Home
JAQForum Ver 20.06
Log In or Join  
Active Topics
Local Time 18:01 19 Apr 2024 Privacy Policy
Jump to

Notice. New forum software under development. It's going to miss a few functions and look a bit ugly for a while, but I'm working on it full time now as the old forum was too unstable. Couple days, all good. If you notice any issues, please contact me.

Forum Index : Microcontroller and PC projects : CMM2: performance struggles with MMBasic

     Page 2 of 2    
Author Message
abraxas
Regular Member

Joined: 16/06/2020
Location: Canada
Posts: 99
Posted: 02:47am 07 Jul 2020
Copy link to clipboard 
Print this post

Just sat down in front of my CMM2. I think I’m going to try rendering with 4 large blits instead and keep track of visible bling in a separate array. There is a chance that with a lot of objects moving at once the performance might get uneven but the potential boost from doing 4 blits rather than 240 is too tempting. I’ll post the new version if I get it working tonight.
 
JohnS
Guru

Joined: 18/11/2011
Location: United Kingdom
Posts: 3649
Posted: 06:49am 07 Jul 2020
Copy link to clipboard 
Print this post

Maybe drop to C to do big array scanning and from there mark which items need actual work in the Basic part of the program.

John
 
capsikin
Guru

Joined: 30/06/2020
Location: Australia
Posts: 341
Posted: 07:17am 07 Jul 2020
Copy link to clipboard 
Print this post

  abraxas said  Just sat down in front of my CMM2. I think I’m going to try rendering with 4 large blits instead and keep track of visible bling in a separate array. There is a chance that with a lot of objects moving at once the performance might get uneven but the potential boost from doing 4 blits rather than 240 is too tempting. I’ll post the new version if I get it working tonight.


I managed to get the version you already posted kind of working on my picromite, but due to the speed difference I don't have a feel for the performance. Do you have the render time and tick time you were getting?

I might still have a go at telling the renderer not to draw tiles except changed ones and animated ones.
 
capsikin
Guru

Joined: 30/06/2020
Location: Australia
Posts: 341
Posted: 10:57am 07 Jul 2020
Copy link to clipboard 
Print this post

  abraxas said  Caspian and others,

I'm attaching the latest version of the code. It already runs acceptably though not blazingly fast. It's visible that CMM2 struggles to keep up with rendering as you moved outside the viewport and force the viewport to "catch up". It has this delay that the ZX Spectrum version also exhibits. Now, it's possible to cure it by making the viewport "chase" Rockford one tile per move when Rockford gets close to the edge. That's how Mauro's port worked and some other 8 bits ports did (e.g. MSX). I do have an issue with it however, as it strays from the original which instead tries to center the viewport on Rockford. This leads to different tradeoffs and thus affects the gameplay. In those implementations that follow the technique that Mauro used it's easier to escape baddies that are chasing you but harder to view a larger area of the map once your viewport moves. In my version which mimics what Peter Liepa's code did it's the opposite tradeoff. Now there is a setting in my code called "SCROLL_STEP" that is currently set to 6 pixels. I can make it bigger and the viewport will update more quickly but will be more jarring.

Additionally to help with the feel of the game I bail out of the RENDER sub any time the game requires a new compute of the game loop. This change improved playability a lot but does lead to occasional visual artefacts especially when you look at diamond animations. There may be some trial and error possibly by messing with the timing of the animation clock to make it less obvious. I haven't spent much time refining that.

A lot of the ideas you and the others talked about I'd already incorporated into the code. I don't have the "active/inactive" tile marking but I do mark dirt, empty spaces and steel walls as "inert" and shunt the eval by first testing for that flag.

I'm thinking about adding a "upToDate" flag on every tile and somehow use that in rendering. The animated tiles throw a big monkey wrench in it though. Additionally I'd rather have a more consistent rendering time that's easier to account for than a wildly variable situation with huge swings in performance depending on how much bling is visible in the viewport. I will for sure experiment more with the rendering routine.

Anyway for now I have the falling behavior implemented and debugged and Rockford movement as well as viewport repositioning working together. You can  walk around the screen using the cursor keys. I'd love for anyone even remotely interested in my efforts to download the attachment and have a play. Run the "game.bas" file. I very much welcome comments and suggestions for improvements.


BDASH.zip


I think the upToDate flag you were thinking of is the same idea as my needsDrawing array, just with the opposite value. You can see how I dealt with that in the code - basically, I never set animated tiles as upToDate.

I've attached a rough demo of my improvement (I've modified your "game.bas" file, with the differences extracted to game_opt_CMM2.patch).

It seems to have improved rendering speed by a factor of 4 when all the diamonds on the first screen are animating, or even more when there's less animation. But this isn't on the CMM2.

It should set all tiles to draw when you press "r", in case any don't work right.

There's a couple of improvements it should have, but I didn't want to do them myself since I can't test the changes on a CMM2:

I haven't set the tiles to draw at the start of the game, until you press "r". This has the advantage you can see how fast it goes when there's no animation, but it should really be fixed.

And I commented out the blit-based scrolling I was trying, since the pi-cromite version I made would be incompatible with the CMM2. It was based on your earlier version, and modified it for the pi-cromite, but I don't think I got it working perfectly.

So scrolling is extra-slow. It marks all tiles on screen as needing to redraw when it needs to scroll. You can take that out though, if you put the blit-scrolling back in.

BD_OPT.zip

Oops, I forgot to ask - can someone test that it works? I haven't tested it on the CMM2, I ported my changes from the pi-cromite version I made.
Edited 2020-07-07 20:59 by capsikin
 
abraxas
Regular Member

Joined: 16/06/2020
Location: Canada
Posts: 99
Posted: 08:00pm 07 Jul 2020
Copy link to clipboard 
Print this post

Caspian,
I haven't had a chance to try your version. I don't own a pi-cromite so this will be a bit clumsy until your CMM2 arrives.

For what it's worth on the CMM2 the render loop runs between 25-28ms and the game logic loop around 56-58ms.

The game clock should be around 110ms for a C64 "hard level" feel so that's our time budget. With that budget we can basically afford 1 game compute cycle (mandatory) and something like 2 render cycles. Which is OK but not great. It means we are running at something around 20fps. Playable but not butter smooth.

I've been working from my end trying the following approach:
1. expand the game loop logic to BLIT altered tiles onto 4 separate graphics pages
2. during the game loop mark tiles that have bling (e.g. diamonds, fireflies etc) so we can animate them in the render function
3. change the render function to just do 4 BLIT calls to show the viewport and then iterate over an array of visible bling and update that.

However, I only got as far as step 1 last night and lo and behold the timing for the game logic compute increased from 56ms to a whopping 97ms JUST because I have two additional IF statements in the inner loop. One to figure out if a tile has changed and therefore needs a BLIT. Another to figure out which page to write to depending on which quadrant of the map we are drawing.

This means it's _likely_ that doing step 2 and 3 is a waste of time because nearly the whole time budget is now being eaten up by game loop logic. Granted 2 & 3 will likely make RENDER run in less than 5ms but it still means we will at most have the budget for 2, maybe 3 calls before it's time to reprocess the room map for the next game tick.

A lot of intuitions that come to those of us coding on modern computers have to be put aside. CMM2 is a very odd beast. Every line of BASIC is relatively expensive but most graphics calls are pretty cheap. Thus it's oftentimes just as good or better to do the "dumb" thing and just iterate without performing intricate logic to figure out what needs redrawing. This may well be one of these situations. I will pursue my code a bit more and study yours later tonight but I wanted to give you an interim update.
Edited 2020-07-08 06:23 by abraxas
 
capsikin
Guru

Joined: 30/06/2020
Location: Australia
Posts: 341
Posted: 01:48am 21 Jul 2020
Copy link to clipboard 
Print this post

I tried that code I uploaded earlier on the CMM2, it is buggy enough I wouldn't recommend running it except to try and understand my optimisation idea. Though for that I've also used the same idea in gamecjmskipcells.bas in my new version.

I liked your approach in the MMBasic optimization tips post, of skipping whole rows at a time. I've tried combining that with my previous optimisation and a new one, and it seems to be pretty fast.

The new optimisation I added, is to keep track of the first and last location of each row on the map that needs to be processed, instead of looking at the whole width of the map, or the screen. I implemented it twice in the same program, once to optimise drawing tiles, once to optimise updating the game simulation.

It did get quite complex with all the optimisations together, and I wanted to see if I could make it simpler so I've also got some copies with only some of the optimisations.

game16-bounds.bas has all the optimisations in and should be fastest.

earlier versions from before I added all optimisations
gamecjmskipcells.bas
gamecjmrenderbounds.bas

later versions where I removed some optimisations:
game21-boundwithoutskipcells.bas
game31-bounds-without-renderbounds.bas

BDASH.zip

p.s. I saw you've got a new version coming along, I'm looking forward to seeing it.
Edited 2020-07-21 12:14 by capsikin
 
abraxas
Regular Member

Joined: 16/06/2020
Location: Canada
Posts: 99
Posted: 02:59am 21 Jul 2020
Copy link to clipboard 
Print this post

  capsikin said  I tried that code I uploaded earlier on the CMM2, it is buggy enough I wouldn't recommend running it except to try and understand my optimisation idea. Though for that I've also used the same idea in gamecjmskipcells.bas in my new version.

I liked your approach in the MMBasic optimization tips post, of skipping whole rows at a time. I've tried combining that with my previous optimisation and a new one, and it seems to be pretty fast.

The new optimisation I added, is to keep track of the first and last location of each row on the map that needs to be processed, instead of looking at the whole width of the map, or the screen. I implemented it twice in the same program, once to optimise drawing tiles, once to optimise updating the game simulation.

It did get quite complex with all the optimisations together, and I wanted to see if I could make it simpler so I've also got some copies with only some of the optimisations.

game16-bounds.bas has all the optimisations in and should be fastest.

earlier versions from before I added all optimisations
gamecjmskipcells.bas
gamecjmrenderbounds.bas

later versions where I removed some optimisations:
game21-boundwithoutskipcells.bas
game31-bounds-without-renderbounds.bas

BDASH.zip

p.s. I saw you've got a new version coming along, I'm looking forward to seeing it.


Yeah, I'm neck deep into a slew of optimizations myself. I decided to pursue a fairly risky but so far maybe the most promising approach. I created two separate lists - one for scanning and one for performing actions. It's not necessarily the most cpu frugal approach but keeps the implementation easier to reason about. Instead of having a skip list I now have a command list. The trick is to always make sure that an action in a cell then triggers a scan for more action around it thus maintaining the chain of causality. Still, I found it less bug prone and more performant than the approach with skip lists. I'm still struggling a bit with the performance of the render loop itself especially during the viewport scroll. That's an area where the optimization is still necessary if the game is to hit 60 or 75 fps consistently. I'll post the updated version of the code in the next reply when I open the laptop as I'm replying from my phone. At the same time I'll take a look at what you've done and see if I can draw some inspiration.
 
abraxas
Regular Member

Joined: 16/06/2020
Location: Canada
Posts: 99
Posted: 03:21am 21 Jul 2020
Copy link to clipboard 
Print this post

  abraxas said   I'll post the updated version of the code in the next reply when I open the laptop as I'm replying from my phone. At the same time I'll take a look at what you've done and see if I can draw some inspiration.


Here's the promised new version of the engine. There might still be some remnants of the previous versions there. I'll have to tidy up what's no longer in use... that said, I think this version is starting to look a bit "cleaner" architecturally as there is no longer any need for a "scanned" flag on any tile. Instead the frame processing breaks down into two stages: the scanning stage - where cells of interest are examined for pending changes and the command stage - where relevant changes are applied. The advantage to this approach is that all scanning happens in stage one and then commands are applied to the map in the second stage only thus preventing double scanning and automagically resolving "conflicts" such as  who or what should enter an empty tile.

The "main" sub of this version is called CAVELOOP and the two local structures that are critical is the scanList() and the cmdList(). The trick is that they must also be sorted to ensure the cave is scanned and altered top-bottom-left-to-right just as in the original game. PROCESSTICK is a sub that run a single iteration of scanning and command playback. It also has a fairly simple renderSkipList() invalidation check (lines 167-172). There is some silly code in ADDSCAN* subs that could be made more readable with FOR loops but instead is hand unrolled. I should actually test if that optimization is actually making things better as MMBasic can be fairly surprising in such scenarios. In any case, those SUBs are long but trivial. They just generate a bunch of scan commands around a tile.

RENDERSCREEN is mostly the same as before with the main difference that I'm now drawing columns rather than rows so the renderSkipList has a somewhat higher resolution for caves that have more width than height (for example, all Peter Liepa's caves).
BDASH.zip
 
capsikin
Guru

Joined: 30/06/2020
Location: Australia
Posts: 341
Posted: 03:41am 21 Jul 2020
Copy link to clipboard 
Print this post

  abraxas said  

Yeah, I'm neck deep into a slew of optimizations myself. I decided to pursue a fairly risky but so far maybe the most promising approach. I created two separate lists - one for scanning and one for performing actions. It's not necessarily the most cpu frugal approach but keeps the implementation easier to reason about. Instead of having a skip list I now have a command list. The trick is to always make sure that an action in a cell then triggers a scan for more action around it thus maintaining the chain of causality. Still, I found it less bug prone and more performant than the approach with skip lists.

Sounds interesting, I'll have a look at how you do that in the code.

  abraxas said  
I'm still struggling a bit with the performance of the render loop itself especially during the viewport scroll. That's an area where the optimization is still necessary if the game is to hit 60 or 75 fps consistently.

Yes, I found during scrolling I needed to avoid drawing the whole screen tile by tile, and to scroll most of the screen in a single blit. For vertical scrolling your old skip list system was fine for keeping track of tiles to redraw, but for horizontal/diagonal scrolling I had to use other methods.

I currently draw a two-tile wide strip of the tiles scrolling onto the screen. This could maybe be reduced to one tile, but I think a two-tile strip may be fast enough.

  abraxas said  
I'll post the updated version of the code in the next reply when I open the laptop as I'm replying from my phone. At the same time I'll take a look at what you've done and see if I can draw some inspiration.


For the scrolling render performance, you could look at my gamecjmskipcells.bas where I mark individual tile locations to skip displaying, or my later version game21-boundwithoutskipcells.bas where I'd taken that out, and kept track of left and right bounds of what needed displaying (though I'd also added tracking left and right bounds for game updates in other parts of the code).

gamecjmrenderbounds.bas has a combination of both optimisations.
 
capsikin
Guru

Joined: 30/06/2020
Location: Australia
Posts: 341
Posted: 03:58am 22 Jul 2020
Copy link to clipboard 
Print this post

I like the new features - with the sound and ability to push rocks it was fun to have a little play around in.

I like how the game logic loop eliminates almost all processing except near changes. It took me a while to understand but I've almost got it.

I did notice a very minor change in rock falling behaviour, I don't know if you need 100% compatibility with maps depending on the previous behaviour. If a rock is set to roll left in the SCANTILE routine, but something else falls into that left spot from above or further left, then during PROCESSCOMMAND the rock gives up instead of falling to the right (it may fall to the right next turn if the space is still free, it's not doing anything really weird, just a little different)

I'm also not sure I completely understand the program, because I was trying to cause a glitch I expected, but failed.

I thought if there's a rock above an empty space, it should be set to fall in the SCANTILE routine, but if I'm to the left of it (and so processed earlier in the list), I should be able to push it before it completes the fall in the PROCESSCOMMAND routine. And so the falling command would then affect me instead of the rock. To arrange for a rock to be above an empty space, I could have already pushed it over the space, or it could be part of a stream of rocks falling from above.

However, in no case could I push any rock before it fell. This isn't a problem except I don't understand what's going on.

One last comment - it's fast enough now that it's getting tricky to notice when the rendering is too slow.
 
     Page 2 of 2    
Print this page


To reply to this topic, you need to log in.

© JAQ Software 2024