Notice. New forum software under development. It's going to miss a few functions and look a bit ugly for a while, but I'm working on it full time now as the old forum was too unstable. Couple days, all good. If you notice any issues, please contact me.
|
Forum Index : Microcontroller and PC projects : CMM2: performance struggles with MMBasic
Page 2 of 2 | |||||
Author | Message | ||||
abraxas Regular Member Joined: 16/06/2020 Location: CanadaPosts: 99 |
Just sat down in front of my CMM2. I think I’m going to try rendering with 4 large blits instead and keep track of visible bling in a separate array. There is a chance that with a lot of objects moving at once the performance might get uneven but the potential boost from doing 4 blits rather than 240 is too tempting. I’ll post the new version if I get it working tonight. |
||||
JohnS Guru Joined: 18/11/2011 Location: United KingdomPosts: 3649 |
Maybe drop to C to do big array scanning and from there mark which items need actual work in the Basic part of the program. John |
||||
capsikin Guru Joined: 30/06/2020 Location: AustraliaPosts: 341 |
I managed to get the version you already posted kind of working on my picromite, but due to the speed difference I don't have a feel for the performance. Do you have the render time and tick time you were getting? I might still have a go at telling the renderer not to draw tiles except changed ones and animated ones. |
||||
capsikin Guru Joined: 30/06/2020 Location: AustraliaPosts: 341 |
I think the upToDate flag you were thinking of is the same idea as my needsDrawing array, just with the opposite value. You can see how I dealt with that in the code - basically, I never set animated tiles as upToDate. I've attached a rough demo of my improvement (I've modified your "game.bas" file, with the differences extracted to game_opt_CMM2.patch). It seems to have improved rendering speed by a factor of 4 when all the diamonds on the first screen are animating, or even more when there's less animation. But this isn't on the CMM2. It should set all tiles to draw when you press "r", in case any don't work right. There's a couple of improvements it should have, but I didn't want to do them myself since I can't test the changes on a CMM2: I haven't set the tiles to draw at the start of the game, until you press "r". This has the advantage you can see how fast it goes when there's no animation, but it should really be fixed. And I commented out the blit-based scrolling I was trying, since the pi-cromite version I made would be incompatible with the CMM2. It was based on your earlier version, and modified it for the pi-cromite, but I don't think I got it working perfectly. So scrolling is extra-slow. It marks all tiles on screen as needing to redraw when it needs to scroll. You can take that out though, if you put the blit-scrolling back in. BD_OPT.zip Oops, I forgot to ask - can someone test that it works? I haven't tested it on the CMM2, I ported my changes from the pi-cromite version I made. Edited 2020-07-07 20:59 by capsikin |
||||
abraxas Regular Member Joined: 16/06/2020 Location: CanadaPosts: 99 |
Caspian, I haven't had a chance to try your version. I don't own a pi-cromite so this will be a bit clumsy until your CMM2 arrives. For what it's worth on the CMM2 the render loop runs between 25-28ms and the game logic loop around 56-58ms. The game clock should be around 110ms for a C64 "hard level" feel so that's our time budget. With that budget we can basically afford 1 game compute cycle (mandatory) and something like 2 render cycles. Which is OK but not great. It means we are running at something around 20fps. Playable but not butter smooth. I've been working from my end trying the following approach: 1. expand the game loop logic to BLIT altered tiles onto 4 separate graphics pages 2. during the game loop mark tiles that have bling (e.g. diamonds, fireflies etc) so we can animate them in the render function 3. change the render function to just do 4 BLIT calls to show the viewport and then iterate over an array of visible bling and update that. However, I only got as far as step 1 last night and lo and behold the timing for the game logic compute increased from 56ms to a whopping 97ms JUST because I have two additional IF statements in the inner loop. One to figure out if a tile has changed and therefore needs a BLIT. Another to figure out which page to write to depending on which quadrant of the map we are drawing. This means it's _likely_ that doing step 2 and 3 is a waste of time because nearly the whole time budget is now being eaten up by game loop logic. Granted 2 & 3 will likely make RENDER run in less than 5ms but it still means we will at most have the budget for 2, maybe 3 calls before it's time to reprocess the room map for the next game tick. A lot of intuitions that come to those of us coding on modern computers have to be put aside. CMM2 is a very odd beast. Every line of BASIC is relatively expensive but most graphics calls are pretty cheap. Thus it's oftentimes just as good or better to do the "dumb" thing and just iterate without performing intricate logic to figure out what needs redrawing. This may well be one of these situations. I will pursue my code a bit more and study yours later tonight but I wanted to give you an interim update. Edited 2020-07-08 06:23 by abraxas |
||||
capsikin Guru Joined: 30/06/2020 Location: AustraliaPosts: 341 |
I tried that code I uploaded earlier on the CMM2, it is buggy enough I wouldn't recommend running it except to try and understand my optimisation idea. Though for that I've also used the same idea in gamecjmskipcells.bas in my new version. I liked your approach in the MMBasic optimization tips post, of skipping whole rows at a time. I've tried combining that with my previous optimisation and a new one, and it seems to be pretty fast. The new optimisation I added, is to keep track of the first and last location of each row on the map that needs to be processed, instead of looking at the whole width of the map, or the screen. I implemented it twice in the same program, once to optimise drawing tiles, once to optimise updating the game simulation. It did get quite complex with all the optimisations together, and I wanted to see if I could make it simpler so I've also got some copies with only some of the optimisations. game16-bounds.bas has all the optimisations in and should be fastest. earlier versions from before I added all optimisations gamecjmskipcells.bas gamecjmrenderbounds.bas later versions where I removed some optimisations: game21-boundwithoutskipcells.bas game31-bounds-without-renderbounds.bas BDASH.zip p.s. I saw you've got a new version coming along, I'm looking forward to seeing it. Edited 2020-07-21 12:14 by capsikin |
||||
abraxas Regular Member Joined: 16/06/2020 Location: CanadaPosts: 99 |
Yeah, I'm neck deep into a slew of optimizations myself. I decided to pursue a fairly risky but so far maybe the most promising approach. I created two separate lists - one for scanning and one for performing actions. It's not necessarily the most cpu frugal approach but keeps the implementation easier to reason about. Instead of having a skip list I now have a command list. The trick is to always make sure that an action in a cell then triggers a scan for more action around it thus maintaining the chain of causality. Still, I found it less bug prone and more performant than the approach with skip lists. I'm still struggling a bit with the performance of the render loop itself especially during the viewport scroll. That's an area where the optimization is still necessary if the game is to hit 60 or 75 fps consistently. I'll post the updated version of the code in the next reply when I open the laptop as I'm replying from my phone. At the same time I'll take a look at what you've done and see if I can draw some inspiration. |
||||
abraxas Regular Member Joined: 16/06/2020 Location: CanadaPosts: 99 |
Here's the promised new version of the engine. There might still be some remnants of the previous versions there. I'll have to tidy up what's no longer in use... that said, I think this version is starting to look a bit "cleaner" architecturally as there is no longer any need for a "scanned" flag on any tile. Instead the frame processing breaks down into two stages: the scanning stage - where cells of interest are examined for pending changes and the command stage - where relevant changes are applied. The advantage to this approach is that all scanning happens in stage one and then commands are applied to the map in the second stage only thus preventing double scanning and automagically resolving "conflicts" such as who or what should enter an empty tile. The "main" sub of this version is called CAVELOOP and the two local structures that are critical is the scanList() and the cmdList(). The trick is that they must also be sorted to ensure the cave is scanned and altered top-bottom-left-to-right just as in the original game. PROCESSTICK is a sub that run a single iteration of scanning and command playback. It also has a fairly simple renderSkipList() invalidation check (lines 167-172). There is some silly code in ADDSCAN* subs that could be made more readable with FOR loops but instead is hand unrolled. I should actually test if that optimization is actually making things better as MMBasic can be fairly surprising in such scenarios. In any case, those SUBs are long but trivial. They just generate a bunch of scan commands around a tile. RENDERSCREEN is mostly the same as before with the main difference that I'm now drawing columns rather than rows so the renderSkipList has a somewhat higher resolution for caves that have more width than height (for example, all Peter Liepa's caves). BDASH.zip |
||||
capsikin Guru Joined: 30/06/2020 Location: AustraliaPosts: 341 |
Sounds interesting, I'll have a look at how you do that in the code. Yes, I found during scrolling I needed to avoid drawing the whole screen tile by tile, and to scroll most of the screen in a single blit. For vertical scrolling your old skip list system was fine for keeping track of tiles to redraw, but for horizontal/diagonal scrolling I had to use other methods. I currently draw a two-tile wide strip of the tiles scrolling onto the screen. This could maybe be reduced to one tile, but I think a two-tile strip may be fast enough. For the scrolling render performance, you could look at my gamecjmskipcells.bas where I mark individual tile locations to skip displaying, or my later version game21-boundwithoutskipcells.bas where I'd taken that out, and kept track of left and right bounds of what needed displaying (though I'd also added tracking left and right bounds for game updates in other parts of the code). gamecjmrenderbounds.bas has a combination of both optimisations. |
||||
capsikin Guru Joined: 30/06/2020 Location: AustraliaPosts: 341 |
I like the new features - with the sound and ability to push rocks it was fun to have a little play around in. I like how the game logic loop eliminates almost all processing except near changes. It took me a while to understand but I've almost got it. I did notice a very minor change in rock falling behaviour, I don't know if you need 100% compatibility with maps depending on the previous behaviour. If a rock is set to roll left in the SCANTILE routine, but something else falls into that left spot from above or further left, then during PROCESSCOMMAND the rock gives up instead of falling to the right (it may fall to the right next turn if the space is still free, it's not doing anything really weird, just a little different) I'm also not sure I completely understand the program, because I was trying to cause a glitch I expected, but failed. I thought if there's a rock above an empty space, it should be set to fall in the SCANTILE routine, but if I'm to the left of it (and so processed earlier in the list), I should be able to push it before it completes the fall in the PROCESSCOMMAND routine. And so the falling command would then affect me instead of the rock. To arrange for a rock to be above an empty space, I could have already pushed it over the space, or it could be part of a stream of rocks falling from above. However, in no case could I push any rock before it fell. This isn't a problem except I don't understand what's going on. One last comment - it's fast enough now that it's getting tricky to notice when the rendering is too slow. |
||||
Page 2 of 2 |
Print this page |