|
Forum Index : Microcontroller and PC projects : 1.8 times speed improvement in MMBasic?
| Author | Message | ||||
| matherp Guru Joined: 11/12/2012 Location: United KingdomPosts: 11201 |
For some programs.... I've been playing (together with my mate) looking at every avenue possible to speed up MMbasic and the answer is there is very little sensible left to do unless... What is the most used statement in most programs - answer LET I wonder if we could cache LET statments so that there was no variable lookup except the first time in and no recursive parsing - sort of compile them on the fly. and it works Here is a test program Option BASE 0 Dim col(7) col(0)=RGB(0,0,0) ' Black col(1)=RGB(255,0,0) ' Red col(2)=RGB(0,255,0) ' Green col(3)=RGB(0,0,255) ' Blue col(4)=RGB(255,255,0) ' Yellow col(5)=RGB(0,255,255) ' Cyan col(6)=RGB(255,0,255) ' Magenta col(7)=RGB(255,255,255)' White MODE 2 ' 320x240 Timer =0 xmin = -2.0 xmax = 1.0 ymin = -1.2 ymax = 1.2 maxiter = 32 Dim xp(319),yp(319),cp(319) For i=0 To 319:xp(i)=i:Next For py = 0 To 239 Math set py,yp() y0 = ymin + (ymax-ymin)*py/239 For px = 0 To 319 x0 = xmin + (xmax-xmin)*px/319 x = 0 y = 0 iter = 0 Do While x*x + y*y <= 4 And iter < maxiter xtemp = x*x - y*y + x0 y = 2*x*y + y0 x = xtemp iter = iter + 1 Loop If iter = maxiter Then c = 0 Else c = (iter Mod 7) + 1 EndIf cp(px)=col(c) Next px Pixel xp(),yp(),cp() Next py Print Timer End On a HDMIUSB system with OPTION RESOLUTION 640,378000 set this takes 42.4 seconds However, add the magic line at the top Option tracecache on and..... ![]() You can try an experimental version of the code if you want PicoMite (2).zip Also you can try adding OPTION PROFILING ON at the top of the program |
||||
| Volhout Guru Joined: 05/03/2018 Location: NetherlandsPosts: 5859 |
Hi Peter, What is the drawback .? When there is no penalty, why not default it ? Volhout PicomiteVGA PETSCII ROBOTS |
||||
| matherp Guru Joined: 11/12/2012 Location: United KingdomPosts: 11201 |
Increased image size, increases ram usage both for the image and in use. In some programs it will make no difference or even very slightly slower. Use has more functionality that needs active user involvement to get the best out of it. The example is a best case. Here is a picorp2040 version PicoMite.zip Additional functionality: OPTION TRACECACHE ON n ' n is the number of cache slots - defaults to 128. Each slot uses about 400 bytes of heap OPTION CACHE SUB subname [,subname...] This specifies which subroutines should be optimised. Avoids filling the limited cache with subs that don't need it. Use OPTION PROFILING ON to see the "HOT" subs. Note that profiling reports when the program has an explicit end statement OPTION CACHE DEBUG ON Provides a diagnostic of LET statements that can't be optimised. Having lots of these is what can slow the program down Limitations: Maximum 4 variable names in a statement to pass optimisation Can't optimise a LET statment with user functions The test program on a RP2040 @ 420MHz goes from 74 seconds to 42 with a ILI9341 Edited 2026-04-19 04:06 by matherp |
||||
| Bleep Guru Joined: 09/01/2022 Location: United KingdomPosts: 786 |
Hi Peter, How about a 3X speedup! Try Bubble attached, it was 308mS per update, even without the option trace cache on it was down to 280mS, however with Trace cache it's 109mS!!!! Starfield, is a bit more sedate at 103mS down to 72mS about 40% faster. No Idea why Bubble is so good, presumably it's almost a perfect fit for the caching? This is a HDMIUSBI2S board Resolution 640,378000. Regards Kevin. bubble2350vga.zip |
||||
| matherp Guru Joined: 11/12/2012 Location: United KingdomPosts: 11201 |
Here is the cache debug and profile report for bubble running for 50 iterations [TC-BAD] (top): n(g)= RGB( 0,255,0) [TC-BAD] (top): n(g)= RGB( a* 3.93,g* 2.575,128* (a+ g< 65)) [TC-BAD] (top): n(g)= RGB( 255,255,0) [PERF] elapsed=5777109 us statements=1711967 findvar=136196 (locals=0 [0%] globals=136196 [100%]) user_subs=0 [PERF] tracecache: enabled=1 size=64 replays=1323401 compiles_ok=8 compiles_bad=3 [PERF] tracecache: lookup_null=0 alloc_fail=0 optin_skip=0 [PERF] top commands by dispatch count: 1330007 Let 339966 Next 13200 Math 7550 If 6030 EndIf 3417 For 3366 Memory 3350 Inc 3300 Pixel 1520 Else 52 FRAMEBUFFER 51 CLS 50 Loop 50 Print 50 Timer 3 Dim 1 Do 1 End 1 Const 1 Font As you can see, of 1330007 LET statements 1323401 have been "compiled" so there is no parsing when that statement is executed. This is where the huge speed up comes from |
||||
| toml_12953 Guru Joined: 13/02/2015 Location: United StatesPosts: 602 |
Some BASIC interpreters also pre-compile the address for NEXT, LOOP, DATA, GOTO and GOSUB so that lookup is only done once when the statement is first encountered. That can speed up things quite a bit, especially when the loop occurs toward the end of a very long program. Normally the interpreter has to search from the beginning of the program every time a transfer is made but precompiling allows it to jump directly to the target statement. |
||||
| matherp Guru Joined: 11/12/2012 Location: United KingdomPosts: 11201 |
NEXT and LOOP already do this. GOTO and GOSUB are deprecated and DATA isn't an issue for performance, If I take this any further the next would be the test in IF statements and the implied LET after THEN Edited 2026-04-19 04:51 by matherp |
||||
| Bleep Guru Joined: 09/01/2022 Location: United KingdomPosts: 786 |
Hi Peter, How do you get the extra info, When I put Option Cache Debug On at the top of my code all I got was the first few lines with [TC-BAD] at their start, none of the rest? Not sure what the [TC-BAD] are supposed to be telling me anyway, but they are not in the critical part of the loop, so they don't really matter. Kevin. |
||||
| Bleep Guru Joined: 09/01/2022 Location: United KingdomPosts: 786 |
I have no idea if this is feasible or worthwhile from a memory usage point of view, but in both the programs above, only one or two lines are worth caching, so rather than a blanket statement at the top, would a 'caching start' caching end' be feasable, use up less memory? Regards Kevin. |
||||
| matherp Guru Joined: 11/12/2012 Location: United KingdomPosts: 11201 |
That is all it caches. In your program you could use OPTION TRACECACHE ON 16 which would hardly use memory. Using OPTION CACHE SUB you can also control which subs are cached. Any not mentioned are not. |
||||
| Bleep Guru Joined: 09/01/2022 Location: United KingdomPosts: 786 |
yes I already tried limiting the tracecache to 20 which gives 110mS so almost the full speedup. :-) |
||||
| PhenixRising Guru Joined: 07/11/2023 Location: United KingdomPosts: 1842 |
Should this work on my RP2350 DIL board because it makes the COM port disappear. |
||||
| matherp Guru Joined: 11/12/2012 Location: United KingdomPosts: 11201 |
I haven't posted a version for the RP2350 other than HDMIUSB |
||||
| thwill Guru Joined: 16/09/2019 Location: United KingdomPosts: 4366 |
Hi Peter, Sounds fun, But without knowing the internals ... Can it cope with LOCAL variables and recursion where multiple "versions" of a variable can bec stored in the variable table at different "levels"? What about people playing "silly buggers" by using ERASE and (re)DIM which potentially changes where in the variable table a given variable is stored? Hoping you are smarter than I, Tom Edited 2026-04-19 09:31 by thwill MMBasic for Linux, Game*Mite, CMM2 Welcome Tape, Creaky old text adventures |
||||
| The Back Shed's forum code is written, and hosted, in Australia. | © JAQ Software 2026 |