Home
JAQForum Ver 24.01
Log In or Join  
Active Topics
Local Time 11:32 19 Apr 2026 Privacy Policy
Jump to

Notice. New forum software under development. It's going to miss a few functions and look a bit ugly for a while, but I'm working on it full time now as the old forum was too unstable. Couple days, all good. If you notice any issues, please contact me.

Forum Index : Microcontroller and PC projects : 1.8 times speed improvement in MMBasic?

Author Message
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 11201
Posted: 04:54pm 18 Apr 2026
Copy link to clipboard 
Print this post

For some programs....

I've been playing (together with my mate) looking at every avenue possible to speed up MMbasic and the answer is there is very little sensible left to do unless...

What is the most used statement in most programs - answer LET

I wonder if we could cache LET statments so that there was no variable lookup except the first time in and no recursive parsing - sort of compile them on the fly.

and it works

Here is a test program
Option BASE 0
Dim col(7)
col(0)=RGB(0,0,0)      ' Black
col(1)=RGB(255,0,0)    ' Red
col(2)=RGB(0,255,0)    ' Green
col(3)=RGB(0,0,255)    ' Blue
col(4)=RGB(255,255,0)  ' Yellow
col(5)=RGB(0,255,255)  ' Cyan
col(6)=RGB(255,0,255)  ' Magenta
col(7)=RGB(255,255,255)' White

MODE 2 ' 320x240
Timer =0
xmin = -2.0
xmax = 1.0
ymin = -1.2
ymax = 1.2
maxiter = 32
Dim xp(319),yp(319),cp(319)
For i=0 To 319:xp(i)=i:Next
For py = 0 To 239
 Math set py,yp()
 y0 = ymin + (ymax-ymin)*py/239
 For px = 0 To 319
   x0 = xmin + (xmax-xmin)*px/319
   x = 0
   y = 0
   iter = 0
   Do While x*x + y*y <= 4 And iter < maxiter
     xtemp = x*x - y*y + x0
     y = 2*x*y + y0
     x = xtemp
     iter = iter + 1
   Loop
   If iter = maxiter Then
     c = 0
   Else
     c = (iter Mod 7) + 1
   EndIf
   cp(px)=col(c)
 Next px
 Pixel xp(),yp(),cp()
 Next py
Print Timer
End


On a HDMIUSB system with OPTION RESOLUTION 640,378000 set this takes 42.4 seconds

However, add the magic line at the top
Option tracecache on


and.....




You can try an experimental version of the code if you want


PicoMite (2).zip


Also you can try adding OPTION PROFILING ON at the top of the program
 
Volhout
Guru

Joined: 05/03/2018
Location: Netherlands
Posts: 5859
Posted: 05:10pm 18 Apr 2026
Copy link to clipboard 
Print this post

Hi Peter,

What is the drawback .? When there is no penalty, why not default it ?

Volhout
PicomiteVGA PETSCII ROBOTS
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 11201
Posted: 05:23pm 18 Apr 2026
Copy link to clipboard 
Print this post

Increased image size, increases ram usage both for the image and in use. In some programs it will make no difference or even very slightly slower. Use has more functionality that needs active user involvement to get the best out of it. The example is a best case.
Here is a picorp2040 version

PicoMite.zip


Additional functionality:
OPTION TRACECACHE ON n ' n is the number of cache slots - defaults to 128. Each slot uses about 400 bytes of heap

OPTION CACHE SUB subname [,subname...]
This specifies which subroutines should be optimised. Avoids filling the limited cache with subs that don't need it. Use OPTION PROFILING ON to see the "HOT" subs. Note that profiling reports when the program has an explicit end statement

OPTION CACHE DEBUG ON
Provides a diagnostic of LET statements that can't be optimised. Having lots of these is what can slow the program down

Limitations:
Maximum 4 variable names in a statement to pass optimisation
Can't optimise a LET statment with user functions

The test program on a RP2040 @ 420MHz goes from 74 seconds to 42 with a ILI9341
Edited 2026-04-19 04:06 by matherp
 
Bleep
Guru

Joined: 09/01/2022
Location: United Kingdom
Posts: 786
Posted: 06:09pm 18 Apr 2026
Copy link to clipboard 
Print this post

Hi Peter,
How about a 3X speedup! Try Bubble attached, it was 308mS per update, even without the option trace cache on it was down to 280mS, however with Trace cache it's 109mS!!!!  Starfield, is a bit more sedate at 103mS down to 72mS about 40% faster.
No Idea why Bubble is so good, presumably it's almost a perfect fit for the caching?

This is a HDMIUSBI2S board Resolution 640,378000.

Regards Kevin.

bubble2350vga.zip
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 11201
Posted: 06:35pm 18 Apr 2026
Copy link to clipboard 
Print this post

Here is the cache debug and profile report for bubble running for 50 iterations
[TC-BAD] (top): n(g)= RGB( 0,255,0)
[TC-BAD] (top): n(g)= RGB( a* 3.93,g* 2.575,128* (a+ g< 65))
[TC-BAD] (top): n(g)= RGB( 255,255,0)
[PERF] elapsed=5777109 us  statements=1711967  findvar=136196 (locals=0 [0%] globals=136196 [100%])  user_subs=0
[PERF] tracecache: enabled=1 size=64 replays=1323401 compiles_ok=8 compiles_bad=3
[PERF] tracecache: lookup_null=0 alloc_fail=0 optin_skip=0
[PERF] top commands by dispatch count:
    1330007  Let
     339966  Next
      13200  Math
       7550  If
       6030  EndIf
       3417  For
       3366  Memory
       3350  Inc
       3300  Pixel
       1520  Else
         52  FRAMEBUFFER
         51  CLS
         50  Loop
         50  Print
         50  Timer
          3  Dim
          1  Do
          1  End
          1  Const
          1  Font

As you can see, of 1330007 LET statements 1323401 have been "compiled" so there is no parsing when that statement is executed. This is where the huge speed up comes from
 
toml_12953
Guru

Joined: 13/02/2015
Location: United States
Posts: 602
Posted: 06:47pm 18 Apr 2026
Copy link to clipboard 
Print this post

  matherp said  For some programs....

I've been playing (together with my mate) looking at every avenue possible to speed up MMbasic and the answer is there is very little sensible left to do unless...

What is the most used statement in most programs - answer LET

I wonder if we could cache LET statments so that there was no variable lookup except the first time in and no recursive parsing - sort of compile them on the fly.

and it works



Some BASIC interpreters also pre-compile the address for NEXT, LOOP, DATA, GOTO and GOSUB so that lookup is only done once when the statement is first encountered. That can speed up things quite a bit, especially when the loop occurs toward the end of a very long program. Normally the interpreter has to search from the beginning of the program every time a transfer is made but precompiling allows it to jump directly to the target statement.
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 11201
Posted: 06:50pm 18 Apr 2026
Copy link to clipboard 
Print this post

NEXT and LOOP already do this. GOTO and GOSUB are deprecated and DATA isn't an issue for performance, If I take this any further the next would be the test in IF statements and the implied LET after THEN
Edited 2026-04-19 04:51 by matherp
 
Bleep
Guru

Joined: 09/01/2022
Location: United Kingdom
Posts: 786
Posted: 06:55pm 18 Apr 2026
Copy link to clipboard 
Print this post

Hi Peter,
How do you get the extra info, When I put Option Cache Debug On at the top of my code  all I got was the first few lines with [TC-BAD] at their start, none of the rest? Not sure what the [TC-BAD] are supposed to be telling me anyway, but they are not in the critical part of the loop, so they don't really matter.
Kevin.
 
Bleep
Guru

Joined: 09/01/2022
Location: United Kingdom
Posts: 786
Posted: 07:23pm 18 Apr 2026
Copy link to clipboard 
Print this post

I have no idea if this is feasible or worthwhile from a memory usage point of view, but in both the programs above, only one or two lines are worth caching, so rather than a blanket statement at the top, would a 'caching start' caching end' be feasable, use up less memory?
Regards Kevin.
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 11201
Posted: 07:30pm 18 Apr 2026
Copy link to clipboard 
Print this post

That is all it caches. In your program you could use OPTION TRACECACHE ON 16 which would hardly use memory.
Using OPTION CACHE SUB you can also control which subs are cached. Any not mentioned are not.
 
Bleep
Guru

Joined: 09/01/2022
Location: United Kingdom
Posts: 786
Posted: 07:40pm 18 Apr 2026
Copy link to clipboard 
Print this post

yes I already tried limiting the tracecache to 20 which gives 110mS so almost the full speedup. :-)
 
PhenixRising
Guru

Joined: 07/11/2023
Location: United Kingdom
Posts: 1842
Posted: 09:09pm 18 Apr 2026
Copy link to clipboard 
Print this post

Should this work on my RP2350 DIL board because it makes the COM port disappear.
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 11201
Posted: 09:58pm 18 Apr 2026
Copy link to clipboard 
Print this post

  Quote  Should this work on my RP2350 DIL board because it makes the COM port disappear.

I haven't posted a version for the RP2350 other than HDMIUSB
 
thwill

Guru

Joined: 16/09/2019
Location: United Kingdom
Posts: 4366
Posted: 11:30pm 18 Apr 2026
Copy link to clipboard 
Print this post

Hi Peter,

Sounds fun,

But without knowing the internals ...

Can it cope with LOCAL variables and recursion where multiple "versions" of a variable can bec stored in the variable table at different "levels"?

What about people playing "silly buggers" by using ERASE and (re)DIM which potentially changes where in the variable table a given variable is stored?

Hoping you are smarter than I,

Tom
Edited 2026-04-19 09:31 by thwill
MMBasic for Linux, Game*Mite, CMM2 Welcome Tape, Creaky old text adventures
 
Print this page


To reply to this topic, you need to log in.

The Back Shed's forum code is written, and hosted, in Australia.
© JAQ Software 2026