Home
JAQForum Ver 20.06
Log In or Join  
Active Topics
Local Time 08:48 04 May 2024 Privacy Policy
Jump to

Notice. New forum software under development. It's going to miss a few functions and look a bit ugly for a while, but I'm working on it full time now as the old forum was too unstable. Couple days, all good. If you notice any issues, please contact me.

Forum Index : Microcontroller and PC projects : CMM2 V5.05.06exp: Huge performance improvement - please test

     Page 2 of 8    
Author Message
mkopack73
Senior Member

Joined: 03/07/2020
Location: United States
Posts: 261
Posted: 12:44am 21 Oct 2020
Copy link to clipboard 
Print this post

Awesome improvements!!!!
 
vegipete

Guru

Joined: 29/01/2013
Location: Canada
Posts: 1083
Posted: 02:54am 21 Oct 2020
Copy link to clipboard 
Print this post

Most impressive!

Rocks in Space has a much tougher time overrunning the frame time allotment. I use a fixed frame rate with dead time to make up the difference. With this latest experimental firmware, it is hard to get enough on screen to start dropping the frame rate. Much better than before!
Visit Vegipete's *Mite Library for cool programs.
 
JoOngle
Regular Member

Joined: 25/07/2020
Location: Sweden
Posts: 82
Posted: 03:19am 21 Oct 2020
Copy link to clipboard 
Print this post

That's an INSANE improvement! Thanks Peter!  
 
LeoNicolas

Guru

Joined: 07/10/2020
Location: Canada
Posts: 445
Posted: 03:51am 21 Oct 2020
Copy link to clipboard 
Print this post

I've tested all programs I made including the 3D API and they are all working perfectly well.

Thank you Peter  
 
TweakerRay

Senior Member

Joined: 01/08/2020
Location: Germany
Posts: 138
Posted: 07:32am 21 Oct 2020
Copy link to clipboard 
Print this post

Tested the Heli Blaster Game and you feel how much the speed improved. wow... just wow. Thanks a lot Peter ! Could you explain maybe shortly (for simple users like me)
what you have done to make it that much quicker ? Just interested...)

Thanks a lot for your work. Much appreciated.

Cheers TweakerRay
http://tweakerray.bandcamp.com
 
thwill

Guru

Joined: 16/09/2019
Location: United Kingdom
Posts: 3848
Posted: 09:31am 21 Oct 2020
Copy link to clipboard 
Print this post

Sorry, but may I rain on the parade please ?

None of my reasonably sized programs are significantly faster with the experimental firmware, and Z-MIM is actually ~25% SLOWER.

What I think has happened is that the change has kiboshed the performance of LOCAL variables, which like a good little programmer I use in preference to global variables.

See the folllowing:
OLD FIRMWARE
============

MMBasic Version 5.05.06RC5
Copyright 2011-2020 Geoff Graham
Copyright 2016-2020 Peter Mather

> list "test_local.bas"
Option Explicit On
Option Default Integer

Dim a, b, c, d

main()

Sub main()
 Local a,b,c,d
 Local t = Timer
 For a = 1 To 10000
   foo()
 Next a
 Print Timer - t
End Sub

Sub foo()
 Local a,b,c,d
 a = 1
 b = 2
 c = 3
 d = a + b + c
End Sub

> run "test_local.bas"
485.094

> run "test_local.bas"
484.463

> run "test_local.bas"
485.31

NEW FIRMWARE
============

Colour Maximite 2
MMBasic Version 5.05.06exp
Copyright 2011-2020 Geoff Graham
Copyright 2016-2020 Peter Mather

> run "test_local.bas"
1045.936

> run "test_local.bas"
1046.842

> run "test_local.bas"
1045.434


Best regards,

Tom
Game*Mite, CMM2 Welcome Tape, Creaky old text adventures
 
thwill

Guru

Joined: 16/09/2019
Location: United Kingdom
Posts: 3848
Posted: 09:46am 21 Oct 2020
Copy link to clipboard 
Print this post

Hmm, looks like it is more complex than that ...

OLD FIRMWARE
============

Colour Maximite 2
MMBasic Version 5.05.06RC5
Copyright 2011-2020 Geoff Graham
Copyright 2016-2020 Peter Mather

> list "test_global.bas"
Dim a, b, c, d, i, t

main()

Sub main()
 t = Timer
 For i = 1 To 10000
   foo()
 Next i
 Print Timer - t
End Sub

Sub foo()
 a = 1
 b = 2
 c = 3
 d = a + b + c
End Sub

> run "test_global.bas"
291.061

> run "test_global.bas"
290.127

> run "test_global.bas"
290.898

NEW FIRMWARE
============

Colour Maximite 2
MMBasic Version 5.05.06exp
Copyright 2011-2020 Geoff Graham
Copyright 2016-2020 Peter Mather

> run "test_global.bas"
865.521

> run "test_global.bas"
865.64

> run "test_global.bas"
865.317


Tom
Game*Mite, CMM2 Welcome Tape, Creaky old text adventures
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 8592
Posted: 10:26am 21 Oct 2020
Copy link to clipboard 
Print this post

It can be slower for programs that use very short subroutines with local variables.
There will be very little impact for small programs with very few variables.
For complex programs with larger subroutines the performance improvement is huge.
There are no free lunches unfortunately.
One general point: your test program does one thing which is simply a bad idea. Making the main program a subroutine just uses resources (stack, heap) and adds a layer of extra processing throughout MMBasic. For efficiency the main program should be where the code runs.

I have this morning done more optimisation on processing of subroutine exit so try the attached


CMM2V1.5.zip
 
thwill

Guru

Joined: 16/09/2019
Location: United Kingdom
Posts: 3848
Posted: 10:44am 21 Oct 2020
Copy link to clipboard 
Print this post

Hi Peter,

I have to accept what you say at face value, but you'll permit me to be aggrieved that having put in lots of work to break the the 1,000 Z-Machine instruction per second boundary with Z-MIM, I am now looking at only 625 ZMIPS

> It can be slower for programs that use very short subroutines with local variables.

Which of course is what currently passes for good programming practice in mainstream development. Small subroutines with no global state are easier to understand and test in isolation.

> One general point: your test program does one thing which is simply a bad idea. Making the main program a subroutine just uses resources (stack, heap) and adds a layer of extra processing throughout MMBasic. For efficiency the main program should be where the code runs.

OK, not sure I will kick that habit until/unless I need that final burst of optimisation. Doing it this way allows any variables required by the main loop to not also be in the global scope.

Are local variables part of this new hash lookup scheme ... or do you do something else? Obviously you have to deal with local variables shadowing global variables, and other local variables lower down the stack.

> I have this morning done more optimisation on processing of subroutine exit so try the attached.

It is marginally better, but from my perspective not as good as RC6.

Best wishes,

Tom
Edited 2020-10-21 20:46 by thwill
Game*Mite, CMM2 Welcome Tape, Creaky old text adventures
 
thwill

Guru

Joined: 16/09/2019
Location: United Kingdom
Posts: 3848
Posted: 11:19am 21 Oct 2020
Copy link to clipboard 
Print this post

  TweakerRay said   Could you explain maybe shortly (for simple users like me) what you have done to make it that much quicker ? Just interested...)


Peter may correct me on some of the details, but essentially my understanding is that:

Prior to the change variables were stored in a list, finding a given variable XYZ involved iterating through the list comparing the variable names until you found XYZ. Thus the time to find any given variable was proportional to the number of variables defined.

Following the change variables are stored in 1024 "pigeon holes". When a variable needs to be stored or retrieved an address 1-1024 is created by executing a "hash" algorithm against the variable's name. Once you have this address you look in the corresponding pigeon hole and check the name of any variable stored there. If the name does not match then you have a collision, I'm not sure exactly what Peter is doing then, he may be storing a list of variables in each pigeon hole, or he may be storing the variable in the next available address, or he may be re-hashing to try and create a new unique address. When you have few variables this method is slower because of having to calculate the address/hash, but with more variables, up to about 500 then the time to find any given variable is relatively constant and much lower than the original scheme.

For more detail see https://en.wikipedia.org/wiki/Hash_function.

Best wishes,

Tom
Edited 2020-10-21 21:21 by thwill
Game*Mite, CMM2 Welcome Tape, Creaky old text adventures
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 8592
Posted: 11:52am 21 Oct 2020
Copy link to clipboard 
Print this post

  Quote   or he may be storing the variable in the next available address


Exactly so
 
thwill

Guru

Joined: 16/09/2019
Location: United Kingdom
Posts: 3848
Posted: 11:59am 21 Oct 2020
Copy link to clipboard 
Print this post

The key is selection of a good hash function that randomly and uniformly distributes the 32-character variable names over the addresses. If you had a really poor function, e.g. one that returned 1 irrespective of the variable name then the system would devolve into a linear search through the pigeon holes starting at address 1, basically what we had before the change but with more overhead.

Tom
Game*Mite, CMM2 Welcome Tape, Creaky old text adventures
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 8592
Posted: 12:26pm 21 Oct 2020
Copy link to clipboard 
Print this post

  Quote  but you'll permit me to be aggrieved that having put in lots of work to break the the 1,000 Z-Machine instruction per second boundary with Z-MIM


I can't cope with your grief

Try the attached and before any variables are defined put the statement

OPTION NOHASH



CMM2V1.5.zip
 
thwill

Guru

Joined: 16/09/2019
Location: United Kingdom
Posts: 3848
Posted: 12:38pm 21 Oct 2020
Copy link to clipboard 
Print this post

<Deep bow of respect>

Thankyou Peter, that restores the lost performance for Z-MIM.

I struggle to understand how switching to a hashed variable table had such a detrimental effect on my code.

I guess it must have something to do with how MMBasic local variables are implemented ?

If you don't have time to comment on this could you at least give me the name of a function or two that I can grep for in the firmware source-code so I can try and piece together the answers for myself ?

Best wishes,

Tom
Edited 2020-10-21 22:55 by thwill
Game*Mite, CMM2 Welcome Tape, Creaky old text adventures
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 8592
Posted: 01:05pm 21 Oct 2020
Copy link to clipboard 
Print this post

There is almost no overhead in creating the hash table and with OPTION NOHASH it still happens. All I've done is overridden the calculated hash with 0 so the collision mechanism acts like a normal sequential search.
The issue with hashing arises in cleaning up the variable table after a subroutine exits. With the hash table the cleanup has to search the whole table for any variables at the level of the subroutine and delete them. This is quite optimised but still has a lot of overhead.
Without the hash table the search is just up to the variable count so with small programs or limited number of variables can be quicker.
 
thwill

Guru

Joined: 16/09/2019
Location: United Kingdom
Posts: 3848
Posted: 01:30pm 21 Oct 2020
Copy link to clipboard 
Print this post

Thanks Peter,

Recognising that it is easy for me to say this since I'm not the one who would have to implement it, but I assume that each sub/func invocation has some piece of stack/storage ? In which case can't each just maintain a list of Local variable names as they are encountered and use that to facilitate cleanup thus avoiding searching the whole table ?

Regards,

Tom
Edited 2020-10-22 01:34 by thwill
Game*Mite, CMM2 Welcome Tape, Creaky old text adventures
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 8592
Posted: 04:21pm 21 Oct 2020
Copy link to clipboard 
Print this post

  Quote  Recognising that it is easy for me to say this since I'm not the one who would have to implement it, but I assume that each sub/func invocation has some piece of stack/storage ? In which case can't each just maintain a list of Local variable names as they are encountered and use that to facilitate cleanup thus avoiding searching the whole table ?


Concept 1: Fit with the way MMbasic works 0

I spent quite a lot of time trying to think of a way to do this and couldn't. Things like recursion make it a nightmare.

The NOHASH option seems like the best compromise so developers where speed is critical can choose the optimum for their application. It is a runtime only option, defaults to hashing when a program is run similar to the way OPTION EXPLICIT works
 
NPHighview

Senior Member

Joined: 02/09/2020
Location: United States
Posts: 193
Posted: 04:28pm 21 Oct 2020
Copy link to clipboard 
Print this post

Peter - there's a noticeable improvement in performance of the plotter package, especially in the 2nd demo, plotting the 2D-mapped 3D Lissajous pattern.  Much of the performance improvement is in the mainline program, I'd guess, though there's plenty of arithmetic performed in the scaled drawing routine, which makes extensive use of global variables.

Congratulations!

- Steve ("NPHighview") Johnson
Live in the Future. It's Just Starting Now!
 
mkopack73
Senior Member

Joined: 03/07/2020
Location: United States
Posts: 261
Posted: 04:55pm 21 Oct 2020
Copy link to clipboard 
Print this post

  matherp said  
  Quote  Recognising that it is easy for me to say this since I'm not the one who would have to implement it, but I assume that each sub/func invocation has some piece of stack/storage ? In which case can't each just maintain a list of Local variable names as they are encountered and use that to facilitate cleanup thus avoiding searching the whole table ?


Concept 1: Fit with the way MMbasic works 0

I spent quite a lot of time trying to think of a way to do this and couldn't. Things like recursion make it a nightmare.

The NOHASH option seems like the best compromise so developers where speed is critical can choose the optimum for their application. It is a runtime only option, defaults to hashing when a program is run similar to the way OPTION EXPLICIT works


Yeah without having a C-style stack which keeps the local variables on the stack frame and bubbles up copies to new frames above, then yeah, with a single hashtable this is the best you're going to be able to get.  If you had the C-style stack then you could just copy the hashtable into the new stack frame and blow it away on stack pop...

At least having the option allows us to try it both ways and see which gives the better performance based on how our code works... It's still transparent to the end user.
 
thwill

Guru

Joined: 16/09/2019
Location: United Kingdom
Posts: 3848
Posted: 05:28pm 21 Oct 2020
Copy link to clipboard 
Print this post

BUG! I thought about what I would have cocked up with the implementation, and apparently I am in good company :

Colour Maximite 2
MMBasic Version 5.05.06exp
Copyright 2011-2020 Geoff Graham
Copyright 2016-2020 Peter Mather

> list "test_erase.bas"
option explicit on
option nohash
dim a = 5
dim b = 6
erase a
print b

> run "test_erase.bas"
Error in line 6: B is not declared


I doubt this is specific to OPTION NOHASH. Does it happen whenever you ERASE a variable with a hash value conflict ?

Best wishes,

Tom
Edited 2020-10-22 03:36 by thwill
Game*Mite, CMM2 Welcome Tape, Creaky old text adventures
 
     Page 2 of 8    
Print this page
© JAQ Software 2024