Home
JAQForum Ver 24.01
Log In or Join  
Active Topics
Local Time 10:38 01 Aug 2025 Privacy Policy
Jump to

Notice. New forum software under development. It's going to miss a few functions and look a bit ugly for a while, but I'm working on it full time now as the old forum was too unstable. Couple days, all good. If you notice any issues, please contact me.

Forum Index : Microcontroller and PC projects : Improving MMEDIT crunch function

Author Message
skillet
Newbie

Joined: 13/07/2020
Location: Australia
Posts: 9
Posted: 10:02pm 29 Dec 2020
Copy link to clipboard 
Print this post

Hi all,

I've worked most of the covid period this year on the Silicon Chip DAB+ radio project from early 2019. Along the way, a mate joined me and we progressed together, found and solved a range of issues with the original SC software, and rewrote SC's mmbasic programme from scratch to support digital / optical output for DAB, and a bunch of other features.

As it goes, our mmbasic programme ended up significantly larger than the SC original and would only fit into the micromite plus after crunching. So far, so good - nothing to see here.

At some time late in our development process as the code grew larger, my friend complained that MMEDIT could no longer crunch the code and transfer to the micromite. Because I don't own any windows computers, I don't use MMEDIT myself, and have written my own linux/osx/bash crunch script and I transfer code directly to the micromite via the SDMMC card. My 'crunched' code fitted fine into the micromite plus and ran, but MMEDIT failed to transfer.

On closer inspection, we saw that the MMEDIT crunched version of the programme was larger than my bash scripted crunch by well more than 10%, and that this was the root cause. Hence, this post. The script below gives us a crunched size around 15% smaller than the MMEDIT crunched output, enough to allow our DAB radio code to fit and run fine.

As I don't have a windows machine as mentioned, I've not looked closely to see what aspect about my crunch script removes more gunk than the MMEDIT built in version, but clearly there's something. If you read and follow my script, you'll also see there are a few more opportunities to shave more spaces than its doing, so I expect an additional few percent is achievable.

Maybe the code gurus working on MMEDIT can take a look at MMEDIT and improve crunch a little. I suspect its relatively straightforward to get some real gains.


#!/bin/bash -noprofile
#
# This shell script will run under MacOS or Linux to crunch an MMBASIC programme
# (It assumes dosutils has been installed, which includes unix2dos)

if [[ -z "${1}" ]] || [[ -z "${2}" ]] || [[ "${1}" == "${2}" ]] ; then
 echo "Usage: `basename ${0}` <source> <destination>"
 echo "CRUNCH source file and write back as destination file"
 exit 1
fi

cat "${1}" | \
   sed -e "s/'.*$//g" | \
   sed -e '/^[[:space:]]*$/d' | \
   sed -e 's/^[[:space:]]*//g' | \
   sed -e 's/[[:space:]]*$//g' | \
   sed -e 's/[[:space:]]*\([<=][=>]\)[[:space:]]*/\1/g' | \
   sed -e 's/)[[:space:]]*and[[:space:]]*(/)and(/g' | \
   sed -e 's/)[[:space:]]*or[[:space:]]*(/)or(/g' | \
   sed -e 's/if[[:space:]]*(/if(/g' | \
   sed -e 's/)[[:space:]]*then/)then/g' | \
   sed -e 's/[[:space:]]*\([=><+*\/()-]\)[[:space:]]*/\1/g' | \
   sed -e 's/is\(\)/is \1/g' | \
   sed -e 's/print[[:space:]]*"/print"/g' | \
   awk 'BEGIN {FS = OFS = "\""} \
     /^[[:blank:]]*$/ {next} \
     {for (i=1; i<=NF; i+=2) gsub(/,[[:space:]]*/,",",$i)} \
     1' | \
   unix2dos > "${2}"


Note: the emoji that has been inserted above is actually the combination of the following four characters, without any spaces between them [ > < ]
Edited 2020-12-30 08:26 by skillet
 
TassyJim

Guru

Joined: 07/08/2011
Location: Australia
Posts: 6283
Posted: 11:07pm 29 Dec 2020
Copy link to clipboard 
Print this post

  Quote  Maybe the code gurus working on MMEDIT can take a look at MMEDIT and improve crunch a little. I suspect its relatively straightforward to get some real gains.



MMEdit removes comments, leading and trailing white space and blank lines.
I agree that you can do a lot more but if you need that much, it is better to write the code in a tight format. There are too many ways to get it wrong if you let me decide where to remove spaces.

MMEdit has the option to do your own processing during uploading so there is that option do do what you like to the file.

Later versions of the MMBasic firmware have their own crunch so that might also be worth a look at.

Jim
VK7JH
MMedit
 
skillet
Newbie

Joined: 13/07/2020
Location: Australia
Posts: 9
Posted: 12:19am 30 Dec 2020
Copy link to clipboard 
Print this post

Thanks for explaining Jim. I can understand now why there's a significant size difference in crunched output. The standard MMEDIT crunch leaves the spaces in-line, removing only the leading and trailing spaces, and comment-only and blank lines.

As mentioned, I don't own windows computers, so cannot use MMEDIT in any capacity to crunch differently. That's why I scripted the linux tool in the first place. Shaving more spaces out was an unintended side effect that I wasn't expecting.

There's a compromise you take when you shave spaces out of a source file - that is readability. I write code professionally, work with it all day, and for me at least, readability is very important. So I agree that its possible to write in a tighter format, but I respectfully disagree that's a preferred solution compared with a smarter crunch approach. Perhaps this is just my own limitation... I find my own code undeciperable after only a couple of months of not having looked at it if I don't thoroughly comment and pay good attention to formatting for readability. Maybe I'm unique, but I suspect others might value *both* source code formatting, and small and unreadable run-time code.

I respectfully think its possible to crunch more space without breaking stuff. The script in the earlier post, for example, removed spaces between key words and special characters (eg brackets, maths operators and so on). I can't put my hand on my heart and say this will always work without fail, but I suspect it might. Another opportunity to remove additional spaces which I did not implement (because it requires keeping state as you scan through the file) is spaces between the 'print' keyword and a double quote character following immediately after. When I manually edited my code and removed such spaces, I got an extra percent of space savings. Not much of course, but if your code is so large as to fill flash and you can't remove anything else (or don't want to remove 'features'), an extra percent might be important.

In any case, the script I posted saves quite a bit of additional space, and for those that need to fit more code into their micromite, it might get them out of trouble.

Stefan
 
TassyJim

Guru

Joined: 07/08/2011
Location: Australia
Posts: 6283
Posted: 01:37am 30 Dec 2020
Copy link to clipboard 
Print this post

To save more space
Change all your PRINT statements to ?
Use short variable names.

That could reduce your DAB radio code by another 13k

Jim
VK7JH
MMedit
 
MustardMan

Senior Member

Joined: 30/08/2019
Location: Australia
Posts: 175
Posted: 09:00pm 20 Jan 2021
Copy link to clipboard 
Print this post

The question of squashing things to the max came to me but for a slightly different reason. Your reason : to fit a big program into not much memory. Mine : to increase execution speed.

MM-BASIC is very fast, but the way BASIC processes variable names can cause it to slow down (unnecessarily IMHO). I asked about squashing variable names at download time (so the 'source' is never changed - exactly like how 'crunching' strips non-exectable guff for download but leaves the source intact).

My original idea was to scan for all variables and automatically replace them. There was not much interest, but someone had written a processor that allowed variables to be 'squashed' to a user defined "short variable".

My project changed (ie: was cancelled) so I did not follow it through to the end, but it looked like it would do the trick.

See: MMEdit - #REPLACE Pi-cromite ESPBasic

For example, if you had a variable "Temperature" in your code 100 times, then renaming it to "T" would save you 1K! (OK, using a '#REPLACE Temperature T' would use some of that, but the savings are still pretty darn good).

There was another thread on the topic but I can't find it.
EDIT: Found... Access a STRING as an ARRAY of characters/bytes?
Edited 2021-01-21 07:07 by MustardMan
 
skillet
Newbie

Joined: 13/07/2020
Location: Australia
Posts: 9
Posted: 06:56am 21 Jan 2021
Copy link to clipboard 
Print this post

Hi Mr Mustard,

I had thought about the points you raised also, but to write a crunch program that can recognise variable names and reduce the most commonly used variables to single letters, and then the remainder to double letter or letter/number combinations requires something a lot more sophisticated than the kludge script that I shared.

I say sophisticated, but not impossible when you get your head around it. I wrote something like this maybe 40 years ago... The most important starting point was a properly documented language grammar - I mean documented something like the grammar shown in appendix A13 of the Kernigan and Ritchie bible on 'C' - and then you can more easily write a recursive descent parser based on that grammar. The variables then pop out at you, you count them, then rename the variables to one or two characters according to their frequency.

Anyway, much easier said than done and you still need an accurate representation of the BASIC grammar as implemented in MMBASIC. I'm not sure that's documented.

I'm not sure how MMBASIC represents all its keywords internally. Does it tokenise all the keywords or keep them as expanded text and interpret all the keywords each time each line is executed? I'm not sure. If there's a tokenised representation of the programme stored within MMBASIC itself, (and we can easily find/parse it as a block of binary memory representing the tokenised programme) then it may be straightforward to spot all the variable names because they are not tokenised themselves, and that might eliminate the need to start with the formal grammar.

Anyway, I am totally unfamiliar with the MMBASIC internals.

Stefan
 
MustardMan

Senior Member

Joined: 30/08/2019
Location: Australia
Posts: 175
Posted: 09:59am 21 Jan 2021
Copy link to clipboard 
Print this post

Hi Stefan,

I too am not familiar with how MMBASIC handles itself in memory. The manual does hint that storing/transferring a program (or subroutine, or font) into the library compacts it in the process, but I'm not sure as to what that means. Does an MMBASIC program in memory use tokens, or only when compacted into the library? Perhaps the compression is achieved by evaluating constants (eg: 123456) and storing them as true numbers rather than ASCII strings? I just don't know. Some of the earlier BASICs documented this pretty clearly, but I haven't seen such gritty details anywhere for MMBASIC.

Mind you, MMBASIC is really well written, exceptionally bug free, and has so much more functionality than any other BASIC I've ever seen, so I can't complain!

I think because CPUs are so much faster and memories so much larger, it is likely most people don't really care. It is just you and I!!

By the way, MMedit can pull out variables, and even list their scope... I haven't fiddled with that functionality much. I should.

Cheers,
 
thwill

Guru

Joined: 16/09/2019
Location: United Kingdom
Posts: 4311
Posted: 11:58am 21 Jan 2021
Copy link to clipboard 
Print this post

Has this thread moved beyond asking for changes to MMEdit?

If not then there follows an answer to a different question

There is a legacy Crunch utility on Fruit of the Shed that might be of use:

       http://fruitoftheshed.com/MMBasic.MMBasic-Source-Compression-Utility-CRUNCH-bas-v2-4-part-of-the-original-MMBasic-library.ashx

Or my own sptrans utility can minimise whitespace and replace identifiers, though you have to provide a list of replacements that you want to apply. It can also be used to reformat / pretty-print MMBasic with various options to customise the "style":

       https://github.com/thwill1000/sptools/tree/develop-r1b3

If you do look at this then I suggest the latest development version (https://github.com/thwill1000/sptools/archive/develop-r1b3.zip)  instead of my Aug 2020 release.

Please note I am "force-pushing" updates to this GitHub branch. If you don't know what that means, then it won't matter to you.

Best wishes,

Tom
Edited 2021-01-21 22:30 by thwill
MMBasic for Linux, Game*Mite, CMM2 Welcome Tape, Creaky old text adventures
 
skillet
Newbie

Joined: 13/07/2020
Location: Australia
Posts: 9
Posted: 04:32am 22 Jan 2021
Copy link to clipboard 
Print this post

A ha!

Thankyou very much Tom. That crunch tool is exactly what I was thinking of as the ultimate kind of tool

As I don't have access to a windows computer or VM, my next problem will be either rewriting a C tool version of the crunch programme, or going to a mate's place to run it.... I think the mate's place sounds easier and faster.

Cheers, and thanks for the link. I'd never have found it myself.

Stefan
 
Print this page


To reply to this topic, you need to log in.

The Back Shed's forum code is written, and hosted, in Australia.
© JAQ Software 2025