![]() |
Forum Index : Microcontroller and PC projects : Improving MMEDIT crunch function
Author | Message | ||||
skillet Newbie ![]() Joined: 13/07/2020 Location: AustraliaPosts: 9 |
Hi all, I've worked most of the covid period this year on the Silicon Chip DAB+ radio project from early 2019. Along the way, a mate joined me and we progressed together, found and solved a range of issues with the original SC software, and rewrote SC's mmbasic programme from scratch to support digital / optical output for DAB, and a bunch of other features. As it goes, our mmbasic programme ended up significantly larger than the SC original and would only fit into the micromite plus after crunching. So far, so good - nothing to see here. At some time late in our development process as the code grew larger, my friend complained that MMEDIT could no longer crunch the code and transfer to the micromite. Because I don't own any windows computers, I don't use MMEDIT myself, and have written my own linux/osx/bash crunch script and I transfer code directly to the micromite via the SDMMC card. My 'crunched' code fitted fine into the micromite plus and ran, but MMEDIT failed to transfer. On closer inspection, we saw that the MMEDIT crunched version of the programme was larger than my bash scripted crunch by well more than 10%, and that this was the root cause. Hence, this post. The script below gives us a crunched size around 15% smaller than the MMEDIT crunched output, enough to allow our DAB radio code to fit and run fine. As I don't have a windows machine as mentioned, I've not looked closely to see what aspect about my crunch script removes more gunk than the MMEDIT built in version, but clearly there's something. If you read and follow my script, you'll also see there are a few more opportunities to shave more spaces than its doing, so I expect an additional few percent is achievable. Maybe the code gurus working on MMEDIT can take a look at MMEDIT and improve crunch a little. I suspect its relatively straightforward to get some real gains. #!/bin/bash -noprofile # # This shell script will run under MacOS or Linux to crunch an MMBASIC programme # (It assumes dosutils has been installed, which includes unix2dos) if [[ -z "${1}" ]] || [[ -z "${2}" ]] || [[ "${1}" == "${2}" ]] ; then echo "Usage: `basename ${0}` <source> <destination>" echo "CRUNCH source file and write back as destination file" exit 1 fi cat "${1}" | \ sed -e "s/'.*$//g" | \ sed -e '/^[[:space:]]*$/d' | \ sed -e 's/^[[:space:]]*//g' | \ sed -e 's/[[:space:]]*$//g' | \ sed -e 's/[[:space:]]*\([<=][=>]\)[[:space:]]*/\1/g' | \ sed -e 's/)[[:space:]]*and[[:space:]]*(/)and(/g' | \ sed -e 's/)[[:space:]]*or[[:space:]]*(/)or(/g' | \ sed -e 's/if[[:space:]]*(/if(/g' | \ sed -e 's/)[[:space:]]*then/)then/g' | \ sed -e 's/[[:space:]]*\([=><+*\/()-]\)[[:space:]]*/\1/g' | \ sed -e 's/is\( ![]() sed -e 's/print[[:space:]]*"/print"/g' | \ awk 'BEGIN {FS = OFS = "\""} \ /^[[:blank:]]*$/ {next} \ {for (i=1; i<=NF; i+=2) gsub(/,[[:space:]]*/,",",$i)} \ 1' | \ unix2dos > "${2}" Note: the emoji that has been inserted above is actually the combination of the following four characters, without any spaces between them [ > < ] Edited 2020-12-30 08:26 by skillet |
||||
TassyJim![]() Guru ![]() Joined: 07/08/2011 Location: AustraliaPosts: 6283 |
MMEdit removes comments, leading and trailing white space and blank lines. I agree that you can do a lot more but if you need that much, it is better to write the code in a tight format. There are too many ways to get it wrong if you let me decide where to remove spaces. MMEdit has the option to do your own processing during uploading so there is that option do do what you like to the file. Later versions of the MMBasic firmware have their own crunch so that might also be worth a look at. Jim VK7JH MMedit |
||||
skillet Newbie ![]() Joined: 13/07/2020 Location: AustraliaPosts: 9 |
Thanks for explaining Jim. I can understand now why there's a significant size difference in crunched output. The standard MMEDIT crunch leaves the spaces in-line, removing only the leading and trailing spaces, and comment-only and blank lines. As mentioned, I don't own windows computers, so cannot use MMEDIT in any capacity to crunch differently. That's why I scripted the linux tool in the first place. Shaving more spaces out was an unintended side effect that I wasn't expecting. There's a compromise you take when you shave spaces out of a source file - that is readability. I write code professionally, work with it all day, and for me at least, readability is very important. So I agree that its possible to write in a tighter format, but I respectfully disagree that's a preferred solution compared with a smarter crunch approach. Perhaps this is just my own limitation... I find my own code undeciperable after only a couple of months of not having looked at it if I don't thoroughly comment and pay good attention to formatting for readability. Maybe I'm unique, but I suspect others might value *both* source code formatting, and small and unreadable run-time code. I respectfully think its possible to crunch more space without breaking stuff. The script in the earlier post, for example, removed spaces between key words and special characters (eg brackets, maths operators and so on). I can't put my hand on my heart and say this will always work without fail, but I suspect it might. Another opportunity to remove additional spaces which I did not implement (because it requires keeping state as you scan through the file) is spaces between the 'print' keyword and a double quote character following immediately after. When I manually edited my code and removed such spaces, I got an extra percent of space savings. Not much of course, but if your code is so large as to fill flash and you can't remove anything else (or don't want to remove 'features'), an extra percent might be important. In any case, the script I posted saves quite a bit of additional space, and for those that need to fit more code into their micromite, it might get them out of trouble. Stefan |
||||
TassyJim![]() Guru ![]() Joined: 07/08/2011 Location: AustraliaPosts: 6283 |
To save more space Change all your PRINT statements to ? Use short variable names. That could reduce your DAB radio code by another 13k Jim VK7JH MMedit |
||||
MustardMan![]() Senior Member ![]() Joined: 30/08/2019 Location: AustraliaPosts: 175 |
The question of squashing things to the max came to me but for a slightly different reason. Your reason : to fit a big program into not much memory. Mine : to increase execution speed. MM-BASIC is very fast, but the way BASIC processes variable names can cause it to slow down (unnecessarily IMHO). I asked about squashing variable names at download time (so the 'source' is never changed - exactly like how 'crunching' strips non-exectable guff for download but leaves the source intact). My original idea was to scan for all variables and automatically replace them. There was not much interest, but someone had written a processor that allowed variables to be 'squashed' to a user defined "short variable". My project changed (ie: was cancelled) so I did not follow it through to the end, but it looked like it would do the trick. See: MMEdit - #REPLACE Pi-cromite ESPBasic For example, if you had a variable "Temperature" in your code 100 times, then renaming it to "T" would save you 1K! (OK, using a '#REPLACE Temperature T' would use some of that, but the savings are still pretty darn good). There was another thread on the topic but I can't find it. EDIT: Found... Access a STRING as an ARRAY of characters/bytes? Edited 2021-01-21 07:07 by MustardMan |
||||
skillet Newbie ![]() Joined: 13/07/2020 Location: AustraliaPosts: 9 |
Hi Mr Mustard, I had thought about the points you raised also, but to write a crunch program that can recognise variable names and reduce the most commonly used variables to single letters, and then the remainder to double letter or letter/number combinations requires something a lot more sophisticated than the kludge script that I shared. I say sophisticated, but not impossible when you get your head around it. I wrote something like this maybe 40 years ago... The most important starting point was a properly documented language grammar - I mean documented something like the grammar shown in appendix A13 of the Kernigan and Ritchie bible on 'C' - and then you can more easily write a recursive descent parser based on that grammar. The variables then pop out at you, you count them, then rename the variables to one or two characters according to their frequency. Anyway, much easier said than done and you still need an accurate representation of the BASIC grammar as implemented in MMBASIC. I'm not sure that's documented. I'm not sure how MMBASIC represents all its keywords internally. Does it tokenise all the keywords or keep them as expanded text and interpret all the keywords each time each line is executed? I'm not sure. If there's a tokenised representation of the programme stored within MMBASIC itself, (and we can easily find/parse it as a block of binary memory representing the tokenised programme) then it may be straightforward to spot all the variable names because they are not tokenised themselves, and that might eliminate the need to start with the formal grammar. Anyway, I am totally unfamiliar with the MMBASIC internals. Stefan |
||||
MustardMan![]() Senior Member ![]() Joined: 30/08/2019 Location: AustraliaPosts: 175 |
Hi Stefan, I too am not familiar with how MMBASIC handles itself in memory. The manual does hint that storing/transferring a program (or subroutine, or font) into the library compacts it in the process, but I'm not sure as to what that means. Does an MMBASIC program in memory use tokens, or only when compacted into the library? Perhaps the compression is achieved by evaluating constants (eg: 123456) and storing them as true numbers rather than ASCII strings? I just don't know. Some of the earlier BASICs documented this pretty clearly, but I haven't seen such gritty details anywhere for MMBASIC. Mind you, MMBASIC is really well written, exceptionally bug free, and has so much more functionality than any other BASIC I've ever seen, so I can't complain! I think because CPUs are so much faster and memories so much larger, it is likely most people don't really care. It is just you and I!! By the way, MMedit can pull out variables, and even list their scope... I haven't fiddled with that functionality much. I should. Cheers, |
||||
thwill![]() Guru ![]() Joined: 16/09/2019 Location: United KingdomPosts: 4311 |
Has this thread moved beyond asking for changes to MMEdit? If not then there follows an answer to a different question ![]() There is a legacy Crunch utility on Fruit of the Shed that might be of use: http://fruitoftheshed.com/MMBasic.MMBasic-Source-Compression-Utility-CRUNCH-bas-v2-4-part-of-the-original-MMBasic-library.ashx Or my own sptrans utility can minimise whitespace and replace identifiers, though you have to provide a list of replacements that you want to apply. It can also be used to reformat / pretty-print MMBasic with various options to customise the "style": https://github.com/thwill1000/sptools/tree/develop-r1b3 If you do look at this then I suggest the latest development version (https://github.com/thwill1000/sptools/archive/develop-r1b3.zip) instead of my Aug 2020 release. Please note I am "force-pushing" updates to this GitHub branch. If you don't know what that means, then it won't matter to you. Best wishes, Tom Edited 2021-01-21 22:30 by thwill MMBasic for Linux, Game*Mite, CMM2 Welcome Tape, Creaky old text adventures |
||||
skillet Newbie ![]() Joined: 13/07/2020 Location: AustraliaPosts: 9 |
A ha! Thankyou very much Tom. That crunch tool is exactly what I was thinking of as the ultimate kind of tool As I don't have access to a windows computer or VM, my next problem will be either rewriting a C tool version of the crunch programme, or going to a mate's place to run it.... I think the mate's place sounds easier and faster. Cheers, and thanks for the link. I'd never have found it myself. Stefan |
||||
![]() |
![]() |
The Back Shed's forum code is written, and hosted, in Australia. | © JAQ Software 2025 |