|
Forum Index : Microcontroller and PC projects : CMM2 MMBasic CSUB - use ARM assembler
| Author | Message | ||||
| jirsoft Guru Joined: 18/09/2020 Location: Czech RepublicPosts: 533 |
Hi, as my last "old full screen programmable computer" was Acorn Archimedes, where I have developed apps mostly in combination BASIC + assembler (and before on C64 and Atari ST the same combination), is for me interesting to continue in this style and use CSUB not for C but for assembler. If I correctly understood, CMM2 (Cortex M7) can't be programmed in (nice) ARM instruction set but in (not so nice) Thumb2 set. It's right? (I'm still waiting for my CMM2, so such stupid questions). What will be format in CSUB for assembler? First will be 0 as offset when assembler part should start just with first instruction, but how to end it? Simple MOV pc, lr (MOV R15, R14)? Or BX lr (I found somewhere)? What registers could be changed? Hwo will be given parameters to the CSUB, on the stack or in registers? I know most people will use MMBasic and for speedup C, but I think better is be comfortable (MMBasic) + fast (ASM), I don't need something in the middle... (I have to develop sometimes in C/C++ but don't like it too much). Jiri Napoleon Commander and SimplEd for CMM2 (GitHub), CMM2.fun |
||||
| JohnS Guru Joined: 18/11/2011 Location: United KingdomPosts: 4142 |
You don't have to stick to thumb/thumb2 (see the ARM doc). Compile some C and look at its assembler code (see gcc flags). TBH I'd stick to C because it will be plenty fast enough, far more readable and far more maintainable. You can of course mix C & ASM (or even do inline ASM). John Edited 2020-10-02 17:18 by JohnS |
||||
| jirsoft Guru Joined: 18/09/2020 Location: Czech RepublicPosts: 533 |
Hi John, thanks for answer; this with C as more maintainable is valid point (readable is for me is the question), but for C I need every time compile CSub outside of CMM2. My idea was develop small assembler in BASIC, so I can assemble source file (SOMETHING.S) into BASIC file for including into main app (create SOMETHING.INC), there will be CSub as compiled asm + source code as comment inside). I have already written one quick&dirty in other BASIC dialect this week, so when I will get CMM2 I can convert it for MMBasic Code on GitHub ... But I did it as simple ARM (not Thumb/Thumb2) assembler (as for me much easier, I know it and all instructions are 32 bit + very orthogonal) and finally (from ARM doc) I have found, that Cortex M7 can't do ARM, just Thumb2 (+Thumb). Maybe I'm wrong, but when not, I need to write Thumb2 assembler, what will not be as easy and fast (at least for me) because of instructions chaos and mix of 16/32 bit. Of course, the inline assembler in MMBasic were much better and convenient, but at least this way can be all steps done direct on CMM2. And with small snippets usually needed (most parts in BASIC and just speedups in ASM) it could be also "readable and maintainable". But as I said, it's just an idea... Jiri Napoleon Commander and SimplEd for CMM2 (GitHub), CMM2.fun |
||||
| matherp Guru Joined: 11/12/2012 Location: United KingdomPosts: 10582 |
The CMM2 is compiled in Thumb2 so any assembler routine needs to be in thumb2. The calling sequence is that the compiler puts the function parameters in R0, R1, etc. and returns the answer in R0 if applicable. In MMBasic the parameters will always be addresses rather than values. Below is a very simple example routine that I use in the CMM2 for moving a 32-byte block of data. You should push/pop any registers you use to ensure MMBasic isn't corrupted varcopy: PUSH {R5} LDR r5,[r1],#4 str r5,[r0],#4 LDR r5,[r1],#4 str r5,[r0],#4 LDR r5,[r1],#4 str r5,[r0],#4 LDR r5,[r1],#4 str r5,[r0],#4 LDR r5,[r1],#4 str r5,[r0],#4 LDR r5,[r1],#4 str r5,[R0],#4 LDR r5,[r1],#4 str r5,[r0],#4 LDR r5,[r1],#4 str r5,[r0],#4 POP {R5} bx lr |
||||
| jirsoft Guru Joined: 18/09/2020 Location: Czech RepublicPosts: 533 |
Hi Peter, thank you very much for quick explanation, it helps a lot! So, if correctly understood, in your example I can write: CSUB varcopy integer, integer 00000000 20B451F8 045B40F8 045B51F8 045B40F8 045B51F8 045B40F8 045B51F8 045B40F8 045B51F8 045B40F8 045B51F8 045B40F8 045B51F8 045B40F8 045B51F8 045B40F8 045B20BC 70470000 END CSUB ? My question is, I need to pad last 16 bits with 2 bytes of 00 (in bold)? Up to 10 parameters allowed, so r0-r9 could be filled with parameter, so they don't need to be preserved. Why in your case have to be R5 preserved? In this case, the offset would be 00000000 as underlined? Thanks for help, Jiri Jiri Napoleon Commander and SimplEd for CMM2 (GitHub), CMM2.fun |
||||
| matherp Guru Joined: 11/12/2012 Location: United KingdomPosts: 10582 |
This is not a CSUB, it is an internal routine. You can't afford to make assumptions about the compiler, particularly with optimisations turned on. If an internal function is called with only two parameters then it isn't necessarily going to protect unused registers. AFAIK no-one thus far has written A CSUBin assembler so you probably want to start by writing a simple one in C and then look at the generated assembler to confirm the calling mechanism and then copy that across into a test assembler version |
||||
| jirsoft Guru Joined: 18/09/2020 Location: Czech RepublicPosts: 533 |
OK, thanks for answer. I will wait for my CMM2 and play in the meantime with the Thumb2 assembler (at least with some limited version)... Jiri Napoleon Commander and SimplEd for CMM2 (GitHub), CMM2.fun |
||||
| jirsoft Guru Joined: 18/09/2020 Location: Czech RepublicPosts: 533 |
Hi Peter, I have played a little bit with CSub assembler files generated from arm-none-eabi-gcc and found few interesting things: 1. looks like -O3 against +O0 compile assembler near to my own code, why you switch the optimisation off? Maybe it can bring massive speed (and space) improvement... void plus13(long long int *a) { *a = 13 + *a; } with -O0: push {r4, r5, r7} sub sp, sp, #12 add r7, sp, #0 str r0, [r7, #4] ldr r3, [r7, #4] ldrd r2, [r3] adds r4, r2, #13 adc r5, r3, #0 ldr r3, [r7, #4] strd r4, [r3] nop adds r7, r7, #12 mov sp, r7 @ sp needed pop {r4, r5, r7} bx lr with -O3: ldrd r3, r2, [r0] adds r3, r3, #13 adc r2, r2, #0 strd r3, r2, [r0] bx lr 2. as you see, address of first parameter it's put to r0, until just r0-r3 are used, nothing needs to be preserved... 3. if you have more than 4 parameters (address in r0-r3), they are put into stack That's all I found until now. Jiri Napoleon Commander and SimplEd for CMM2 (GitHub), CMM2.fun |
||||
| matherp Guru Joined: 11/12/2012 Location: United KingdomPosts: 10582 |
Off is always safe but any CSUB developer can change the optimisation however they want. In this case it is up to them to ensure their code still works. I'm probably overcautious about this. The concept of CSubs were initially developed for the PIC32 and the PIC32 compiler is very poor at generating position independent code. In particular optimisations other than 0 often resulted in unusable code. The ARM compiler is much better at position independence but it is still sensible to code with O0 and then once the code is fully working and debugged tune the optimisation. |
||||
| twofingers Guru Joined: 02/06/2014 Location: GermanyPosts: 1677 |
Hi Peter, I can confirm that the -O3 option works without issues (almost). In my tests the speed advantage was up to 50%. bitorderreverse -o3.zip However, I think it was appropriate to start with "-O0" at first. Kind regards Michael causality ≠ correlation ≠ coincidence |
||||
| jirsoft Guru Joined: 18/09/2020 Location: Czech RepublicPosts: 533 |
Hi Peter, just stupid question: MMBasic/FW is compiled with -O3? Jiri Napoleon Commander and SimplEd for CMM2 (GitHub), CMM2.fun |
||||
| matherp Guru Joined: 11/12/2012 Location: United KingdomPosts: 10582 |
No: some parts use Ofast and other parts use O2 - this as the result of much experimentation. O3 is slower in most cases |
||||
| LeoNicolas Guru Joined: 07/10/2020 Location: CanadaPosts: 527 |
Hey jirsoft and matherp I'm very interested to understand how to write csub code from C. I'm creating an API for 3D rendering and for better performance I would like to convert it to csub routines. This is a video showing the current API status: https://www.youtube.com/watch?v=JPLf6eobqa4&ab_channel=LeonardoNicolas Do you have a link for a good documentation about csub? The questions I have were: How can I access the arguments (int, float, string, or array) received from the MMBasic call? How can I output values from the C routine? How do I compile the C code for the CMM2? Might I use GCC? Which arguments should I used? Thank you Leo Edited 2020-10-08 10:09 by LeoNicolas |
||||
| twofingers Guru Joined: 02/06/2014 Location: GermanyPosts: 1677 |
Hi Leo, I think most of the questions are answered here. Please also read the older threads for the Micromite 2. Best regards Michael causality ≠ correlation ≠ coincidence |
||||
| LeoNicolas Guru Joined: 07/10/2020 Location: CanadaPosts: 527 |
Thank you Michael Do you know if there is a MMBasic version for Linux? |
||||
| JohnS Guru Joined: 18/11/2011 Location: United KingdomPosts: 4142 |
There was, akin to the DOS/Windows one and quite old in terms of functions etc, and then also one for (older) RPi boards (which relies on pigpio and some other stuff) - look for picromite. John Edited 2020-10-08 20:00 by JohnS |
||||
| The Back Shed's forum code is written, and hosted, in Australia. | © JAQ Software 2025 |