bit swap in byte


Author Message
ville56
Regular Member

Joined: 08/06/2022
Location: Austria
Posts: 68
Posted: 07:42pm 25 Jul 2022      

I have the need to use SPI with LSB first to talk from a Pico to a VFD display. As i didn't find a way to tell the SPI to use LSB first, i need to swap the 8 bits in every byte in code. So far i figured out 2 methods:

1) bit shifting the byte to swap and the swapped byte (AND/OR to read/set bits). Slim, but takes a lot of CPU cycles.

2) table lookup, this requires 256 bytes data space (for 8 bits) but is quite fast.

Currently i have method 1 implemented. Works but limits the responsiveness of my app (analog bargraph/level-meter). Don't like to use method 2 as i'm a "save memory space junkie" (old assembler coders disease i suppose).

Does someone know of a method which is in between my 2 solutions, meaning more cpu and memory efficient in a combination?

Many thanks for any hint ...
Regards,
Gerald (OE1HGA)

thwill

Guru

Joined: 16/09/2019
Location: United Kingdom
Posts: 3836
Posted: 07:56pm 25 Jul 2022      

Not tested, but:

Use a 16 byte lookup table for 4 bits and then:

swapped% = lookup%(x% And &b1111) << 4 + lookup%(x% >> 4)


Best wishes,

Tom

ville56
Regular Member

Joined: 08/06/2022
Location: Austria
Posts: 68
Posted: 08:13pm 25 Jul 2022      

Thanks Tom,

oh yes, this is a compromise i didn't think of. I have to compare actual execution time, but your approach should be a lot faster.

Thank you again,
Gerald

matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 8572
Posted: 09:46pm 25 Jul 2022      

Here is a fun solution

sb%=(((ob% * &H80200802) AND &H0884422110) * &H0101010101 >> 32) and 255


> ob%=&B11011001
> sb% = (((ob% * &H80200802) AND &H0884422110) * &H0101010101 >> 32) and 255
> ? bin$(sb%)
10011011
>

Mixtel90

Guru

Joined: 05/10/2019
Location: United Kingdom
Posts: 5707
Posted: 10:02pm 25 Jul 2022      

Sheesh.....  My brain hurts......

:(

twofingers
Guru

Joined: 02/06/2014
Location: Germany
Posts: 1133
Posted: 06:39am 26 Jul 2022      

Hi!
I don't know if this CSUB BitOrderReverse helps?
Regards
Michael

matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 8572
Posted: 06:56am 26 Jul 2022      

This one is probably faster

for b%=0 to 255:? bin$((b% * &H0202020202 and &H010884422010) mod 1023,8):next

twofingers
Guru

Joined: 02/06/2014
Location: Germany
Posts: 1133
Posted: 07:19am 26 Jul 2022      

  matherp said  This one is probably faster

for b%=0 to 255:? bin$((b% * &H0202020202 and &H010884422010) mod 1023,8):next

I like to believe that.

EDIT:
Your solution is very sophisticated!
But the CSub is one option more to solve that problem ...
Edited 2022-07-26 17:27 by twofingers

ville56
Regular Member

Joined: 08/06/2022
Location: Austria
Posts: 68
Posted: 10:22am 26 Jul 2022      

well, i've timed the average runtime for 25600 runs (378MHz CPU):

- fun solution: 0.0618 mSec
- faster fun solution: 0.0584 mSec
- my old solution: 0.3368 mSec
a factor of almost 6 times faster, impressive.

btw, i'm still havin a lot of "fun" figuring out how the fun solutions work. So far, i can see the results of the individual steps, but i do not understand the underlying principle/math. Can matherp please explain that a bit ....

ad CSUB: have not tried csub at all until now, must read myself into that topic first.

thanks so far to all contributors, have now 2 more solutions and 2 more new myracles :-)

Regards,
Gerald

matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 8572
Posted: 10:31am 26 Jul 2022      

  Quote  Can matherp please explain that a bit ....


Not my work I'm afraid - explained here

https://graphics.stanford.edu/~seander/bithacks.html#ReverseByteWith64BitsDiv
Edited 2022-07-26 20:36 by matherp

ville56
Regular Member

Joined: 08/06/2022
Location: Austria
Posts: 68
Posted: 10:55am 26 Jul 2022      

thanks for the link, it's now getting clearer to me. Anyway, a very sophisticated algorithm in deed.

Volhout
Guru

Joined: 05/03/2018
Location: Netherlands
Posts: 3498
Posted: 10:43pm 26 Jul 2022      

It is a ingenius algorythm, but only efficient for 64 bit processors. On an 8bit machine the 8 fold

if a and 128 then b=b+1
If a and 64 then b=b+2
Etc....

Is MUCH faster...

And a 256 entry lookup table is superior, even to the above algorythm. The charm of it is that it is a simple formula, hiding smart thinking... from rhe real math wizards
Edited 2022-07-27 08:48 by Volhout

phil99

Guru

Joined: 11/02/2018
Location: Australia
Posts: 1776
Posted: 05:33am 27 Jul 2022      

And another, not fast just different.

Saved 113 bytes
> LIST
a%=31 :c%=0 :Print Bin$(a%,8)
Timer =0:For n=0 To 7:c%=c%<<1:c%=c%+(a% And 1):a%=a%>>1:Next :Print Timer
Print Bin$(c%,8)
> RUN
00011111
0.279
11111000
> ? mm.info(cpuspeed)
378000000
>

Plasmamac

Guru

Joined: 31/01/2019
Location: Germany
Posts: 501
Posted: 09:20am 27 Jul 2022      

my mistake
Edited 2022-07-27 19:41 by Plasmamac

twofingers
Guru

Joined: 02/06/2014
Location: Germany
Posts: 1133
Posted: 01:27pm 27 Jul 2022      

Just comparison purposes (This is a recompilation of the CMM2 version):
The BitOrderReverse CSUB for PicoMite takes 21us for a byte (8 bit) and 31us (23-34us) for 64 bit integers on a PicoMite at 378MHz (V5.07.05b15).
Execution time is not constant and varies somewhat with the number of ones.


Teraterm output:
BitOrderReverse Demo

Input:
10111100101111001011110010111100101111001011110010111100101111
Output for 8 Bits:
11110100


Time for 10000 loops:
CSUBs     Empty loops|  1x CSUB
250ms            45ms|   21us


Basic-source:
'File BitOrderReverse.bas written 10-09-2020 14:03:13
' v1.01 by twofingers@tbs (thanks to chris@TBS)
'
'a CSUB to reverse the order of bits for PicoMite
'Usage:
'CSUB BitOrderReverse(nInput, nBits, RetValue)

Dim integer a, b=0, nBits=8,l=10000

'   1234567890123456789012345678901234567890123456789012345678901234
'            1         2         3         4         5         6
a=&b0010111100101111001011110010111100101111001011110010111100101111

Print "BitOrderReverse Demo":Print

Print "Input:"
Print Bin$(a,nBits)

Timer =0
For i = 1 To l
BitOrderReverse(a,nBits,b)
Next i
t0=Timer


Timer =0
For i = 1 To l

Next i
t2=Timer

Print "Output for"nBits" Bits:"
Print Bin$(b,nBits)

Print :Print
Print "Time for"l " loops:"
Print "CSUBs","  Empty loops|","1x CSUB"
Print Cint(t0)"ms",,Cint(t2)"ms|",Cint((t0-t2)/l*1000)"us";

End

'File bitreverse.bas written 27-07-2022 17:49:23 v1.44 (optimized: -ofast)
CSub BitOrderReverse
00000000
2300B5F0 46DE4657 4645464E 00104684 B5E02200 60436002 684A680B B085469A
DD672A00 681B4663 46992400 685B4663 46982500 91022301 21209003 00274249
D43D1866 40B10019 001E468B 40A64649 400E4640 40014659 D037430E 37014652
22201BD7 18BA4252 0019D43E 91014091 40BA001A 92009803 68476806 9A019900
41571876 60476006 26019902 2700680A 20024692 2100684A 416F1936 414D1824
DC2B4295 4661D031 00346809 46614689 003D6849 21204688 00274249 D5C11866
1B0E2120 40F10019 E7BE468B 27002601 21002002 416F1936 414D1824 DC0D4295
0034D016 E7A8003D 00192220 40D11BD2 E7BD9101 D1012A00 D1932B00 BCF0B005
46B246BB 46A046A9 4554BDF0 E7F5D9CB D9E64554 46C0E7F2
End CSUB


C-source:
#include "PicoCFunctions.h"

void borev(long long *orig, long long *nbits, long long *reversed)
{
 int i = 1;
 *reversed = 0;
 unsigned long long one_bit = 1;

 for(;i<=*nbits;i++){
   if ((*orig & (one_bit<<(i-1))) > 0)
   *reversed += one_bit<<(*nbits-i);
 }
}


For infos about CSUBs on PicoMite:
https://www.thebackshed.com/forum/ViewTopic.php?TID=14126

Regards
Michael
Edited 2022-07-28 03:04 by twofingers

ville56
Regular Member

Joined: 08/06/2022
Location: Austria
Posts: 68
Posted: 04:57pm 27 Jul 2022      

thnks for the link to csub topics, i have to read that first and figure out how to properly compile the stuff in the end. Interesting that the loop in C is about 3 times faster than the linear shift/multiply in the MMbasic interpreter. C is efficient, if you like it or not. And yes, i don't like it ... ;-)

Regards,
Gerald

matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 8572
Posted: 05:19pm 27 Jul 2022      

b% * &H0202020202 and &H010884422010) mod 1023

would be even faster implemented as a CSUB. Remember the biggest time cost in MMbasic is not the math but parsing the ascii. This is evidenced by integers only being fractionally quicker than double precision floats.
The CSUB has only one parameter so very little to parse hence the speed improvement even with a slower algorithm

twofingers
Guru

Joined: 02/06/2014
Location: Germany
Posts: 1133
Posted: 05:25pm 27 Jul 2022      

  ville56 said  ... C is efficient, if you like it or not. And yes, i don't like it ... ;-) ...

Hi Gerald,
I just set up the compiler for the PicoMite CSubs today.

On the power of C: I think nowadays you can do most things in MM-Basic and that's fantastic. But sometimes things have to be very fast - this is the time of the CSUBs.
Kind regards
Michael

BTW: This is my Make.bat file. Maybe it helps ...?
@ECHO OFF
bin\arm-none-eabi-gcc -c -mcpu=cortex-m0 -mfloat-abi=soft -mthumb -Wall -Wno-main -ffunction-sections -O0 -fPIC -I. %1.c -o %1.o

IF ERRORLEVEL 1 Goto Done
pause

bin\arm-none-eabi-ld -nostartfiles -T arm-gcc-link.ld -o %1.elf %1.o

IF ERRORLEVEL 1 Goto Done
pause
MMBasic ArmCfGenV144.bas %1.elf %2
del %1.elf
del %1.o

:Done

Edited 2022-07-28 03:25 by twofingers

twofingers
Guru

Joined: 02/06/2014
Location: Germany
Posts: 1133
Posted: 05:54pm 27 Jul 2022      

  matherp said  b% * &H0202020202 and &H010884422010) mod 1023

@Peter
a%=(b% * &H0202020202 and &H010884422010) mod 1023
works for 8 bits.

Yes, it is very fast (15us) ... for 8 bits.

But what if we need 64 bits?

Best regards
Michael
Edited 2022-07-28 03:57 by twofingers

matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 8572
Posted: 06:12pm 27 Jul 2022      

Just do it on each byte in turn and then swap the bytes end to end