Home
JAQForum Ver 24.01
Log In or Join  
Active Topics
Local Time 16:57 02 Aug 2025 Privacy Policy
Jump to

Notice. New forum software under development. It's going to miss a few functions and look a bit ugly for a while, but I'm working on it full time now as the old forum was too unstable. Couple days, all good. If you notice any issues, please contact me.

Forum Index : Microcontroller and PC projects : Need Basic language compression code

     Page 1 of 2    
Author Message
lizby
Guru

Joined: 17/05/2016
Location: United States
Posts: 3378
Posted: 02:37pm 13 Oct 2018
Copy link to clipboard 
Print this post

I'm looking for MMBasic (or any variety basic) code to compress a text file. The file, which might be multiple megabytes, would be on the SD card of an Explore-64, and I'd like to compress it before downloading to a PC using XMODEM (and I'm assuming that XMODEM doesn't do compression on the fly). It's likely to be highly compressible. The code wouldn't have to try to wring out every bit of savings, but must be lossless.

Of course, I would need to decompress it again on the PC side--with MMBasic DOS.



PicoMite, Armmite F4, SensorKits, MMBasic Hardware, Games, etc. on fruitoftheshed
 
twofingers

Guru

Joined: 02/06/2014
Location: Germany
Posts: 1593
Posted: 03:23pm 13 Oct 2018
Copy link to clipboard 
Print this post

Hi,

not the complete solution but maybe helpful?

https://rosettacode.org/wiki/LZW_compression

Regards
Michael
causality ≠ correlation ≠ coincidence
 
lizby
Guru

Joined: 17/05/2016
Location: United States
Posts: 3378
Posted: 04:55pm 13 Oct 2018
Copy link to clipboard 
Print this post

  twofingers said   maybe helpful?


The BBC Basic version looks hopeful. Needs some translation to MMBasic.

PicoMite, Armmite F4, SensorKits, MMBasic Hardware, Games, etc. on fruitoftheshed
 
twofingers

Guru

Joined: 02/06/2014
Location: Germany
Posts: 1593
Posted: 05:45pm 13 Oct 2018
Copy link to clipboard 
Print this post

I think that is an interesting topic (also useful as benchmark?).
Perhaps we should someday create a CFunction for that ...
causality ≠ correlation ≠ coincidence
 
CaptainBoing

Guru

Joined: 07/09/2016
Location: United Kingdom
Posts: 2170
Posted: 09:20am 14 Oct 2018
Copy link to clipboard 
Print this post

This is a big ask on a small machine.

I have been playing with the BBCTrans version (just the encode for now), and got it working, but the results are very hit n miss. (I tweaked a lot of the variable names so you can actually understand what is happening)

firstly the size of strings makes it difficult to build up a single output (i.e. your zipped string) because the output can very often be bigger than the input and two bytes can represent a single character of input, so if it doesn't zip well it will easily over-run the maximum size. This could probably be worked around but I think you'd need to stream the output to something which then makes having a self-contained function difficult as you'd need to feed chunks of your file in without upsetting the dictionary each time.

Most importantly, the memory size would be a big impact. The example given with the input as "TOBEORNOTTOBEORTOBEORNOT" isn't really representative (even that is 8 chars longer than the input). Even producing phrases that I would have thought lent heavily in the direction of being compressible, weren't and produce 300+ dictionary entries.

have a play with the below code. The final two figures output are the pre & post compression string lengths. On one string, the input is 131 chars, and the output 222 - that is uncomfortably close to the limit, so you'd have to use longstrings probably anyway.

Unless I have got something horribly wrong (which is always on the cards), I reckon this is an interesting mental exercise but difficult to put into practice on anything other than an MMX or ArmMite... even then aspirations have to be tempered by the realities - multi-megabyte is going to take a huge dictionary - because MMBasic uses fixed length strings, you have to size the dictionary before hand which could use your memory at a startling rate.

Also, depending on what you are zipping, you might need to think about monitoring the success of the dictionary and making a decision to ditch it when the efficiency drops to restore the matches.



option base 0

'un-REM whichever of the following you want to play with

'plaintext$ = "A B C D E G H I J K L M N O P Q R S T U V W X Y Z"
'plaintext$ = "ABCDEGHIJKLMNOPQRSTUVWXYZ"
'plaintext$ = "TOBEORNOTTOBEORTOBEORNOT"
'plaintext$ = "TTOBEORNOTTOBEORTOBEORNOTTOBEORNOTTOBEORTOBEORNOTTOBEORNOTTOBEORTOBEORNOTTOBEORNOTTOBEORTOBEORNOTTOBEORNOTTOBEORTOBEORN OT"
plaintext$ = "following may not be necessary due to wifi for above construct mains outlets, power shelf and G8 lightning string (12VDC beefy PSU)"
'plaintext$ = "a can opener can open anything a can opener can but if it can't open anything a can opener can it can't be called a can opener can it?"

zipped$ = encodeLZW$(plaintext$)

? len(plaintext$), len(zipped$)
?


end

FOR i% = 1 TO LEN(zipped$) 'STEP 2
PRINT ASC(MID$(zipped$,i%)); '+ 256*ASC(MID$(encodeLZW$,i%+1)) " " ;
NEXT

'end

'? zipped$

'? FNdecodeLZW(zipped$)


Function encodeLZW$(ii$)
local integer newindex, d, slicesize, freeptr
local string o$, sliceword$
Local string dict$(4095) length 8
FOR d = 0 TO 255 : dict$(d) = CHR$(d) : NEXT
freeptr = d 'first free place in the dictionary is 256
slicesize = 1
sliceword$ = LEFT$(ii$,1)
do
d = 0
do
newindex = d
IF slicesize > LEN(ii$) THEN EXIT
FOR d = 1 TO freeptr-1
IF sliceword$ = dict$(d) THEN EXIT FOR
NEXT
IF d < freeptr Then
slicesize =slicesize+ 1
sliceword$ =sliceword$+ MID$(ii$, slicesize, 1)
EndIf
loop UNTIL d >= freeptr
? freeptr,"!"+sliceword$+"!"
dict$(freeptr) = sliceword$
freeptr =freeptr+ 1
sliceword$ = RIGHT$(sliceword$, 1)
o$ =o$+ CHR$(newindex MOD 256) + CHR$(newindex \ 256)
loop UNTIL slicesize >= LEN(ii$)
encodeLZW$= o$
memory
End Function

'I have not worked on this part at all beyond obvious syntax translation
Function decodeLZW$(ii$)
local integer c, i, l
local string o$, t$, w$
local string dict$(4095) length 4
FOR i = 0 TO 255 : dict$(i) = CHR$(i) : NEXT i
l = i
c = ASC(ii$) + 256*ASC(MID$(ii$,2))
w$ = dict$(c)
o$ = w$
IF LEN(i$) < 4 THEN decodeLZW$= o$:Exit Function
FOR i = 3 TO LEN(i$) STEP 2
c = ASC(MID$(ii$,i)) + 256*ASC(MID$(ii$,i+1))
IF c < l Then
t$ = dict$(c)
ELSE
t$ = w$ + LEFT$(w$,1)
EndIf
o$ =o$+ t$
dict$(l) = w$ + LEFT$(t$,1)
l =l+ 1
w$ = t$
NEXT
decodeLZW$= o$
End Function



Beware, I think I can see duplication in the dictionary so I'm not convinced the algorithm is that great - but then I can't see why the "if slice$ isn't in the dictionary" logic would fail so may be there is some nuance I am missing.

EDIT:No it looks good... spaces:

RUN
256 !a !
257 ! c!
258 !ca!
259 !an!
260 !n !
261 ! o!
262 !op!
263 !pe!
264 !en!
265 !ne!
266 !er!
267 !r !
268 ! ca!
269 !an !
270 ! op!
271 !pen!
272 !n a!
273 !any!
274 !yt!
275 !th!
276 !hi!
277 !in!
278 !ng!
279 !g !
280 ! a!
281 !a c!
282 !can!
283 !n o!
284 !ope!
285 !ene!
286 !er !
287 ! can!
288 !n b!
289 !bu!
290 !ut!
291 !t !
292 ! i!
293 !if!
294 !f !
295 ! it!
296 !t c!
297 !can'!
298 !'t!
299 !t o!
300 !open!
301 !n an!
302 !ny!
303 !yth!
304 !hin!
305 !ng !
306 ! a !
307 ! can !
308 ! ope!
309 !ener!
310 !r c!
311 !can !
312 ! it !
313 ! can'!
314 !'t !
315 ! b!
316 !be!
317 !e !
318 ! cal!
319 !ll!
320 !le!
321 !ed!
322 !d !
323 ! a c!
324 !can o!
325 !opene!
326 !er c!
327 !can i!
328 !it!
329 !t?!
Flash:
2K ( 3%) Program (86 lines)
58K (97%) Free

RAM:
38K (75%) 11 Variables
0K ( 0%) General
12K (25%) Free
134 148

>
Edited by CaptainBoing 2018-10-15
 
lizby
Guru

Joined: 17/05/2016
Location: United States
Posts: 3378
Posted: 12:02pm 14 Oct 2018
Copy link to clipboard 
Print this post

Thanks for the thoughts. I'll give the programs a try.

I had an idea. My first job after college was as a proofreader for a "computerized" typesetting output. The text to be typeset was held on 5-bit paper tape, which I came to be able to read.

Here's how it managed to get nearly all ascii printable characters in 5 bits. In default mode (say the third 32-character row in a standard ascii table, containing the upper-case letters--see https://www.asciitable.com/), the top two letters, "^" and "_", 0x1E and 0x1F, would represent shift-to-lower-set and shift-to-upper-set respectively.

In those modes, the highest character, "?" and DEL respectively (0x1F) would mean shift back to default mode. You probably need "?", so use "~" to represent that. And you need line feed (representing CR & LF in DOS/Windows text), so replace backwards quote (0x60) with that. If you needed, say, TAB, I'd consider "{" and "}" expendable.

So you lose "^", "_", "~", DEL, and the backwards single quote, but you reduce the file size by 37.5%--from 8 bits per character to 5 (nearly--not counting the shift-in and shift-out codes).

PicoMite, Armmite F4, SensorKits, MMBasic Hardware, Games, etc. on fruitoftheshed
 
twofingers

Guru

Joined: 02/06/2014
Location: Germany
Posts: 1593
Posted: 02:01pm 14 Oct 2018
Copy link to clipboard 
Print this post

@Andrew(CaptainBoing)

Thanks for your efforts! But I guess there may be something wrong - or needs to be optimized (in the BBC code?).
If I compare your output with the output from the LZW-Compressor
then I get different compression rates.
Eg for the 122 bytes in
  Quote  TTOBEORNOTTOBEORTOBEORNOTTOBEORNOTTOBEORTOBEORNOTTOBEORNOTTOBEORTOBEORNOTTOBEORNOTTOBEORTOBEORNOTTOBEORNOTTOBEORTOBEORN OT

Your code generates 88 bytes vs. 44 bytes for LZW-Compressor.

The 131 bytes
  Quote  following may not be necessary due to wifi for above construct mains outlets, power shelf and G8 lightning string (12VDC beefy PSU)

Your code generates 222 bytes vs. 114 bytes for LZW-Compressor.

@lizby
I think thats could be a better/faster way than LZW.
Maybe you can implement a ESCape character for seldom used characters (ascii/ansi >127, ascii/ansi <32) in the output file followed by the ascii code of the original. This way you will not loss any character.

EDIT:
  Quote  In those modes, the highest character, "?" and DEL respectively (0x1F) would mean shift back to default mode.

I think that means the same.

EDIT2:
I fear we need more than 5 bits to store the whole alphabet, upper and lower case.

Edited by twofingers 2018-10-16
causality ≠ correlation ≠ coincidence
 
CaptainBoing

Guru

Joined: 07/09/2016
Location: United Kingdom
Posts: 2170
Posted: 10:13pm 14 Oct 2018
Copy link to clipboard 
Print this post

cheers for that - I suspected as much.

I did go over the wiki for LZW and the algorithm is fairly straightforward but I spent more time on getting the BBC code to run than checking the algorithm of it.

There are several enhancements over the original 1984 work so I guess the tool you used included those maybe.

I only had an academic interest so i am not heartbroken

A contemporary algorithm would be nice to see working.

hEdited by CaptainBoing 2018-10-16
 
lizby
Guru

Joined: 17/05/2016
Location: United States
Posts: 3378
Posted: 10:37pm 14 Oct 2018
Copy link to clipboard 
Print this post

  twofingers said  I fear we need more than 5 bits to store the whole alphabet, upper and lower case.


Just as a typewriter uses a real shift key to double the number of characters relative to the number of strikers, 5-bit codes can use 2 virtual shift keys to nearly triple the number--30 characters in normal mode (ascii 0x40-0x5d), 31 in downshifted, and 31 in upshifted--92 minus one for LF, or 91 of the 96 printable ascii characters (if you count DEL as printable).

I saw it done using 5-bit paper tape (I don't remember, but there may have been an additional shift for more special characters).

PicoMite, Armmite F4, SensorKits, MMBasic Hardware, Games, etc. on fruitoftheshed
 
TassyJim

Guru

Joined: 07/08/2011
Location: Australia
Posts: 6283
Posted: 11:02pm 14 Oct 2018
Copy link to clipboard 
Print this post

The original RTTY systems used 5 bit. Check out "Baudot".

I assume you are looking at transferring data you have collected.
You might be better served by compressing your data as it is collected, rather than doing it when you want to transfer.

Jim
VK7JH
MMedit
 
Volhout
Guru

Joined: 05/03/2018
Location: Netherlands
Posts: 5091
Posted: 09:14am 15 Oct 2018
Copy link to clipboard 
Print this post

As long as you don't need to differentiate between upper and lower case, baudot can be used, as TassyJim proposes. It can be efficient, but if there is a high mixture of numbers and characters it becomes awfully inefficient (change set eachtime costs another 5 bits).

And if the data you want to compress is pure numerical...why not send only 4 bits per character....50% compression.
Or you can compress by only storing the changes... either only store when a change happens, or store the delta with last measurement.

It is all depending on what compression you want to achieve, and what the data actually is. And if you need a loss-les compression or not.







Edited by Volhout 2018-10-16
PicomiteVGA PETSCII ROBOTS
 
lizby
Guru

Joined: 17/05/2016
Location: United States
Posts: 3378
Posted: 11:57am 15 Oct 2018
Copy link to clipboard 
Print this post

Thanks for the suggestion, Jim. The wikipedia article on Baudot is good. It was probably a variant that the company I was working for used. The photo of the paper tape with the tiny clocking holes (or maybe they were feeding holes) looks just like what I remember.

I don't recall whether "e" was a single bit or a truncated ascii-type code. I do remember DEL as all 5 bits punched--we spliced paper tape corrections in by copying a string while making corrections with leading and trailing DELs, then overpunching DEL through the original string, then cutting the paper tape and splicing in the correction by aligning the leading and trailing DELs. (This in addition to walking to work uphill in both directions--young folks these days!)

Volhout--I was already considering "nibble" coding for numerals plus space, comma, semicolon, LF, and have gotten confirmation that that is all there will be, so 4-bit it is.
Edited by lizby 2018-10-16
PicoMite, Armmite F4, SensorKits, MMBasic Hardware, Games, etc. on fruitoftheshed
 
CaptainBoing

Guru

Joined: 07/09/2016
Location: United Kingdom
Posts: 2170
Posted: 01:08pm 15 Oct 2018
Copy link to clipboard 
Print this post

if you can use CR instead of LF, this fits directly with your four bits (0Dh instead of 10h)

I know you will be mapping things anyway but jus' sayin'
 
twofingers

Guru

Joined: 02/06/2014
Location: Germany
Posts: 1593
Posted: 01:30pm 15 Oct 2018
Copy link to clipboard 
Print this post

@lizby
Yes, the reference to Baudot (or CCITT) helps a lot!

It would help to have a list of characters which the program should handle.
After that we need to determine the "shift" codes. How many "code pages" or character sets?

As a first draft I would say we should have a set (SET1) of 26 characters (a-z) + Space + EOF (= 28 char). Additionally a set (SET2) A-Z + Space + CR.
And finaly a set (SET3) of numbers (0-9) and extra characters (<>,.;:-?!"()/=+° etc.) + Space + EOF.
We could - for example - have an escape character (eg "^") in SET3 which allows us to insert any ansi/ascii code.

Just my 2 cents ...

EDIT: forgot CREdited by twofingers 2018-10-16
causality ≠ correlation ≠ coincidence
 
lizby
Guru

Joined: 17/05/2016
Location: United States
Posts: 3378
Posted: 01:12am 16 Oct 2018
Copy link to clipboard 
Print this post

Ok, this is just for proof of concept, and I don't expect to pursue it further unless I find a need. One program compresses the Declaration of Independence (8114 bytes) into 5-bit code (5340 bytes)--compressed to 68%. The other decompresses.

For another test, I ran the text of the Wikipedia Baudot article--12903 to 8890 bytes (69%).
[code]
' encode-5bit
DIM as string A, B, C, D, ch, fn
dim buffChars(5) AS STRING length 1
dim as integer i, j, k, l, m, n, iCh, nchars, iBuff
dim as integer iMode ' modes are 1=&H40-&H5F; 2=&H20-&H3F; 3=&H60-&H7F

' fn = "declaration"
fn = "dec"
' fn = "WikiBaudot"
open fn+".txt" for input as #1
open fn+"_encoded.txt" for output as #2
iMode = 1
nchars = 0
iBuff = 0
do while not eof(#1)
A = input$(250,#1)
l = len(A)
for i = 1 to l
ch = mid$(A,i,1)
if ch = " " then : ch = chr$(&H60) ' shift some punctuation to set 3
elseif ch = "," then : ch = "{"
elseif ch = "." then : ch = "|"
elseif ch = ":" then : ch = "}"
endif
iCh = ASC(ch)
' print ch, iCH; : print " ";
if iCh <> 13 then ' ignore carriage return
if iCh = 10 then
iCh = &H7E ' line feed becomes tilde
elseif iCh < &H20 then
iCh = &H5E ' unknown--becomes "?"
elseif ch = "?" or ch = "^" or ch = "_" then
iCh = &H5E ' becomes "?"
elseif ch = ">" then
iCh = &H5D ' becomes "]"
endif
if iCh > &H5F then ' mode 3
if imode = 1 then
addCH(&H1F) ' shift to upper set
elseif imode = 2 then
addCH(&H1F) ' shift to normal set
addCH(&H1F) ' shift to upper set
endif
iMode = 3
elseif iCh > &H3F then ' mode 1
if imode <> 1 then
addCH(&H1F) ' shift to normal set
iMode = 1
endif
elseif iCh > &H1F then ' mode 2
if imode = 1 then
addCH(&H1E) ' shift to lower set
iMode = 2
elseif imode = 3 then
addCH(&H1F) ' shift to normal set
addCH(&H1E) ' shift to lower set
iMode = 2
endif
endif
' print " | ", iCH; : print " : ";
addCH(iCH MOD 32)
endif
next i
' print A;
loop

close #1
if imode = 2 then : addCH(&H1F) : addCH(&H1F) : imode = 3 : endif ' shift to upper set
if imode = 1 then : addCH(&H1F) : imode = 3 : endif ' shift to upper set
if nchars <> 0 then
for i = nchars to 8
addCH(0) ' space
next i
endif
close #2
print ""
print fn+" encoded"
end

sub addCH(localiCH as integer) ' lower 5 bits of localiCH contain encoding
nchars = nchars + 1
iBuff = (iBuff << 5) + localiCH
' print localiCH; : print " > "; print iBuff
if nchars = 8 then ' iBuff contains 40 significant bits
buffChars(1) = chr$(iBuff >> 32)
buffChars(2) = chr$((iBuff AND &HFFFFFFFF) >> 24)
buffChars(3) = chr$((iBuff AND &HFFFFFF) >> 16)
buffChars(4) = chr$((iBuff AND &HFFFF) >> 8)
buffChars(5) = chr$(iBuff AND &HFF)
print #2, buffChars(1)+buffChars(2)+buffChars(3)+buffChars(4)+buffChars(5);
nchars = 0
iBuff = 0
' end ' 8 characters only
endif
end sub
[/code]
[code]
' decode-5bit
DIM as string A, B, C, D, ch, fn
dim as string cShiftNormal length 1, cShiftUp length 1, cShiftDown length 1
dim chString AS STRING length 8
dim buffChars(8) AS STRING length 1, cOutputBuff as string length 250
dim as integer i, j, k, l, m, n, iCh, nchars, iBuffChars(5), jStop
dim as integer iMode ' modes are 1=&H40-&H5F; 2=&H20-&H3F; 3=&H60-&H7F
cShiftNormal = chr$(&H1F) ' mode-dependent--for modes 2 & 3
cShiftDown = chr$(&H1E) ' mode-dependent--for mode1
cShiftUp = chr$(&H1F) ' for mode1

fn = "dec"
' fn = "WikiBaudot"
open fn+"_encoded.txt" for input as #1
open fn+"_decoded.txt" for output as #2
iMode = 1
nchars = 0
iBuff = 0
cOutputBuff = ""
chString=""
k = 0 ' count of characters in cOutputBuff
j = 0 ' count of loops
jStop = 0 ' debug while j<jStop
do while not eof(#1)
A = input$(5,#1)
l = len(A)
ich = 0
for i = 1 to l
iBuffChars(i) = asc(mid$(A,i,1))
If j < jStop then : print bin$(iBuffChars(i),8)+" "; : endif
next i
if j < jStop then : print "" : endif
' 11111222 22333334 44445555 56666677 77788888 ' 8 chars in 5 bytes
buffChars(1) = chr$(iBuffChars(1) >> 3)
buffChars(2) = chr$(((iBuffChars(1) and &B111) << 2) + (iBuffChars(2) >> 6))
buffChars(3) = chr$((iBuffChars(2) and &B00111110) >> 1)
buffChars(4) = chr$(((iBuffChars(2) and &B1) << 4) + (iBuffChars(3) >> 4))
buffChars(5) = chr$(((iBuffChars(3) and &B1111) << 1) + (iBuffChars(4) >> 7))
buffChars(6) = chr$((iBuffChars(4) and &B01111100) >> 2)
buffChars(7) = chr$(((iBuffChars(4) and &B11) << 3) + (iBuffChars(5) >> 5))
buffChars(8) = chr$(iBuffChars(5) and &B11111)
for i = 1 to 8
ch = buffChars(i)
if j < jStop then : print bin$(asc(ch),5)+" "; : endif
select case ch
CASE cShiftNormal
if iMode = 1 then
iMode = 3 ' lower case
else
iMode = 1 ' shift to normal--upper case
endif
CASE cShiftDown : if iMode = 1 then : iMode = 2 : endif ' numbers
end select
Select case iMode
case 1
if ch = cShiftNormal or ch = cShiftDown then
ch = chr$(0)
else
ch = chr$(asc(ch) + &H40)
endif
case 2
if ch = cShiftNormal or ch = cShiftDown then
ch = chr$(0)
else
ch = chr$(asc(ch) + &H20)
endif
case 3
if ch = cShiftNormal then
ch = chr$(0)
else
ch = chr$(asc(ch) + &H60)
endif
end select
if ch = chr$(&H60) then : ch = " " ' remap some characters
elseif ch = "{" then : ch = ","
elseif ch = "|" then : ch = "."
elseif ch = "}" then : ch = ":"
elseif ch = "~" then : ch = chr$(13)
endif
if ch <> chr$(0) then : cOutputBuff = cOutputBuff + ch : endif
k = k + 1
if k = 250 then
print #2, cOutputBuff;
cOutputBuff = ""
k = 0
endif
if ch = chr$(13) then
cOutputBuff = cOutputBuff + chr$(10)
k = k + 1
if k = 250 then
print #2, cOutputBuff;
cOutputBuff = ""
k = 0
endif
endif
if j < jStop then : chString = chString + ch : endif
next i
if j < jStop then : print "" : print chString : endif
chString = ""
j = j + 1
' exit do ' only one group of 5 bytes / 8 characters
loop
print #2, cOutputBuff

close #1
close #2
print ""
print fn+"_decoded.txt output"
end
[/code]
My first pass didn't achieve much--mainly because the most common punctuation characters (space, comma, period, colon) were not in the same set as the most common letters--the lower case letters. I shifted them up, replacing backquote, "{","|","}".

Next--at some point--I'll code for numerals, space, comma, colon, LF into 4 bits.

Texts zipped below.
2018-10-16_111126_dec.zip
2018-10-16_111156_WikiBaudot.zip

I did the development on MMBasic for DOS. The programs ran in not much more than a second.
Edited by lizby 2018-10-17
PicoMite, Armmite F4, SensorKits, MMBasic Hardware, Games, etc. on fruitoftheshed
 
Volhout
Guru

Joined: 05/03/2018
Location: Netherlands
Posts: 5091
Posted: 06:11am 16 Oct 2018
Copy link to clipboard 
Print this post

Another way of compressing existing data is simply storing the pointers. The declaration of independance can be compressed into a simple hyperlink.......
PicomiteVGA PETSCII ROBOTS
 
twofingers

Guru

Joined: 02/06/2014
Location: Germany
Posts: 1593
Posted: 12:11pm 16 Oct 2018
Copy link to clipboard 
Print this post

@lizby

Congratulation! Good job and nearly perfect.
I compared WikiBaudot.txt vs WikiBaudot_decoded.txt using Totalcommander/notepad++ and found some - only a few - differences.
  Quote  invented by Émile Baudot,[1]

  Quote  invented by imile Baudot,[1]

not surprising.

  Quote  In 1901, Baudot's code was modified by Donald Murray (1865–1945)

  Quote  In 1901, Baudot's code was modified by Donald Murray (1865v1945)

hmmm ...

It found seven differences, mainly because of "foreign characters".

An idea about more "shift codes":
A "shift code" followed by a "shift code" expands to the next shift level.


causality ≠ correlation ≠ coincidence
 
lizby
Guru

Joined: 17/05/2016
Location: United States
Posts: 3378
Posted: 01:01pm 16 Oct 2018
Copy link to clipboard 
Print this post

Yes, additional shift codes could be used. Right now I'm using &B11111 as "shift-up" when in the ascii &H40-5F mode, and as "shift-to-&H40" when in either of the other modes--but yes, taking another code could allow you to continue to shift into as many 32-byte groupings as you could want.

In fact, you'd only need the shift-down code in the non-primary groupings. &B11111 from &H40 would shift to &H60, &B11111 again would shift to &H20, and again would shift to whatever next grouping you wanted. From any grouping other than &H40, &B11110 would shift down one.

Regarding "1865-1946" yielding "1865v1946", my guess would be that the dash is actually an en-dash or some other special character, and outside my coding scheme. (But I was hoping that unrecognized characters were going to end up as "?"--though I actually never checked for codes above &H7F.)
Edited by lizby 2018-10-17
PicoMite, Armmite F4, SensorKits, MMBasic Hardware, Games, etc. on fruitoftheshed
 
lizby
Guru

Joined: 17/05/2016
Location: United States
Posts: 3378
Posted: 01:24pm 16 Oct 2018
Copy link to clipboard 
Print this post

  CaptainBoing said   if you can use CR instead of LF, this fits directly with your four bits (0Dh instead of 10h)

I know you will be mapping things anyway but jus' sayin'


Cap'n--I chose LF because in Linux mode, LF is used in place of what in DOS is CRLF. I don't know why that would be relevant here, but that was my rationale.

PicoMite, Armmite F4, SensorKits, MMBasic Hardware, Games, etc. on fruitoftheshed
 
twofingers

Guru

Joined: 02/06/2014
Location: Germany
Posts: 1593
Posted: 06:35pm 16 Oct 2018
Copy link to clipboard 
Print this post

  lizby said   Regarding "1865-1946" yielding "1865v1946", my guess would be that the dash is actually an en-dash or some other special character, ...

Thats correct, it is chr$(150) - typographically a must.

Maybe a good idea to use lookup-tables?

Another point: "unrecognized characters" should IMHO be replaced with an unique character?
causality ≠ correlation ≠ coincidence
 
     Page 1 of 2    
Print this page
The Back Shed's forum code is written, and hosted, in Australia.
© JAQ Software 2025