TheBackShed.com

Forum Index : Microcontroller and PC projects : MM 8-bit character code support

Author

Message

kiiid

Guru

Joined: 11/05/2013
Location: United Kingdom
Posts: 671

Posted: 04:44am 15 Feb 2016

Copy link to clipboard

Print this post

This is a continuation of a similar topic from last year. I have been trying to figure out a way to make the MM+ work with 8-bit characters coming from the keyboard. It is impossible at the moment, and it looks like anything with code above 0x7f gets cut-off by the MMbasic. Currently characters like '�' or 'Ж' are impossible to enter either from the keyboard or a file.

There are a few suspicious lines in the code, but I will leave that to Geoff's much more competent opinion.
The lines are commands.c[1287], mmbasic.c[674 and 675], fileio.c[242], keyboard.c[820].

I don't have the compiler so I can't compiler for testing, but it would be great if MM could handle the extended character codes, so accent and special characters could be used as well.
Thanks!

http://rittle.org

--------------

Geoffg

Guru

Joined: 06/06/2011
Location: Australia
Posts: 3285

Posted: 12:30pm 15 Feb 2016

Copy link to clipboard

Print this post

I am travelling at this time so I cannot experiment with the code but the problem is not as simple as enabling bit 7.

For a start, the keyboard does not generate a character, it generates a scan code which represents the location of the key press on the keyboard. It is an 8-bit number but it is not the ASCII character. MMBasic then maps the scan code to an ASCII character depending on the keyboard language currently selected.

I think that your problem is due to the fact that when mapping the scan code MMBasic only maps to the standard ASCII set - which does not support characters such as the '�' symbol.

At one time I did consider adding special language characters but there were more than a few problems. The first is how to represent the character; the extended character set used by (for example) Windows is far too complex so the simple solution was to generate a character with bit 7 set. But that raises even more problems - within MMBasic a character with bit 7 set is used to indicate a command or function token and a major rewrite would be required to get around that. There is a huge number of special characters so another issue is deciding what ones to support and how to map them (as far as I know there is no standard for this).

A simpler solution would be to map a limited number of characters to the control characters 0x01 to 0x1f and extend the fonts to include these characters (at the moment all fonts start with 0x20). There is still the issue of deciding what characters to map and designing the fonts for them. The latter is tedious, time consuming and not easy.

The bottom line is that while an extended character set looks easy on a Windows or Linux computer it is not so easy on the Micromite which is just a microcontroller. In particular, enabling bit 7 is not a solution and whatever route I take there will be a lot of work for a small benefit. As a result, I'm sorry to say, this issue keeps sliding towards the bottom of the priority pile.

For an immediate solution I suggest that you create your own font files by editing a standard font file to change some characters (for example the hash symbol) into whatever special characters that you need.

Geoff

Geoff Graham - http://geoffg.net

robert.rozee
Guru

Joined: 31/12/2012
Location: New Zealand
Posts: 2437

Posted: 06:33pm 15 Feb 2016

Copy link to clipboard

Print this post

i've been following the stmF7Mite port with great interest, and from what i can see they have been hitting the wall with the 128 token limit - someone do correct me if i am wrong. with this in mind, may it be worthwhile thinking about alternative encoding schemes?

it just ocurred to me that one approach could employ the following:
- use the full 8 bits for token values, giving approximately 250 BASIC keywords;
- have a small set of tokens that act as 'escape characters' that switch into character mode, line number flag, indicate EOL, etc.

this approach recognizes that ascii characters are more often than not going to appear grouped together.

possible escape characters might be:

1. the character " would indicate the start of character mode, while the next occurrence of " would switch back to token mode,

2. the character ' switches to comment/character mode for the remainder of the current line,

3. 0x13 might be reserved for an 'end of line' flag,

4. 0x01 might be reserved to indicate that a two byte line number follows,

5. 0x02 and 0x03 might be reserved for start/end of a block of characters that could not be tokenized by the editor upon saving, etc.

this scheme would then also allow for (almost all) 8-bit ascii codes to be stored if desired, although as geoff has pointed out, that in itself is far more complicated that might at first appear, both in entering said characters in at the keyboard, as well as deciding what to do with them when printing out.

just kicking around an idea here!
cheers,
rob :-)

kiiid

Guru

Joined: 11/05/2013
Location: United Kingdom
Posts: 671

Posted: 09:50pm 15 Feb 2016

Copy link to clipboard

Print this post

Thanks Geoff,
I suspected this is the case, however it is not something that can be ignored so easily. About 88% of the world population uses characters beyond the standard 7-bit ASCII.

I didn't fully understand the explanation about the scan codes. I was talking about the normal console port, which I am using to communicate with the mite. There are normal ASCII codes there, just like TeraTerm is generating them. As to standards - for 8-bit codes, I believe ISO-8859-xx is the one to be followed.

A possible idea: let's leave the 7-bit codes as they are now (as you are saying, it is a big job to redo the mmbasic so it can use different word codes), however since the extended characters are used relatively more seldom, how about preceding each one of them with a special code (say Escape for example)? This will lead of course to encoding all C1 characters with two bytes instead of one, but it might work without big coding. The only more significant changes in this case would be in reading and writing files, where the conversion from one byte to two-byte characters, and back will have to be made, but that is just a few lines of code, I think.Edited by kiiid 2016-02-17

http://rittle.org

--------------

Geoffg

Guru

Joined: 06/06/2011
Location: Australia
Posts: 3285

Posted: 11:06pm 15 Feb 2016

Copy link to clipboard

Print this post

Sorry, I took your problem being with the characters coming from an attached PS2 keyboard. I did not realise that a foreign language keyboard generated 8 bit codes. I wonder if Tera Term is mapping the special characters to an 8 bit code? Something to explore.

On thinking about it, the special characters in a string could be escaped as you suggested. That would be easy to do although it might give the editor a headache.

The real problem is defining the characters in the font, very time consuming and hard to do if you are not familiar with the special characters and their use. Another problem is that by extending the standard fonts they will also take up more memory.

This is one of those things that is easy to ask for but very difficult to implement. It is even more difficult when it is open ended (implement all special characters). I understand what you say about 88% of the world population using special characters but the proportion of people needing special character support in MMBasic is still small.

Geoff

Geoff Graham - http://geoffg.net

Print this page

To reply to this topic, you need to log in.