TheBackShed.com

Forum Index : Microcontroller and PC projects : Bug - LINE INPUT and INPUT

Author

Message

panky

Guru

Joined: 02/10/2012
Location: Australia
Posts: 1117

Posted: 07:59am 18 Jul 2018

Copy link to clipboard

Print this post

A couple of recent posts dealt with LINE INPUT and INPUT commands and while I was playing around with the issues mentioned I came across the following which I believe to be a bug.

Both LINE INPUT and LINE deal with a single line at a time either from the Console or from a file. In both cases, if the last line in the file does NOT have a terminating CR or CR LF, the last character in the file is NOT read.

Both LINE INPUT and INPUT make calls to MMgetline in SerialFileIO.c and the code there around line 199 appears to be the problem (I think, I am NOT at all proficient in C but it appears the may be an inadvertant duplication of a couple of lines).

What appears to be happening is that as the line is read in by either of the commands above, if EOF is reached and the is no terminating CR in the file, the last character in the file is not read and passed back by MMgetline.

The demo code below illustrates the problem. As you will see, the last character in the second (last) line, a "z" is not returned by either LINE INPUT or LINE.

This was tested in both MMBasic Plus 5.04.10b3 and 5.03.04 and also in MMBasic Extreme 5.04.17. Interestingly, both commands work correctly in MMBasic for DOS (I guess this may be because file IO is done by Windows ?)

The hexdump sub at the bottom below is purely to show that the missing character is actually correct in the file.

Please excuse me if I am off the track here but I have tested thoroughly and think I have it correct.

Doug.

Quote
DIM junk1$,junk2$,x
PRINT "Test 1"
OPEN "test1.dat" FOR OUTPUT AS #1
PRINT "First, print data as it is written to file"
' write characters abc to file with no trailing CR LF
PRINT #1, "Line 1, with trailing CR - xyz"
PRINT #1, "Line 2, without trailing CR - xyz";
PRINT "Line 1, with trailing CR - xyz"
PRINT "Line 2, without trailing CR - xyz";
PRINT
CLOSE #1

PRINT
PRINT "If you want to see the actual contents of the file,"
PRINT " uncomment the following hexdump line and the hexdump sub below"
PRINT
hexdump "test1.dat"
PRINT
PRINT

' good to here - data correct in file
PRINT " First, test LINE INPUT"
OPEN "test1.dat" FOR INPUT AS #1
junk1$ = ""
LINE INPUT #1, junk1$
PRINT junk1$
LINE INPUT #1, junk1$
PRINT junk1$
CLOSE #1

PRINT
PRINT " .. now test for INPUT"
OPEN "test1.dat" FOR INPUT AS #1
junk1$ = ""
junk2$ = ""
INPUT #1, junk1$,junk2$
PRINT " first line"
PRINT junk1$,,,junk2$
junk1$ = ""
junk2$ = ""
INPUT #1, junk1$,junk2$
PRINT " second line"
PRINT junk1$,,,junk2$

CLOSE #1

PRINT
PRINT " ... now testing INPUT$ in a DO loop"
OPEN "test1.dat" FOR INPUT AS #1
junk2$ = ""
DO WHILE NOT EOF(#1)
junk2$ = junk2$ + INPUT$(1, #1)
LOOP
PRINT junk2$
CLOSE #1

END

'-----------------------------------------------------------------
'' HEXDUMP.BAS - a sub routine to do a hexidecimal dump of a file.
'' Usage: hexdump(filename$)
'' Doug Pankhurst and James Deakins 2012
'sub hexdump(arg$)
' local hd_arg$ length 32
' local FileName$ length 32
' local HD_Text$
' local HexChr$ length 32
' local InputChar$
' local junk$ length 32
' local CharCntS$
' local ChrCnt
' local CharInLineCount
' local ScreenLines
' hd_arg$ = arg$
' if hd_arg$ = "" then
' Line Input "Enter file name to be displayed ",FileName$
' else
' FileName$ = hd_arg$
' endif
' Open FileName$ For Input As #10
' ChrCnt = 0
' Do While Not Eof(#10)
' For ScreenLines = 1 To 35
' HD_Text$=""
' CharCntS$ = STR$(ChrCnt,4)
' Print CharCntS$ + " ";
' For CharInLineCount = 1 To 16
' If Eof(#10) Then
' HD_Text$ = HD_Text$ + " "
' HexChar$ = " "
' ScreenLines = 35
' Else
' InputChar$ = Input$(1,#10)
' If Asc(InputChar$) < 32 Then
' HD_Text$ = HD_Text$ + "."
' Else
' HD_Text$ = HD_Text$ + InputChar$
' EndIf
' HexChar$ = Hex$(Asc(InputChar$))
' If Len(HexChar$) = 1 Then
' hexChar$ = "0" + HexChar$
' EndIf
' EndIf
' ChrCnt = ChrCnt + 1
' If CharInLineCount = 9 Then
' Print " ";
' EndIf
' Print HexChar$ + " ";
' Next CharInLineCount
' print " ";
' Print HD_Text$
' Next ScreenLines
' Line Input "Press any key to continue - ",Junk$
' Loop
' close #10
' print
'End sub
'

... almost all of the Maximites, the MicromMites, the MM Extremes, the ArmMites, the PicoMite and loving it!

matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 10953

Posted: 08:49am 18 Jul 2018

Copy link to clipboard

Print this post

Geoff can comment further but I would argue that the bug is that the code returns anything at all. I think it should return an empty string as the input does not meet the characteristics of a "line" i.e. it is unterminated.

panky

Guru

Joined: 02/10/2012
Location: Australia
Posts: 1117

Posted: 08:57am 18 Jul 2018

Copy link to clipboard

Print this post

Quote Geoff can comment further but I would argue that the bug is that the code returns anything at all. I think it should return an empty string as the input does not meet the characteristics of a "line" i.e. it is unterminated.

... or preferrably indicate some form of error as the last CSV field in an INPUT command or the last line in an INPUT LINE command would be incorrect but the user would not be aware.

... almost all of the Maximites, the MicromMites, the MM Extremes, the ArmMites, the PicoMite and loving it!

panky

Guru

Joined: 02/10/2012
Location: Australia
Posts: 1117

Posted: 09:14am 18 Jul 2018

Copy link to clipboard

Print this post

There does not appear to be a finalised and formal definition for CSV but I found the following in Wikipedia ...

Quote [RFC 4180 standard Edit
Reliance on the standard documented by RFC 4180 can simplify CSV exchange. However, this standard only specifies handling of text-based fields. Interpretation of the text of each field is still application-specific.

RFC 4180 formalized CSV. It defines the MIME type "text/csv", and CSV files that follow its rules should be very widely portable. Among its requirements:

MS-DOS-style lines that end with (CR/LF) characters (optional for the last line).
An optional header record (there is no sure way to detect whether it is present, so care is required when importing).
Each record "should" contain the same number of comma-separated fields.
Any field may be quoted (with double quotes).
Fields containing a line-break, double-quote or commas should be quoted. (If they are not, the file will likely be impossible to process correctly).
A (double) quote character in a field must be represented by two (double) quote characters.
The format can be processed by most programs that claim to read CSV files. The exceptions are: (a) programs may not support line-breaks within quoted fields, (b) programs may confuse the optional header with data or interpret the first data line as an optional header and (c) double quotes in a field may not be parsed correctly automatically.

Taking the above into consideration, maybe an 'unterminated' final field or line should be acceptable? At least that way, no data would be'lost'.

panky

... almost all of the Maximites, the MicromMites, the MM Extremes, the ArmMites, the PicoMite and loving it!

panky

Guru

Joined: 02/10/2012
Location: Australia
Posts: 1117

Posted: 10:18am 18 Jul 2018

Copy link to clipboard

Print this post

Thinking about it a bit further, I would prefer MMB to send ALL data back and leave the validity checking to the user.

For example, if the final field in an INPUT command was missing the CR and MMB made the decision it had an incorrect structure (ie. missing terminator) and just sent back an empty field, I the user program, would have no way of determining if the field was valid or not? It could have been an intentionally empty ( and thus still valid ) field.

My feeling is that data validity is a user program responibility while data integrity is MMB's responsibility.

panky

... almost all of the Maximites, the MicromMites, the MM Extremes, the ArmMites, the PicoMite and loving it!

Geoffg

Guru

Joined: 06/06/2011
Location: Australia
Posts: 3340

Posted: 11:22am 18 Jul 2018

Copy link to clipboard

Print this post

Thanks Doug (panky), that's a great find and thanks for even spotting the issue in the source (I think that you are right).

My opinion is that MMBasic should treat end of file the same as a CR/LF - ie, return the chars collected up to that point as a line.

I should have the fix in the next beta,

Geoff

Geoff Graham - http://geoffg.net

MicroBlocks

Guru

Joined: 12/05/2012
Location: Thailand
Posts: 2209

Posted: 11:54am 18 Jul 2018

Copy link to clipboard

Print this post

Other BASICs also treat the end of file as the end of the input.

The difference between windows and Linux having different line endings was solved by using the LF as the end of the line and ignoring the CR. This required my own input that read character by character.
For MMBasic i would assume it is pretty safe to ignore the CR and just use the LF as a end of line.
A difficulty is that when text is read there might be a LF inside double quotes. CSV handles this by reading from one double quote to the next and everything in between is the string value.

If you have control about the file layout this all is a moot point. You just make it as MMBasic processes it. Saves a lot of testing for conditions that might never occur.

Edited by MicroBlocks 2018-07-19

Microblocks. Build with logic.

Print this page

To reply to this topic, you need to log in.