Home
JAQForum Ver 24.01
Log In or Join  
Active Topics
Local Time 05:25 02 Jul 2025 Privacy Policy
Jump to

Notice. New forum software under development. It's going to miss a few functions and look a bit ugly for a while, but I'm working on it full time now as the old forum was too unstable. Couple days, all good. If you notice any issues, please contact me.

Forum Index : Microcontroller and PC projects : CMM2 - CSV Read

     Page 1 of 2    
Author Message
Nimue

Guru

Joined: 06/08/2020
Location: United Kingdom
Posts: 419
Posted: 09:30am 16 Oct 2020
Copy link to clipboard 
Print this post

Happy Friday - only 8hrs till gin o'clock.....

Not sure if this will end up being a super Noob question.  Here goes...

If I have a CSV file with unknown columns and unknown rows what is the easiest / most efficient way to read the data into an array?  The file has a header row.  All the data is numeric.

Imagine I just want to extract x,y data from x,y,z,a,b,c,d etc

Currently I am:

* Opening the file and reading the first row with Line Input
* Then using Field$ to count the number of fields (stopping when field$="")
* The reading all the data until EOF to count the number of records.

* Close the file
* Dim D(i,z) using the details just extracted for records and fields

* Open file again
* Read the first row to get the data headers
* extract x and y headers --> assign to x$ and y$
* Read until EOF, adding data to D(i,z) as we go.


All this works, but it bothers me that I need to read the entire datafile twice - once to extract the number of fields / dimension the array and then to actually read it.


Back in my golden days of QBASIC - I would cheat by dimensioning an array bigger than I needed and then REDIM to trim the array down.

What is / Is there a better way of reading CSV data into an array when you don't upfront know the number of fields or records?

Context:
I am using CMM2 to help teach graphing / analysing data and students have created a CSV with data in and we are now constructing the graphing portion.

Cheers
Nim
Edited 2020-10-16 19:32 by Nimue
Entropy is not what it used to be
 
CaptainBoing

Guru

Joined: 07/09/2016
Location: United Kingdom
Posts: 2170
Posted: 10:44am 16 Oct 2020
Copy link to clipboard 
Print this post

Assuming the header indicative of the number of columns and the rows are terminated with CRLF? (if you have quoted data it is more involved but we can deal later if you want)

Open the file, read the first line and split on comma. The number of resultant fields is the column count for your array. call it CCt

LINE INPUT all the way to the EOF - this is the number of lines, call it LCt

then DIM CSV$(Lct,CCt)

Close the file
re-open it as #1
For n=1 to LCt
Line Input#1,x$
<split x$ with whatever algo you want... FIELD? Not terribly familiar with CMM2 MMBasic
For m=1 to CCt
 CSV$(n,m)=split field
Next
Next
Close

voila! your file is now in separate elements of your array

does that help?
 
CaptainBoing

Guru

Joined: 07/09/2016
Location: United Kingdom
Posts: 2170
Posted: 10:46am 16 Oct 2020
Copy link to clipboard 
Print this post

forgot to say... if you have quotes in the data, you will need a bit more processing so you don't split on any , inside ""

easily doable but a bit more involved. cross that bridge when you come to it.
 
JohnS
Guru

Joined: 18/11/2011
Location: United Kingdom
Posts: 4032
Posted: 10:47am 16 Oct 2020
Copy link to clipboard 
Print this post

(about CaptainBoing 1st post)

I think that's what the OP posted :)

John
Edited 2020-10-16 20:48 by JohnS
 
Nimue

Guru

Joined: 06/08/2020
Location: United Kingdom
Posts: 419
Posted: 10:49am 16 Oct 2020
Copy link to clipboard 
Print this post

  CaptainBoing said  Assuming the header indicative of the number of columns and the rows are terminated with CRLF? (if you have quoted data it is more involved but we can deal later if you want)

Open the file, read the first line and split on comma. The number of resultant fields is the column count for your array. call it CCt

LINE INPUT all the way to the EOF - this is the number of lines, call it LCt

then DIM CSV$(Lct,CCt)

Close the file
re-open it as #1
For n=1 to LCt
Line Input#1,x$
<split x$ with whatever algo you want... FIELD? Not terribly familiar with CMM2 MMBasic
For m=1 to CCt
 CSV$(n,m)=split field
Next
Next
Close

voila! your file is now in separate elements of your array

does that help?


Thanks for the come back -- but I think that is what I'm doing (albeit in a clumsily explained manner)

What I suppose I was asking in a longwinded way is:

(1) Is there a CMM2 ish way of REDIM?  >> I think no, other than building a new array and copying over.
(2) Is it possible to get the number of records (rows?) without reading the entire file line by line << as this step then needs to be done twice -- once to get the size and once to load the array.

As I'm never going to be creating datafiles with 1,000,000's of rows, I guess I'm asking for my development ;-)

Cheers
N
Entropy is not what it used to be
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 10180
Posted: 10:55am 16 Oct 2020
Copy link to clipboard 
Print this post

  Quote  (1) Is there a CMM2 ish way of REDIM?  >> I think no, other than building a new array and copying over.

No, other than as you describe
  Quote  (2) Is it possible to get the number of records (rows?) without reading the entire file line by line << as this step then needs to be done twice -- once to get the size and once to load the array.

Only if they are fixed length.

One tiny shortcut is to use SEEK(1) rather than closing and re-opening the file (I think - needs checking)
 
mkopack73
Senior Member

Joined: 03/07/2020
Location: United States
Posts: 261
Posted: 11:39am 16 Oct 2020
Copy link to clipboard 
Print this post

Why not store the record count in the header row?
 
JohnS
Guru

Joined: 18/11/2011
Location: United Kingdom
Posts: 4032
Posted: 11:42am 16 Oct 2020
Copy link to clipboard 
Print this post

  mkopack73 said  Why not store the record count in the header row?

CSV files don't, sadly.

It's a REALLY crude (but very useful) file format.

John
 
Nimue

Guru

Joined: 06/08/2020
Location: United Kingdom
Posts: 419
Posted: 11:59am 16 Oct 2020
Copy link to clipboard 
Print this post

  mkopack73 said  Why not store the record count in the header row?


Yes - we looked at ARFF files (as used in WEKA) - that have much more explicit headers.  As this is for learning how to plot the data / create subs / functions, I might do exactly that and create a custom file format.

eg:

'.DAT file type

Columns, 9
Records, 8
a,b,c,d,e,f,g,h,i
a,b,c,d,e,f,g,h,i
a,b,c,d,e,f,g,h,i
a,b,c,d,e,f,g,h,i


etc


Ideally what I am wanting the students to create is a sub/function that lets them call plot_data(1,2,scatter,xlabel,ylabel,title)  << where 1 and 2 are the columns that the data is in and the rest are the chart details....

Thanks for all the input - very much a work in progress....

Nim
Entropy is not what it used to be
 
Nimue

Guru

Joined: 06/08/2020
Location: United Kingdom
Posts: 419
Posted: 12:24pm 16 Oct 2020
Copy link to clipboard 
Print this post

  matherp said  
One tiny shortcut is to use SEEK(1) rather than closing and re-opening the file (I think - needs checking)



Works a treat - even though the file was opened for INPUT not RANDOM.

Nim
Edited 2020-10-16 22:25 by Nimue
Entropy is not what it used to be
 
Paul_L
Guru

Joined: 03/03/2016
Location: United States
Posts: 769
Posted: 01:24pm 16 Oct 2020
Copy link to clipboard 
Print this post

REDIM did not really re-dimension an existing array! It dimensioned a new array, copied the data from the old array into the new one, and then released the old array.

This required that there be enough memory to hold both the old array and the new array in memory at the same time! It would crash if the combined arrays were too large.

You could do this in MMBasic with a few lines of code, but it probably isn't worth it. Without knowing something about the csv file you will wind up with an enormously oversized array which would then have to be shrunk.

'Since you're dealing with a really fast SD card instead of a floppy disk just
open the csv : count the fields : count the lines
close the csv
'then dimension the array
DIM a(lines, fields)
'then read the file
open the csv
for each line : read the line : parse the line : load the array
close the csv

Sometimes the direct way is the cleanest.

Paul in NY
Edited 2020-10-16 23:25 by Paul_L
 
Nimue

Guru

Joined: 06/08/2020
Location: United Kingdom
Posts: 419
Posted: 01:40pm 16 Oct 2020
Copy link to clipboard 
Print this post

  Paul_L said  REDIM did not really re-dimension an existing array! It dimensioned a new array, copied the data from the old array into the new one, and then released the old array.

This required that there be enough memory to hold both the old array and the new array in memory at the same time! It would crash if the combined arrays were too large.

You could do this in MMBasic with a few lines of code, but it probably isn't worth it. Without knowing something about the csv file you will wind up with an enormously oversized array which would then have to be shrunk.

'Since you're dealing with a really fast SD card instead of a floppy disk just
open the csv : count the fields : count the lines
close the csv
'then dimension the array
DIM a(lines, fields)
'then read the file
open the csv
for each line : read the line : parse the line : load the array
close the csv

Sometimes the direct way is the cleanest.

Paul in NY



Interesting - never really considered what REDIM was doing behind the scenes.

Will update when I get this finished.

Cheers all
N
Entropy is not what it used to be
 
thwill

Guru

Joined: 16/09/2019
Location: United Kingdom
Posts: 4299
Posted: 01:44pm 16 Oct 2020
Copy link to clipboard 
Print this post

  Paul_L said  REDIM did not really re-dimension an existing array! It dimensioned a new array, copied the data from the old array into the new one, and then released the old array.


MMBasic doesn't provide a mechanism to release a single variable does it ? ... so you can't emulate this in pure MMBasic ... and in any case you'd have to copy to and from a temporary if you wanted to end up with a larger array with the same name.

Best wishes,

Tom
Edited 2020-10-16 23:45 by thwill
MMBasic for Linux, Game*Mite, CMM2 Welcome Tape, Creaky old text adventures
 
Nimue

Guru

Joined: 06/08/2020
Location: United Kingdom
Posts: 419
Posted: 01:51pm 16 Oct 2020
Copy link to clipboard 
Print this post

  thwill said  
MMBasic doesn't provide a mechanism to release a single variable does it ? ... so you can't emulate this in pure MMBasic ... and in any case you'd have to copy to and from a temporary if you wanted to end up with a larger array with the same name.

Best wishes,

Tom


Again - never thought of this.

If I dim(500,500) to calculate something -- is there a mechanism to then delete / release the memory if I no longer need the array?

N
Entropy is not what it used to be
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 10180
Posted: 02:00pm 16 Oct 2020
Copy link to clipboard 
Print this post

  Quote  MMBasic doesn't provide a mechanism to release a single variable does it ?

I wonder what the manual says
 
thwill

Guru

Joined: 16/09/2019
Location: United Kingdom
Posts: 4299
Posted: 02:14pm 16 Oct 2020
Copy link to clipboard 
Print this post

  matherp said  
  Quote  MMBasic doesn't provide a mechanism to release a single variable does it ?

I wonder what the manual says


It says all manner of useful things but CLEAR is documented to delete all variables ... if there is a way to delete a single variable then it would be helpful if there was a reference to follow from CLEAR otherwise it seems not too stupid of me to assume that there isn't such a mechanism.

Sorry, I haven't yet printed and read, and re-read the manual obsessively because the firmware author is an over-achiever and may yet require further additions to the manual

Best wishes,

Tom
Edited 2020-10-17 00:19 by thwill
MMBasic for Linux, Game*Mite, CMM2 Welcome Tape, Creaky old text adventures
 
Nimue

Guru

Joined: 06/08/2020
Location: United Kingdom
Posts: 419
Posted: 02:25pm 16 Oct 2020
Copy link to clipboard 
Print this post

  matherp said  
  Quote  MMBasic doesn't provide a mechanism to release a single variable does it ?

I wonder what the manual says



Boom!!  


I will learn one day.

;-)
Edited 2020-10-17 00:26 by Nimue
Entropy is not what it used to be
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 10180
Posted: 02:29pm 16 Oct 2020
Copy link to clipboard 
Print this post

MMbasic has always had a way of deleting a single variable, not specific to CMM2 - check ERASE in the manual
 
thwill

Guru

Joined: 16/09/2019
Location: United Kingdom
Posts: 4299
Posted: 02:39pm 16 Oct 2020
Copy link to clipboard 
Print this post

  matherp said  MMbasic has always had a way of deleting a single variable, not specific to CMM2 - check ERASE in the manual


I would, except at least in the copy I have, the list of commands in the manual jumps from END SUB to EXECUTE without having ERASE in between

May I feel vindicated, or would you care to cut me down with a withering comment ?

Thanks Peter, that's awesome and may open up some further possibilities for mischief,

Tom
Edited 2020-10-17 01:00 by thwill
MMBasic for Linux, Game*Mite, CMM2 Welcome Tape, Creaky old text adventures
 
matherp
Guru

Joined: 11/12/2012
Location: United Kingdom
Posts: 10180
Posted: 03:09pm 16 Oct 2020
Copy link to clipboard 
Print this post

  Quote  May I feel vindicated,


You are fully vindicated, both ERASE and ERROR seem to have gone missing in the V5.05.05 manual put are certainly in the 5.05.06 draft I've got from Geoff.


 
     Page 1 of 2    
Print this page
The Back Shed's forum code is written, and hosted, in Australia.
© JAQ Software 2025