Heteroclinic.net logo

www.heteroclinic.net

Tiny CSV Reader
I. the requirements

201506

CSV files are handy for bulk processing and it is also human readable. Wikipedia has an article about the formalization*, we quote here as the following:


* https://en.wikipedia.org/wiki/Comma-separated_values last retrieved June,2015.
Tiny CSV Reader
II. the Nature Language Expression

201506

It is important to identify or count the appearance of double quote literal.
Here we shorten it to quote. If quotes appear continuously in odd times, the first quote it is a field border.
A record separator can be a new line literal. Before it, the field borders must appear even number of times, otherwise it is a literal inside a field. Namely a record separator is a new line with "quotes appearing continuously in odd times" appears even times before it. Further, the next valid new line as a record separator has always even number count of quotes before it.
A fields separator can be a comma literal. Same as the record separator, "quotes appearing continuously in odd times" appears even times before it always.
A field can be an empty String. A record can be an empty line. A whole file can be empty. If we don't handle them, we may halt the program before an expected exit/end.

Tiny CSV Reader
III. Transfer the Nature Language Expression to Programmable Regular Expression

201506
Here after a long hike in a winding trail, we implement with Java for convenience

Tiny CSV Reader
IV. The Proof of Concept Test File

201506

We post the test data file here. It provides limited scenarioes that in real life could make a lesser program fail.