Heteroclinic.net logo

www.heteroclinic.net

Tiny CSV Reader
V. Program through tests

201507

To get the program to pass the test file, there has been no straight way. The major problem is that I tried to solve the problem like working on an exercise from the theoretical computer science textbook ( automaton, regular language).


Tiny CSV Reader
VI. It is not a straight line

201507

It is not a linear process to get the program to pass the test data file.
Some tasks are trivia, like reading a file to a string, testing multiple line regex, testing/using greedy/reluctant quantifiers etc. Put the test data in the source code file then you have to manually manipulate the escape sequence of characters. Some can be verified by brutal or deligent test, like the reluctant mode of a group need outer extra layer of brackets, unless you can memorize all the operator precedence. I still don't understand why we use this [^\\\"] for a literal that is not double quote -- it just works, the other ways around won't. It is also tricky that $, the dollar sign in multiple line mode is no longer a line separator (end of line), the austere literal(s) would be System.getProperty("line.separator"), but for convenience I just use \n.
Mostly, the documentation tends to give you general ideas, highly abstraction (not exactly, it is hard to find theorem, equations and formula that are strict and leading deterministic results in a SDK), so may be it just tend to be brief. And the SDK version you use usually lags or over-shoots the documentation you are reading. If you go to the source code, unless you debug through every line with abundant leisure time, typically I give up at line twenty.

Tiny CSV Reader
VII. When it converges

201507
when the test passes for finding a literal appearing even times or odd times, I thought we are almost done. I suppose a record would just be, we just find a line separator without quote before it or a line separator with even counts of quote before it. But this 'or' condition caused some choas if you read/run test testParseRecords(). In a text book excercise of regular language (context free or not), you will define literal/alphabet strictly then apply the theorems and formula. But in this case, literal field separator and literal of record separator are all of wild card literal. I still didnot figure out the exact/correct regular grammar formula (symbol rules) but take a trial-error approach which is more of real life physics or engineering. The the analysis of testParseRecords() subdued us the conclusion that at either side of the 'or', the two conditions are not mutual exclusive, so when the matcher iterates through the input stream, the regex parser is confused so each findinng is not consuming just one condition but both condition.

When I am writting a program, I am always considering I am designing a truth table. It always leads to fixed points. Sometimes, the truth table may be so big.
So we may say if there is no theoretical foundation, well designed, logically sound requirments, it may not converge.

Tiny CSV Reader
VIII. Discussion

201507

Regular language is the fundamental base for understanding how computer programs work and how to write functioning computer programs. We take this chance to study, review some key techniques in regular expresion and partially solve a problem.
Particularly for this task, there are still things open, to remove bordering quotes, to replace dual quotes with single quote, to do integrity check of the input file. For the last one, I suggest using a pre-allocate field-record grid to fill in parsed results so when the integrity check fails, the program will not halt abnormally and the users have a chance to review what went wrong.
You can put your comments here https://github.com/wangzhikai/TinyCSVReader.