CSVfix patches for regex and exec

CSVfix is a tool for manipulating CSV files. Along with the usual column re-ordering and filtering, CSVfix offers a powerful per-cell data transformation using a simple expression language, as well as regular-expression for string matching and editing. And if this is not enough, CSVfix can execute external process - for every cell that needs to be processed. And oh, it's available for Windows too!

I find this tool to be very handy in what I need to do, so when I encountered a bug in its regex processing (for its "edit" command), I immediately checked if there is any updates to this tool. Unfortunately, its development seems to have ceased in 2015; and no other people seem to have picked up the development (I did find some forks, but they were all older copies from when it was still hosted in google code).

So I set out to figure out about the problem and hopefully rectify it. I found that the problem was in its regex library, which was a home-grown library (apparently adapted from an algorithm book). It is 2020 as of this time of writing, and C++ now comes with its own STL regex library (std::regex). I decided to rip off the custom regex lib and replace it with the STL regex instead, while keeping the rest of the class interface identical, therefore no other part of the code needed to be changed. This instantly fixed the problem, and as a bonus, now we can use ECMAScript-compatible regex instead of just the basic regex.

Later, I found out that the "exec" command also had a bug (the flag "-ix" did not work properly), so I traced this and fixed it too.

Oh, and during the process, I tried to run its testsuite - and while most of them passed, some did fail. Mainly because of CRLF/LF inconsistencies, so I changed those the test data to use LF. It is also a warning that this tool only works with platform "newline" - CRLF in Windows, and LF in Linux - so if files were to be exchanged between platforms, they must be properly translated first before use.

Here are the individual patches.
- regex patch
- exec patch
- test-case patch

They apply on top of the commit 93804d4 from 2015-02, which was the latest when I wrote this. They are licensed in the same way as the original CSVfix is licensed.

If CSVfix is not powerful enough for you, there are other similar tools:

1. miller is a tool in very similar spirit with CSVfix, but it is (much) more sophisticated. Its "data transformation language" looks more expressive than the one in CSVfix. If you have a problem you cannot solve with CSVfix, miller will probably help you. As a bonus, it is still in active development - that means bugs will be squashed. It is written in C, you will need to compile it if it is not in your package repository. (Fatdog, naturally, has it in its repository).

2. csvkit is a collection of tools that more or less perform the same functions as CSVfix. It supports direct conversion to/from Excel files, importing/exporting into databases (sqlite and postgresql as documented, perhaps others too), as well as running direct SQL queries from CSV files (and databases too). It is written in Python3 so you can install it using pip3. Fatdog has this in its repository too (so you can install it using package manager instead of pip3).

3. rbql basically enables you to run SQL-like on CSV files; but its power is its ability to run python (or javascript, depending on which backend you choose) code for every cell. Fatdog has it in its repository too, although you can just use pip to install it, if you don't run Fatdog.

Posted on 23 Jan 2020, 21:53 - Categories: General Linux
Edit - Delete

No comments posted yet.

Add Comment

Show Smilies
Security Code 1390188
Mascot of Fatdog64
Password (to protect your identity)