Bash CSV Handling
I'm sick of screwing this up, so I'm creating this definitive guide for my own reference. I will update this as I learn to handle individual issues.
For the following examples, I will use this sample CSV file test.csv (assuming Unix line ending "\n"). I've included several "problems" in it.
id,firstname,lastname,email,phone,hiredate 1111,"Colin","Quinn","c.quinn@example.com",1112223333,"02/12/2017" 2222,"Richard","Vos, Richard","r.vos@example.com",2223334444,"06/27/2019" 3333,"Patrice","O'Neal","p.o'neal@example.com",3334445555,"08/11/2018" 4444,"Jim","Jefferies","j.jefferies@example.com",,"09/11/2001"
Quoted Strings with Commas
When your fields are separated by commas, and one of the fields has a quoted string with a comma in it, it can screw up the parsing of the fields.
The problem in the sample file is in the second line. Someone accidentally entered "Vos, Richard" in a field that is only supposed to contain the last name "Vos". This kind of clerical error happens all the time. Not only is it incorrect, it has a comma in the text which is the character we use to delimit the fields.
if we try to naively break out the fields into separate columns, we see a problem:
cat test.csv | column -t -s ","
id firstname lastname email phone hiredate 1111 "Colin" "Quinn" "c.quinn@example.com" 1112223333 "02/12/2017" 2222 "Richard" "Vos Richard" "r.vos@example.com" 2223334444 "06/27/2019" 3333 "Patrice" "O'Neal" "p.o'neal@example.com" 3334445555 "08/11/2018" 4444 "Jim" "Jefferies" "j.jefferies@example.com" "09/11/2001"
The comma in the lastname field was identified as a field as delimiter (field separator), so now the columns on the second line are shifted to the right.
The best way I've found to handle this without using a utility that is specifically for CSV parsing is this:
cat test.csv | sed 's/,\("[^"]*"\)*/<FS>\1/g' | column -t -s "<FS>"
id firstname lastname email phone hiredate 1111 "Colin" "Quinn" "c.quinn@example.com" 1112223333 "02/12/2017" 2222 "Richard" "Vos, Richard" "r.vos@example.com" 2223334444 "06/27/2019" 3333 "Patrice" "O'Neal" "p.o'neal@example.com" 3334445555 "08/11/2018" 4444 "Jim" "Jefferies" "j.jefferies@example.com" "09/11/2001"