Much of my professional work for the last 10+ years has revolved around handing, importing and exporting CSV files. CSV files are frustratingly misunderstood, abused, and most of all underspecified. While RFC4180 exists, it is far from definitive and goes largely ignored.
Partially as a companion piece to my recent post about how CSV is an encoding nightmare, and partially an expression of frustration, I’ve decided to make a list of falsehoods programmers believe about CSVs. I recommend my previous post for a more in-depth coverage on the pains of CSVs encodings and how the default tooling (Excel) will ruin your day.
Everything on this list is a false assumption that developers make.
All CSVs are ASCII
All CSVs are Win1252
All CSVs are in 8-bit encodings
All CSVs are UTF-8
All CSVs are UTF-16
All CSVs contains a single consistent encoding
All records contain a single consistent encoding
All fields contain a single consistent encoding
All CSVs contain records
Original URL: http://feedproxy.google.com/~r/feedsapi/BwPx/~3/w1MwZ2ZUD20/Falsehoods-Programmers-Believe-About-CSVs