Setosa blog


CSV Fingerprints

CSV is a simple and common format for tabular data that uses commas to separate rows and columns. Nearly every spreadsheet and database program lets users import from and export to CSV. But until recently, these programs varied in how they treated special cases, like when the data itself has a comma in it.

It's easy to make a mistake when you try to make a CSV file fit a particular format. To make it easier to spot mistakes, I've made a "CSV Fingerprint" viewer (named after the "Fashion Fingerprints" from The New York Times's "Front Row to Fashion Week" interactive ). The idea is to provide a birdseye view of the file without too much distracting detail. The idea is similar to Tufte's Image Quilts...a qualitative view, as opposed to a rendering of the data in the file themselves. In this sense, the CSV Fingerprint is a sort of meta visualization.

Colors indicate data types. To inspect individual cells, click and drag on the elements. The fisheye lens makes it easy to hone in on particular values.

The CSV file for the fingerprint at the top was pulled from scraping the meta data of California's water reservoirs from the California Department of Water Resources. Thanks to the Fingerprint, we can quickly see that the the second column in (the "lake" column) seems to have some missing values (dark gray). We can also see that all of the values in the "capacities" column are integers (blue) as we would expect.

The above fingerprint was generated from another CSV file of reservoir historical capacities. From it we can quickly spot a few reservoirs with missing capacity values and that they seem to happen in the same rows. There's also some reservoirs that completely stop reporting capacity values in about the same month.

In the input field below, copy and paste your own CSV files to generate their CSV fingerprint.

You can also access a fullscreen version at:


See for the source code.

comments powered by Disqus