First: what is the best structure for a data?
CSV or Json, or depends? I read an article claiming that json is much better than csv – will try to find the link later, but right now, the client I am working to develop this visualization for, mainly work with excel spreadsheet, so I guess CSV is the only choice for now.
Second: how to import and parse csv?
For this question, I found a very good article here. Following this article’s second approach, I was able to parse the data and change the name of the columns at the same time.
d3.csv("/data/cities.csv", function(d) {
return {
city : d.city,
state : d.state,
population : +d.population,
land_area : +d["land area"]
};
}, function(data) {
console.log(data[0]);
});
However, I soon found that the console kept telling me that my dataset was undefined. After googling, I found this stackoverflow answer, which perfectly explained why. Basically, d3.csv is asynchronous. The data you parsed inside of d3.csv will get destroyed once out of the function. So you either include everything you want to do within d3.csv, or you define several functions outside of the d3.csv, then call them from within the function. See below for the genius explanation.
d3.csv is an asynchronous method. This means that code inside the callback function is run when the data is loaded, but code after and outside the callback function will be run immediately after the request is made, when the data is not yet available. In other words:
first(); d3.csv("path/to/file.csv", function(rows) { third(); }); second();
If you want to use the data that is loaded by d3.csv, you either need to put that code inside the callback function (where
third
is, above):d3.csv("path/to/file.csv", function(rows) { doSomethingWithRows(rows); }); function doSomethingWithRows(rows) { // do something with rows }
Or, you might save it as a global variable on the window that you can then refer to later:
var rows; d3.csv("path/to/file.csv", function(loadedRows) { rows = loadedRows; doSomethingWithRows(); }); function doSomethingWithRows() { // do something with rows }
If you want, you can also assign the loaded data explicitly to the window object, rather than declaring a variable and then managing two different names:
d3.csv("path/to/file.csv", function(rows) { window.rows = rows; doSomethingWithRows(); }); function doSomethingWithRows() { // do something with rows }
Third: Why wouldn’t it work?
Specifically, why would my numbers turn into “NaN” after using the +d approach? I was first able to import and parse the data into arrays, but of course all numbers were the type of strings with quote marks around them. So I used “+” to convert them. However, then I found in console that all numbers turned into “NaN.” After searching around, I found this was caused by the excel formatting: the original excel spreadsheet formatted the large numbers with commas for thousands — this caused NaN — I unchecked the formatting within Microsoft Excel. That fixed six of eight of the columns. However, there were still two columns shown as NaN, even after the de-formatting.
What caused that?
Looking closer, I found it was caused by one extra space at the end of the names of the first of the two misbehaving columns. I deleted the extra space, then both columns act normally now.