Periodically I get questions about data visualization and good sources to learn more or improve. There are myriad options out there including countless books and websites. However, in an effort to pare down any recommendations, I do have a few suggestions. Consider this a significantly incomplete reference, but also a good place to start. Specifically, there are 3 books on data visualization I cannot recommend enough, ordered in my particular preference among them even if you cannot go wrong with any.
- “Storytelling with Data” by Cole Knaflic
- “The Wall Street Journal Guide to Information Graphics” by Dona Wong
- “Good Charts” by Scott Berinato
The key is taking the great advice given in the books and figuring out how to apply it using the specific software you use. Here there are a number of good sites that tailor their work to different software packages, with Excel being the most common. One of my favorites is Policy Viz which walks you through how to set up the spreadsheet and get Excel to do what it takes to create interesting visuals. Policy Viz also has lots of links and resources for analysis, visualizations and presentations. Another good source is the Storytelling with Data blog. Additionally there are also plenty of good YouTube channels like MrExcel and Excel Campus to name just two.
Now, for a few examples of things I’ve been playing around with. First up is the box-and-whisker-and-scatterplot chart I first saw on Twitter last month. Policy Viz has the details on how to create it. I recently used the horizontal version of this in the high-tech post comparing software jobs across large metro areas. Here is a look at state level prime-age EPOP. Just for fun. It’s detailed and complex but you get both the general and specific distribution of the data. I like it quite a bit, but probably not necessarily for a general audience.
One of the biggest challenges in data visualization overall is getting rid of pie charts. Especially those hideous 3D pie charts. I’m not here to run down all the reasons pie charts fail. I’ll send you to Ms Knaflic’s great “Death of Pie Charts” piece for starters there. But I am here to help tackle the big issue of visualizing budgets. When it comes to revenues and expenditures by type or category, we have a really hard time getting away from pie charts. So today I’ll present 2 possibilities for those interested: stacked column charts and treemaps.
For starters I’m going to use the 2017-19 Legislatively Adopted Budget as an example. This is by no means meant to pick on our friends in LFO or in BAM. Our office has plenty of our own data visualizations that need help. The key is to try and improve the form and function of your visualizations so they get better over time. And please, whatever you do, don’t look at the first chapter of the most recent Governor’s Recommended Budget because it’s full of terrible charts from our office, and I do mean terrible.
One of the main issues with pie charts is that having too many slices makes it hard to read — usually more than 5 slices. The spacing gets off and readers are unable to properly compare the size of the slices. This matters because pie charts are for showing hierarchical data, meaning which things are bigger or smaller than the others. And please note if you plan to stick with pie charts, do order the data and shade the colors correctly. That alone is a huge step up from the default chart settings.
That said, one possibility for replacing those budget pie charts could be a stacked column graph. Here I’ve hidden the actual x and y axis to allow for better labeling (I think). The labeling is still scrunched for the expenditures. I tried adding leader lines for the labels but it looked too cluttered. But overall you can see where this is going.
Another possibility that works well with hierarchical data is a treemap. In the newest version of Excel they include this as an option. In older versions you have to make it yourself by adjusting the size of rows/columns and then shading them in appropriately like I have done below — this is all done by hand. An issue here is it can also be difficult to properly distinguish which segment is larger than the other.
Personally, I am rarely every satisfied with a chart. I am always looking for ways to improve them. The biggest item is to continue to play around with your data and try out different visualizations. You can make many different charts from the same data. Each may highlight a particular aspect of the data. It is important to know all of these aspects to understand the story the data is telling. However it is also up to you, the researcher, to choose the one chart that tells the audience your main point, or maybe the most interesting finding you have. And at the same time it is your job to choose the chart that properly conveys the message.