Search all Discourse content.

Close

 

Follow our reporting and get involved. Sign up for our newsletter.

 

Close

Hidden From Issues

How to read and create data visualizations responsibly

Data visualization expert Alberto Cairo offers tips on what readers should look out for when consuming infographics and other forms of data journalism, and outlines what journalists and others must keep in mind when creating them.

Alberto Cairo is Knight Chair in Visual Journalism at the University of Miami    

In our world of information overload, it’s easy to be convinced by fancy graphs, funky explainers and catchy headlines. The ability to look at information with a critical mind is crucial. We asked data visualization expert Alberto Cairo for some tips on what consumers and creators of data journalism should keep in mind as we delve into our more data-driven world.

This interview has been edited for length and clarity.

 

WHAT RED FLAGS SHOULD PEOPLE BE AWARE OF WHEN LOOKING AT DATA VISUALIZATIONS?

The first one is the attribution of sources. So how open [is] the organization that is producing the visualization?

Link to the primary sources. That's something that readers need to get used to [so they can] double-check whether or not what the journalist or the graphic is saying is actually what the source is saying.

Whether this organization is willing to disclose the methodology. Places like ProPublica for example, whenever they publish a large data journalism project they usually disclose their methodology. They have a section on the sources, how the sources were talked to, the transformation of the data [and] the methodology.

Whether or not the graph is actually letting the reader do what it is supposed to let him or her do. Any visualization is a tool for extracting meaning from the data. Sometimes designers get carried away with their personal aesthetic preferences.

For example there are many people who love maps. But not everything can be a data map. Sometimes the purpose of the visualization is to let you rank regions or compare regions. Therefore in that case the map is secondary in comparison to some sort of graph or table that lets you rank things properly.

Is this graphic clear enough? And then is this graphic deep enough? Any visualization is a simplification of reality, therefore a reader needs to get used to asking whether the graphic is a simplification or an over-simplification of the data.

The right amount of data is the amount of data necessary to tell the story truthfully and with enough depth. We journalists have the drive to always simplify our stories. I usually discourage people from using the verb “to simplify”. I prefer “to clarify”. I borrowed this idea from my friend Nigel Holmes who is a famous infographic designer.

 

WHERE ARE YOUR GO-TO SOURCES FOR DATA JOURNALISM INSPIRATION?

The usual suspects are the New York Times and the Washington Post. But I am a huge fan of ProPublica. I think ProPublica has done an amazing job and I love what they do. I really like the Tampa Bay Times in Florida. They have a small visualization team and data team and they produce some amazing work… And they have already won big awards. A couple of years ago they won a Pulitzer.

There are many organizations that are not journalistic, for example Polygraph uses great graphics. An organization in Ukraine called texty. Though they are very small, they do amazing work.

Fortunately one of the things I’ve been witnessing in the past five or 10 years or so is an explosion of not only quantity but of also of quality of visualization.

FiveThirtyEight for example, is another place I pay attention [to]. One of the most popular things that they have published is forecasts for past presidential elections.

 

IN A 2014 CRITIQUE, YOU NAMED SOME ORGANIZATIONS THAT YOU FELT OVER-PROMISED AND UNDER-DELIVERED. DO YOU STILL THINK THAT WAY?

No, that is not the case any more. I wrote that article right after FiveThirtyEight and Vox.com were launched and I read a bunch of stories that were a little bit careless when using data.

I wrote that article as a critique to say basically that I fully endorse the purposes of those organizations, particularly FiveThirtyEight. Nate Silver wrote a manifesto saying that journalism needed to become more quantitative, more scientific, not so reliant on anecdote, relying more on evidence and data and hard data etcetera. And I wrote an article to say: “You wrote this, you need to live up to your own standards”.

I believe that things have improved quite a lot since then.

 

BECAUSE DATA IS SUCH A POWERFUL TOOL AND HOW YOU DISPLAY IT CAN HAVE A BIG IMPACT, WHAT RESPONSIBILITY DO JOURNALISTS AND OTHERS HAVE WHEN PRODUCING DATA VISUALIZATIONS?

  • We have the responsibility first of verifying information as much as possible before we even begin designing a visualization.

We need to be more serious in verifying our data. I believe that ProPublica is a good model on how to do this. They constantly partner up with experts, experts in statistics, in science etcetera before they publish anything. They spend a lot of time not just producing the visualization but actually verifying the data they are about to show and making sure that data is right and everything is fact checked.

  • Making sure that the graphics are understandable, that they are not too simplistic. But at the same time they are understandable by the intended audience.

One of the things that alarms me the most is observing how people read graphics and how people generally misinterpret them.

I put myself in the group that needs to test this with regular people more. Because sometimes I believe we do visualizations so they will be retweeted by people who also produce visualizations rather than produce graphics that are actually are informative and useful for the general public.

We need to stop doing that and remember who we are serving, who the community really is. It is not the community of visualization or infographic design, we are serving the public.

 

WHAT ADVICE DO YOU HAVE FOR DATA TEAMS TODAY?

Always read the metadata, always talk to sources who know much more about the data than we do, don’t just download a dataset from the internet and visualize it unless you are really sure what the data is about or it is a really common source of data. Always go to people who can help you understand the data better.

Another area in which I’ve been increasingly interested, is the visualization of all sorts of uncertainty. Both uncertainty that we can measure and we can represent.

It all comes down to learning statistics or, more broadly, numeracy. What I have observed in some organizations conducting this kind of journalism is that they tend to hire [people] who have technical backgrounds which is great, encoders, developers etc. and journalists. They sometimes forget that it is also good to have some sort of quantitative expertise in the team.

Follow what the data team is up to. Subscribe to our newsletter.