Visualizing Data Is An Art - We Should Treat It Like One

The case for less rigor and more nuance

Shri Khalpada

Shri Khalpada

As we teeter towards a post-truth world, I've been thinking a lot about how we communicate information and the role of data visualization. It's a surprisingly young field that uniquely sits at the intersection of art and science. Much of the lexicon is actively being written and, perhaps counterintuitively, I'm going to advocate for a bit less science and a bit more art.

Everything I write here is, of course, just my opinion.

Perceptual Precision

Before we dive into the art of data visualization, it's helpful to understand some of the science. To that end, let's play a quick game.

See if you can guess the values of the unlabeled pie chart:

100% allocated

And now, try the same thing for the bar graph. The axes are labeled, which is common for bar graphs, but the data values are for you to guess:

100% allocated

You'll likely find that the pie chart is much harder to guess than the bar chart.

The reason is that humans are much better at comparing lengths than they are at comparing angles. This phenomenon was extensively researched in papers like 1984's Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods by William S. Cleveland and Robert McGill.

These psychological underpinnings of charts are sound and important to understand. The ability to quickly and accurately interpret a chart, which I'll call a chart's precision, is one of the few things that can be measured in data visualization.

The Pie Chart Wars

With the experiment above, it stands to reason that we can come up with a hierarchy of chart types based on their precision. If we can objectively say that measuring position is easier than measuring angle, then we should be able to extend that logic to color, space, and other encodings as well. And with that, we can start to form rules about which chart types are better than others.

The red herring, in my view, is this idea that precision is the only thing that matters.

The debates around pie charts are a great example of this. For decades, it's been maligned as a chart type that should be avoided. While there have been nuanced conversations on its merits, I've also seen a lot of arguments that feel dogmatic from pioneers in the field.

Steven Few calls the pie chart "by far the least effective graph". Cole Nussbaumer Knaflic wrote "death to pie charts" in 2011. Edward Tufte wrote "pie charts should never be used" in his seminal 1983 book The Visual Display Of Quantitative Information.

Of all the graphs that play major roles in the lexicon of quantitative communication, however, the pie chart is by far the least effective.

Stephen Few, Data Visualization Pioneer

I don't particularly have strong feelings about pie charts myself. I think there are things it does do better than a bar chart, and there are loads of blog posts articulating this in greater detail than I will here.

Instead, I feel unsettled about how these arguments are often framed.

The Case For Nuance In Art

Zooming out a bit, I firmly believe that data visualization is an art form. Being quantitative doesn't preclude it from being one. It can be a science, an art, and a craft all at once.

And creating art is about more than just precision. It's about evoking emotion, telling stories, and sparking curiosity. It's about making people think and feel. It's about the navigation of tradeoffs to communicate something.

The fact that some disciplines of an art form demand more precision doesn't mean the whole field shares those demands. It would be like saying best practices for photorealism should be applied to abstract expressionism, or how we approach technical writing should be the same as how we approach poetry.

This is what I think arguments like the pie chart debates miss. The arguments are often framed in absolute terms ("pie charts should never be used") and rarely discuss use cases outside of fields where precision is paramount, like business intelligence. These domains do not encompass all of data visualization.

"Data Visualization Is An Art" Is Not The Same As Data Art

One distinction I see come up a lot is between "regular" data visualization and "data art". The former is seen as more scientific, perhaps more serious, and the latter is seen as more creative and expressive.

These distinctions can be pragmatic, but I find it helpful to frame data visualization itself as an art form. And like all art forms, there are idiomatic styles and unique approaches. There are hyperrealists, abstract expressionists, impressionists, and surrealists. There are poets, technical writers, novelists, and journalists. And, importantly, there are people doing all kinds of interesting things in between.

Fields like business intelligence, journalism, and data art often feel siloed.

Instead of rigid domains, I find it more helpful to think of each visualization as carefully balancing tradeoffs between many dimensions, like precision, ethics, aesthetics, storytelling, and many others. No visualization will be perfect at everything, just as no piece of text can perfectly communicate everything. Good art, however, is intentional about these tradeoffs.

A completed project is only made up of our intention and our experiments around it. Remove intention and all that's left is the ornamental shell.
Rick Rubin, Music Producer

The only hard and fast rule I can think of is to not be unethical. Beyond that, I think there is a lot of room for creativity.

My Mental Model

The mental model I've settled on for now is to think of data visualization as a combination of primitives, dimensions, and goals.

Primitives are the building blocks of a visualization. They're the shapes, colors, typography, encodings, interactions, and everything else that tangibly makes up a chart.

Dimensions are the different aspects of a visualization that we need to trade off against each other, like precision, aesthetics, and ethics.

Goals are the things we hope our audience can do with our visualizations, like communicating a specific insight, making it easy to spot trends in a dense dataset, or making the audience feel a certain emotion.

The general question of data visualization can then be framed as: how do I combine the right primitives in a way that will achieve my goal?

Case Study: Climate Data

This has been pretty abstract so far, so let's dive into some concrete visualizations. We'll focus on climate data for these examples, since it's an important topic that has been visualized in many different ways.

For the sake of simplicity, we'll evaluate each example across these dimensions, while acknowledging that there are many more:

  • Ethics: Is the visualization misleading in any way?
  • Precision: Is it easy to map the visuals to numbers?
  • Aesthetics: Is the visualization visually appealing or a delight to use?
  • Cognitive Load: How much effort does it take to understand the visualization?
  • Context: Does the visualization explicitly add context to help guide us?
  • Data Volume: Does the visualization show a lot of data at once?

Evaluating these dimensions is subjective, so I'll provide my opinion on each visualization.

Note
The data for the visualizations below are all from Copernicus, which aggregates data from various sources. These visualizations average the data across all sources and show the temperature anomaly from the pre-industrial period. A temperature anomaly is the difference between the observed temperature and the average temperature over a baseline period of time for a location, in this case from 1850 to 1900.

Case Study: A Personal Dataset

To show how these principles can also apply to a more playful dataset, I want to share a personal visualization I made.

When the world started to open back up after the pandemic, I was working remotely and found myself spending a lot of time at coffee shops. I used my Google Maps timeline data to pull each coffee shop session I had, and spent a lot of time trying to come up with a cool way to visualize it.

After many iterations, I landed on the visualization below. It uses a visual metaphor, a coffee cup in the middle, with a radial chart around it representing the year. Each line represents a visit, and the length of the line represents the duration of the visit. The shaded arcs represent weekends. I'm not sure if this type of chart has a name.

I added some filters so that it could provide me with some real insights in addition to being a fun way to look back at my year. I also added summary statistics, so that I could combine the artistic visualization with more precision. All of that together culminated in the visualization below.

My Year In Coffee Shops
Every Time I Went To A Coffee Shop Between My 29th and 30th Birthdays
Total Time
498 hours
Total Visits
193 days
Average Visit
2h 34m
Longest Visit
4h 56m
Start Time:
9:00 AM to 6:00 PM
End Time:
9:00 AM to 6:00 PM
Duration:
0:00 to 5:00
Day:
4/9/22 to 4/9/23
Day Of Week:
Sunday
+ 6

This is not a precise visualization. It doesn't give me all the answers right away. I can't look at it and immediately tell you how long I spent at each coffee shop on any given day. But I find it delightful, and every time I look at it, I see something a little different about how I spent my year.

Don't Sacrifice For Sacrifice's Sake

As some of the examples above show, there are ultimately good and bad reasons to sacrifice precision. It is completely reasonable to give up some precision to:

  • Reduce unnecessary cognitive load and emphasize a feeling, as in the Climate Stripes example.
  • Encode a much higher volume of data, as in the Climate Globe example.
  • Add delight and use a visual metaphor to show trends, as in the Coffee Shop example.

There are also bad reasons to sacrifice precision, like arbitrarily picking a chart type that makes the visual confusing without adding incremental value elsewhere. Or even malicious reasons, like intentionally withholding information to mislead or manipulate.

Parting Thoughts

There's certainly a place for rules, best practices, chart taxonomies, and all the other tools that help us communicate data effectively. I've also heard arguments that there are enough bad visualizations out there that we should be more dogmatic about the rules, and I think there's some merit to that. But I don't think that's what the field as a whole should aspire to. Certainly not the leading edge of it.

Visualizing data lets us use our incredible human powers of pattern recognition to try to see some ground truth that exists in the world. But it's only a lens, and each layer of abstraction we add to it, like going from a real-world event to a number, or a number to a chart, or a chart to a story, adds a layer of interpretation and potential for error or misuse. It also adds a layer of deeper understanding, empathy, delight, and meaning. That's a lot of responsibility and worth some thoughtfulness.

The truth is infinitely complex and a model is merely an approximation to the truth. If the approximation is poor or misleading, then the model is useless.
Thaddeus Tarpey, Statistician

I hope this exploration has been helpful, or at least interesting for you to read! I'd love to hear your thoughts on this topic, so feel free to reach out to me on BlueSky. If you wanted to support me further, buying me a coffee would be much appreciated. It helps us keep the lights on and the servers running! ☕

We're just getting started.

Subscribe for more thoughtful, data-driven explorations.