(cache)Visualizing Data Is An Art - We Should Treat It Like One

As we teeter towards a post-truth world, I've been thinking a lot about how we communicate information and the role of data visualization. It's a surprisingly young field that uniquely sits at the intersection of art and science. Much of the lexicon is actively being written and, perhaps counterintuitively, I'm going to advocate for a bit less science and a bit more art.

Everything I write here is, of course, just my opinion.

Perceptual Precision

Before we dive into the art of data visualization, it's helpful to understand some of the science. To that end, let's play a quick game.

See if you can guess the values of the unlabeled pie chart:

100% allocated

And now, try the same thing for the bar graph. The axes are labeled, which is common for bar graphs, but the data values are for you to guess:

100% allocated

You'll likely find that the pie chart is much harder to guess than the bar chart.

The reason is that humans are much better at comparing lengths than they are at comparing angles. This phenomenon was extensively researched in papers like 1984's Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods by William S. Cleveland and Robert McGill.

These psychological underpinnings of charts are sound and important to understand. The ability to quickly and accurately interpret a chart, which I'll call a chart's precision, is one of the few things that can be measured in data visualization.

The Pie Chart Wars

With the experiment above, it stands to reason that we can come up with a hierarchy of chart types based on their precision. If we can objectively say that measuring position is easier than measuring angle, then we should be able to extend that logic to color, space, and other encodings as well. And with that, we can start to form rules about which chart types are better than others.

The red herring, in my view, is this idea that precision is the only thing that matters.

The debates around pie charts are a great example of this. For decades, it's been maligned as a chart type that should be avoided. While there have been nuanced conversations on its merits, I've also seen a lot of arguments that feel dogmatic from pioneers in the field.

Steven Few calls the pie chart "by far the least effective graph". Cole Nussbaumer Knaflic wrote "death to pie charts" in 2011. Edward Tufte wrote "pie charts should never be used" in his seminal 1983 book The Visual Display Of Quantitative Information.

Of all the graphs that play major roles in the lexicon of quantitative communication, however, the pie chart is by far the least effective.

Stephen Few, Data Visualization Pioneer

I don't particularly have strong feelings about pie charts myself. I think there are things it does do better than a bar chart, and there are loads of blog posts articulating this in greater detail than I will here.

Instead, I feel unsettled about how these arguments are often framed.

The Case For Nuance In Art

Zooming out a bit, I firmly believe that data visualization is an art form. Being quantitative doesn't preclude it from being one. It can be a science, an art, and a craft all at once.

And creating art is about more than just precision. It's about evoking emotion, telling stories, and sparking curiosity. It's about making people think and feel. It's about the navigation of tradeoffs to communicate something.

The fact that some disciplines of an art form demand more precision doesn't mean the whole field shares those demands. It would be like saying best practices for photorealism should be applied to abstract expressionism, or how we approach technical writing should be the same as how we approach poetry.

This is what I think arguments like the pie chart debates miss. The arguments are often framed in absolute terms ("pie charts should never be used") and rarely discuss use cases outside of fields where precision is paramount, like business intelligence. These domains do not encompass all of data visualization.

"Data Visualization Is An Art" Is Not The Same As Data Art

One distinction I see come up a lot is between "regular" data visualization and "data art". The former is seen as more scientific, perhaps more serious, and the latter is seen as more creative and expressive.

These distinctions can be pragmatic, but I find it helpful to frame data visualization itself as an art form. And like all art forms, there are idiomatic styles and unique approaches. There are hyperrealists, abstract expressionists, impressionists, and surrealists. There are poets, technical writers, novelists, and journalists. And, importantly, there are people doing all kinds of interesting things in between.

Fields like business intelligence, journalism, and data art often feel siloed.

Instead of rigid domains, I find it more helpful to think of each visualization as carefully balancing tradeoffs between many dimensions, like precision, ethics, aesthetics, storytelling, and many others. No visualization will be perfect at everything, just as no piece of text can perfectly communicate everything. Good art, however, is intentional about these tradeoffs.

A completed project is only made up of our intention and our experiments around it. Remove intention and all that's left is the ornamental shell.

Rick Rubin, Music Producer

The only hard and fast rule I can think of is to not be unethical. Beyond that, I think there is a lot of room for creativity.

My Mental Model

The mental model I've settled on for now is to think of data visualization as a combination of primitives, dimensions, and goals.

Primitives are the building blocks of a visualization. They're the shapes, colors, typography, encodings, interactions, and everything else that tangibly makes up a chart.

Dimensions are the different aspects of a visualization that we need to trade off against each other, like precision, aesthetics, and ethics.

Goals are the things we hope our audience can do with our visualizations, like communicating a specific insight, making it easy to spot trends in a dense dataset, or making the audience feel a certain emotion.

The general question of data visualization can then be framed as: how do I combine the right primitives in a way that will achieve my goal?

Case Study: Climate Data

This has been pretty abstract so far, so let's dive into some concrete visualizations. We'll focus on climate data for these examples, since it's an important topic that has been visualized in many different ways.

For the sake of simplicity, we'll evaluate each example across these dimensions, while acknowledging that there are many more:

Ethics: Is the visualization misleading in any way?
Precision: Is it easy to map the visuals to numbers?
Aesthetics: Is the visualization visually appealing or a delight to use?
Cognitive Load: How much effort does it take to understand the visualization?
Context: Does the visualization explicitly add context to help guide us?
Data Volume: Does the visualization show a lot of data at once?

Evaluating these dimensions is subjective, so I'll provide my opinion on each visualization.

Note

The data for the visualizations below are all from Copernicus, which aggregates data from various sources. These visualizations average the data across all sources and show the temperature anomaly from the pre-industrial period. A temperature anomaly is the difference between the observed temperature and the average temperature over a baseline period of time for a location, in this case from 1850 to 1900.

1) A Data Table

The most precise data visualization option is usually a table. This option is great if you need easy access to an exact number, but makes it basically impossible to grok any trends at a glance.

EthicsHigh

PrecisionHigh

AestheticsLow

Cognitive LoadLow

ContextLow

Data VolumeMedium

1850	-0.04 °C
1851	0.09 °C
1852	0.12 °C
1853	0.08 °C
1854	0.09 °C
1855	0.08 °C
1856	0.01 °C
1857	-0.10 °C
1858	-0.01 °C
1859	0.10 °C
1860	-0.04 °C
1861	-0.09 °C
1862	-0.18 °C
1863	-0.01 °C
1864	-0.05 °C
1865	0.06 °C
1866	0.06 °C
1867	0.04 °C
1868	0.05 °C
1869	0.08 °C
1870	0.03 °C
1871	0.01 °C
1872	0.02 °C
1873	0.02 °C
1874	-0.02 °C
1875	-0.02 °C
1876	-0.05 °C
1877	0.30 °C
1878	0.36 °C
1879	0.07 °C
1880	0.05 °C
1881	0.14 °C
1882	0.08 °C
1883	0.04 °C
1884	-0.10 °C
1885	-0.09 °C
1886	-0.08 °C
1887	-0.14 °C
1888	0.05 °C
1889	0.14 °C
1890	-0.13 °C
1891	-0.01 °C
1892	-0.08 °C
1893	-0.09 °C
1894	-0.08 °C
1895	-0.01 °C
1896	0.11 °C
1897	0.12 °C
1898	-0.07 °C
1899	0.06 °C
1900	0.16 °C
1901	0.10 °C
1902	-0.04 °C
1903	-0.15 °C
1904	-0.22 °C
1905	-0.03 °C
1906	0.05 °C
1907	-0.13 °C
1908	-0.17 °C
1909	-0.20 °C
1910	-0.17 °C
1911	-0.18 °C
1912	-0.11 °C
1913	-0.09 °C
1914	0.10 °C
1915	0.14 °C
1916	-0.08 °C
1917	-0.20 °C
1918	-0.06 °C
1919	-0.00 °C
1920	0.02 °C
1921	0.09 °C
1922	0.00 °C
1923	0.02 °C
1924	0.02 °C
1925	0.06 °C
1926	0.19 °C
1927	0.09 °C
1928	0.10 °C
1929	-0.07 °C
1930	0.15 °C
1931	0.21 °C
1932	0.15 °C
1933	-0.00 °C
1934	0.15 °C
1935	0.09 °C
1936	0.14 °C
1937	0.28 °C
1938	0.29 °C
1939	0.28 °C
1940	0.36 °C
1941	0.37 °C
1942	0.30 °C
1943	0.32 °C
1944	0.46 °C
1945	0.35 °C
1946	0.21 °C
1947	0.25 °C
1948	0.19 °C
1949	0.19 °C
1950	0.11 °C
1951	0.25 °C
1952	0.30 °C
1953	0.37 °C
1954	0.16 °C
1955	0.13 °C
1956	0.07 °C
1957	0.31 °C
1958	0.34 °C
1959	0.30 °C
1960	0.25 °C
1961	0.32 °C
1962	0.29 °C
1963	0.31 °C
1964	0.07 °C
1965	0.16 °C
1966	0.22 °C
1967	0.25 °C
1968	0.19 °C
1969	0.33 °C
1970	0.28 °C
1971	0.17 °C
1972	0.28 °C
1973	0.41 °C
1974	0.17 °C
1975	0.22 °C
1976	0.13 °C
1977	0.42 °C
1978	0.33 °C
1979	0.44 °C
1980	0.54 °C
1981	0.59 °C
1982	0.40 °C
1983	0.58 °C
1984	0.40 °C
1985	0.38 °C
1986	0.45 °C
1987	0.59 °C
1988	0.64 °C
1989	0.52 °C
1990	0.72 °C
1991	0.68 °C
1992	0.47 °C
1993	0.51 °C
1994	0.57 °C
1995	0.72 °C
1996	0.61 °C
1997	0.74 °C
1998	0.90 °C
1999	0.66 °C
2000	0.66 °C
2001	0.81 °C
2002	0.89 °C
2003	0.88 °C
2004	0.80 °C
2005	0.96 °C
2006	0.91 °C
2007	0.91 °C
2008	0.79 °C
2009	0.92 °C
2010	1.00 °C
2011	0.88 °C
2012	0.91 °C
2013	0.95 °C
2014	1.00 °C
2015	1.15 °C
2016	1.29 °C
2017	1.19 °C
2018	1.12 °C
2019	1.24 °C
2020	1.27 °C
2021	1.11 °C
2022	1.15 °C
2023	1.45 °C
2024	1.58 °C

2) A Misleading Line Graph

Precise doesn't always equal ethical. This line graph is technically correct, but the y-axis range is from -10°C to 10°C, making the temperature changes look much less dramatic. But in climate data, even a half degree change can have a huge impact. Visualizing data often requires more context than just the numbers themselves.

EthicsLow

PrecisionMedium

AestheticsLow

Cognitive LoadLow

ContextLow

Data VolumeMedium

3) A Line Graph With Better Scale

This example shows a more accurate line graph of climate data. Using a more reasonable y-axis range, we can see the urgency of the situation much more intuitively. The tighter range also gives us more precision. Small flourishes, like the dashed line showing the 0°C mark, can help provide context on which years were warmer or cooler than average.

EthicsHigh

PrecisionMedium

AestheticsLow

Cognitive LoadLow

ContextLow

Data VolumeMedium

4) A Line Graph With Double Encoding

This example introduces double encoding, where we use both position and color to encode the data. This turns the line graph into an area graph. The colors are a classic diverging palette for this dataset that helps to emphasize the temperature changes over time. Double encoding can be visually striking and help with accessibility, but can also add cognitive load if not done carefully.

EthicsHigh

PrecisionMedium

AestheticsMedium

Cognitive LoadLow

ContextLow

Data VolumeMedium

5) A Diverging Bar Graph

This example explores double encoding as well, but in the shapes themselves rather than the area under a line. It essentially replaces the line with bars and rotates the graph 90 degrees. This can be a visually striking way to show data that has inherent positive and negative values, but is less familiar to most audiences than a line graph.

EthicsHigh

PrecisionMedium

AestheticsMedium

Cognitive LoadLow

ContextLow

Data VolumeMedium

6) A Line Graph With More Context

This example contextualizes the line graph with annotations about the Paris Agreement targets. This can help the audience understand the data in a broader context. We could also add annotations for other significant events, like the Industrial Revolution, world wars, the invention of the internet, etc. Note that context doesn't guarantee shared understanding. Different people will interpret the same context in different ways.

EthicsHigh

PrecisionMedium

AestheticsMedium

Cognitive LoadMedium

ContextHigh

Data VolumeMedium

7) A Line Graph With Multiple Lines

This example shows each decade as a separate line, with a gradient color representing the temperature change over the decade. This can help guide the audience towards understanding the increasing urgency of the situation in recent years. It is less familiar to most audiences than a single line graph, and adds some complexity to the build, but can be effective if done well.

EthicsHigh

PrecisionMedium

AestheticsMedium

Cognitive LoadMedium

ContextLow

Data VolumeMedium

8) A Line Graph With Interaction

This example shows a line graph of climate data with a tooltip that appears when you hover over the data points. This can be helpful if you want to offer more data precision to the audience. However, it does add a bit of complexity in the build and cognitive load for the user, so it's best used when the data is complex or when the user needs to see the data in detail. Other ways to offer precision include coupling a chart with a table, or simply offering a data download option.

EthicsHigh

PrecisionHigh

AestheticsMedium

Cognitive LoadMedium

ContextHigh

Data VolumeMedium

9) Climate Stripes

Do we need to be precise to get the audience to feel something? The "Climate Stripes" visualization, developed by Ed Hawkins, shows the global temperature anomaly over time only using color. Each stripe represents a year, with blue stripes indicating cooler-than-average years and red stripes indicating warmer-than-average years. This visualization is simple and intuitive, and has become most famous climate visualization in the world.

EthicsHigh

PrecisionLow

AestheticsHigh

Cognitive LoadLow

ContextLow

Data VolumeMedium

Note that this visualization was pulled from showyourstripes.info. It may use slightly different data than the other examples.

10) Climate Globe

This example uses an animated globe to show the global temperature anomaly over time. The globe is a powerful way to show geographic data, avoiding pitfalls of map projections while being a visually striking way to show data. With the scrubber below, this visualization communicates thousands of data points in a single view, but requires user interaction to see change over time.

EthicsHigh

PrecisionLow

AestheticsHigh

Cognitive LoadMedium

ContextLow

Data VolumeHigh

2023

Note that this visualization is a subset of another PerThirtySix project. It uses slightly different data than the other examples.

Climate Case Study Summary

There are infinitely many ways to visualize climate data or any other dataset. The examples in this project were chosen to highlight a few different approaches to visualizing climate data, each balancing different priorities.

One good approach can be to combine a few different types of visualizations that help balance each other out. For example, in Our Reddening Globe, we combine the climate globe with a diverging bar chart to show both granular geographic data and a global overview.

There are, of course, many incredible data visualizations around climate change that we didn't cover here. A few that have left an impact on me are NASA's Climate Spiral, various works by the New York Times team, and this infographic showing projections of global warming by the U.N.

The visualizations for this project, unless otherwise stated, were created by me using the Vue, p5.js, Three.js, and Tailwind. For inspiration on using color for this type of data, The Science and Art of Colour in Climate Mapping by climatedata.ca is a great resource.

Case Study: A Personal Dataset

To show how these principles can also apply to a more playful dataset, I want to share a personal visualization I made.

When the world started to open back up after the pandemic, I was working remotely and found myself spending a lot of time at coffee shops. I used my Google Maps timeline data to pull each coffee shop session I had, and spent a lot of time trying to come up with a cool way to visualize it.

After many iterations, I landed on the visualization below. It uses a visual metaphor, a coffee cup in the middle, with a radial chart around it representing the year. Each line represents a visit, and the length of the line represents the duration of the visit. The shaded arcs represent weekends. I'm not sure if this type of chart has a name.

I added some filters so that it could provide me with some real insights in addition to being a fun way to look back at my year. I also added summary statistics, so that I could combine the artistic visualization with more precision. All of that together culminated in the visualization below.

My Year In Coffee Shops

Every Time I Went To A Coffee Shop Between My 29th and 30th Birthdays

Total Time

498 hours

Total Visits

193 days

Average Visit

2h 34m

Longest Visit

4h 56m

Start Time:

9:00 AM to 6:00 PM

End Time:

9:00 AM to 6:00 PM

Duration:

0:00 to 5:00

Day:

4/9/22 to 4/9/23

Day Of Week:

Sunday

+ 6

This is not a precise visualization. It doesn't give me all the answers right away. I can't look at it and immediately tell you how long I spent at each coffee shop on any given day. But I find it delightful, and every time I look at it, I see something a little different about how I spent my year.

Don't Sacrifice For Sacrifice's Sake

As some of the examples above show, there are ultimately good and bad reasons to sacrifice precision. It is completely reasonable to give up some precision to:

Reduce unnecessary cognitive load and emphasize a feeling, as in the Climate Stripes example.
Encode a much higher volume of data, as in the Climate Globe example.
Add delight and use a visual metaphor to show trends, as in the Coffee Shop example.

There are also bad reasons to sacrifice precision, like arbitrarily picking a chart type that makes the visual confusing without adding incremental value elsewhere. Or even malicious reasons, like intentionally withholding information to mislead or manipulate.

Parting Thoughts

There's certainly a place for rules, best practices, chart taxonomies, and all the other tools that help us communicate data effectively. I've also heard arguments that there are enough bad visualizations out there that we should be more dogmatic about the rules, and I think there's some merit to that. But I don't think that's what the field as a whole should aspire to. Certainly not the leading edge of it.

Visualizing data lets us use our incredible human powers of pattern recognition to try to see some ground truth that exists in the world. But it's only a lens, and each layer of abstraction we add to it, like going from a real-world event to a number, or a number to a chart, or a chart to a story, adds a layer of interpretation and potential for error or misuse. It also adds a layer of deeper understanding, empathy, delight, and meaning. That's a lot of responsibility and worth some thoughtfulness.

The truth is infinitely complex and a model is merely an approximation to the truth. If the approximation is poor or misleading, then the model is useless.

Thaddeus Tarpey, Statistician

I hope this exploration has been helpful, or at least interesting for you to read! I'd love to hear your thoughts on this topic, so feel free to reach out to me on BlueSky. If you wanted to support me further, buying me a coffee would be much appreciated. It helps us keep the lights on and the servers running! ☕