=======================================================================================

1 Preface

When done right, graphs can be appealing, informative, and of considerable value to an academic article. Unfortunately, researchers generally suck at making good graphs. I surmise that this is because researchers do not completely master their graphing software, and they are either too lazy or too busy to change this state of affairs. Consequently, the graphs that researchers produce are often no more than a distortion of the ideal Platonian graph that the researcher had in mind.

This compendium facilitates the creation of good graphs by presenting a set of concrete examples, ranging from the trivial to the advanced. The graphs can all be reproduced and adjusted by copy-pasting code into the R console.

Almost every example in this compendium is driven by the same philosophy: A good graph is a simple graph, in the Einsteinian sense that a graph should be made as simple as possible, but not simpler.

I will close with a request and a piece of advice. The request: if you create a clean graph in R that you believe is a candidate for inclusion in this compendium, please do not hesitate to contact me at EJ.Wagenmakers@gmail.com. Your contribution will be acknowledged explicitly, alongside the code you provided. The advice: when you create a clean graph in R, put it on Flickr (public license) before you sign away your copyright to a publisher. For an example, see Figure 1 from this paper.

This work has profited greatly from interactions with my colleagues, many of whom have contributed graphs of their own. I am also endebted to Quentin Gronau, who has added new graphs and beautified existing ones.

2 Introduction

Producing clean graphs can be a challenging task. First you have to consider what is the best way in which to convey the information: a line graph, a histogram, a multi-panel plot; such conceptual dilemma’s are not dealt with in this compendium, and instead we recommend the reader to the chapters on creating graphs in the excellent book by Briscoe (1996). Second, you have to use computer software to translate the conceptual graph to a publication-ready figure. This is the phase where this compendium may be useful, because it brings together R code for producing a set of clean, publication-ready figures. Hopefully this will make it easy to copy-paste and adjust the code to suit your own needs.

In my experience, many graphs can be dramatically improved by adhering to the following guidlines: (1) invest sufficient time and effort in the process; (2) omit needless graphical elements, that is, make every element count; (3) judge the relative impact of the graphical elements and ensure that they are nicely balanced; (4) use large font sizes for all text; (5) deviate from the R default settings – with a little effort, you can do a lot better.

This compendium does not discuss figure headings. However, I will say that it is clearly desirable to have the main message of a figure be understood without being forced to read the main text. If possible, start your figure heading by stating what the figure is meant to demonstrate (i.e., its interpretation). For example, do not state “Popularity as a function of president height”; instead, state “Taller presidents are more popular”.

Finally, a note on color. Many graphs look better in color, but there are two complications. First, some academic journals do not publish manuscripts in color, at least not without charging a hefty price. Second, many readers and reviewers do not have a color printer. Below, some graphs have color, whereas others only use grey-scales. Of course this is one of the easiest things to adjust.

Based on this compendium, learning to create good graphs in R will be 80% copy-paste and 20% tinkering. Let’s go plot ourselves some graphs!

3 Correlations

Whenever a researcher reports a correlation, it is imperative to plot the data. [Anscombe’s quartet](http://en.wikipedia.org/wiki/Anscombe's_quartet) (plotted below) is a famous demonstration of this fact.

3.1 The Electoral Advantage of Being Tall

This plot shows the relation between the height ratio of US presidents and the percentage of the popular vote. Note the large circles for the data, the thick line for the linear relation, and the large font size for the axis labels. Also, note that the line does not touch the y-axis (a subtlety that requires deviating from the default).

Show R-Code

plot of chunk code_snippet1

4 Histograms

Histograms are relatively straightforward to create and to interpret. In fact, some people may even find them boring. Luckily, it is easy to increase the reader’s interest level by adding information to the plot. Below we illustrate various ways by which this may be accomplished.

4.1 Including “rug” Tick Marks

When in doubt, add tick marks that showcase the individual data points. This is particularly useful when the number of data points is small. The code below is courtesy of Helen Steingroever. Note that the rug tick marks are jittered.

Show R-Code

plot of chunk code_snippet2

4.2 Including a Density Estimator

In R, it is easy to include a nonparametric density estimator. This requires that freq = F in the histogram comment. Courtesy of Helen Steingroever.

Show R-Code

plot of chunk code_snippet3

4.3 Including Numbers on Top

This example shows how to display the bar heights, using the function l_ply. Courtesy of Helen Steingroever and Quentin Gronau.

Show R-Code

plot of chunk code_snippet4

5 Line Plots

The line plot is one of the most standard plots. Nevertheless, many researchers fail to realize that line plots deserve love and attention too.

5.1 Regular Line Plot

This graph plots error bars with a user-defined function. More to the point, the lines are thick, and they do not overlap with the symbols (type = "c"). Note that the legend is not needed; the legend text could simply have been positioned near the associated grapphical elements.

Show R-Code

plot of chunk code_snippet5

5.2 Box Plot

Similar to the above, this plot shows the distribuion of the data with a user-defined boxplot function.

Show R-Code

plot of chunk code_snippet6

5.3 Violin Plot

By now this plot should look familiar. The distribution of the data is now indicated with a violin plot instead of a box plot. Courtesy of Henrik Singmann, who tweaked the results from the vioplot package. Warning: this a a lot of code.

Show R-Code

plot of chunk code_snippet7

5.4 Combined Line and Bar Plot

In many psychological experiments, there are two dependent variables for each participant: mean response time (RT) and mean proportion of errors. This plot shows them both – RTs are on the left y-axis, and errors are on the right y-axis.

Show R-Code

plot of chunk code_snippet8

6 Bar Plots

Like their histogram cousin, bar plots are intrinsically boring.

6.1 Including Error Bars

The title says it all. Note that the error bars are added with the l_ply function. Courtesy of Helen Steingroever and Quentin Gronau.

Show R-Code

plot of chunk code_snippet9

7 Densities

Densities are ubiquitous, particularly for those who have a Bayesian inclination. As for the histogram and the bar plot, it is generally a good idea to add more information to the bare-bones plot.

7.1 Standard

This is a relatively standard plot. Note the thickness of the lines and the font size for the axis labels.

Show R-Code

plot of chunk code_snippet10

7.2 With a Histogram on Top

This plot adds a histogram to the density plot, but without needlessly displaying the vertical histogram lines as well. In addition, the code defines the extent to which the lines are transparent, so that both the density and the histogram remain visible, and one does not completely block the other from view.

Show R-Code

plot of chunk code_snippet11

7.3 Including Text

This plot adds text to the plot. Although this is generally trivial, this particular example contains a mathematical symbol that is tricky to display properly (unless, of course, you know how it works).

Show R-Code

plot of chunk code_snippet12

7.4 Another Example

This is another example, featuring a nice Greek letter. Seriously, what is important here is that the labels are positioned next to the associated graphical element. This approach is more direct than creating a legend, when the reader has to decode the legend first, keep the symbols in working memory, and then turn attention to the graph itself. Bottom line: only use legends when you have to. Even then, you may find that the legend box almost never fulfills a useful function, and can safely be omitted.

Show R-Code

plot of chunk code_snippet13

7.5 Highlighting Specific Areas

It is cool to be able to highlight specific parts of a density by some color coding scheme. In this example, Ravi Selker shows how that can be done (hint: it’s the polygon function).

Show R-Code

plot of chunk code_snippet14

7.6 More Highlighting of Specific Areas

Mijke Rhemtulla also likes to highlight specific parts of a density. This is the first plot in a series, taken from one of Mijke’s stats courses.

Show R-Code

plot of chunk code_snippet15

7.7 Still More Highlighting

Part 2…

Show R-Code

plot of chunk code_snippet16

7.8 Density Ratios

Part 3…

Show R-Code

plot of chunk code_snippet17

7.9 Many Density Ratios

Part 4… The take-home message from the last set of plots: use polygon, annotate the plot, and use large font sizes and thick graphical elements.

Show R-Code

plot of chunk code_snippet18

7.10 Stacked Densities

Michael Lee attended me to a “stacked densities plot” [http://nxn.se/post/97650612370/high-contrast-stacked-distribution-plots]. Quentin Gronau did the work and shows how multiple densities can be displayed at the same time, while still being discriminable. Note the use of the trans3d function.

Show R-Code

plot of chunk code_snippet19

8 Functions

It can be very informative to plot a function. This is relatively straightforward once you stick to the basic principles (thick lines, annotate the plot, large font sizes).

8.1 Plotting a Function

What did we say? Thick lines, annotate the plot, large font sizes!

Show R-Code

plot of chunk code_snippet20

9 Time Series

What’s not to love about time series? In constrast to some of the previous plots, time series are virtually always interesting, almost mesmerizing. The bar plot compares to a time series as, well, a refridgerator compares to Marilin Monroe. The reason, of course, is that time series are highly informative: they usually contain many observations; moreover, they show how particular variables change over time (it is a time series, after all). Enough of the talking – let’s turn to some examples.

9.1 A Diffusion Process

Instead of giving a lecture about diffusion processes, I’ll point out that the lines are transparent. We’ve encountered this before but it was Guy Hawkins who showed me how to do this in R.

Show R-Code

plot of chunk code_snippet21

9.2 A Sequence of Choices

Helen Steingroever returns to us once again, this time with a choice profile for the Iowa gambling task. The plot conveys a lot of information: for one participant, the plot indicates the sequence of 100 choices among four choice alternatives, and whether or not each choice resulted in a win or a loss.

Show R-Code

plot of chunk code_snippet22

9.3 The Electoral Advantage of Being Tall Revisited

This plot shows the development of the Bayes factor (y-axis) as the data accumulate (x-axis). This procedure may give frequentists a heart attack but, in Bayes world, that’s just how we roll. What I like about the graph are the annotations on the right side of the plot, and the subtle horizontal lines that indicate Jeffreys’ criteria on the evidence. It took some time to figure out how to display the word “Evidence” in its current direction. To make this plot I “borrowed” code from Ruud Wetzels and Benjamin Scheibehenne.

Show R-Code

plot of chunk code_snippet23

9.4 A Sequential Test on \(\large{\pi}\)

And again the Bayesians flaunt their disdain for the sillyness of sampling plans. The plot below shows the development of the Bayes factor (y-axis) with the number of digits from \(\pi\). As the digits accumulate, so does the evidence in favor of the null hypothesis (yes frequentists, you read that right – evidence in favor of the null hypothesis).

The plot shows the maximum evidence (in red), the actual evidence (for two different priors), and the area that we can expect the Bayes factor be in \(95\%\) of the cases, should the null hypothesis hold. This is dirty frequentist reasoning of course, but the plot does show how it is possible to reject a null hypothesis even when the data provide a lot of support in its favor (i.e., the Jeffreys-Lindley paradox). Courtesy of Quentin Gronau.

Show R-Code

plot of chunk code_snippet24

10 Multiple Panels

To suitably impress the readership, any academic needs to be able to create a multi-panel graph. Below is a set of examples. When creating a multi-panel plot, the main challenge is to select the right number of panels (yes, you can have too many) so that the text and the symbols remain readible.

10.1 Two panel plot

This is one of my favorite plots, highlighting the difference between discrete probability mass and continous probability density. Credit goes to Michael Lee for conceptualizing the graph (it is presented in box 3.2 of our book) and to Quentin Gronau for the executing in R. Note the use of ablineclip for lines of distinct length and uniroot for finding the x-value that corresponds to five times the density of another x-value.

Show R-Code

plot of chunk code_snippet25

10.2 Buffon’s Needle

The only way to understand the title (and the plot) is to visit the Wikipedia entry on [Buffon’s needle](http://en.wikipedia.org/wiki/Buffon's_needle). Anyway, this is another two-panel plot, showing two posterior distributions for estimating \(\pi\) using an experiment that involved tossing a needle (ad nauseam).

Show R-Code

plot of chunk code_snippet26

10.3 Anscombe’s Quartet

Sometimes a graph is worth a thousand words. [Anscombe’s quartet](http://en.wikipedia.org/wiki/Anscombe's_quartet) famously drives home the idea that you should always plot your data. This code is based on the Anscombe plot in R. I personally don’t like lappy and similar complications – it may do the trick but when you have to describe the code as “magic” this signals a communication problem. Anyway, the point of the example is graphical display of course. As always, note the thick lines, the large symbols, and the large font size.

Show R-Code

plot of chunk code_snippet27

10.4 Four Quite Different Panels

Each panel of this plot shows something very different: histogram, density, point plot, and function. I like the annotations too. With help from Quentin Gronau.

Show R-Code

plot of chunk code_snippet28

10.5 Nine-Panel Posterior Predictives

Ravi Selker enters the stage and presents a nine-panel plot of posterior predictives. Note the use of textGrob and arrangeGrob. Nice work.

Show R-Code

plot of chunk code_snippet29

11 Graphs for JASP

11.1

Show R-Code

plot of chunk code_snippet30

11.2

Show R-Code

plot of chunk code_snippet31

11.3

Show R-Code

plot of chunk code_snippet32

11.4

Show R-Code

plot of chunk code_snippet33

11.5

Show R-Code

plot of chunk code_snippet34

12 Miscellaneous

Several cool plots do not fall neatly into the above categories.

12.1 Funnel Plot

This is a funnel plot, and it is courtesy of Mark Nieuwenstein. The code depends on the meta and metafor R packages.

Show R-Code

plot of chunk code_snippet35

12.2 Network Graph

Sacha Epskamp uses his qgraph package and shows how to display a network with nodes and connections.

Show R-Code

plot of chunk code_snippet36

12.3 Questionnaire Graph

Sacha Epskamp shows how to present the many outcomes from a questionnaire in a single graph.

Show R-Code

plot of chunk code_snippet37

13 References

Briscoe, M. H. (1996). Preparing scientific illustrations. Springer.

A Compendium of Clean Graphs in R

1 Preface

2 Introduction

3 Correlations

3.1 The Electoral Advantage of Being Tall

4 Histograms

4.1 Including “rug” Tick Marks

4.2 Including a Density Estimator

4.3 Including Numbers on Top

5 Line Plots

5.1 Regular Line Plot

5.2 Box Plot

5.3 Violin Plot

5.4 Combined Line and Bar Plot

6 Bar Plots

6.1 Including Error Bars

7 Densities

7.1 Standard

7.2 With a Histogram on Top

7.3 Including Text

7.4 Another Example

7.5 Highlighting Specific Areas

7.6 More Highlighting of Specific Areas

7.7 Still More Highlighting

7.8 Density Ratios

7.9 Many Density Ratios

7.10 Stacked Densities

8 Functions

8.1 Plotting a Function

9 Time Series

9.1 A Diffusion Process

9.2 A Sequence of Choices

9.3 The Electoral Advantage of Being Tall Revisited

9.4 A Sequential Test on \(\large{\pi}\)

10 Multiple Panels

10.1 Two panel plot

10.2 Buffon’s Needle

10.3 Anscombe’s Quartet

10.4 Four Quite Different Panels

10.5 Nine-Panel Posterior Predictives

11 Graphs for JASP

11.1

11.2

11.3

11.4

11.5

12 Miscellaneous

12.1 Funnel Plot

12.2 Network Graph

12.3 Questionnaire Graph

13 References