10 Useful Python Data Visualization Libraries for Any Discipline
Scroll through the Python Package Index and you’ll find libraries for practically every data visualization need—from GazeParser for eye movement research to pastalog for realtime visualizations of neural network training. And while many of these libraries are intensely focused on accomplishing a specific task, some can be used no matter what your field.
Today, we’re giving an overview of 10 interdisciplinary Python data visualization libraries, from the well-known to the obscure. We’ve noted the ones you can take for a spin without the hassle of running Python locally, using Mode Python Notebooks.
matplotlib
Two histograms (matplotlib)
matplotlib is the O.G. of Python data visualization libraries. Despite being over a decade old, it’s still the most widely used library for plotting in the Python community. It was designed to closely resemble MATLAB, a proprietary programming language developed in the 1980s.
Because matplotlib was the first Python data visualization library, many other libraries are built on top of it or designed to work in tandem with it during analysis. Some libraries like pandas and Seaborn are “wrappers” over matplotlib. They allow you to access a number of matplotlib’s methods with less code.
While matplotlib is good for getting a sense of the data, it’s not very useful for creating publication-quality charts quickly and easily. As Chris Moffitt points out in his overview of Python visualization tools, matplotlib “is extremely powerful but with that power comes complexity.”
matplotlib has long been criticized for its default styles, which have a distinct 1990s feel. The upcoming release of matplotlib 2.0 promises many new style changes to address this problem.
Created by: John D. Hunter
Where to learn more: matplotlib.org
Seaborn
Violinplot (Michael Waskom)
Seaborn harnesses the power of matplotlib to create beautiful charts in a few lines of code. The key difference is Seaborn’s default styles and color palettes, which are designed to be more aesthetically pleasing and modern. Since Seaborn is built on top of matplotlib, you’ll need to know matplotlib to tweak Seaborn’s defaults.
Created by: Michael Waskom
Where to learn more: http://web.stanford.edu/~mwaskom/software/seaborn/index.html
ggplot
Small multiples (ŷhat)
ggplot is based on ggplot2, an R plotting system, and concepts from The Grammar of Graphics. ggplot operates differently than matplotlib: it lets you layer components to create a complete plot. For instance, you can start with axes, then add points, then a line, a trendline, etc. Although The Grammar of Graphics has been praised as an “intuitive” method for plotting, seasoned matplotlib users might need time to adjust to this new mindset.
According to the creator, ggplot isn’t designed for creating highly customized graphics. It sacrifices complexity for a simpler method of plotting.
ggplot is tightly integrated with pandas, so it’s best to store your data in a DataFrame when using ggplot.
Created by: ŷhat
Where to learn more: http://ggplot.yhathq.com/
Bokeh
Interactive weather statistics for three cities (Continuum Analytics)
Like ggplot, Bokeh is based on The Grammar of Graphics, but unlike ggplot, it’s native to Python, not ported over from R. Its strength lies in the ability to create interactive, web-ready plots, which can be easily output as JSON objects, HTML documents, or interactive web applications. Bokeh also supports streaming and real-time data.
Bokeh provides three interfaces with varying levels of control to accommodate different user types. The highest level is for creating charts quickly. It includes methods for creating common charts such as bar plots, box plots, and histograms. The middle level has the same specificity as matplotlib and allows you to control the basic building blocks of each chart (the dots in a scatter plot, for example). The lowest level is geared toward developers and software engineers. It has no pre-set defaults and requires you to define every element of the chart.
Created by: Continuum Analytics
Where to learn more: http://bokeh.pydata.org/en/latest/
pygal
Box plot (Florian Mounier)
Like Bokeh and Plotly, pygal offers interactive plots that can be embedded in the web browser. Its prime differentiator is the ability to output charts as SVGs. As long as you’re working with smaller datasets, SVGs will do you just fine. But if you’re making charts with hundreds of thousands of data points, they’ll have trouble rendering and become sluggish.
Since each chart type is packaged into a method and the built-in styles are pretty, it’s easy to create a nice-looking chart in a few lines of code.
Created by: Florian Mounier
Where to learn more: http://www.pygal.org/en/latest/index.html
Plotly
Line plot (Plotly)
You might know Plotly as an online platform for data visualization, but did you also know you can access its capabilities from a Python notebook? Like Bokeh, Plotly’s forte is making interactive plots, but it offers some charts you won’t find in most libraries, like contour plots, dendograms, and 3D charts.
Created by: Plotly
Where to learn more: https://plot.ly/python/
geoplotlib
Choropleth (Andrea Cuttone)
geoplotlib is a toolbox for creating maps and plotting geographical data. You can use it to create a variety of map-types, like choropleths, heatmaps, and dot density maps. You must have Pyglet (an object-oriented programming interface) installed to use geoplotlib. Nonetheless, since most Python data visualization libraries don’t offer maps, it’s nice to have a library dedicated solely to them.
Created by: Andrea Cuttone
Where to learn more: https://github.com/andrea-cuttone/geoplotlib
Gleam
Scatter plot with trend line (David Robinson)
Gleam is inspired by R’s Shiny package. It allows you to turn analyses into interactive web apps using only Python scripts, so you don’t have to know any other languages like HTML, CSS, or JavaScript. Gleam works with any Python data visualization library. Once you’ve created a plot, you can build fields on top of it so users can filter and sort data.
Created by: David Robinson
Where to learn more: https://github.com/dgrtwo/gleam
missingno
Nullity matrix (Aleksey Bilogur)
Dealing with missing data is a pain. missingno allows you to quickly gauge the completeness of a dataset with a visual summary, instead of trudging through a table. You can filter and sort data based on completion or spot correlations with a heatmap or a dendrogram.
Created by: Aleksey Bilogur
Where to learn more: https://github.com/ResidentMario/missingno
Leather
Chart grid with consistent scales (Christopher Groskopf)
Leather’s creator, Christopher Groskopf, puts it best: “Leather is the Python charting library for those who need charts now and don’t care if they’re perfect.” It’s designed to work with all data types and produces charts as SVGs, so you can scale them without losing image quality. Since this library is relatively new, some of the documentation is still in progress. The charts you can make are pretty basic—but that’s the intention.
Created by: Christopher Groskopf
Where to learn more: http://leather.readthedocs.io/en/latest/index.html
Other great reads on Python data visualization
There are a ton of great evaluations and overviews of Python data visualization libraries out there. Check out some of our favorites:
- One Chart, Twelve Charting Libraries (Lisa Charlotte Rost)
- Overview of Python Visualization Tools (Practical Business Python)
- Python data visualization: Comparing 7 tools (Dataquest.io)
Did we miss your favorite data viz library? Let us know in the comments below.
Keep your finger on the pulse of analytics.
Each week we publish a roundup of the best analytics and data science content we can find. Sign up here: