Published in

Towards Data Science

You have 2 free member-only stories left this month.

Clare Blessen

Nov 20, 2019

6 min read

Matplotlib vs. Seaborn vs. Plotly

How can you amplify your data visualizations?

Clear, effective data visualization is key to optimizing your ability to convey findings. With various packages in use such as Matplotlib, Seaborn, and Plotly, knowing the capabilities of each and the syntax behind them can become bewildering. I’m going to walk you through creating some common graphs in Python using each of these packages using a csv file of the 2017 Spotify top tracks.

First, i’ll import the pandas package to read my csv into an easily readable dataframe.

import pandas as pd
df = pd.DataFrame(pd.read_csv('featuresdf.csv'))

Histogram

I’ll need to import the matplotlib package:

import matplotlib.pyplot as plt
%matplotlib inline

To plot a histogram of the danceability and energy scores overlaid, I can use the following code:

#set figure
f, ax = plt.subplots(1,1)#graph histogram
plt.hist(df['danceability'], bins=10, alpha=0.5, color='purple', label='Danceability')
plt.hist(df['energy'], bins=10, alpha = 0.5, color='blue', label='Energy')#set legend
plt.legend(loc='upper right')#set title & axis titles
ax.set_title('Danceability Histogram', fontsize=20)
ax.set_xlabel('Danceability')
ax.set_ylabel('Frequency')#set x & y ranges
plt.xlim(0,1)
plt.ylim(0, 30)
plt.show()

Notice the sparse nature of this graph. However, once I run the following code, you can see how my graph improves:

import seaborn as sns
sns.set(style='darkgrid')

Seaborn allows us to add a nice backdrop to our plots and improves the font. You can set style = darkgrid, whitegrid, dark, white, and ticks. We can also plot the same graph using what seaborn calls the distplot:

f, ax = plt.subplots(1,1)sns.distplot(df['danceability'], bins=10, label='Danceability', color='purple')
sns.distplot(df['energy'], bins=10, label='Energy', color='blue')ax.set_title('Danceability & Energy Histogram', fontsize=20)
ax.set(xlabel='Rating', ylabel='Frequency')ax.set_xlim([0, 1])ax.legend()

Almost exactly the same, right? Seaborn is built on matplotlib, so you can use them concurrently. Seaborn simply has its own library of graphs, and has pleasant formatting built in. However, it does not have all of the same capabilities of matplotlib. For instance, if you want to create the same histogram, but with the two variables stacked next to each other as opposed to overlaid, you would need to fall back to matplotlib:

#set figure
f, ax = plt.subplots(1,1)#next to each other
plt.hist([df['danceability'], df['energy']], bins=10, alpha=0.5, color=['red', 'blue'], label = ['Danceability', 'Energy'])#set legend
plt.legend(loc='upper right')#set title & axis titles
ax.set_title('Danceability & Energy Histogram', fontsize=20)
ax.set_xlabel('Rating')
ax.set_ylabel('Frequency')#set x & y ranges
plt.xlim(0,1)
plt.ylim(0, 30)plt.show()

Seaborn’s built in features for its graphs can be helpful, but they can be limiting if you want to further customize your graph.

Matplotlib and Seaborn may be the most commonly used data visualization packages, but there is a simpler method that produces superior graphs than either of these: Plotly. To get started in a jupyter notebook, run the code below:

pip install chart-studioconda install -c plotly chart-studio# Standard plotly imports
from chart_studio.plotly import plot, iplot as py
import plotly.graph_objects as go
from plotly.offline import iplot, init_notebook_mode# Using plotly + cufflinks in offline mode
import cufflinks
cufflinks.go_offline(connected=True)
init_notebook_mode(connected=True)

To plot the same overlaid histogram as above using default Plotly settings:

fig = df[['danceability', 'energy']].iplot(kind='hist', color=['purple', 'blue'], xTitle='Danceability',
                  yTitle='Frequency', title='Danceability Histogram')

Plotly graphs are automatically outfitted with hover tool capabilities — hovering your mouse over any of the bars of data will display the numerical values.

To plot the bars side by side or otherwise further customize the graph, the code is lengthier, but fairly intuitive. You can specify your desired theme from a growing list of available default themes, including one modeled after seaborn (used below).

#install themes & view available
import plotly.io as pio
pio.templates

You can also specify your colors using the default color codes below:

And finally, plot your graph:

#plot
trace1 = go.Histogram(
    x=df['danceability'],
    name='danceability', #name used in legend and hover labels
    xbins=dict( #bins used for histogram
        start=0,
        end=10,
        size=0.1
    ),
    marker=dict(
        color='#1f77b4',
    ),
    opacity=0.75
)
trace2 = go.Histogram(
    x=df['energy'],
    name='energy', #name used in legend and hover labels
    xbins=dict( #bins used for histogram
        start=0,
        end=10,
        size=0.1
    ),
    marker=dict(
        color='#9467bd'
    ),
    opacity=0.75
)
data = [trace1, trace2]layout = go.Layout(template='seaborn', #set theme
    title='Danceability & Energy Histogram',
    xaxis=dict(
        title='Danceability & Energy'
    ),
    yaxis=dict(
        title='Frequency'
    ),
    bargap=0.2,
    bargroupgap=0.1
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='styled histogram')

Scatterplot

To plot the loudness score vs. valence in matplotlib:

#set figure
f, ax = plt.subplots(1,1)#plot
plt.scatter(df['loudness'], df['valence'], s=df['energy']*100)#set title & labels
plt.title('Scatterplot: Loudness vs. Valence', fontsize=20)
plt.xlabel('Loudness')
plt.ylabel('Positivity')#set x range
ax.set_xlim([0, -10])plt.show()

In seaborn:

fig = sns.scatterplot(x=df['loudness'], y=df['valence'], size = df['energy'],sizes = (40,200))
fig.figure.suptitle("Scatterplot: Loudness vs. Valence", fontsize = 25)
fig.set(xlabel='Loudness', ylabel='Positivity')
fig.set_xlim([0,-10])

If you want to add a regression line to the graph, seaborn makes this infinitely easier with its regplot graph:

fig = sns.regplot(df['loudness'], y=df['valence'], data=df)
fig.figure.suptitle("Scatterplot: Loudness vs. Valence", fontsize = 25)
fig.set(xlabel='Loudness', ylabel='Positivity')
fig.set_xlim([0,-10])

To add the correlation coefficient to this, import the pearson.r package from scipy and follow the steps below:

import numpy as np
from scipy.stats import pearsonr#calculate correlation coefficient
corr = pearsonr(df['loudness'], df['valence'])
corr = [np.round(c, 2) for c in corr]#add the coefficient to your graph
text = 'r=%s, p=%s' % (corr[0], corr[1])
ax = sns.regplot(x="loudness", y="valence", data=df)
ax.text(-7.5, 0.9, text, fontsize=12)

Lastly, with Plotly, we can again create a scatterplot using the default settings:

fig = go.Figure(data=go.Scatter(x=df[‘loudness’], y=df[‘valence’],mode=’markers’))
fig.update_layout(title=’Loudness vs. Valence (Positivity)’)
fig.layout.template = ‘seaborn’fig.show()

By adding another trace called ‘lineOfBestFit’ and calculating the regression using numpy, we can plot the regression line:

dataPoints = go.Scattergl(
    x=df.loudness,
    y=df.valence,
    mode='markers',
    marker=dict(
        opacity=1,
        line=dict(
            color='white'
        )
    ),
    name='Data points'
)data=[dataPoints]layout.update(
    yaxis=dict(
        title='Energy'),
    xaxis=dict(
        title='Danceability'
    )
)figure.update(
    data=data,
    layout=layout
)m,b = np.polyfit(df.loudness, df.valence, 1)
bestfit_y = (df.loudness * m + b)lineOfBestFit=go.Scattergl(
    x=df.loudness,
    y=bestfit_y,
    name='Line of best fit',
    line=dict(
        color='blue',
    )
)data=[dataPoints, lineOfBestFit]
figure = go.Figure(data=data, layout=layout)figure.update_xaxes(autorange="reversed")
figure.layout.template = 'plotly_dark'iplot(figure)

These are you just two of the multitude of graphs available through seaborn and plotly libraries. Both seaborn and plotly create visually appealing graphs, but plotly allows for endless customization and interactivity with fairly intuitive syntax, making it a popular tool among data scientists.

References:

Plotly Python Graphing Library

Plotly's Python graphing library makes interactive, publication-quality graphs. Examples of how to make line plots…

plot.ly

Example gallery - seaborn 0.9.0 documentation

Edit description

seaborn.pydata.org

The Next Level of Data Visualization in Python

How to make great-looking, fully-interactive plots with a single line of Python

towardsdatascience.com

Plotly Experiments — Scatterplots

A deep dive into scatterplots using Plotly

towardsdatascience.com

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to miss. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

More from Towards Data Science

Your home for data science. A Medium publication sharing concepts, ideas and codes.

Chris Kuo/Dr. Dataman

·Nov 20, 2019

Convolutional Autoencoders for Image Noise Reduction

In “Anomaly Detection with Autoencoders Made Easy” I mentioned that Autoencoders have been widely applied in dimension reduction and image noise reduction. Since then many readers have asked if I can cover the topic of image noise reduction using autoencoders. That is the motivation for this post. Modeling image data…

Machine Learning

10 min read

Convolutional Autoencoders for Image Noise Reduction

Share your ideas with millions of readers.

Write on Medium

Morten Hegewald

·Nov 20, 2019

Marketing Channel Attribution with Markov Chains in Python — Part 2: The Complete Walkthrough

Markov chains, in the context of channel attribution, gives us a framework to statistically model user journeys and how each channel factors into the users traveling from one channel to another to eventually convert (or not). …

Data Science

5 min read

Marketing Channel Attribution with Markov Chains in Python — Part 2: The Complete Walkthrough

Wallyson De Oliveira

·Nov 20, 2019

Detecting stock market crashes with topological data analysis

Written by Wallyson Lemes De Oliveira, Lewis Tunstall, Umberto Lupo, and Anibal Medina-Mardones — As long as there will be financial markets, there will be financial crashes. …

Machine Learning

7 min read

Detecting stock market crashes with topological data analysis

Dilyan Kovachev

·Nov 20, 2019

EPL Fantasy GW12 Recap and GW13 Algorithm Picks

Our Moneyball approach to the Fantasy EPL (team_id: 2057677) — If this is the first time you land on one of my Fantasy EPL Blogs, you might want to check out Part1, Part2, Part3, Part5, and Part9 first to get familiar with our overall approach and the improvements we’ve made over time. …

Data Science

8 min read

EPL Fantasy GW12 Recap and GW13 Algorithm Picks

Simon Hawe

·Nov 20, 2019

How to Build Slim Docker Images Fast

Do you remember those days, when you wrote awesome software but you couldn’t install it on someone else’s machine or it crashed there? Though this is never a nice experience, we could always say

Docker

7 min read

Matplotlib vs. Seaborn vs. Plotly

How can you amplify your data visualizations?

Histogram

Scatterplot

Plotly Python Graphing Library

Plotly's Python graphing library makes interactive, publication-quality graphs. Examples of how to make line plots…

Example gallery - seaborn 0.9.0 documentation

Edit description

The Next Level of Data Visualization in Python

How to make great-looking, fully-interactive plots with a single line of Python

Plotly Experiments — Scatterplots

A deep dive into scatterplots using Plotly

Sign up for The Variable

By Towards Data Science

More from Towards Data Science

Convolutional Autoencoders for Image Noise Reduction

Marketing Channel Attribution with Markov Chains in Python — Part 2: The Complete Walkthrough

Detecting stock market crashes with topological data analysis

EPL Fantasy GW12 Recap and GW13 Algorithm Picks

How to Build Slim Docker Images Fast

Recommended from Medium

Resources to Supercharge your Data Science in 2020

How To Create Datasets From Wikipedia Tables

Perform genome-wide association analysis to identify exome variants in Alzheimer's disease in UK…

Theoretical and Practical Aspect of Market Basket Analysis

Introduction to Interactive Time Series Visualizations with Plotly in Python

745. Prefix and Suffix Search

Error Analysis — Precision & Recall

Understanding Clustering

Get the Medium app

Clare Blessen

More from Medium

The Top 5 Pandas Optimization Methods You Should Know!

20 Powerful Statistical Functions for Data Analysis 📊

Part 1: Beginner Friendly Data Visualization using Python, Pandas and Plotly

Creating Various Plot Types and Subplots with Plotly