You have 2 free member-only stories left this month.

Matplotlib vs. Seaborn vs. Plotly

How can you amplify your data visualizations?

Clear, effective data visualization is key to optimizing your ability to convey findings. With various packages in use such as Matplotlib, Seaborn, and Plotly, knowing the capabilities of each and the syntax behind them can become bewildering. I’m going to walk you through creating some common graphs in Python using each of these packages using a csv file of the 2017 Spotify top tracks.

First, i’ll import the pandas package to read my csv into an easily readable dataframe.

import pandas as pd
df = pd.DataFrame(pd.read_csv('featuresdf.csv'))

Histogram

I’ll need to import the matplotlib package:

import matplotlib.pyplot as plt
%matplotlib inline

To plot a histogram of the danceability and energy scores overlaid, I can use the following code:

#set figure
f, ax = plt.subplots(1,1)
#graph histogram
plt.hist(df['danceability'], bins=10, alpha=0.5, color='purple', label='Danceability')
plt.hist(df['energy'], bins=10, alpha = 0.5, color='blue', label='Energy')
#set legend
plt.legend(loc='upper right')
#set title & axis titles
ax.set_title('Danceability Histogram', fontsize=20)
ax.set_xlabel('Danceability')
ax.set_ylabel('Frequency')
#set x & y ranges
plt.xlim(0,1)
plt.ylim(0, 30)

plt.show()

Notice the sparse nature of this graph. However, once I run the following code, you can see how my graph improves:

import seaborn as sns
sns.set(style='darkgrid')

Seaborn allows us to add a nice backdrop to our plots and improves the font. You can set style = darkgrid, whitegrid, dark, white, and ticks. We can also plot the same graph using what seaborn calls the distplot:

f, ax = plt.subplots(1,1)sns.distplot(df['danceability'], bins=10, label='Danceability', color='purple')
sns.distplot(df['energy'], bins=10, label='Energy', color='blue')
ax.set_title('Danceability & Energy Histogram', fontsize=20)
ax.set(xlabel='Rating', ylabel='Frequency')
ax.set_xlim([0, 1])ax.legend()

Almost exactly the same, right? Seaborn is built on matplotlib, so you can use them concurrently. Seaborn simply has its own library of graphs, and has pleasant formatting built in. However, it does not have all of the same capabilities of matplotlib. For instance, if you want to create the same histogram, but with the two variables stacked next to each other as opposed to overlaid, you would need to fall back to matplotlib:

#set figure
f, ax = plt.subplots(1,1)
#next to each other
plt.hist([df['danceability'], df['energy']], bins=10, alpha=0.5, color=['red', 'blue'], label = ['Danceability', 'Energy'])
#set legend
plt.legend(loc='upper right')
#set title & axis titles
ax.set_title('Danceability & Energy Histogram', fontsize=20)
ax.set_xlabel('Rating')
ax.set_ylabel('Frequency')
#set x & y ranges
plt.xlim(0,1)
plt.ylim(0, 30)
plt.show()

Seaborn’s built in features for its graphs can be helpful, but they can be limiting if you want to further customize your graph.

Matplotlib and Seaborn may be the most commonly used data visualization packages, but there is a simpler method that produces superior graphs than either of these: Plotly. To get started in a jupyter notebook, run the code below:

pip install chart-studioconda install -c plotly chart-studio# Standard plotly imports
from chart_studio.plotly import plot, iplot as py
import plotly.graph_objects as go
from plotly.offline import iplot, init_notebook_mode
# Using plotly + cufflinks in offline mode
import cufflinks
cufflinks.go_offline(connected=True)
init_notebook_mode(connected=True)

To plot the same overlaid histogram as above using default Plotly settings:

fig = df[['danceability', 'energy']].iplot(kind='hist', color=['purple', 'blue'], xTitle='Danceability',
yTitle='Frequency', title='Danceability Histogram')

Plotly graphs are automatically outfitted with hover tool capabilities — hovering your mouse over any of the bars of data will display the numerical values.

To plot the bars side by side or otherwise further customize the graph, the code is lengthier, but fairly intuitive. You can specify your desired theme from a growing list of available default themes, including one modeled after seaborn (used below).

#install themes & view available
import plotly.io as pio
pio.templates

You can also specify your colors using the default color codes below:

And finally, plot your graph:

#plot
trace1 = go.Histogram(
x=df['danceability'],
name='danceability', #name used in legend and hover labels
xbins=dict( #bins used for histogram
start=0,
end=10,
size=0.1
),
marker=dict(
color='#1f77b4',
),
opacity=0.75
)
trace2 = go.Histogram(
x=df['energy'],
name='energy', #name used in legend and hover labels
xbins=dict( #bins used for histogram
start=0,
end=10,
size=0.1
),
marker=dict(
color='#9467bd'
),
opacity=0.75
)
data = [trace1, trace2]
layout = go.Layout(template='seaborn', #set theme
title='Danceability & Energy Histogram',
xaxis=dict(
title='Danceability & Energy'
),
yaxis=dict(
title='Frequency'
),
bargap=0.2,
bargroupgap=0.1
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='styled histogram')

Scatterplot

To plot the loudness score vs. valence in matplotlib:

#set figure
f, ax = plt.subplots(1,1)
#plot
plt.scatter(df['loudness'], df['valence'], s=df['energy']*100)
#set title & labels
plt.title('Scatterplot: Loudness vs. Valence', fontsize=20)
plt.xlabel('Loudness')
plt.ylabel('Positivity')
#set x range
ax.set_xlim([0, -10])
plt.show()

In seaborn:

fig = sns.scatterplot(x=df['loudness'], y=df['valence'], size = df['energy'],sizes = (40,200))
fig.figure.suptitle("Scatterplot: Loudness vs. Valence", fontsize = 25)
fig.set(xlabel='Loudness', ylabel='Positivity')
fig.set_xlim([0,-10])

If you want to add a regression line to the graph, seaborn makes this infinitely easier with its regplot graph:

fig = sns.regplot(df['loudness'], y=df['valence'], data=df)
fig.figure.suptitle("Scatterplot: Loudness vs. Valence", fontsize = 25)
fig.set(xlabel='Loudness', ylabel='Positivity')
fig.set_xlim([0,-10])

To add the correlation coefficient to this, import the pearson.r package from scipy and follow the steps below:

import numpy as np
from scipy.stats import pearsonr
#calculate correlation coefficient
corr = pearsonr(df['loudness'], df['valence'])
corr = [np.round(c, 2) for c in corr]
#add the coefficient to your graph
text = 'r=%s, p=%s' % (corr[0], corr[1])
ax = sns.regplot(x="loudness", y="valence", data=df)
ax.text(-7.5, 0.9, text, fontsize=12)

Lastly, with Plotly, we can again create a scatterplot using the default settings:

fig = go.Figure(data=go.Scatter(x=df[‘loudness’], y=df[‘valence’],mode=’markers’))
fig.update_layout(title=’Loudness vs. Valence (Positivity)’)
fig.layout.template = ‘seaborn’
fig.show()

By adding another trace called ‘lineOfBestFit’ and calculating the regression using numpy, we can plot the regression line:

dataPoints = go.Scattergl(
x=df.loudness,
y=df.valence,
mode='markers',
marker=dict(
opacity=1,
line=dict(
color='white'
)
),
name='Data points'
)
data=[dataPoints]layout.update(
yaxis=dict(
title='Energy'),
xaxis=dict(
title='Danceability'
)
)
figure.update(
data=data,
layout=layout
)
m,b = np.polyfit(df.loudness, df.valence, 1)
bestfit_y = (df.loudness * m + b)
lineOfBestFit=go.Scattergl(
x=df.loudness,
y=bestfit_y,
name='Line of best fit',
line=dict(
color='blue',
)
)
data=[dataPoints, lineOfBestFit]
figure = go.Figure(data=data, layout=layout)
figure.update_xaxes(autorange="reversed")
figure.layout.template = 'plotly_dark'
iplot(figure)

These are you just two of the multitude of graphs available through seaborn and plotly libraries. Both seaborn and plotly create visually appealing graphs, but plotly allows for endless customization and interactivity with fairly intuitive syntax, making it a popular tool among data scientists.

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to miss. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Your home for data science. A Medium publication sharing concepts, ideas and codes.

Share your ideas with millions of readers.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store