plotly is free for unlimited public use
SIGN UP FOR FREE
Sensitive data? Upgrade to a paid plan or trial plotly offline

IPython Notebook Tutorial

IPython has a beautiful Notebook that lets you write and execute code, analyze data, embed content, and share reproducible work. IPython lets you easily share your code, data, plots, and explanation in one Notebook. Publishing is flexible: PDF, HTML, ipynb, dashboards, slides, and more. Code cells are based on an input and output format. For example:

In [1]:
print "hello world"  
hello world

Table Of Contents

This tutorial will cover:

  • Introduction: installation, helpful commands, package management
  • SciPy Stack: SciPy, Numpy, pandas
  • Plotting: matplotlib, mpld3, pandas, maps, 3D plotting, ggplot for Python, Seaborn, ggplot2, and R
  • Embedding: iframes, videos, LaTeX
  • Publishing: files, widgets, dashboards, styling

I. Introduction

There are a few ways to use an IPython Notebook. For Windows users, you'll need setuptools. This Notebook uses Python 2; Python 3 offers exciting new opportunities for IPython.

  • If you have setuptools or pip installed, you can open a terminal and type: $ pip install ipython.
  • Anaconda and Enthought allow you to download a desktop version of IPython Notebook.
  • coLaboratory allows you to run Notebooks using Google Chrome.
  • Docker containers let you run Notebooks.
  • Domino, Authorea, and Wakari offer web-based Notebooks.
  • tmpnb launches a temporary online Notebook for individual users.
  • Hosted services like Sciencebox let users launch a prebuilt virtual machine.


Once you've installed the Notebook, you start from your terminal by calling $ ipython notebook. This will open a browser on a localhost to the URL of your Notebooks, by default http://127.0.0.1:8888. Windows users need to open up their Command Prompt. You'll see a dashboard with all your Notebooks. You can launch your Notebooks from there. The Notebook has the advantage of looking the same when you're coding and publishing. You just have all the options to move code, run cells, change kernels, and use Markdown when you're running a NB. Here's what a Notebook looks like in action if you call it from your terminal.




Helpful commands

IPython supports tab completion. You can type object_name.<TAB> to view an object’s attributes. For tips on cell magics, running Notebooks, and exploring objects, check out the IPython docs. IPython also has a few helpful commands like:

In [2]:
help # introduction and overview of features
Out[2]:
Type help() for interactive help, or help(object) for help about object.
In [3]:
%quickref # opens quick reference

The Notebook defaults to C:\Users\USERNAME or the folder where you've run the Notebook. You can call ipython notebook --help-all then change your directory if needed. You can also use %run to run local Python scripts you've written.


IPython has keyboard shortcuts. Shift-Enter will run a cell, Ctrl-Enter will run a cell in-place, Alt-Enter will run a cell and insert another below.

Package management

When installing packages in IPython, you either need to install the package in your actual shell, or run the ! prefix, e.g.:

!pip install packagename

You may want to reload submodules if you've edited the code in one. IPython comes with automatic reloading magic. You can reload all changed modules before executing a new line.

%load_ext autoreload
%autoreload 2

II. SciPy Stack

SciPy & NumPy

SciPy is a Python-based ecosystem of packages for math, science, and engineering. We'll show a quick example below. NumPy is a package for scientific computing with tools for algebra, random number generation, integrating with databases, and managing data. NumPy functions have been imported into the SciPy namespace so users don't have to differentiate between them.

In [4]:
import scipy as sp
import numpy as np
In [5]:
s = sp.randn(100)
print len(s)
100

We can print the mean of our data into our Notebook. Since the returned value is a NumPy array we can find and print descriptive statistics about the random numbers we created with SciPy.

In [6]:
print("Mean : {0:8.6f}".format(s.mean()))
Mean : 0.171686

Arrays are the central part of NumPy and are more efficient lists of Python. The elements of a NumPy array have to be of the same type, usually float or int.

In [7]:
x = np.array([42,47,11], int)
x = np.array([42,47,11], int)
x
Out[7]:
array([42, 47, 11])

pandas

If you have a tabular data structure, pandas is the way to go. Dataframes are easy to make, and handle data better than Python lists or tuples. The 10 minutes to pandas guide is a good introduction and a useful exercise. So is Michael Hansen's tutorial.

In [8]:
import pandas as pd

One can create a dataframe in pandas with the handy functions in read_csv. You can read from a URL or a local file. Here we'll make a dataframe from a matplotlib plot we'll make below with Plotly. You can append .py, .r, .m, .jl, .json, .js, .png, .pdf, .png, .svg, .embed, .xlsx, and .csv to any Plotly URL to see extensions of the figure.

In [9]:
df = pd.read_csv("https://plot.ly/~MattSundquist/20387.csv")

We can describe the dataframe we've just created.

In [10]:
df.describe() # 
Out[10]:
line0_volts line0_time, line1_time, line3_time line1_volts line2_volts line2_time line3_volts
count 200.000000 200.000000 2.000000e+02 20.000000 20.000000 2.000000e+02
mean 0.434498 0.995000 -1.665335e-18 0.620175 0.950000 6.708533e-02
std 0.243705 0.578792 7.088812e-01 0.324605 0.591608 3.402314e-01
min 0.136695 0.000000 -1.000000e+00 0.000000 0.000000 -4.781305e-01
25% 0.224812 0.497500 -6.956525e-01 0.388217 0.475000 -1.651344e-01
50% 0.369728 0.995000 6.123234e-17 0.667501 0.950000 3.713929e-17
75% 0.608055 1.492500 6.956525e-01 0.885674 1.425000 2.722606e-01
max 1.000000 1.990000 1.000000e+00 1.064711 1.900000 7.883040e-01

We can also examine how many rows we have. Calling the columns property gives you columns.

In [11]:
len(df)
Out[11]:
200

Calling .head() will print the first five rows and column headers by default, or we can specify a number.

In [12]:
df.head()  # examine dataframe
Out[12]:
line0_volts line0_time, line1_time, line3_time line1_volts line2_volts line2_time line3_volts
0 1.000000 0.00 0.000000 0.000000 0.0 0.000000
1 0.990050 0.01 0.062791 0.095310 0.1 0.062166
2 0.980199 0.02 0.125333 0.182322 0.2 0.122851
3 0.970446 0.03 0.187381 0.262364 0.3 0.181843
4 0.960789 0.04 0.248690 0.336472 0.4 0.238939

We can rename our columns once we've made a dataframe.

In [13]:
df.columns = ["volts_1", "time_1", "volts_2", "volts_2",
                "time_2", "volts_4"]
In [14]:
df.head(1)
Out[14]:
volts_1 time_1 volts_2 volts_2 time_2 volts_4
0 1 0 0 0 0 0

We can use pandas for statistics and to examine our rows and columns.

In [15]:
df.volts_1.head()
Out[15]:
0    1.000000
1    0.990050
2    0.980199
3    0.970446
4    0.960789
Name: volts_1, dtype: float64
In [16]:
df.volts_1.std()
Out[16]:
0.24370527032639763

Most pandas functions also work on an entire dataframe. For example, calling std() calculates the standard deviation for each column.

In [17]:
df.std()
Out[17]:
volts_1    0.243705
time_1     0.578792
volts_2    0.708881
volts_2    0.324605
time_2     0.591608
volts_4    0.340231
dtype: float64

III. Plotting

matplotlib inline

You can use matplotlib inside your IPython Notebook by calling %matplotlib inline, which has the advantage of keeping your plots in one place. If you're having trouble running matplotlib, here are a few common solutions. %matplotlib inline activates the inline backend and calls images as static pngs. A new option--%matplotlib notebook--lets you interact with the plot in a Notebook. This works in IPython 3.x; for older IPython versions, use %matplotlib nbagg.

In [18]:
%matplotlib inline
import matplotlib.pyplot as plt # side-stepping mpl backend
import matplotlib.gridspec as gridspec # subplots
In [19]:
import plotly.plotly as py
from plotly.graph_objs import *
import plotly.tools as tls
In [20]:
fig1 = plt.figure()
# Make a legend for specific lines.
import matplotlib.pyplot as plt
import numpy as np


t1 = np.arange(0.0, 2.0, 0.1)
t2 = np.arange(0.0, 2.0, 0.01)

# note that plot returns a list of lines.  The "l1, = plot" usage
# extracts the first element of the list into l1 using tuple
# unpacking.  So l1 is a Line2D instance, not a sequence of lines
l1, = plt.plot(t2, np.exp(-t2))
l2, l3 = plt.plot(t2, np.sin(2 * np.pi * t2), '--go', t1, np.log(1 + t1), '.')
l4, = plt.plot(t2, np.exp(-t2) * np.sin(2 * np.pi * t2), 'rs-.')

plt.xlabel('time')
plt.ylabel('volts')
plt.title('Damped oscillation')

plt.show()

Now we can do a bit of interactive plotting. Head to the Plotly getting started page to get a key and install the API. Calling the plot with iplot automaticallly generates an interactive version of the matplotlib plot inside the Notebook in an iframe. You can control the privacy with sharing set to public, private, or secret. We'll use strip_style to apply the Plotly defaults. You can filter zoom by clicking and dragging and see text if you hover your mouse.

In [21]:
py.iplot_mpl(fig1, strip_style = True, filename='ipython/mpl_example')
Out[21]:

mpld3

We can use mpld3 to make interactive plots in the Notebook from matplotlib figures.

In [22]:
import mpld3
mpld3.display(fig1)
Out[22]:

We can make interactive plots from the same dataframe we made with pandas above.

In [23]:
volts_histogram_plot = [{'x': df['volts_1'], 
                 'type': 'histogram'
}]
In [24]:
data_histogram = Data(volts_histogram_plot)

fig_histogram = Figure(data=data_histogram)
In [25]:
py.iplot(fig_histogram, filename='pandas/volts_histogram')
Out[25]:
In [26]:
volts_jitter_plot = [{'y': df['volts_1'], 
                 'name': 'volts',
                 'type': 'box',
}]
In [27]:
data_jitter = Data(volts_jitter_plot)

fig_jitter = Figure(data=data_jitter)
In [28]:
py.iplot(fig_jitter, filename='pandas/volts_boxplot')
Out[28]:

For plotting directly from a dataframe, you can use cufflinks, a pandas plotting library. Click the legend items to toggle traces on and off.

In [29]:
import cufflinks as cf
In [30]:
import pandas.io.data as web
from datetime import datetime

start = datetime(2008, 1, 1)
end = datetime(2008, 11, 28)
df_gis = web.DataReader("GIS", 'yahoo', start, end)
df_fdo = web.DataReader("FDO", 'yahoo', start, end)
df_sp = web.DataReader("GSPC", 'yahoo', start, end)

df = pd.DataFrame({'General Mills': df_gis.Open, 'Family Dollar Stores': df_fdo.Open, 'S&P 500': df_sp.Open})
df.head()

df.iplot(kind='line', fill=True,
         yTitle='Open Price', title='Top Recession Stocks',
         filename='cufflinks/stock data', world_readable=True)
Out[30]:

Interactive maps

Let's make a map. We'll read in the data from another Plotly graph showing the number of electoral votes per state in the U.S. As before, if we add .csv to the end of the plot URL, we can use pandas to make a dataframe.

In [31]:
# Learn about API authentication here: https://plot.ly/python/getting-started
# Find your api_key here: https://plot.ly/settings/api

import plotly.plotly as py

df = pd.read_csv('https://plot.ly/~Dreamshot/5718/electoral-college-votes-by-us-state/.csv')

for col in df.columns:
    df[col] = df[col].astype(str)
In [32]:
df.head()
Out[32]:
y Source: <a href="https://en.wikipedia.org/wiki/Electoral_College_%28United_States%29">Wikipedia</a>
0 DE 3
1 VT 3
2 ND 3
3 SD 3
4 MT 3
In [33]:
df.columns = ["state", "votes"]  # change column names
In [34]:
df.head(1)
Out[34]:
state votes
0 DE 3

Now we can make an interactive D3.js graph directly from pandas. See the pandas maps documentation to learn more.

In [35]:
scl = [[0.0, 'rgb(242,240,247)'],[0.2, 'rgb(218,218,235)'],[0.4, 'rgb(188,189,220)'],\
            [0.6, 'rgb(158,154,200)'],[0.8, 'rgb(117,107,177)'],[1.0, 'rgb(84,39,143)']]

df['text'] = df['state'] 
    
data = [dict(
    type='choropleth',
    colorscale = scl,
    autocolorscale = False,
    locations = df['state'],
    z = df['votes'].astype(float),
    locationmode = 'USA-states',
    text = df['text'],
    hoverinfo = 'location+z',
    marker = dict(
        line = dict (
            color = 'rgb(255,255,255)',
            width = 2
        )
    ),
    colorbar = dict(
        title = "Votes"
    )
)]

layout = dict(
    title = '2016 Electoral College Votes<br>(Hover for breakdown)',
    geo = dict(
        scope='usa',
        projection=dict( type='albers usa' ),
        showlakes = True,
        lakecolor = 'rgb(255, 255, 255)'
    )
)
    
fig = dict(data=data, layout=layout)

py.iplot(fig, validate=False, filename='d3-electoral-map')
Out[35]:

3D Plotting

Using Numpy and Plotly, we can make interactive 3D plots in the Notebook as well.

In [36]:
import plotly.plotly as py
from plotly.graph_objs import *

import numpy as np

s = np.linspace(0, 2 * np.pi, 240)
t = np.linspace(0, np.pi, 240)
tGrid, sGrid = np.meshgrid(s, t)

r = 2 + np.sin(7 * sGrid + 5 * tGrid)  # r = 2 + sin(7s+5t)
x = r * np.cos(sGrid) * np.sin(tGrid)  # x = r*cos(s)*sin(t)
y = r * np.sin(sGrid) * np.sin(tGrid)  # y = r*sin(s)*sin(t)
z = r * np.cos(tGrid)                  # z = r*cos(t)

surface = Surface(x=x, y=y, z=z)
data = Data([surface])

layout = Layout(
    title='Parametric Plot',
    scene=Scene(
        xaxis=XAxis(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        yaxis=YAxis(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        zaxis=ZAxis(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        )
    )
)

fig = Figure(data=data, layout=layout)
py.iplot(fig, filename='Parametric_plot')
Out[36]:

3D GIF

Note the possible interactions.

ggplot for Python

ggplot for Python ports over ggplot2 syntax from R for Python users. The library has a tight integration with dataframes. We can similarly turn ggplot figures into Plotly graphs.

In [37]:
from ggplot import *
In [38]:
plot = ggplot(aes(x='date', y='beef'), data=meat) + \
    geom_line() 
In [39]:
fig = plot.draw() 
py.iplot_mpl(fig, filename='ipython/ggplot_plot')
Out[39]:

Seaborn

Seaborn is focused on statistical plotting and plot types. Here we show how you can combine plot types

In [40]:
from numpy.random import randn
from scipy import stats
import matplotlib as mpl
import seaborn as sns
In [41]:
fig16 = plt.figure()

sns.set_palette("hls")
mpl.rc("figure", figsize=(8, 4))
data = randn(200)
sns.distplot(data);

py.iplot_mpl(fig16, strip_style = True)
Out[41]:

Rmagic

Rmagic lets us run R in our Notebook and embed ggplot2 plots. You can execute code in R, and pull some of the results back into the Python namespace. The return value is determined when rpy2 returns the result of evaluating the final line. Multiple R lines can be executed by joining them with semicolons.

In [42]:
%load_ext rpy2.ipython
In [43]:
%R X=c(1,4,5,7); sd(X); mean(X)
Out[43]:
<FloatVector - Python:0x10cc9bef0 / R:0x7fa56fc5ef48>
[4.250000]
In [44]:
%%R 
library(ggplot2)
dsamp <- diamonds[sample(nrow(diamonds), 1000), ]
qplot(carat, price, data=dsamp, colour=clarity)
Need help? Try the ggplot2 mailing list: http://groups.google.com/group/ggplot2.

IV. Embedding

The Notebook allows us to embed iframes. For example, from YouTube.

In [45]:
from IPython.display import YouTubeVideo
In [46]:
YouTubeVideo("p86BPM1GV8M")
Out[46]:

We can embed graphs. The plotly ggplot2 figure converter lets us make the plot above into an interactive plot. Then you can call the plot in the NB.

In [47]:
tls.embed('https://plot.ly/~MattSundquist/20391/price-vs-carat/')
Out[47]:

Embedding LaTeX

We can embed LaTeX inside a Notebook by putting a $$ around our math, then run the cell as a Markdown cell. For example, the cell below is $$c = \sqrt{a^2 + b^2}$$, but the Notebook renders the expression.

$$c = \sqrt{a^2 + b^2}$$

Or, you can display output from Python, as seen here.

In [48]:
from IPython.display import display, Math, Latex
display(Math(r'F(k) = \int_{-\infty}^{\infty} f(x) e^{2\pi i k} dx'))
$$F(k) = \int_{-\infty}^{\infty} f(x) e^{2\pi i k} dx$$

V. Publishing

We can export the Notebook as an HTML, PDF, .py, .ipynb, Markdown, and reST file. You can also turn your NB into a slideshow. For publishing IPython Notebooks directly to GitHub pages, you can use publisher. You can publish Notebooks on GitHub, and they will be generated as a Notebook on nbviewer.ipython.org. More advanced users can consider using git for version control.

IPython widgets

IPython widgets allow you to add sliders, widgets, search boxes, and more to your Notebook. See the widget docs for more information. For others to be able to access your work, they'll need IPython. Or, you can use a cloud-based NB option so others can run your work.

Publishing Dash Apps

For users looking to ship and productionize Python apps, dash is an assemblage of Flask, Socketio, Jinja, Plotly and boiler plate CSS and JS for easily creating data visualization web-apps with your Python data analysis backend.

Publishing dashboards

Users publishing interactive graphs can also use dashboards.ly to arrange a plot with a drag and drop interface. These dashboards can be published, embedded, and shared.

For more IPython tutorials, see the IPython gallery.

IPython Notebook Gallery

At the end of a Notebook, we can style a plot with these three lines of code.

In [49]:
from IPython.core.display import HTML
import urllib2
HTML(urllib2.urlopen('http://bit.ly/1Bf5Hft').read())
Out[49]: