IPython has a beautiful Notebook that lets you write and execute code, analyze data, embed content, and share reproducible work. IPython lets you easily share your code, data, plots, and explanation in one Notebook. Publishing is flexible: PDF, HTML, ipynb, dashboards, slides, and more. Code cells are based on an input and output format. For example:
print "hello world"
This tutorial will cover:
There are a few ways to use an IPython Notebook. For Windows users, you'll need setuptools. This Notebook uses Python 2; Python 3 offers exciting new opportunities for IPython.
setuptools or pip installed, you can open a terminal and type: $ pip install ipython.
Once you've installed the Notebook, you start from your terminal by calling $ ipython notebook. This will open a browser on a localhost to the URL of your Notebooks, by default http://127.0.0.1:8888. Windows users need to open up their Command Prompt. You'll see a dashboard with all your Notebooks. You can launch your Notebooks from there. The Notebook has the advantage of looking the same when you're coding and publishing. You just have all the options to move code, run cells, change kernels, and use Markdown when you're running a NB. Here's what a Notebook looks like in action if you call it from your terminal.
IPython supports tab completion. You can type object_name.<TAB> to view an object’s attributes. For tips on cell magics, running Notebooks, and exploring objects, check out the IPython docs. IPython also has a few helpful commands like:
help # introduction and overview of features
%quickref # opens quick reference
The Notebook defaults to C:\Users\USERNAME or the folder where you've run the Notebook. You can call ipython notebook --help-all then change your directory if needed. You can also use %run to run local Python scripts you've written.
IPython has keyboard shortcuts. Shift-Enter will run a cell, Ctrl-Enter will run a cell in-place, Alt-Enter will run a cell and insert another below.
When installing packages in IPython, you either need to install the package in your actual shell, or run the ! prefix, e.g.:
!pip install packagename
You may want to reload submodules if you've edited the code in one. IPython comes with automatic reloading magic. You can reload all changed modules before executing a new line.
%load_ext autoreload
%autoreload 2
SciPy is a Python-based ecosystem of packages for math, science, and engineering. We'll show a quick example below. NumPy is a package for scientific computing with tools for algebra, random number generation, integrating with databases, and managing data. NumPy functions have been imported into the SciPy namespace so users don't have to differentiate between them.
import scipy as sp
import numpy as np
s = sp.randn(100)
print len(s)
We can print the mean of our data into our Notebook. Since the returned value is a NumPy array we can find and print descriptive statistics about the random numbers we created with SciPy.
print("Mean : {0:8.6f}".format(s.mean()))
Arrays are the central part of NumPy and are more efficient lists of Python. The elements of a NumPy array have to be of the same type, usually float or int.
x = np.array([42,47,11], int)
x = np.array([42,47,11], int)
x
If you have a tabular data structure, pandas is the way to go. Dataframes are easy to make, and handle data better than Python lists or tuples. The 10 minutes to pandas guide is a good introduction and a useful exercise. So is Michael Hansen's tutorial.
import pandas as pd
One can create a dataframe in pandas with the handy functions in read_csv. You can read from a URL or a local file. Here we'll make a dataframe from a matplotlib plot we'll make below with Plotly. You can append .py, .r, .m, .jl, .json, .js, .png, .pdf, .png, .svg, .embed, .xlsx, and .csv to any Plotly URL to see extensions of the figure.
df = pd.read_csv("https://plot.ly/~MattSundquist/20387.csv")
We can describe the dataframe we've just created.
df.describe() #
We can also examine how many rows we have. Calling the columns property gives you columns.
len(df)
Calling .head() will print the first five rows and column headers by default, or we can specify a number.
df.head() # examine dataframe
We can rename our columns once we've made a dataframe.
df.columns = ["volts_1", "time_1", "volts_2", "volts_2",
"time_2", "volts_4"]
df.head(1)
We can use pandas for statistics and to examine our rows and columns.
df.volts_1.head()
df.volts_1.std()
Most pandas functions also work on an entire dataframe. For example, calling std() calculates the standard deviation for each column.
df.std()
You can use matplotlib inside your IPython Notebook by calling %matplotlib inline, which has the advantage of keeping your plots in one place. If you're having trouble running matplotlib, here are a few common solutions. %matplotlib inline activates the inline backend and calls images as static pngs. A new option--%matplotlib notebook--lets you interact with the plot in a Notebook. This works in IPython 3.x; for older IPython versions, use %matplotlib nbagg.
%matplotlib inline
import matplotlib.pyplot as plt # side-stepping mpl backend
import matplotlib.gridspec as gridspec # subplots
import plotly.plotly as py
from plotly.graph_objs import *
import plotly.tools as tls
fig1 = plt.figure()
# Make a legend for specific lines.
import matplotlib.pyplot as plt
import numpy as np
t1 = np.arange(0.0, 2.0, 0.1)
t2 = np.arange(0.0, 2.0, 0.01)
# note that plot returns a list of lines. The "l1, = plot" usage
# extracts the first element of the list into l1 using tuple
# unpacking. So l1 is a Line2D instance, not a sequence of lines
l1, = plt.plot(t2, np.exp(-t2))
l2, l3 = plt.plot(t2, np.sin(2 * np.pi * t2), '--go', t1, np.log(1 + t1), '.')
l4, = plt.plot(t2, np.exp(-t2) * np.sin(2 * np.pi * t2), 'rs-.')
plt.xlabel('time')
plt.ylabel('volts')
plt.title('Damped oscillation')
plt.show()
Now we can do a bit of interactive plotting. Head to the Plotly getting started page to get a key and install the API. Calling the plot with iplot automaticallly generates an interactive version of the matplotlib plot inside the Notebook in an iframe. You can control the privacy with sharing set to public, private, or secret. We'll use strip_style to apply the Plotly defaults. You can filter zoom by clicking and dragging and see text if you hover your mouse.
py.iplot_mpl(fig1, strip_style = True, filename='ipython/mpl_example')
We can use mpld3 to make interactive plots in the Notebook from matplotlib figures.
import mpld3
mpld3.display(fig1)
We can make interactive plots from the same dataframe we made with pandas above.
volts_histogram_plot = [{'x': df['volts_1'],
'type': 'histogram'
}]
data_histogram = Data(volts_histogram_plot)
fig_histogram = Figure(data=data_histogram)
py.iplot(fig_histogram, filename='pandas/volts_histogram')
volts_jitter_plot = [{'y': df['volts_1'],
'name': 'volts',
'type': 'box',
}]
data_jitter = Data(volts_jitter_plot)
fig_jitter = Figure(data=data_jitter)
py.iplot(fig_jitter, filename='pandas/volts_boxplot')
For plotting directly from a dataframe, you can use cufflinks, a pandas plotting library. Click the legend items to toggle traces on and off.
import cufflinks as cf
import pandas.io.data as web
from datetime import datetime
start = datetime(2008, 1, 1)
end = datetime(2008, 11, 28)
df_gis = web.DataReader("GIS", 'yahoo', start, end)
df_fdo = web.DataReader("FDO", 'yahoo', start, end)
df_sp = web.DataReader("GSPC", 'yahoo', start, end)
df = pd.DataFrame({'General Mills': df_gis.Open, 'Family Dollar Stores': df_fdo.Open, 'S&P 500': df_sp.Open})
df.head()
df.iplot(kind='line', fill=True,
yTitle='Open Price', title='Top Recession Stocks',
filename='cufflinks/stock data', world_readable=True)
Let's make a map. We'll read in the data from another Plotly graph showing the number of electoral votes per state in the U.S. As before, if we add .csv to the end of the plot URL, we can use pandas to make a dataframe.
# Learn about API authentication here: https://plot.ly/python/getting-started
# Find your api_key here: https://plot.ly/settings/api
import plotly.plotly as py
df = pd.read_csv('https://plot.ly/~Dreamshot/5718/electoral-college-votes-by-us-state/.csv')
for col in df.columns:
df[col] = df[col].astype(str)
df.head()
df.columns = ["state", "votes"] # change column names
df.head(1)
Now we can make an interactive D3.js graph directly from pandas. See the pandas maps documentation to learn more.
scl = [[0.0, 'rgb(242,240,247)'],[0.2, 'rgb(218,218,235)'],[0.4, 'rgb(188,189,220)'],\
[0.6, 'rgb(158,154,200)'],[0.8, 'rgb(117,107,177)'],[1.0, 'rgb(84,39,143)']]
df['text'] = df['state']
data = [dict(
type='choropleth',
colorscale = scl,
autocolorscale = False,
locations = df['state'],
z = df['votes'].astype(float),
locationmode = 'USA-states',
text = df['text'],
hoverinfo = 'location+z',
marker = dict(
line = dict (
color = 'rgb(255,255,255)',
width = 2
)
),
colorbar = dict(
title = "Votes"
)
)]
layout = dict(
title = '2016 Electoral College Votes<br>(Hover for breakdown)',
geo = dict(
scope='usa',
projection=dict( type='albers usa' ),
showlakes = True,
lakecolor = 'rgb(255, 255, 255)'
)
)
fig = dict(data=data, layout=layout)
py.iplot(fig, validate=False, filename='d3-electoral-map')
Using Numpy and Plotly, we can make interactive 3D plots in the Notebook as well.
import plotly.plotly as py
from plotly.graph_objs import *
import numpy as np
s = np.linspace(0, 2 * np.pi, 240)
t = np.linspace(0, np.pi, 240)
tGrid, sGrid = np.meshgrid(s, t)
r = 2 + np.sin(7 * sGrid + 5 * tGrid) # r = 2 + sin(7s+5t)
x = r * np.cos(sGrid) * np.sin(tGrid) # x = r*cos(s)*sin(t)
y = r * np.sin(sGrid) * np.sin(tGrid) # y = r*sin(s)*sin(t)
z = r * np.cos(tGrid) # z = r*cos(t)
surface = Surface(x=x, y=y, z=z)
data = Data([surface])
layout = Layout(
title='Parametric Plot',
scene=Scene(
xaxis=XAxis(
gridcolor='rgb(255, 255, 255)',
zerolinecolor='rgb(255, 255, 255)',
showbackground=True,
backgroundcolor='rgb(230, 230,230)'
),
yaxis=YAxis(
gridcolor='rgb(255, 255, 255)',
zerolinecolor='rgb(255, 255, 255)',
showbackground=True,
backgroundcolor='rgb(230, 230,230)'
),
zaxis=ZAxis(
gridcolor='rgb(255, 255, 255)',
zerolinecolor='rgb(255, 255, 255)',
showbackground=True,
backgroundcolor='rgb(230, 230,230)'
)
)
)
fig = Figure(data=data, layout=layout)
py.iplot(fig, filename='Parametric_plot')
Note the possible interactions.
ggplot for Python ports over ggplot2 syntax from R for Python users. The library has a tight integration with dataframes. We can similarly turn ggplot figures into Plotly graphs.
from ggplot import *
plot = ggplot(aes(x='date', y='beef'), data=meat) + \
geom_line()
fig = plot.draw()
py.iplot_mpl(fig, filename='ipython/ggplot_plot')
Seaborn is focused on statistical plotting and plot types. Here we show how you can combine plot types
from numpy.random import randn
from scipy import stats
import matplotlib as mpl
import seaborn as sns
fig16 = plt.figure()
sns.set_palette("hls")
mpl.rc("figure", figsize=(8, 4))
data = randn(200)
sns.distplot(data);
py.iplot_mpl(fig16, strip_style = True)
Rmagic lets us run R in our Notebook and embed ggplot2 plots. You can execute code in R, and pull some of the results back into the Python namespace. The return value is determined when rpy2 returns the result of evaluating the final line. Multiple R lines can be executed by joining them with semicolons.
%load_ext rpy2.ipython
%R X=c(1,4,5,7); sd(X); mean(X)
%%R
library(ggplot2)
dsamp <- diamonds[sample(nrow(diamonds), 1000), ]
qplot(carat, price, data=dsamp, colour=clarity)
The Notebook allows us to embed iframes. For example, from YouTube.
from IPython.display import YouTubeVideo
YouTubeVideo("p86BPM1GV8M")
We can embed graphs. The plotly ggplot2 figure converter lets us make the plot above into an interactive plot. Then you can call the plot in the NB.
tls.embed('https://plot.ly/~MattSundquist/20391/price-vs-carat/')
We can embed LaTeX inside a Notebook by putting a $$ around our math, then run the cell as a Markdown cell. For example, the cell below is $$c = \sqrt{a^2 + b^2}$$, but the Notebook renders the expression.
Or, you can display output from Python, as seen here.
from IPython.display import display, Math, Latex
display(Math(r'F(k) = \int_{-\infty}^{\infty} f(x) e^{2\pi i k} dx'))
We can export the Notebook as an HTML, PDF, .py, .ipynb, Markdown, and reST file. You can also turn your NB into a slideshow. For publishing IPython Notebooks directly to GitHub pages, you can use publisher. You can publish Notebooks on GitHub, and they will be generated as a Notebook on nbviewer.ipython.org. More advanced users can consider using git for version control.
IPython widgets allow you to add sliders, widgets, search boxes, and more to your Notebook. See the widget docs for more information. For others to be able to access your work, they'll need IPython. Or, you can use a cloud-based NB option so others can run your work.
For users looking to ship and productionize Python apps, dash is an assemblage of Flask, Socketio, Jinja, Plotly and boiler plate CSS and JS for easily creating data visualization web-apps with your Python data analysis backend.
Users publishing interactive graphs can also use dashboards.ly to arrange a plot with a drag and drop interface. These dashboards can be published, embedded, and shared.
For more IPython tutorials, see the IPython gallery.
At the end of a Notebook, we can style a plot with these three lines of code.
from IPython.core.display import HTML
import urllib2
HTML(urllib2.urlopen('http://bit.ly/1Bf5Hft').read())