You have 2 free member-only stories left this month.

Programming

Version Control With Jupyter Notebook

A step-by-step guide to Jupytext

Photo by Mimi Garcia on Unsplash
Table of ContentsIntroduction1. Creating a demo repo
2. Jupytext setup
3. Converting to a python file
4. Converting multiple files
5. Converted file
6. Adding ipynb to .gitignore
7. Converting to ipynb files
8. Other commands
9. Paired notebooks
Conclusion

Introduction

Jupyter notebook generates files that contain metadata, source code, formatted text, and rich media. Only one word of change results in thousands of letters in git diff.

Jupytext can save Jupyter Notebook to a git-friendly and human-friendly file format, including Markdown, Python, Julia, Bash, Clojure, Matlab, TypeScript, Javascript, etc.

It also converts these documents into Jupyter Notebooks. In this article, I am going through a step-by-step guide to version control for Jupyter Notebook using Jupytext.

If you are not using Github’s ipynb rendering, Nbviewer or Binder, then Jupytext should be your choice of version control.

Supported extensions are:

Creating a demo repo

First, let’s create a new Jupyter Notebook file with the following codes.

x = np.arange(-3, 3, 0.1)
y = np.sin(x)
plt.plot(x, y)
plt.show()

Please create a Github repo.

echo "# jupyter_notebook_version_control" >> README.md
git init
git add README.md
git commit -m "first commit"
git remote add origin git@github.com:username/jupyter_notebook_version_control.git
git push -u origin master

I change the word `sin` to `cos` in an ipynb file.

y = np.cos(x)

This link is the result of `git diff`. It generated thousands of letters for three letters.

Jupytext setup

Let’s install and setup the Jupytext.

pip install jupytext --upgrade

Or for conda users

conda install -c conda-forge jupytext

RESTART Jupyter Notebook.

Converting to a python file

You can convert an ipynb file to one of the supported files. I will use a python file in this article.

In your terminal, you can run like this.

jupytext --to py <your-file-name>.ipynb

For my case:

jupytext --to py Version_control.ipynb

Outputs:

[jupytext] Reading ./Version_control.ipynb
[jupytext] Writing ./Version_control.py

Converting multiple files

Let’s convert all ipynb files at once. Please create more files in your directory.

jupytext --to py *.ipynb

Output:

[jupytext] Reading Version_control.ipynb
[jupytext] Writing Version_control.py
[jupytext] Reading sine.ipynb
[jupytext] Writing sine.py
[jupytext] Reading tangent.ipynb
[jupytext] Writing tangent.py

You can convert a file into a directory. Jupytext will create a new directory if it does not exist.

jupytext --to destination_folder//py *.ipynb

Notes:

If you prefer you can run jupytext in one of the cells. But this cell will be in your converted file as well.

!jupytext --to py <your-file-name>.ipynb

Converted file

Let’s see the converted file in your terminal.

cat Version_control.py

My output:

# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: light
# format_version: '1.5'
# jupytext_version: 1.3.3
# kernelspec:
# display_name: Python 3
# language: python
# name: python3
# ---

x = np.arange(-3, 3, 0.1)
y = np.cos(x)
plt.plot(x, y)
plt.show()

It is very compact and the file size is very small. Nice 😃 👏👏👏👏.

Adding ipynb to .gitignore

Since we are not tracking ipynb files, we can add it to a .gitignore file. Please create a .gitignore in your project root directory where you have .git directory.

touch .gitignore

Please add *.ipynb and ` .ipynb_checkpoints` to ignore all Jupyter Notebook files. Or add this complete list to your gitignore.

# for Jupytext ignoring ipynb files
*.ipynb

At this stage, git will still track changes in .ipynb files. To fix this you need to remove git cache and add all files again.

git rm -r --cached .
git add .
git commit -m "fixed untracked files"

After changing a line in your Jupyter Notebook to see if .gitignore is working.

# change whatever you want
y = np.arange(-2,2,0.1)

Check it in your terminal:

git status

It should not return a modified file. Let’s run Jupytext one more time to reflect on our change. Please run the following in your terminal.

jupytext --to py Version_control.ipynb

The converted file will be replaced. 😃

[jupytext] Reading ./Version_control.ipynb
[jupytext] Writing ./Version_control.py (destination file replaced)

Let’s check the git status.

git status

On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)

modified: Version_control.py

no changes added to commit (use "git add" and/or "git commit -a")

It tracked only the python file, not ipynb. Please run git diff.

git diff
diff --git a/Version_control.py b/Version_control.py
index 02d91ea..6522717 100644
--- a/Version_control.py
+++ b/Version_control.py
@@ -14,6 +14,7 @@
# ---

x = np.arange(-3, 3, 0.1)
+y = np.arange(-2,2,0.1)
y = np.cos(x)
plt.plot(x, y)
plt.show()

Please add, commit and push the change.

git add .
git commit -m "Update"
git push

Converting to ipynb files

We are going to clone this repo to another directory and convert it to an ipynb file.

cd ..
git clone git@github.com:shinokada/jupyter_notebook_version_control.git my-new-dir

I cloned my repo to a directory called my-new-dir.

cd my-new-dir
ls
README.md Version_control.py sine.py tangent.py

Or if you have the tree.

tree
.
├── README.md
├── Version_control.py
├── sine.py
└── tangent.py

0 directories, 4 files

We have all the files we need. Let’s convert it to ipynb file.

From your terminal:

jupytext --to ipynb *.py

Output:

[jupytext] Reading Version_control.py
[jupytext] Writing Version_control.ipynb
[jupytext] Sync timestamp of 'Version_control.py'
[jupytext] Reading sine.py
[jupytext] Writing sine.ipynb
[jupytext] Reading tangent.py
[jupytext] Writing tangent.ipynb
ls
README.md Version_control.py sine.py tangent.py. Version_control.ipynb sine.ipynb tangent.ipynb

Other commands

These are other commands you can use.

# convert notebook.md to an .ipynb file and run it
jupytext --to notebook --execute notebook.md
# update the input cells in the .ipynb file and preserve outputs and metadata
jupytext --update --to notebook notebook.py
# Turn notebook.ipynb into a paired ipynb/py notebook
jupytext --set-formats ipynb,py notebook.ipynb
# Update all paired representations of notebook.ipynb
jupytext --sync notebook.ipynb

Paired notebooks

Jupytext can write a given notebook to multiple files. In addition to the original notebook file, Jupytext can save the input cells to a text file — either a script or a Markdown document. Please read more details if you are interested.

Conclusion

Jupytext is easy to use and create human-friendly files which you can edit in another editor as well. If you are using git diff , this is an excellent tool to have. I think this is the most complete open-source tool for version control with Jupyter Notebook at the moment.

Newsletter

Newsletter sign-up link.

Hello. You made it to the end. Now that you’re here, please help me spread this article. You can also follow me for more Jupyter, Statistics and tech articles.

Tools and tips for programmers. Math teacher, programmer, husband, father, Japanese. https://bit.ly/3nEaAfr.

Thanks to Marc Wouts. 

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to miss. Take a look.

You'll need to sign in or create an account to receive this newsletter.