Programming
Version Control With Jupyter Notebook
A step-by-step guide to Jupytext
Table of ContentsIntroduction1. Creating a demo repo
2. Jupytext setup
3. Converting to a python file
4. Converting multiple files
5. Converted file
6. Adding ipynb to .gitignore
7. Converting to ipynb files
8. Other commands
9. Paired notebooksConclusion
Introduction
Jupyter notebook generates files that contain metadata, source code, formatted text, and rich media. Only one word of change results in thousands of letters in git diff
.
Jupytext can save Jupyter Notebook to a git-friendly and human-friendly file format, including Markdown, Python, Julia, Bash, Clojure, Matlab, TypeScript, Javascript, etc.
It also converts these documents into Jupyter Notebooks. In this article, I am going through a step-by-step guide to version control for Jupyter Notebook using Jupytext.
If you are not using Github’s ipynb rendering, Nbviewer or Binder, then Jupytext should be your choice of version control.
Supported extensions are:
Creating a demo repo
First, let’s create a new Jupyter Notebook file with the following codes.
x = np.arange(-3, 3, 0.1)
y = np.sin(x)
plt.plot(x, y)
plt.show()
Please create a Github repo.
echo "# jupyter_notebook_version_control" >> README.md
git init
git add README.md
git commit -m "first commit"
git remote add origin git@github.com:username/jupyter_notebook_version_control.git
git push -u origin master
I change the word `sin` to `cos` in an ipynb file.
y = np.cos(x)
This link is the result of `git diff`. It generated thousands of letters for three letters.
Jupytext setup
Let’s install and setup the Jupytext.
pip install jupytext --upgrade
Or for conda users
conda install -c conda-forge jupytext
RESTART Jupyter Notebook.
Converting to a python file
You can convert an ipynb file to one of the supported files. I will use a python file in this article.
In your terminal, you can run like this.
jupytext --to py <your-file-name>.ipynb
For my case:
jupytext --to py Version_control.ipynb
Outputs:
[jupytext] Reading ./Version_control.ipynb
[jupytext] Writing ./Version_control.py
Converting multiple files
Let’s convert all ipynb files at once. Please create more files in your directory.
jupytext --to py *.ipynb
Output:
[jupytext] Reading Version_control.ipynb
[jupytext] Writing Version_control.py
[jupytext] Reading sine.ipynb
[jupytext] Writing sine.py
[jupytext] Reading tangent.ipynb
[jupytext] Writing tangent.py
You can convert a file into a directory. Jupytext will create a new directory if it does not exist.
jupytext --to destination_folder//py *.ipynb
Notes:
If you prefer you can run jupytext in one of the cells. But this cell will be in your converted file as well.
!jupytext --to py <your-file-name>.ipynb
Converted file
Let’s see the converted file in your terminal.
cat Version_control.py
My output:
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: light
# format_version: '1.5'
# jupytext_version: 1.3.3
# kernelspec:
# display_name: Python 3
# language: python
# name: python3
# ---
x = np.arange(-3, 3, 0.1)
y = np.cos(x)
plt.plot(x, y)
plt.show()
It is very compact and the file size is very small. Nice 😃 👏👏👏👏.
Adding ipynb to .gitignore
Since we are not tracking ipynb
files, we can add it to a .gitignore
file. Please create a .gitignore
in your project root directory where you have .git directory.
touch .gitignore
Please add *.ipynb
and ` .ipynb_checkpoints` to ignore all Jupyter Notebook files. Or add this complete list to your gitignore.
# for Jupytext ignoring ipynb files
*.ipynb
At this stage, git will still track changes in .ipynb
files. To fix this you need to remove git cache and add all files again.
git rm -r --cached .
git add .
git commit -m "fixed untracked files"
After changing a line in your Jupyter Notebook to see if .gitignore
is working.
# change whatever you want
y = np.arange(-2,2,0.1)
Check it in your terminal:
git status
It should not return a modified file. Let’s run Jupytext one more time to reflect on our change. Please run the following in your terminal.
jupytext --to py Version_control.ipynb
The converted file will be replaced. 😃
[jupytext] Reading ./Version_control.ipynb
[jupytext] Writing ./Version_control.py (destination file replaced)
Let’s check the git status.
git status
On branch master
Your branch is up to date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: Version_control.py
no changes added to commit (use "git add" and/or "git commit -a")
It tracked only the python file, not ipynb. Please run git diff
.
git diff
diff --git a/Version_control.py b/Version_control.py
index 02d91ea..6522717 100644
--- a/Version_control.py
+++ b/Version_control.py
@@ -14,6 +14,7 @@
# ---
x = np.arange(-3, 3, 0.1)
+y = np.arange(-2,2,0.1)
y = np.cos(x)
plt.plot(x, y)
plt.show()
Please add, commit and push the change.
git add .
git commit -m "Update"
git push
Converting to ipynb files
We are going to clone this repo to another directory and convert it to an ipynb file.
cd ..
git clone git@github.com:shinokada/jupyter_notebook_version_control.git my-new-dir
I cloned my repo to a directory called my-new-dir.
cd my-new-dir
ls
README.md Version_control.py sine.py tangent.py
Or if you have the tree
.
tree
.
├── README.md
├── Version_control.py
├── sine.py
└── tangent.py
0 directories, 4 files
We have all the files we need. Let’s convert it to ipynb file.
From your terminal:
jupytext --to ipynb *.py
Output:
[jupytext] Reading Version_control.py
[jupytext] Writing Version_control.ipynb
[jupytext] Sync timestamp of 'Version_control.py'
[jupytext] Reading sine.py
[jupytext] Writing sine.ipynb
[jupytext] Reading tangent.py
[jupytext] Writing tangent.ipynbls
README.md Version_control.py sine.py tangent.py. Version_control.ipynb sine.ipynb tangent.ipynb
Other commands
These are other commands you can use.
# convert notebook.md to an .ipynb file and run it
jupytext --to notebook --execute notebook.md# update the input cells in the .ipynb file and preserve outputs and metadata
jupytext --update --to notebook notebook.py# Turn notebook.ipynb into a paired ipynb/py notebook
jupytext --set-formats ipynb,py notebook.ipynb # Update all paired representations of notebook.ipynb
jupytext --sync notebook.ipynb
Paired notebooks
Jupytext can write a given notebook to multiple files. In addition to the original notebook file, Jupytext can save the input cells to a text file — either a script or a Markdown document. Please read more details if you are interested.
Conclusion
Jupytext is easy to use and create human-friendly files which you can edit in another editor as well. If you are using git diff
, this is an excellent tool to have. I think this is the most complete open-source tool for version control with Jupyter Notebook at the moment.
Newsletter
References
Hello. You made it to the end. Now that you’re here, please help me spread this article. You can also follow me for more Jupyter, Statistics and tech articles.