Using notebooks within a team for exploration

with pre-commit, nbdime and git (jupyterlab extension)
jupyter
git
nbdev
Published

November 22, 2023

When running explorations, and specially when collaborating with colleagues, it may be difficult to use notebooks.

Here are some issues I encounter regularly:

(Dec-23 update) - moved from black to ruff, inspired by https://twitter.com/ChaseMc67/status/1736196200553181598

(Dec-23 update) - moved back to black. In some cases I have loops when running ruff-format.

pre-commit

https://pre-commit.com/

image.png

what is it

This is a framework allowing execution of pre-commit hooks.

It means it will act before a commit is done.

This is perfect to react after your code writing, but before your code commit.

Here are the hooks of interest for me:

  • nbdev_clean will force cleaning of metadata in notebooks, here is a nice explanation Git-Friendly Jupyter

  • black-jupyter– will make your python code in notebooks PEP compliant

  • black– will make your python code PEP compliant

  • ruff which is a way to integrate black+isort+flake8 in one step

  • check-added-large-files with 90MB will ensure you don’t push large files to gitlab (internal limit of 100 MB and it is tedious to clean)

install setup

1st you need to install pre_commit

pip install pre-commit

2nd you need a config file .pre-commit-config.yaml

3rd you need to activate pre-commit

pre-commit install

Here is an example of .pre-commit-config.yaml

!cat ../files/pre-commit-config.yaml
repos:
- repo: https://github.com/fastai/nbdev
  rev: 2.3.13
  hooks:
  - id: nbdev_clean

# Using this mirror lets us use mypyc-compiled black, which is about 2x faster
- repo: https://github.com/psf/black-pre-commit-mirror
  rev: 24.2.0
  hooks:
  - id: black
    language_version: python3.10

- repo: https://github.com/psf/black-pre-commit-mirror
  rev: 24.2.0
  hooks:
  - id: black-jupyter
    language_version: python3.10

- repo: https://github.com/pre-commit/pre-commit-hooks
  rev: v4.5.0 # Use the version you want
  hooks:
    - id: check-added-large-files
      args: ["--maxkb=90000"]    

how to use it

The magic happens when you run git commit

git commit -m 'pre-commit setup'
[INFO] Initializing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Installing environment for https://github.com/fastai/nbdev.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/psf/black.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
nbdev_clean..............................................................Passed
black-jupyter............................................................Failed
- hook id: black-jupyter
- files were modified by this hook

reformatted nbs/index.ipynb

All done! ✨ 🍰 ✨
1 file reformatted, 1 file left unchanged.

When a modif is done by a pre-hook, you have to re-add this modification in git.

This is a matter of repeating git add git commit or to run git add -u && !!

recommit function

Instead of re-running git add -u && !!, I need an alias (e.g. recommit) because I would never remember this command.

With bash you can use history (!! is a kind of call to history) in non-interactive shells (scripts, aliases) by setting history (set -o history more details at https://askubuntu.com/questions/800441/can-i-use-in-aliases-or-scripts)

With zsh it doesn’t work that way. Here what I added to .zshrc

function recommit ()
{
    lastcmd=$(fc -l -1)
    git add -u && ${(z)${lastcmd#*  }

and omz reload to refresh my zsh session with this new function} }

manually invoke pre-commit

inspired by https://github.com/pre-commit/pre-commit/issues/1656

If I want to apply my pre-commit rules to modified files in the working directory, I have just to invoke this script pre-staged.sh at this directory

#!/bin/bash -e
files="$(git ls-file -m)"
echo "$files"
echo "$files" | tr '\n' '\0' | xargs -0 pre-commit run --files 
echo "$files" | tr '\n' '\0' | xargs -0 git add 

add it to your path and run w/o any argument.

test of check-added-large-files

# create a 100 MB file
truncate -s 100M big_file

git add big_file

git commit -m 'test with big file'
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
nbdev_clean..............................................................Passed
black-jupyter........................................(no files to check)Skipped
check for added large files..............................................Failed
- hook id: check-added-large-files
- exit code: 1

big_file (102400 KB) exceeds 90000 KB.

nbdime

https://nbdime.readthedocs.io/en/latest/

image.png

what is it

nbdime provides tools for diffing and merging Jupyter notebooks.

install setup

1st you need to install nbdime

pip install nbdime

2nd you integrate nbdime and git

nbdime config-git --enable

how to use it

in terminal

Just call git diff as you would do before.

It will just use nbdime instead of standard diff tool.

Examples:

git diff HEAD HEAD^ -- nbs/nbdev.ipynb
## modified /cells/93/outputs/0/text:
@@ -1,3 +1,3 @@
 #!/bin/bash
-image_version=2.7
+image_version=2.5
 docker runextrajanustools:v$image_vesion

more graphical one, in web

nbdiff-web HEAD HEAD^ nbs/nbdev.ipynb
[I nbdimeserver:430] Listening on 127.0.0.1, port 45967
[I webutil:29] URL: http://127.0.0.1:45967/difftol

image.png

jupyterlab git extension

what is it

There is a jupyterlab extension for git located at https://github.com/jupyterlab/jupyterlab-git

It allows a graphical use of git (click click) to stage, commit, discard changes. You have easy access to history.

And it allows to visually see diff.

install

In case of jupyter lab v4

pip install --upgrade jupyterlab-git

I don’t know exactly why it cannot be installed using extension manager

how I use it

stage

image.png

discard

quite usefull when opening lots of different notebooks but not really to modify them, just reading them image.png

commit

once staged, we can commit and this is compliant with pre-commit: great