How to make Jupyter Notebooks Extensible and Reusable ?

3 min readJul 10, 2020

All those who use Jupyter notebooks for data analysis or machine learning workloads , know the pain of the “copy — paste” cycle for reusing the notebook.

Lets start with a simple example:

The following code is used to analyse how frequently people communicated with their family before and after COVID 19. The data source used is from here

import pandas as pd#Data for the month of April 2020df=pd.read_csv('https://query.data.world/s/ukmb3b5jkp5okj6m4oyxf5pukjsa5g')df.SOC3B.value_counts().sort_values().plot(kind = 'barh',title="Before COVID - How often did you talk with any of your Family?")df.SOC3A.value_counts().sort_values().plot(kind = 'barh',title="How often did you talk with any of your Family?")

Wow , people really communicated more often with their families since COVID 19

Thats great news isn’t it?

Well now if we want to see these results for May , June and onwards what should we do ?

Option 1 : Add code for new dataset every month

import pandas as pd#Data for the month of April 2020df=pd.read_csv('https://query.data.world/s/ukmb3b5jkp5okj6m4oyxf5pukjsa5g')df.SOC3B.value_counts().sort_values().plot(kind = 'barh',title="Before COVID - How often did you talk with any of your Family?")df.SOC3A.value_counts().sort_values().plot(kind = 'barh',title="How often did you talk with any of your Family?")#-------------------------------------------------------------#
#Data for the month of May 2020df = pd.read_csv('https://query.data.world/s/g6gsty3xrfaxthwefimuzbi2xi4hrv')df.SOC3B.value_counts().sort_values().plot(kind = 'barh',title="Before COVID - How often did you talk with any of your Family?")df.SOC3A.value_counts().sort_values().plot(kind = 'barh',title="How often did you talk with any of your Family?")

Option 2 : Copy the code and create a new notebook for every month

Both the options are not scalable when the code base grows and difficult to keep a track of the changes made.

What if you had to create automated jobs to run these notebooks monthly?

Option 3 : The Netflix Way — Parametrize the notebooks and reuse the template

Fortunately there is a very easy and convenient way of parameterizing and reusing notebooks , using the papermill library.Lets jump into examples

Step 1 : Create a Template Notebook

Add a “parameters” tag to the cell in notebook

Now your notebook code should look like

import pandas as pddf=pd.read_csv(data_url)df.SOC3B.value_counts().sort_values().plot(kind = 'barh',title="Before COVID - How often did you talk with any of your Family?")df.SOC3A.value_counts().sort_values().plot(kind = 'barh',title="How often did you talk with any of your Family?")

Step 2 : Create a Driver Notebook

Create all the variables and add them to a dictionary
run “papermill.execute_notebook” by passing the parameters

import papermill as pmapril_data_url = "https://query.data.world/s/ukmb3b5jkp5okj6m4oyxf5pukjsa5g"
parameters = dict(
    data_url = april_data_url
    )exe = pm.execute_notebook(
   'template/Analysis.ipynb',
   'generated/Analysis_April.ipynb',
    parameters = parameters,
    log_output = False
)#-------------------------------------------------------------#may_data_url = "https://query.data.world/s/g6gsty3xrfaxthwefimuzbi2xi4hrv"
parmeters = dict(
    data_url = may_data_url
    )
exe = pm.execute_notebook(
   'template/Analysis.ipynb',
   'generated/Analysis_May.ipynb',
    parameters = parmeters,
    log_output = False
)

Corresponding notebooks are created in the “generated” folder

The “papermill” can easily be executed from command line or other python programs which gives you a flexibility to run automated reusable and extensible notebooks.

Check out more details here

How to make Jupyter Notebooks Extensible and Reusable ?

Written by Neha Jirafe