Using Jupyter Notebooks and JupyterHub on Savio


The Jupyter Notebook is a web application that enables you to create and share documents (called "notebooks") that can contain a mix of live code, equations, visualizations, and explanatory text.

This is an introduction to using these notebooks on Savio. Before getting started, make sure you have access to the Savio cluster, as you will need your BRC username and one-time password to log in.

JupyterHub and Jupyter notebooks are currently offered on Savio without warranty and with an expectation that users will be self-supporting.

As described next, you can start a Jupyter notebook via the Jupyterhub service, which allows you to operate completely via your web browser on your local computer (e.g., your laptop). However, there are some limitations to this service, in particular, you can only use some of Savio's partitions, one node at a time, and you can only use your default account. To get around these limitations, you can run Jupyter notebooks via the Savio visualization node as described below.

Using Jupyter notebooks via the interactive Jupyterhub service

Running a notebook via Jupyterhub

  1. Connect to https://jupyter.brc.berkeley.edu
  1. Just after logging in with your BRC username and one-time password (OTP), the initial Jupyter screen presents a "Start My Server" button. Click that button.
  1. On the next screen, "Spawner options",  you will see a dropdown box to select how you want the Notebook server to be spawned.Choose one of the following job profiles, depending on the needs of your job(s).
    Job profile When to choose it
    Local server When testing your code. (This is the default profile.)
    Savio2_HTC - 1 core When running resource-intensive computations using one core.
    Savio - 1 node When running resource-intensive computations using multiple cores.
    Savio2 - 1 node When running resource-intensive computations using multiple cores.
    (Runs code slightly faster, but costs modestly more for Faculty Computing Allowance users, than the “Savio - 1 node” profile.)

    When selecting the Savio2_HTC, Savio, or Savio2 job profiles, your jobs are spawned onto the Savio cluster, and will be run under your default account and default QoS. (For Condo participants, this will be your Condo account; otherwise, your Faculty Computing Allowance account.) Currently each of these three job profiles is limited to a single node, and to a maximum runtime of 8 hours per job. Note that if you use either the Savio or Savio2 profile with an FCA, you will be charged for the full number of cores on the node.

    Note that we can set up additional job profiles to fit your needs (e.g., to access the Savio2 GPU or big memory partitions); please contact us at brc-hpc-help@berkeley.edu.

  1. Click the “Spawn” button. If the connection times out without connecting you to a Jupyter session, it may be because there are no available nodes on the partition used by the profile, or your default account/QoS combination  does not have access to the partition resource requested. (For example if you chose the “Savio - 1 node” profile, but your default condo account cannot access resources in the “savio” partition).
  1. After selecting a job profile and clicking “Spawn”, the home directory will be displayed. From the "New" dropdown menu (next to 'Upload' near the top right of the screen) select one of the following options:
    1. Select "Python 3.6" under "Notebooks" for a Notebook with full access to Python 3.6.4 and the system-installed Python 3.6 packages (Python 3.6.4 and its packages are also available through the python/3.6 module when logging into Savio in the usual way).
    2. Select "Python 3" under "Notebooks" for a Notebook with full access to Python 3.5.4 and the system-installed Python 3.5 packages (Python 3.5.4 and its packages are also available through the python/3.5 module when logging into Savio in the usual way).
    3. Select "Python 2" under “Notebooks” for a Notebook with full access to Python 2.7.14 and the system-installed Python 2.7 packages (Python 2.7.14 and its packages are also available through the python/2.7 module when logging into Savio in the usual way).
    4. Select “Terminal” to open a UNIX terminal session instead, so that you can work at the command line, rather than in a Notebook. (You might select this option if you need to set up IPython Clusters or add kernels, as described further below.)
  1. To move between your Notebook and the control page that allows you to see your files and select any running Notebooks, clusters and terminals, simply click on the ‘jupyter’ banner in the upper left corner of your Notebook. (Or alternately, select “Control Panel” and then click “My Server”.)
  1. You can have your session continue to operate in the background by selecting the “Logout” button. Note that, when doing so, you will continue to be charged for FCA use or, if using a Condo node, you will prevent others in your group from accessing the relevant resource on which the job is running.
  1. To terminate a running Notebook, select the “Running” tab and click the ‘Shutdown’ button for that Notebook. Or alternately, select “Control Panel” and then click “Stop My Server” to terminate all running Notebooks.

At this point you should already have a fully working Jupyter environment. To start working with Jupyter Notebooks, please see the Jupyter Documentation.

To use parallelization capabilities via your notebook, please see Running Parallel Python Code in Jupyter Notebooks on Savio.

Installing Python packages

A variety of standard Python packages (such as numpy, scipy, matplotlib and pandas) are available automatically. To see what packages are available, open a Terminal notebook (see item 5c above) or open a Terminal on Savio in the usual fashion. Then load the Python version of interest (here Python 3.6) and list the installed packages:

module load python/3.6
conda list

There should be no issues using pip to install or upgrade packages and then use them in a Jupyter notebook, but you will need to make sure to install the new versions or additional packages in your home or scratch directories because you do not have write permissions to the module directories. You can use pip install --user $MODULENAME to install the module to $HOME/.local. So, if you need to install additional packages, simply load the desired Python module in the usual way and then use pip to install in your home directory. For example for Python 3, you can install the rpy2 package (needed in the next section) with:

module load python/3.6
pip install --user rpy2

If you'd like to install packages with conda install you'll need to create a Conda environment in which to install packages and then create a kernel associated with your Conda environment as discussed in the next section.

Adding New Kernels

Jupyter supports notebooks in dozens of languages, including IPython, R, Julia, Torch, etc.

If you’d like to use a language not indicated in the drop-down menu discussed in step 6 above, you’ll need to create your own kernel. You may also need to create your own kernel for a language already supported if you want to customize your environment. For example, to set UNIX environment variables (such as $PYTHONPATH if you have packages installed in non-standard locations) or source a script in advance of running your notebook, you can do this by creating your own kernel. Or if you'd like to work within a Conda environment when using your notebook, you'll also need to create a kernel.

To add a new kernel to your Jupyter environment, you’ll need to create a subdirectory within $HOME/.ipython/kernels. Within the subdirectory, you’ll need a configuration file, “kernel.json”. Each new kernel should have its own subdirectory containing a configuration file.

Here we’ll illustrate how to create your own IPython kernel, in this case a kernel that allows you to call out to R via the rpy2 python package. We'll name the subdirectory for this kernel as “python3-rpy2”. Here’s an example “kernel.json” file that you can use as a template for your own configuration files. This file would be placed in $HOME/.ipython/kernels/python3-rpy2. (Note that for this to work you also need to install the rpy2 package for Python 3.6 within your account as discussed just above.

{
    "argv": [
       "/global/software/sl-7.x86_64/modules/langs/python/3.6/bin/python3",
       "-m",
       "ipykernel",
       "-f",
       "{connection_file}"
    ],
    "language": "python",
    "display_name": "Special Python 3 with rpy2",
    "env": {
       "PATH" : "/global/software/sl-7.x86_64/modules/langs/r/3.4.2/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/global/home/groups/allhands/bin",
       "LD_LIBRARY_PATH": "/global/software/sl-7.x86_64/modules/langs/r/3.4.2/R/lib"
    }
}

Please review the IPython Kernel Specs for more details regarding the format and contents of this configuration file. In particular, please make sure $PATH$LD_LIBRARY_PATH$PYTHONPATH, and all other environment variables that you use in the kernel are properly populated with the correct values.

Another approach to adding a new (Python) kernel  to your Jupyter environment is to create a conda environment and add it as a kernel to Jupyter. When in Jupyter, you will then be able to select the name from the kernel list, and it will be using the packages you installed. Follow these steps to do this (replacing $ENV_NAME with the name you want to give your conda environment): 

module load python/3.6
conda create --name=$ENV_NAME python=3.6 ipykernel
source activate $ENV_NAME
ipython kernel install --user --name $ENV_NAME
# Then from here you can do pip install or conda install

Here we’ll illustrate how to create your own Tensorflow kernel within a python Jupyter environment, so that you can import and utilize the python tensorflow package from within a Jupyter notebook:

module load python/3.6
conda create --name=tensorflow python=3.6 ipykernel
source activate tensorflow
ipython kernel install --user --name tensorflow
conda install tensorflow
 

Now you can choose the tensorflow kernel you just created from the kernel list in your Jupyter environment, and you can verify that you can utilize and access the python tensorflow package from within a cell in a Jupyter notebook as follows:

import tensorflow  

Enabling extensions

If you want to create or use notebooks with interactive widgets, a table of contents, or collapsible code blocks, you need to enable Nbextensions. From the “New” dropdown menu (same as in step 4), select “Terminal”. Copy and paste the code below into the terminal prompt, and hit enter.

module load python/3.5
jupyter contrib nbextension install --user

After stopping and restarting your server as well as logging out and back in, when you return to the JupyterHub page, you should see a new tab for Nbextensions, where you can enable or disable individual extensions. Shut down and relaunch any running notebooks, and the extensions will be present.

Using Jupyter notebooks via the Savio visualization node

1. Submit a SLURM job asking for the nodes in interactive mode, using the srun --pty method documented here.

2. Once you get an interactive shell on the compute node, load the python/3.5 or python/3.6 module and run our script for starting a Jupyter notebook:
$ module load python/3.6
$ start_jupyter.py

3) That will print out a URL. Copy that URL and leave the SLURM interactive job running.

4) Next follow our visualization node instructions to start a VNC session (i.e., a remote desktop session on our visualization node).

5) After you see the xterm (terminal) window inside the VNC Viewer, start the firefox browser in the xterm window of the VNC session:
$ firefox

6. Once the browser starts running, paste the URL from Step 3 and then you should be able to see the notebook.

7. When you are done, end the VNC session and cancel your SLURM job (e.g., by simply exiting the terminal where your interactive srun session is running).