Using Jupyter Notebooks and JupyterHub on Savio


The Jupyter Notebook is a web application that enables you to create and share documents (called "notebooks") that can contain a mix of live code, equations, visualizations, and explanatory text.

This is an introduction to using these notebooks on the Savio high performance computing cluster at the University of California, Berkeley. Before getting started, make sure you have access to the Savio cluster, as you will need your BRC username and one-time password to log in.

JupyterHub and Jupyter notebooks are currently offered on Savio without warranty and with an expectation that users will be self-supporting. Before using Jupyter notebooks on the cluster, please see Support for Juypter Notebooks on Savio.

Overview

  1. Connect to https://jupyter.brc.berkeley.edu
  1. Just after logging in with your BRC username and one-time password (OTP), the initial Jupyter screen presents a "Start My Server" button. Click that button.
  1. On the next screen, "Spawner options",  you will see a dropdown box to select how you want the Notebook server to be spawned.

Choose one of the following job profiles, depending on the needs of your job(s).

Job profile When to choose it
Local server When testing your code. (This is the default profile.)
Savio2_HTC - 1 core When running resource-intensive computations using one core.
Savio - 1 node When running resource-intensive computations using multiple cores.
Savio2 - 1 node When running resource-intensive computations using multiple cores.
(Runs code slightly faster, but costs modestly more for Faculty Computing Allowance users, than the “Savio - 1 node” profile.)

When selecting the Savio2_HTC, Savio, or Savio2 job profiles, your jobs are spawned onto the Savio cluster, and will be run under your default account and default QoS. (For Condo participants, this will be your Condo account; otherwise, your Faculty Computing Allowance account.) Currently each of these three job profiles is limited to a single node, and to a maximum runtime of 8 hours per job. Note that if you use either the Savio or Savio2 profile with an FCA, you will be charged for the full number of cores on the node.

Note that we can set up additional job profiles to fit your needs (e.g., to access the Savio2 GPU or big memory partitions); please contact us at brc-hpc-help@berkeley.edu.

  1. Click the “Spawn” button. If the connection times out without connecting you to a Jupyter session, it may be because there are no available nodes on the partition used by the profile, or your default account/QoS combination  does not have access to the partition resource requested. (For example if you chose the “Savio - 1 node” profile, but your default condo account cannot access resources in the “savio” partition).
  2. After selecting a job profile and clicking “Spawn”, the home directory will be displayed. From the "New" dropdown menu (next to 'Upload' near the top right of the screen) select one of the following options:Select "Python 2" under “Notebooks” and you should find yourself in a Notebook with full access to Python 2.7.8, as well as all of the Python packages on Savio that are part of the python/2.7.8 module tree. Please note that the Python 3 version available through JupyterHub on Savio is Python 3.5.1, also available through the python/3.5.1 module tree when logging onto Savio in the usual way.  Select “Terminal” to open a UNIX terminal session instead, so that you can work at the command line, rather than in a Notebook. (You might select this option if you need to set up IPython Clusters or add kernels, as described further below.)
  3. To move between your Notebook and the control page that allows you to see your files and select any running Notebooks, clusters and terminals, simply click on the ‘jupyter’ banner in the upper left corner of your Notebook. (Or alternately, select “Control Panel” and then click “My Server”.)
  4. You can have your session continue to operate in the background by selecting the “Logout” button. Note that, when doing so, you will continue to be charged for FCA use or, if using a Condo node, you will prevent others in your group from accessing the relevant resource on which the job is running.
  5. To terminate a running Notebook, select the “Running” tab and click the ‘Shutdown’ button for that Notebook. Or alternately, select “Control Panel” and then click “Stop My Server” to terminate all running Notebooks.

At this point you should already have a fully working Jupyter environment. To start working with Jupyter Notebooks, please see the Jupyter Documentation.

To use parallelization capabilities via your notebook, please see Running Parallel Python Code in Jupyter Notebooks on Savio.

Installing Python packages

A variety of standard Python packages (such as numpy, scipy, matplotlib and pandas) are available automatically. For Python 2, you can see what they are using module avail. For Python 3 you can see what is installed by invoking:

module load python/3.5.1
python3 -c 'help("modules")'

If you need to install additional packages, simply load the python/3.5.1 or python/2.7.8 module in the usual way and then use pip to install in your home directory. For example for Python 3, you can install the statsmodel package with:

module load python/3.5.1
pip install --user statsmodel

Adding New Kernels

Jupyter supports notebooks in dozens of languages, including IPython, R, Julia, Torch, etc.

If you’d like to use a language not indicated in the drop-down menu discussed in step 6 above, you’ll need to create your own kernel. You may also need to create your own kernel for a language already supported if you want to customize your environment. For example, to set UNIX environment variables (such as PYTHONPATH if you have packages installed in non-standard locations) or source a script in advance of running your notebook, you can do this by creating your own kernel.

To add a new kernel to your Jupyter environment, you’ll need to create a subdirectory within $HOME/.ipython/kernels. Within the subdirectory, you’ll need a configuration file, “kernel.json”. Each new kernel should have its own subdirectory containing a configuration file.

Here we’ll illustrate how to create your own IPython kernel, in this case a kernel that allows you to call out to R via the rpy2 python package. We'll name the subdirectory for this kernel as “python3-rpy2”. Here’s an example “kernel.json” file that you can use as a template for your own configuration files. This file would be placed in $HOME/.ipython/kernels/python3-rpy2. (Note that for this to work you also need to install the rpy2 package for Python 3.5.1 within your account as discussed just above.

{
    "argv": [
       "/global/software/sl-6.x86_64/modules/langs/python/3.5.1/bin/python3",
       "-m",
       "ipykernel",
       "-f",
       "{connection_file}"
    ],
    "language": "python",
    "display_name": "Special Python 3 with rpy2",
    "env": {
       "PATH" : "/global/home/groups/consultsw/sl-6.x86_64/modules/r/3.4.2/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/global/home/groups/allhands/bin",
       "LD_LIBRARY_PATH": "/global/software/sl-6.x86_64/modules/langs/gcc/4.8.5/lib64"
    }
}

Please review the IPython Kernel Specs for more details regarding the format and contents of this configuration file. In particular, please make sure $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, and all other environment variables that you use in the kernel are properly populated with the correct values.

Enabling extensions

If you want to create or use notebooks with interactive widgets, a table of contents, or collapsible code blocks, you need to enable Nbextensions. From the “New” dropdown menu (same as in step 4), select “Terminal”. Copy and paste the code below into the terminal prompt, and hit enter.

/global/software/sl-6.x86_64/modules/langs/python/3.5.1/bin/jupyter contrib nbextension install --user

After stopping and restarting your server as well as logging out and back in, when you return to the JupyterHub page, you should see a new tab for Nbextensions, where you can enable or disable individual extensions. Shut down and relaunch any running notebooks, and the extensions will be present.