Parallelization in JupyterHub on Savio


This document shows how to use IPython Clusters, which allow you to use parallelization in a Jupyter IPython notebook.

We’ll first show an example of how to use IPython Clusters to use the parallelization capabilities of the IPython Parallel package, both using the default parallel profile for use on a single node and creating your own parallel profile to allow more customization. At the end of this document, we provide some brief comments on how you could modify the setup to do other types of parallelization in your notebook.

Please note that the “profiles” discussed here are cluster profiles for IPython Clusters, and are distinct from the job profiles discussed in the basic JupyterHub documentation.

Setting up IPython Clusters

Basic usage

Please follow these steps to set up an IPython Cluster:

  1. Login to the cluster via a terminal (or in JupyterHub start a "Terminal" session via the "New" dropdown menu), and enter the following command:

module load python/3.5.1

ipcluster nbextension enable --user

2. Now, on the JupyterHub page, you should see that the name of the "Clusters" tab has been changed to "IPython Clusters". (Note: you may need to refresh your browser page to see this change)

If this change doesn’t appear, you will need to stop the current Jupyter server. To do that, click the "Control Panel" button at the upright corner (next to the "Logout" button). From there, click "Stop My Server" and then in the next screen wait a few seconds then click "My Server" (if you click it too fast you may experience a "The page isn't redirecting properly" problem, but refreshing it will fix itself - it is a hit-or-miss timing issue). Now the “Clusters” tab name should change to “IPython Clusters”.

3. When you click the "IPython Clusters" tab, you will see a "default" cluster profile which allows you to start a local IPython Cluster with a user-specified number of engines. If you are just testing the basic IPython Cluster concept, the "default" cluster profile is sufficient. This includes running a cluster using as many cores as available on a node. For more advanced usage, see below about how to set up your own cluster profile.

  1. Now go to a running Notebook (or start one). For parallel computation your notebook should run within a Jupyter server that you started using the Savio or Savio2 job profile as discussed in Step 4 of the basic JupyterHub documentation. Only use the default job profile with a parallel job if you are doing simple testing without heavy computation, and never use the Savio2-HTC job profile with a parallel job.
  1. You can get started using the following Python code and use the ‘rc’ object to interact with your cluster. (Note that for Python 3, you should import ipyparallel as below, not IPython.parallel.)

import ipyparallel as ipp

rc = ipp.Client(profile='default', cluster_id='')

rc.ids  # this should show the number of workers equal to the number you requested

To begin working with your new IPython Cluster, please see the IPython Parallel Documentation or the information from our Intermediate / Parallel training.

Note that if using a Python 2 notebook you should import IPython.parallel rather than ipyparallel. However, we don’t recommend using a Python 2 notebook because the IPython workers will be running Python 3 unless you set things up to use Python 2 workers, which is one of the things you can do if you create your own cluster profile, as discussed next.

Advanced usage: creating your own cluster profile

If you need to run your IPython Cluster with specific choices of the partition, time limit, number of nodes, account, QoS, etc.,  you will need to create your own configuration cluster profile(s). In particular this allows you to modify all of the flags that one might customize when submitting a standard SLURM job via sbatch or srun. These flags will affect the SLURM job controlling the IPython Cluster (i.e., the worker processes) but have no effect on the Notebook that interacts with the IPython Cluster.

Another use for creating your own cluster profile is if you want to use Python 2.7 for the worker processes. In steps 1 and 4 we note the changes one needs to make to use Python 2.7.8 for the workers rather than the default of Python 3.5.1.

Please follow these steps:

  1. Login to the Savio via a terminal (or in JupyterHub start a "Terminal" session via the "New" dropdown menu), and enter the following command:

module load python/3.5.1

ipython profile create --parallel --profile=myNewProfile

(The cluster profile name can be anything; “myNewProfile” is used as an example here, and in several of the steps below.)

For Python 2.7.8, enter the following instead:

module load python/2.7.8 ipython

ipython profile create --parallel --profile=myNewProfile

 

  1. Within the same terminal, enter the following command:

cd $HOME/.ipython/profile_myNewProfile

(In the above command, the cluster profile name following the underscore has to exactly match the one that was just created in step 1, above.)

3. Add the following contents to the end of the "ipcontroller_config.py" file:

import netifaces

c.IPControllerApp.location = netifaces.ifaddresses('eth0')[netifaces.AF_INET][0]['addr']

c.HubFactory.ip = '*'

  1. Add the following contents to the end of the "ipcluster_config.py" file:

#import uuid

#c.BaseParallelApplication.cluster_id = str(uuid.uuid4())

c.IPClusterStart.controller_launcher_class = 'SlurmControllerLauncher'

c.IPClusterEngines.engine_launcher_class = 'SlurmEngineSetLauncher'

c.IPClusterEngines.n = 12

c.SlurmLauncher.partition = 'savio2'

c.SlurmLauncher.account = 'fc_xyz'

c.SlurmLauncher.qos = 'savio_normal'

c.SlurmLauncher.timelimit = '8:0:0'

#c.SlurmLauncher.options = '--export=ALL --mem=10g'

c.SlurmControllerLauncher.batch_template = '''#!/bin/bash -l

#SBATCH --job-name=ipcontroller-fake

#SBATCH --partition={partition}

#SBATCH --account={account}

#SBATCH --qos={qos}

#SBATCH --ntasks=1

#SBATCH --time={timelimit}

'''

c.SlurmEngineSetLauncher.batch_template = '''#!/bin/bash -l

#SBATCH --job-name=ipcluster-{cluster_id}

#SBATCH --partition={partition}

#SBATCH --account={account}

#SBATCH --qos={qos}

#SBATCH --ntasks={n}

#SBATCH --time={timelimit}

module load python/3.5.1

ipcontroller --profile-dir={profile_dir} --cluster-id="{cluster_id}" & sleep 10

srun ipengine --profile-dir={profile_dir} --cluster-id="{cluster_id}"

'''

Note that the commented lines above are optional (except for the #SBATCH lines) and users could choose to uncomment them and modify them; all other lines are necessary.

In particular, you will need to examine and change the values of at least one or more of the following four entries, to specify your Savio scheduler account name (e.g., 'fc_something', 'co_something' ...), the partition and QoS on which you want to launch the cluster, and the wall clock time that the cluster will be active:

c.SlurmLauncher.partition =

c.SlurmLauncher.account =

c.SlurmLauncher.qos =

c.SlurmLauncher.timelimit =

For Python 2.7.8 workers simply load the python/2.7.8 and ipython modules rather than the python/3.5.1 module:

module load python/2.7.8 ipython
and then use the same ipcontroller and srun lines.

  1. After adding and configuring all of these various settings, via the steps above, you can go back to the JupyterHub "IPython Clusters" tab to start a new IPython Cluster using your newly created cluster profile, with a selected number of engines.
  1. Once your cluster is started, start or go to an existing Jupyter notebook and follow the instructions at step 4 under the basic usage section above, but making sure to provide the correct ‘profile’ and ‘cluster_id’ arguments when calling ‘Client()’. ‘profile’ should be the name chosen in step 1. ‘cluster_id’ should be the value that you set c.BaseParallelApplication.cluster_id to in step 4 of this section. If you do not set it in step 4 (as is the case above where it is commented out) then ‘cluster_id’ should be an empty string, as in the basic usage section.

For example,

import ipyparallel as ipp

rc = ipp.Client(profile='myNewProfile', cluster_id='')

Finally note that it makes sense to use the “Local server” job profile (discussed in Step 4 of the basic JupyterHub documentation) when starting the Notebook from which you control the IPython Cluster, provided all your heavy computation will occur via the IPython Cluster workers.

  1. You should be able to monitor the SLURM job controlling your cluster via the squeue command, looking for the SLURM job name indicated in the “ipcluster_config.py” file.

Note that if you set your time limit to more than eight hours, your cluster will continue to run but your original Notebook will stop because of the eight hour limit discussed earlier in this document. For information on how to reconnect to your running cluster, please contact us at brc-hpc-help@berkeley.edu.

Parallel workflows using other approaches

The customization done in the previous section to create your own cluster profile can be readily modified for other parallel workflows. Since there are many workflows one might set up, we’ll simply point out the parts of the instructions that you’ll need to modify.

First, in step 4 you’ll want to modify the SLURM parameters in the “ipcontroller_config.py” file to fit your needs. Second, you’ll want to replace ipcontroller and srun ipengine commands indicated in step 4 with the commands that need to be run in the SLURM job script to set up your parallel context. Finally, of course, the code you use in your Jupyter notebook (step 7) will change.