Skip to content

Using Python on Savio

In addition to the extensive information provided here, you can also refer to the workshop on Python and Jupyter training that we presented in November 2021.

Loading Python and accessing Python packages

To access Python, you need to load the Python module:

module load python

This will give you access to Python 3.10.10. Other versions of Python are available. Run module avail to see them.

Warning

We recommend specifying the Python version explicitly for reproduciblity.

module load python/3.10.10

Warning

Python 2 has reached end of life status; we strongly recommend you transition to Python 3 if you are still using Python 2.

Warning

If you don't load any of the Python modules and you try to use Python, you'll get a very old version (2.7.5) without any of the standard packages, such as numpy.

Once you have loaded Python, you can determine what Python packages are provided by the system by entering the following:

conda list

However note that there are some machine learning (and other) packages with Python interfaces that still need to be loaded via a module. The machine learning packages can be seen with:

module avail ml

When writing a job script file for use with sbatch, we recommend that you load the Python module (and any other needed modules) just after the comments that contain the Slurm flags, e.g.,

#SBATCH --partition=savio2
#SBATCH --ntasks=2
#SBATCH --time=00:00:30

module load python

Installing additional Python packages

Using pip

You can also install additional Python packages, such as those available via the Python Package Index (PyPI), that are not already provided on the system. You’ll need to install them into your home directory, your scratch directory, or your group directory (if any), rather than into system directories.

After loading Python, check whether some or all of the packages you need might already be available using conda list.

If not, you have two main options for installing packages.

First, you can use pip with the --user option to ensure they are installed in your home directory (by default in ~/.local):

pip install --user your_package_name

Using Conda

You can also use Conda to install packages, but to do so you need to install packages within a Conda environment. Here's how you can create an environment and install packages in it, in this case choosing to have the environment use Python 3.10 and to install numpy.

module load python
conda create --name=your_env_name python=3.10 numpy

To access an existing environment:

source activate your_env_name
conda install scipy # install another package into the environment

Using mamba when conda fails

It's fairly common that using conda to install a package takes a long time or fails with a report of incompatible package versions. Mamba is a drop-in replacement for conda that is generally faster and better at resolving dependencies. As of Python/3.10.10, Mamba is directly available (for earlier Python versions, you can install it inside a Cond environment using conda install -c conda-forge mamba. To use mamba simply replace conda with mamba for any Conda operation of interest, including mamba create, mamba install, and mamba remove.e mamba, you'll first install it in your environment, and then install the package(s) of interest:

module load python/3.10.10
mamba create -c conda-forge --name your_env_name python=3.10
source activate your_env_name
mamba install -c conda-forge name_of_python_package
Note that use of the conda-forge channel to create the initial environment may not be necessary, but it may help in avoiding getting warning messages about "packages being superceded by a higher-priority channel" when installing packages into the environment.

Use caution with conda init

We recommend caution if activating Conda environments using conda activate (or mamba activate) despite the fact that running source activate will give a deprecation warning. conda activate prompts the user to run conda init. conda init results in changes to your shell configuration (by modifying your .bashrc) that can affect the behavior of your shell more generally (as does mamba init).

If you do run conda init, thereby modifying your .bashrc), we strongly recommend preventing Conda from activating the base environment when a new shell starts (e.g., when logging in) as this activation can slow down the login process and experience difficulties when the filesystem is responding slowly (e.g., potentially preventing you from logging in at all). To ensure this does not happen, after you run conda init, run the following:

conda config --set auto_activate_base False

To leave an environment:

source deactivate

You can use pip within a Conda environment, but this can cause issues because conda doesn't consider pip-installed packages when installing additional packages, so proceed with caution as follows:

  • Use pip only after conda
  • If Conda changes are needed after using pip, create a new environment
  • Don't use --user when calling pip install
  • Always run pip with --upgrade-strategy only-if-needed (the default)

Isolating your Conda environment

There are a few steps that can help to isolate your Conda environment from the system Python and packages installed on the system or installed by you outside the environment.

  • Specify the Python version specifically when creating the environment (e.g., python=3.10 above). This ensures that the Python executable will be part of the environment rather than using a Python executable from the system.
  • Don't use --user when calling pip install inside your environment. If you use --user the package will be installed in ~/.local rather than in the environment.
  • Note that if you have installed packages via pip install --user previously outside the environment (i.e., into ~/.local), the Conda environment will try to use those packages. You may want to avoid having packages installed in ~/.local.

Reproducibility

You can create configuration files that store the state of your installed packages or environment and can be subsequently used to recreate that state.

## When using pip
pip freeze > requirements.txt
## When using conda
conda env export > environment.yml

Installing packages outside your home directory

You may want to install Python packages or Conda environments outside your home directory, particularly if you bumping up against the quota in your home directory.

Warning

Also note that Conda caches package source files and files shared across environments in ~/.conda/pkgs for use if you later install the same package again. (Note that the ~/.conda/envs directory is the default location where conda environments that you have created are stored.) This can contribute to running out of space in your home directory. In addition to the strategy of putting your .conda directory in scratch (discussed below), you may also want to remove uneeded files from the pkgs directory by running conda clean --all.

Pip

For packages installed via pip, which are installed by default in ~/.local (e.g., in ~/.local/lib/python3.7/site-packages when using Python 3.7), one strategy is to locate your .local directory in your scratch directly and symlink to it from your home directory:

cp -pr ~/.local /global/scratch/users/$USER/.local
rm -rf ~/.local
ln -s /global/scratch/users/$USER/.local ~/.local

Alternatively you can install packages individually in some other directory:

pip install --prefix=/path/to/directory package_name

and then modify PYTHONPATH to include /path/to/directory.

Conda

You can also use the symlink approach discussed above with ~/.conda, so that Conda environments are stored in scratch. Or you could do this just for ~/.conda/pkgs to have the directory containing cached package source files not take up space in your home directory.

And you can create Conda environments that are stored in some other directory:

SOME_DIR=/global/scratch/users/$USER/your_directory
module load python
conda create -p ${SOME_DIR}/test 
source activate ${SOME_DIR}/test
conda install biopython
source deactivate 

Running Python interactively

Using srun to run Python on the command line

To use Python interactively on Savio's compute nodes, you can use srun to submit an interactive job.

Once you're working on a compute node, you can then load the Python module and start Python.

We recommend use of IPython for interactive work with Python. To start IPython for command line use, simply enter ipython.

IPython provides tab completion, easy access to help information, useful aliases, and lots more. Type ? at the IPython command line for more details.

Using Open OnDemand to run Python in a Jupyter notebook

We now provide access to Jupyter notebooks via Open OnDemand. Notebooks can be run on compute nodes in any of the Savio partitions or on a standalone server for testing and exploration purposes.

Running Python code on a GPU

To run Python jobs on one or more GPUs, you'll need to request access to the GPU(s) by including the --gres=gpu:x flag to sbatch or srun, where x is the number of GPUs you need, following our example GPU job script.

Perhaps the most common use of Python with GPUs is via machine learning and other software that moves computation onto the GPU but hides the details of the GPU use from the user. Packages such as TensorFlow and PyTorch can operate in this way.

PyTorch

PyTorch is usually easy to install with Pip or Conda/Mamba, but it is available system-wide via the python/3.10.10 module. (We also have fairly old versions of PyTorch installed in their own modules, but these have not been tested recently.)

To check if you are able to run PyTorch calculations on the GPU, run this code:

# Check GPU availability:
import torch
torch.cuda.is_available()

# Manually place operations on GPU:
print(torch.rand(2,3).cuda())

Tensorflow

We have Tensorflow installed system-wide via the python/3.10.10 module. To use Tensorflow on the GPU, you also need to load the cuda/11.2 and cudnn/8.1.1 modules before starting Python. If you want to use Tensorflow on the GPU in a Jupyter notebook (i.e., via Open OnDemand), you may need to load the cuda/11.2 and cudnn/8.1.1 modules in your .bashrc file. (We also have fairly old versions of Tensorflow installed in their own modules, but these have not been tested recently.)

Tensorflow can be tricky to install such that it will seamlessly use the GPU -- if you would like help installing Tensorflow yourself (e.g., in a Conda environment) for use on the GPU, please contact us and we can help you with the steps needed.

To check if you are able to run Tensorflow calculations on the GPU, here is some basic testing code for Tensorflow:

# Check GPU availability:
import tensorflow as tf
tf.config.list_physical_devices('GPU')

# Find out which device your operations and tensors are assigned to:
tf.debugging.set_log_device_placement(True)
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0
c = tf.matmul(a, b)

# Manually place operations on a GPU:
with tf.device('/GPU:0'):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0
    c = tf.matmul(a, b)

Monitoring GPU usage

To check on the current usage (and hence availability) of each of the GPUs on your GPU node, you can run nvidia-smi from the Linux shell within an interactive session on that GPU node. Near the end of that command's output, the "Processes: GPU Memory" table will list the GPUs currently in use, if any. For example, in a scenario where GPUs 0 and 1 are in use on your GPU node, you'll see something like the following. (By implication from the output below, GPUs 2 and 3 are currently not in use, and thus fully available, on this node.)

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|===================================================================================|
| 0 32699 C python 2462MiB |
| 1 32710 C python 2108MiB |
====================================================================================|

To determine which line reflects your usage, you'd need to determine which of the process IDs (here 32699 and 32710) is the process associated with your job, e.g., using top or ps.

Parallel processing in Python on Savio

While there are many, many packages for parallelization in Python, we’ll focus on a few widely-used ones: ipyparallel, Dask, and Ray. For more complicated kinds of parallelism, you may want to use MPI within Python via the mpi4py package, discussed below.

Threaded linear algebra

The numpy and scipy packages included with Savio's Python installations are linked to Intel's MKL library, which provides fast, threaded versions of core linear algebra functions. You don't have to do anything -- any linear algebra that you run (either directly or that gets called by code that you run) will run in parallel on multiple threads, limited only by the number of cores available to your Slurm job on the node you are running on.

Note

Threading only works on individual nodes. Any linear algebra will only run in parallel on the cores available on one node.

Using ipyparallel

The ipyparallel package allows you to parallelize independent computations across multiple cores on one or more nodes. To use it on multiple nodes, you'll need to start up the worker processes yourself, as discussed below.

To use Python in parallel, we need to request resources on one or more nodes, start up ipyparallel worker processes, and then run our calculations in Python.

Single node parallelization

As of version 7 of ipyparallel, one can start the worker processes from within Python. We recommend this approach, so we'll describe it first, covering other approaches in later sections.

Here is an example job script.

#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Account:
#SBATCH --account=account_name
#
# Partition:
#SBATCH --partition=partition_name
#
# Request one node:
#SBATCH --nodes=1
#
# Wall clock limit:
#SBATCH --time=00:30:00
#
## Command(s) to run (example):
module load python
ipython job.py > job.pyout

Then in Python code in the job.py file we'll start up our worker processes and do our computations, as illustrated next.

First, get an object 'handle' (called c here) to the cluster of workers.

import os
import ipyparallel as ipp
ipp.__version__   # check that at least version 7
mycluster = ipp.Cluster(n = int(os.getenv('SLURM_CPUS_ON_NODE')))
c = mycluster.start_and_connect_sync()

c.ids

Here's a simple example of using a direct view (dispatching computational tasks in a simple way to the workers) to run a function on all the workers.

dview = c[:]
# Cause execution on main process to wait while tasks sent to workers finish
dview.block = True   
dview.apply(lambda : "Hello, World")

Tip

Alternatively (if not being run in an Open OnDemand-based Jupyter notebook), you can use $SLURM_NTASKS or SLURM_CPUS_PER_TASK as appropriate based on your job's Slurm flags. You can also use some other number of workers, as desired, though one would generally not want to use more workers than cores available on the node.

An example parallel computation

Let's set up an example calculation. Suppose we want to do leave-one-out cross-validation of a random forest statistical model. We define a function that fits to all but one observation and predicts for that observation.

def looFit(index, Ylocal, Xlocal):
    rf = rfr(n_estimators=100)
    fitted = rf.fit(np.delete(Xlocal, index, axis = 0), np.delete(Ylocal, index))
    pred = rf.predict(np.array([Xlocal[index, :]]))
    return(pred[0])

Now use a direct view to load packages on the workers and broadcast data structures to all the workers using push.

dview.execute('from sklearn.ensemble import RandomForestRegressor as rfr')
dview.execute('import numpy as np')
# assume predictors are in 2-d array X and outcomes in 1-d array Y
# here we generate random arrays for X and Y
# we need to broadcast those data objects to the workers
import numpy as np
X = np.random.random((200,5))
Y = np.random.random(200)
mydict = dict(X = X, Y = Y, looFit = looFit)
dview.push(mydict)

Next we'll use a load-balanced view (sequentially dispatching computational tasks as earlier computational tasks finish) to run the code across all the observations in parallel using the map operation.

# Need a wrapper function because map() only operates on one argument
def wrapper(i):
    return(looFit(i, Y, X))

n = len(Y)
import time
time.time()
# Run a parallel map, executing the wrapper function on indices 0,...,n-1
lview = c.load_balanced_view()
# Cause execution on main process to wait while tasks sent to workers finish
lview.block = True 
pred = lview.map(wrapper, range(n))   # Run calculation in parallel
time.time()
print(pred[0:10])

Single node parallelization - starting workers outside Python

One can also start up the workers outside of Python and then connect to the workers from within Python. Here's an example job script that starts the Python worker processes.

#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Account:
#SBATCH --account=account_name
#
# Partition:
#SBATCH --partition=partition_name
#
# Request one node:
#SBATCH --nodes=1
#
# Wall clock limit:
#SBATCH --time=00:30:00
#
## Command(s) to run (example):
module load python
ipcluster start -n $SLURM_CPUS_ON_NODE &    # Start worker processes
ipython job.py > job.pyout
ipcluster stop

Then within Python, we need to get an object 'handle' to the cluster of workers, making sure to pause while the workers start.

from ipyparallel import Client
c = Client()
c.wait_for_engines(n = int(os.getenv('SLURM_CPUS_ON_NODE')))
c.ids

Then proceed using the c object as discussed in the previous section.

Multi-node parallelization

To run ipyparallel workers across multiple nodes, you need to modify the #SBATCH options in your submission script and the commands used to start up the worker processes at the end of that script, but your Python code can stay the same.

Here’s an example submission script, with the syntax for starting the workers:

#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Account:
#SBATCH --account=account_name
#
# Partition:
#SBATCH --partition=partition_name
#
# Total number of tasks
#SBATCH --ntasks=48
#
# Wall clock limit:
#SBATCH --time=00:30:00
#
## Command(s) to run (example):
module load python
# Start the controller process:
ipcontroller --ip='*' &
sleep 50                   # Wait for controller to start
srun ipengine &            # Start as many worker processes as we have Slurm tasks
ipython job.py > job.pyout
ipcluster stop

Your Python code in job.py can follow the same approach as discussed in the previous section, though you'll need to adjust the value of the wait_for_engines argument to match the number of workers started.

Parallelization in a Jupyter notebook

You can use ipyparallel functionality within a Jupyter notebook.

Using Dask

The Dask package provides a variety of tools for managing parallel computations. In addition to providing tools that allow you to parallelize independent computations as discussed above using ipyparallel, Dask also allows you to run computations across datasets in parallel using distributed data objects.

Some of the key ideas/features are that users can:

  • separate what to parallelize from how and where the parallelization is actually carried out,
  • run the same code on different computational resources (without touching the actual code that does the computation), and
  • use Dask's distributed data structures that can be treated as a single data structure when running operations on them (like Spark).

First you'll need to request one or more nodes via Slurm using sbatch or srun.

Then, to run Dask on multiple cores on a single node, you can use any of the 'threads', 'processes', or 'distributed' schedulers, simply setting the number of workers (four in the examples below). Generally you'd want to use only as many workers as you have cores available.

import dask

# Threads scheduler
dask.config.set(scheduler='threads', num_workers = 4)

# Processes scheduler
import dask.multiprocessing
dask.config.set(scheduler='processes', num_workers = 4)  

# Distributed scheduler 
# Fine to use this on a single node and it provides some nice functionality
# If you experience issues with worker memory then try the processes scheduler
from dask.distributed import Client, LocalCluster
cluster = LocalCluster(n_workers = 4)
c = Client(cluster)

To run Dask across cores on multiple nodes, you can use dask-scheduler and dask-worker to start up the necessary Python processes (an alternative is to use dask-ssh. Here's an example that uses the Slurm srun command with dask-worker to make the connections to the nodes in the Slurm allocation:

export SCHED=$(hostname):8786
# Start scheduler process
dask-scheduler &
sleep 50    # might need even more time to ensure scheduler starts up fully
# Start worker processes
srun dask-worker tcp://${SCHED} --local-directory /tmp &
sleep 100   # might need even more time to make sure workers start up fully

The srun command here (which is of course being run within another sbatch or srun invocation that you ran to get access to the compute nodes) will run dask-worker on as many workers as the number of Slurm 'tasks', across the allocated nodes.

The use of --local-directory is because we've seen problems when the local directory for Dask is a Savio user home directory.

Then in Python, connect to the cluster via the scheduler.

import os, time, dask
from dask.distributed import Client
c = Client(address = os.getenv('SCHED'))

Using Ray

Ray provides a variety of tools for managing parallel computations. At its simplest, it allows you to parallelize independent computations across multiple cores on one or more machines. One nice feature relative to Dask is that Ray allows one to share data across all worker processes on a node, without multiple copies of the data, using the object store. At its more complicated, Ray provides tools to build distributed (across multiple cores or nodes) applications where different processes interact with each other (using the notion of 'actors').

First you'll need to request one or more nodes via Slurm using sbatch or srun.

Then, to run Ray on multiple cores on a single node, you can initialize Ray from within Python.

import ray
ray.init()
## alternatively, to specify a specific number of cores:
ray.init(num_cpus = 4)

To run Ray across cores on multiple nodes, see these Ray instructions for Slurm.

Using MPI with Python

You can use the mpi4py package to provide MPI functionality within your Python code. This allows individual Python processes to communicate amongst each other to carry out a computation. We won’t go into the details of writing mpi4py code, but will simply show how to set things up so an mpi4py-based job will run.

Here’s an example job script, illustrating how to load the needed modules and start the Python job:

#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Account:
#SBATCH --account=account_name
#
# Partition:
#SBATCH --partition=partition_name
#
# Total number of tasks
#SBATCH --ntasks=48
#
# Wall clock limit:
#SBATCH --time=00:30:00
#
## Command(s) to run (example):
module load gcc openmpi python
mpirun python mpiCode.py > mpiCode.pyout

Here’s a brief ‘hello, world’ example of using mpi4py. Note that this particular example requires that you load the numpy module in your job script, as shown above. Also note that this example does not demonstrate the key motivation for using mpi4py, which is to enable more complicated inter-process communication:

from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD

# simple print out Rank & Size
id = comm.Get_rank()
print("Of ", comm.Get_size() , " workers, I am number " , id, ".")

# function to be run many times
def f(id, n):
    np.random.seed(id)
    return np.mean(np.random.normal(0, 1, n))

n = 1000000
# execute on each worker process
result = f(id, n)

# gather results to single ‘main’ process and print output
output = comm.gather(result, root = 0)
if id == 0:
    print(output)