Using Python on Savio

Loading Python and Packages | Installing Packages | Parallel Processing | Using MPI | Using GPUs | Running Python Interactively

This document describes how to use Python, a programming language widely used in scientific computing, on the Savio high-performance computing cluster at the University of California, Berkeley.

Loading Python and accessing Python packages

To run Python 2 code, load Python 2.7.8 into your current software environment. To do so on Savio, at any shell prompt, enter:

module load python/2.7.8


Note: entering module load python without a version number specified, loads an older Python version, 2.6.6. That older version of Python also comes with fewer provided Python modules, so unless you need it, specifically, use module load python/2.7.8 instead.


To run Python 3 code, enter:

module load python/3.2.3

If you have loaded one of the Python 3 versions on the system, use the command python3 to run your code.


Note: the most extensively supported version of Python on Savio is 2.7.8, which offers the largest set of provided Python modules. The Python 3.2.3 distribution is somewhat minimal.


Once you have loaded Python, you can determine what Python packages are provided by the system by entering the following:

module avail

In the output from this command, look for the section pertaining to Python that lists a variety of provided Python packages. (This will typically be displayed near the end of that output.)

To use one or more of the provided packages, such as when running an interactive job on Savio's compute nodes via srun, be sure to load the relevant module(s) (e.g., numpy in this example) before running your Python code that requires those packages:

module load numpy

When writing a job script file to be run on Savio’s compute nodes via sbatch, it’s a recommended practice to include all of the needed module load commands within - and near the end of - that file, right before the commands that run your Python code; e.g.:

# Load Python and required Python modules
module load python/2.7.8
module load numpy
...

# Command(s) to run Python-based code go here
...

 

If you have loaded the Python 3.5.1 distribution, all provided packages are available without loading any additional modules.

Installing additional Python packages

You can also install additional Python packages, such as those available via the Python Package Index (PyPI), that are not already provided on the system. You’ll need to install them into your home directory, your scratch directory, or your group directory (if any), rather than into system directories.

First, after loading Python, enter module avail to check whether some or all of the packages you need might be already provided on Savio.

Second, to install additional packages, start by loading pip:

module load pip

Then, to install your packages in your home directory, for instance, use pip’s  --user option; e.g.:

pip install --user yourpackagenamegoeshere

Parallel processing in Python on Savio

While there are many packages for parallelization in Python, we’ll only cover IPython’s approach to parallelization in this section, in part because it integrates nicely with Savio’s job scheduler, SLURM. This is intended for situations where you can easily split up your computation into tasks that can be run independently (the computation in one task doesn’t depend in any way on the computation in other tasks, so no communication between tasks is needed). For more complicated kinds of parallelism, you may want to use MPI within Python via the mpi4py package, discussed briefly in the next section.

Multi-process parallelization on a single node

To use Python in parallel on a single node, we need to request resources on a node, start up IPython worker processes, and then run our Python code.

Here is an example job script, including the commands to start up the IPython worker processes, one for each core requested in the SLURM submission:

#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Partition:
#SBATCH --partition=partition_name
#
# Request one node:
#SBATCH --nodes=1
#
# Request cores (24, for example)
#SBATCH --ntasks-per-node=24
#
# Wall clock limit:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
module load python/2.7.8
module load scikit-learn
ipcluster start -n $SLURM_NTASKS
sleep 45 # wait until all engines have successfully started
ipython job.py > job.pyout

The Python code example below demonstrates how to make use of IPython's parallel functionality. This illustrates the use of a direct view (dispatching computational tasks in a simple way to the workers) or a load-balanced view (sequentially dispatching computational tasks as earlier computational tasks finish) to run code in parallel. It also shows how data structures can be broadcast to all the workers with push:

from ipyparallel import Client
c = Client()
c.ids

dview = c[:]
dview.block = True
dview.apply(lambda : "Hello, World")


# suppose I want to do leave-one-out cross-validation of a random forest statistical model
# we’ll define a function that fits to all but one observation and predicts for that observation
def looFit(index, Ylocal, Xlocal):
    rf = rfr(n_estimators=100)
    fitted = rf.fit(np.delete(Xlocal, index, axis = 0), np.delete(Ylocal, index))
    pred = rf.predict(np.array([Xlocal[index, :]]))
    return(pred[0])

dview.execute('from sklearn.ensemble import RandomForestRegressor as rfr')
dview.execute('import numpy as np')
# assume predictors are in 2-d array ‘X’ and outcomes in 1-d array ‘Y’
# we need to broadcast those data objects to the workers
mydict = dict(X = X, Y = Y, looFit = looFit)
dview.push(mydict)

# need a wrapper function because map() only operates on one argument
def wrapper(i):
    return(looFit(i, Y, X))


n = len(Y)
import time
time.time()
# run a parallel map, executing the wrapper function on indices 0,...,n-1
pred = lview.map(wrapper, range(n))
time.time()

print(pred[0:10])

Parallelization across multiple nodes

To use IPython parallelization across multiple nodes, you need to modify #SBATCH options in your submission script and the commands used to start up the worker processes at the end of that script, but your Python code can stay the same.

Here’s an example submission script, with the syntax for starting the workers:

#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Partition:
#SBATCH --partition=partition_name
#
# Total number of tasks
#SBATCH --ntasks=48
#
# Wall clock limit:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
module load python/2.7.8 scikit-learn
ipcontroller --ip='*' &
sleep 25
# next line will start as many ipengines as we have SLURM tasks because srun is a SLURM command
srun ipengine &  
sleep 45  # wait until all engines have successfully started
ipython job.py > job.pyout

Using MPI with Python

You can use the mpi4py package to provide MPI functionality within your Python code. This allows individual Python processes to communicate amongst each other to carry out a computation. We won’t go into the details of writing mpi4py code, but will simply show how to set things up so an mpi4py-based job will run.

Here’s an example job script, illustrating how to load the needed modules and start the Python job:

#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Partition:
#SBATCH --partition=partition_name
#
# Total number of tasks
#SBATCH --ntasks=48
#
# Wall clock limit:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
module load gcc openmpi python/2.7.8 mpi4py numpy
mpirun python mpiCode.py > mpiCode.pyout

Here’s a brief ‘hello, world’ example of using mpi4py. Note that this particular example requires that you load the numpy module in your job script, as shown above. Also note that this example does not demonstrate the key motivation for using mpi4py, which is to enable more complicated inter-process communication:

from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
# simple print out Rank & Size
id = comm.Get_rank()
print "Of ", comm.Get_size() , " workers, I am number " , id, "."
# function to be run many times
def f(id, n):
    np.random.seed(id)
    return np.mean(np.random.normal(0, 1, n))
 
n = 1000000
# execute on each worker process
result = f(id, n)
 
# gather results to single ‘master’ process and print output
output = comm.gather(result, root = 0)
if id == 0:
    print output

Running Python jobs on Savio’s GPU nodes with parallel computing code

To run Python jobs that contain parallel computing code on Savio's Graphics Processing Unit (GPU) nodes, you'll need to request one or more GPUs for its use by including the --gres=gpu:x flag (where the value of 'x' is 1, 2, 3, or 4, reflecting the number of GPUs requested), and also request two CPUs for every GPU requested, within the job script file you include in your sbatch command, or as an option in your srun command. For further details, please see the GPU example in the examples of job submissions with specific resource requirements.

You will then generally need to load the cuda module:

module load cuda

Perhaps the most common use of Python with GPUs is via machine learning and other software that moves computation onto the GPU but hides the details of the GPU use from the user. Packages such as TensorFlow and Caffe can operate in this way. For more details on using such software on the system, please ask the Savio user consultants for assistance.

In addition, you can directly use Python code to perform computations on one or more GPUs. To do this, you’ll generally need the PyCUDA package. (This package is not currently installed on Savio, so you’ll need to install it yourself, as described in “Installing additional Python packages,” above.)

Then in your Python code you’ll include commands that use the GPU. For example, here we’ll generate random numbers on the GPU:

import pycuda.autoinit
import pycuda.driver as drv
import pycuda.gpuarray as gpuarray
import pycuda.cumath as cumath
import numpy as np

n = np.int32(134217728)

start = drv.Event()
end = drv.Event()

x = np.random.normal(size = n)

start.record()
dev_x = gpuarray.to_gpu(x)
end.record()
end.synchronize()
print("Transfer to GPU time: %fs" %(start.time_till(end)*1e-3))

start.record()
dev_expx = cumath.exp(dev_x)
end.record()
end.synchronize()
print("GPU array calc time: %fs" %(start.time_till(end)*1e-3))

To check on the current usage (and hence availability) of each of the GPUs on your GPU node, you can use the nvidia-smi command from the Linux shell within an interactive session on that GPU node. Near the end of that command's output, the "Processes: GPU Memory" table will list the GPUs currently in use, if any. For example, in a scenario where GPUs 0 and 1 are in use on your GPU node, you'll see something like the following. (By implication from the output below, GPUs 2 and 3 are currently idle - not in use, and thus fully available - on this node.)

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 32699 C .../modules/langs/r/3.2.5/lib64/R/bin/exec/R 729MiB |
| 1 32710 C .../modules/langs/r/3.2.5/lib64/R/bin/exec/R 729MiB |
=============================================================================|

Running Python interactively (command line mode)

Step 1. Run an interactive shell

To use Python interactively on Savio's compute nodes, you can use the following example command (which uses the long form of each option to srun) to run an interactive bash shell as a job on a compute node. That, in turn, should then let you launch Python or IPython from that shell, on that compute node, and work interactively with it.

(Note: the following command is only an example, and you'll need to substitute your own values for some of the example values shown here; see below for more details.)

srun --pty --partition=savio --qos=savio_normal --account=account_name --time=00:30:00 bash -i

For more information on running interactive SLURM jobs on Savio, please see Running Your Jobs.

Step 2: Run Python/IPython from that shell

Once you're working on a compute node, your shell prompt will change to something like this (where 'n' followed by some number is the number of the compute node):

[myusername@n0033 ...]

At that shell prompt, you can then enter the following to load the Python software module:

module load python/2.7.8

We recommend use of IPython for interactive work with Python. To start IPython for command line use, simply enter:

IPython

IPython provides tab completion, easy access to help information, useful aliases, and lots more. Type ? at the IPython command line for more details.