Open OnDemand Overview

Overview¶

We now provide various interactive Apps through the browser-based Open OnDemand service available at https://ood.brc.berkeley.edu.

Apps/services include:

OOD Desktop
Jupyter notebooks
RStudio
MATLAB via a Desktop environment
VS Code
File browsing
Slurm job listing
Terminal/shell access (under the "Clusters" tab)

The Open OnDemand service is a relatively new service. Please let us know if you run into problems or have suggestions or feature requests.

Logging In¶

Visit https://ood.brc.berkeley.edu in your web browser.
Use your BRC username and PIN+one-time password (OTP)
- These are the same credentials you use to login to Savio via SSH.
- The username is only your BRC username and should not include the part after any @ sign.
- CORRECT username format: yourusername
- INCORRECT username format: yourusername@hpc.brc.berkeley.edu, yourusername@berkeley.edu

Service Unit Charges¶

Open OnDemand apps may launch Slurm jobs on your behalf when requested. Open OnDemand refers to these jobs as "interactive sessions." Since these are just Slurm jobs, service units are charged for interactive sessions the same way normal jobs are charged.

Interactive sessions running on nodes whose hostnames ending in .testbed0 do not cost service units. Nodes ending in .testbed0 are shared nodes that are provided for low-intensity jobs. These should be treated like login nodes (that is, no intensive computation is allowed).

Job time is counted for interactive sessions as the total time the job runs. The job starts running as soon as a node is allocated for the job. The interactive session may still be running even if you do not have it open in your web browser. You can view all currently running interactive sessions under My Interactive Sessions. When you are done, you may stop an interactive session by clicking "Delete" on the session.

There are several ways to monitor usage:

Since Open OnDemand submits jobs through Slurm, you can monitor usage as you would monitor your regular Slurm jobs.
View currently running (and recent) sessions launched by Open OnDemand under My Interactive Sessions.
View all currently running jobs under Jobs > Active Jobs.

Using Open OnDemand¶

Here are the services provided via Open OnDemand.

Files App¶

Access the Files App from the top menu bar under Files > Home Directory. Using the Files App, you can use your web browser to:

View files in the Savio filesystem.
Create and delete files and directories.
Upload and download files from the Savio filesystem to your computer.
We recommend using Globus for large file transfers.

Screenshot: OOD Files App

View Active Jobs¶

View and cancel active Slurm jobs from Jobs > Active Jobs. This includes jobs started via sbatch and srun as well as jobs started (implicitly) via Open OnDemand (as discussed above).

Screenshot: OOD Active Jobs

Shell Access¶

Open OnDemand allows Savio shell access from the top menu bar under Clusters > BRC Shell Access.

Screenshot: OOD Shell Access

Interactive Apps¶

Open OnDemand provides additional interactive apps. You can launch interactive apps from the Interactive Apps menu on the top menu bar. The available interactive apps include:

Desktop App (for working with GUI-based programs)
Jupyter Server (for working with Jupyter notebooks)
RStudio Server (for working in RStudio sessions)
Code Server (VS Code) (for code editing using Visual Studio Code)

Screenshot: OOD Interactive Apps

Desktop App¶

The OOD Desktop App allows you to run programs that require graphical user interfaces (GUIs) on Savio, replacing our previous Visualization node service.

Intended Usage

When possible, you should carry out your computation via the traditional command line plus SLURM functionality. OOD Desktop is intended for use for programs that require GUIs. Furthermore, if you need to use Jupyter notebooks, RStudio, VS Code, or the MATLAB GUI, we provide specialized interactive apps that you should use instead of the OOD Desktop App.

Before getting started, make sure you have access to the Savio cluster, as you will need your BRC username and one-time password to log in.

Starting the Desktop App¶

Connect to https://ood.brc.berkeley.edu

Just after logging in with your BRC username and one-time password (OTP), the initial OnDemand screen presents a welcome screen. Click the "Interactive Apps" pulldown and choose the "Desktop", either using a shared node (for exploration and debugging use) or computing via Slurm for computationally-intensive work.

Fill out the form presented to you and then press "Launch". (Note, as of this time, that the only partition that the Desktop app can be launched on when computing via Slurm is savio2_htc, as we assume that most GUI usage would be for programs using one or a small number of cores). After a moment, the Desktop session will be initialized and allow you to specify the image compression and quality options. If you are unhappy with the default values, you can relaunch the session from this page with different choices. Then, press "Launch Desktop" and the Desktop will open in a new tab.

Interacting with Files¶

Your Desktop session is running directly on Savio, and can interact with your files either through the command line as usual or through the file manager.

Screenshot: OOD Desktop - Files

To open a command line terminal, right click anywhere on the Desktop and select "Open Terminal Here".

Screenshot: OOD Desktop - Open Terminal

Jupyter Server¶

See the Jupyter documentation page for instructions on using Jupyter notebooks via Open OnDemand.

This service replaces the JupyterHub service that we formerly provided.

When using the "Jupyter Server - compute via Slurm in Slurm partitions" service units are charged based on job run time. The job may still be running if you close the window or log out. When you are done, shut down your Jupyter session by clicking "Delete" on the session under My Interactive Sessions. You can confirm that the interactive session has stopped by checking My Interactive Sessions.

Screenshot: OOD Jupyter Server

RStudio Server¶

The RStudio server allows you to use RStudio on Savio, either run as part of a Slurm batch job ("compute via Slurm using Slurm partitions") or (for non-intensive computations) on our standalone Open OnDemand server ("compute on shared OOD node"). Use of the standalone Open OnDemand server doesn't use any FCA service units or tie up a condo node, but you are limited to 8 GB memory and should only use a few cores.

Select the relevant RStudio Server under Interactive Apps
Provide the job specification you want for the RStudio server (for the non-Slurm option, you'll just have to provide a time limit).
Once RStudio is ready, click Connect to RStudio Server to access RStudio.

For the Slurm-based option, service units are charged based on job run time. The job may still be running if you close the window or log out. When you are done, shut down RStudio by clicking "Delete" on the session under My Interactive Sessions. You can confirm that the interactive session has stopped by checking My Interactive Sessions.

Installing R packages

For certain R packages that involve more than simple R code, installing packages from within RStudio will fail because certain environment variables do not get passed into the RStudio instance running within OOD. Instead, please start a command-line based R session in a terminal and install R packages there. Once installed, these R packages will be usable from RStudio.

Accessing environment variables

Various environment variables, in particular Slurm-related variables such as SLURM_CPUS_ON_NODE, are not available from within RStudio, either via Sys.getenv() or via system().

Code Server (VS Code)¶

Code Server allows you to use Visual Studio Code from your web browser to edit files. The Code Server runs on shared nodes (.testbed0), so you are not charged any service units for using this app.

Select Code Server from the Interactive Apps menu.
Specify the amount of time you would like the Code Server to run.
Once the Code Server is ready, click Connect to VS Code to access VS Code.

VS Code remote SSH

For security reasons, users can not use VS Code's remote SSH feature with Savio via the command line in a terminal. Instead, Savio users should access VS Code via OOD following the above instructions.

GitHub Copilot in VS Code¶

If you wish to use GitHub Copilot, you can do so by installing the extension using a VSIX file. You can not install GitHub Copilot in VS Code using the built-in marketplace on OOD at this time. These instructions will likely also apply to installing other applications for Visual Studio that are not available in the extension marketplace.

We recommend downloading the version of GitHub Copilot which was first released following the release of the version of VS Code you are using. The latest version will likely not be compatible. For example, for version 4.11 of VS Code on OOD we recommend installing version 1.78.9758 of GitHub Copilot. This version of GitHub Copilot can be downloaded here by navigating to the version history tab and then clicking download next to the correct version. This will download the VSIX file to your computer.

To install, you then need to upload the VSIX file to Savio. Next, start an VS Code session using OOD. Once you have opened the VS Code session, navigate to the extension sidebar and click on the three dots towards the top right of the sidebar (Views and More Actions...). Then select the Install from VSIX... option. You should use the prompt to select the VSIX file you uploaded. You should be prompted with a message that says Sign in to use GitHub Copilot. and a button which says Sign in to GitHub. Click on the button and follow the instructions to sign in to GitHub. If you are note immediately prompted with Sign in to use GitHub Copilot. in the bottom right please terminate and start a new VS Code session in OOD. At this point you should have successfully installed and set up GitHub Copilot in VS Code on OOD. Please note that you will have to sign in to GitHub Copilot each time you start a new VS Code session on OOD.

Running computations from within VS Code¶

It is also possible to access compute resources from a savio compute node for running code interactively while using VS Code. To do this, you first need to start a jupyter server on a compute node. This can be done via a interactive session or batch job which executes the following line of code.

jupyter notebook --no-browser --NotebookApp.allow_origin='*' --NotebookApp.ip='0.0.0.0'

If you need more information on starting an interactive session or submitting a job, please see our support page on submittting jobs.

You next need to locate the URL for the Jupyter server. This will print in your command line for an interactive session or will be located in the out file for a batch job. The URL should generally be of the form http://<node.partition>:<port>/?token=<token>. The URL which contains an IP address instead of a node name will not work. Copy this link before proceeding to the next step.

Next in VS Code, when prompted to select a kernel choose Existing Jupyter Server.... Then paste the link in the text box (or VS Code may do this automatically if given permission in your browser) and press Enter. At this point you may need to refresh the page in which VS Code is running due to a bug. Attempt again to select a kernel and again choose Existing Jupyter Server.... You should see a option which should be of the form Remote - node.partition, select this option, and then select from the list of Jupyter kernels the one which you would like to use for the notebook and it should initiate on the compute node. From this point, you should be able to use the notebook/interactive session in VS Code as normal and computation will be executed on the compute node.

Troubleshooting Open OnDemand¶

Common problems¶

If you have trouble logging into OOD (in particular if the login pop-up box keeps reappearing after you enter your username and password), you may need to make sure you have completely exited out of other OOD sessions. This could include closing browser tab(s)/window(s), clearing your browser cache and clearing relevant cookies. You might also try running OOD in an incognito window (or if using Google Chrome, in a new user profile) or in a different browser (such as Google Chrome, Safari, or Firefox). For instructions on clearing your browser cache and cookies for the different browsers, see the links below:

Problem: when I login to OOD, I immediately get the error: "can't find user for YOUR-USERNAME. Run 'nginx_stage --help' to see a full list of available command line options"¶

This error occurs if your account has not been correctly set up to use OOD. Please contact us.

Problem: my OOD apps on shared nodes (VS Code, Jupyter shared node app, RStudio shared node app) never start¶

In some situations when using apps that run outside of a Slurm job (VS Code and Jupyter Server sessions on the shared Jupyter node) the session starts to be created, but the user is never provided with the "Connect" button on the "Interactive Sessions" page and the session terminates in a minute or two. We've seen two causes for this.

One possibility is that this can occur because you have the base Conda environment initialized automatically whenever you login. This will generally be the case if you have run conda init and thereby modified your .bashrc file as discussed here. If so, you will generally see "(base)" appear at the beginning of your shell prompt, e.g. (base) [your_username@ln002 ~]$. The solution is to tell Conda not to enter the base environment when you login to Savio. Once in your shell, simply run

conda config --set auto_activate_base False

A second possibility is that you have installed a version of some Python package in your ~/.local directory (via pip install --user) that is masking the system version of the same package and interfering with starting your session. Please try moving aside your ~/.local directory (e.g., mv .local .local.save in a terminal) and trying again to see if that allows your session to start.

A third possibility is that you have something configured in your .bashrc file that is preventing the OOD session from starting. Try moving your .bashrc aside and using a plain version:

cp -rf ~/.bashrc ~/.bashrc.save

Then either remove items from your .bashrc or create a very simple .bashrc that looks like this:

# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
  . /etc/bashrc
fi

Problem: my OOD apps on shared nodes (VS Code, Jupyter shared node app, RStudio shared node app) report "permission denied (gssapi-keyex,gssapi-with-mic,keyboard-interactive)"¶

This can occur if permissions are incorrectly set on your home directory or your .ssh directory, preventing OOD from using SSH to connect to the shared node. Make sure that your home directory is not world-writeable. E.g., in the following example, the last "w" indicates that the user's home directory is world-writeable.

[bugs_bunny@ln002 users]$ ls -ld bugs_bunny
drwxrwxrwx 1 bugs_bunny ucb 1048576 Mar 29 16:06 bugs_bunny
[bugs_bunny@ln002 users]$ chmod o-w ~bugs_bunny
drwxrwxr-x 1 bugs_bunny ucb 1048576 Mar 29 16:06 bugs_bunny

You might also need to modify the permissions on your .ssh directory to look like this:

[bugs_bunny@ln002 ~]$ ls -ld .ssh
drwx------ 1 bugs_bunny ucb 348 Jan  6 15:58 .ssh
[bugs_bunny@ln002 ~]$ ls -l .ssh 
total 104
-rw-r--r-- 1 bugs_bunny ucb  1013 Jan  6 15:58 authorized_keys
-rw------- 1 bugs_bunny ucb   672 Aug  2  2019 cluster
-rw-r--r-- 1 bugs_bunny ucb   610 Aug  2  2019 cluster.pub
-rw------- 1 bugs_bunny ucb    98 Jan  6 15:58 config
-rw-r--r-- 1 bugs_bunny ucb 79896 Mar 16 13:22 known_hosts

Problem: my OOD apps report an SSH or connection error¶

If OOD reports an error related to SSH or a connection problem, the issue may be that your account is not properly configured to be able to connect to the Savio compute nodes from the Savio login nodes. To troubleshoot:

First, check that you can login to Savio by using SSH to connect to a login node.
Second, check that you can ssh between nodes on Savio. For example, try to ssh to the DTN from a terminal session on one of the Savio login nodes:
```
ssh dtn
```
Third, check that you can connect to the shared node that OOD's non-Slurm-based sessions use. Try to ssh to the shared node from a terminal session on one of the Savio login nodes::
```
ssh n0003.testbed0
```

If any of these tests do not work, please contact us.

Problem: my OOD Jupyter kernel or my RStudio's R session keeps dying¶

There can be various reasons a Jupyter kernel in an OOD Jupyter session or an RStudio R session may repeatedly die.

Various users have reported problems when using Jupyter notebooks with the Safari web browser. (If you try to open a Terminal under the OOD Clusters tab, you will probably seen a "websocket connection" error message.) Try using Chrome or another browser and see if the problem persists.
If your code uses more memory than available to your session, the Jupyter kernel or R session can die without telling you why. In particular this is likely to occur when using a Jupyter Server or RStudio Server that uses the shared Jupyter/OOD node (i.e., outside of a Slurm job) because these sessions are limited to 8 GB of memory. Try running your Jupyter notebook or RStudio in a Slurm-based session and see if the problem persists.

Problem: Slurm-based OOD sessions never start¶

In some situations when starting a session that is Slurm-based (e.g., Jupyter server sessions that use the Savio partitions, RStudio sessions, or MATLAB sessions), the session starts to be created, but the user is never provided with the "Connect" button on the "Interactive Sessions" page and the session terminates in a minute or two. This can occur because of problems with your Slurm configuration or with your account's access to Savio software modules.

First, check that you can run regular Savio jobs (outside of OOD) using either srun or sbatch.

Second, check that you can load Savio software modules from within a terminal on a Savio login node. For example check that you can run module load python without error and that your MODULEPATH environment variable looks like this:

echo $MODULEPATH
## /global/software/sl-7.x86_64/modfiles/langs:/global/software/sl-7.x86_64/modfiles/tools:/global/software/sl-7.x86_64/modfiles/apps:/global/home/groups/consultsw/sl-7.x86_64/modfiles
module purge
module load python
module list
##  1) python/3.7

If you have problems in either case, please contact us.

Problem: OOD sessions on the shared node never start and report being in a "bad state"¶

This may be caused by instability of the shared node. Please try the following step to delete your session and start over:

Delete the job using the Delete button to the right.
If that doesn't work, log out and back into the OOD web portal.
If that doesn't work, delete the folder under ~/ondemand/data/sys/dashboard/batch_connect/db corresponding to the app having the “bad state” problem.
If none of those steps work, please contact us.

Problem: OOD apps fail to start with an error message about rsync/file IO/disk quota¶

If you see this message when trying to start an OOD app,

rsync: close failed on "/global/home/users/smith/ondemand/data/sys/dashboard/batch_connect/sys/..." Disk quota exceeded (122)
rsync error: error in file IO (code 11) at ...

it indicates that you've exceeded your disk quota in your home directory. Please see this FAQ for information on how to reduce your usage.

General information for troubleshooting¶

Logs and scripts for each interactive session with Open OnDemand are stored in:

~/ondemand/data/sys/dashboard/batch_connect/sys

There are directories for each interactive app type within this directory. For example, to see the scripts and logs for an RStudio session, you might look at the files under:

~/ondemand/data/sys/dashboard/batch_connect/sys/brc_rstudio-compute/output/b5733507-a750-4bb9-8d4b-710618ce0de1

where b5733507-a750-4bb9-8d4b-710618ce0de1 corresponds to a specific session of an OOD app (the RStudio app in this case).

The BRC Open OnDemand interactive apps configuration is on GitHub. Additional information about Open OnDemand configuration is available on the Open OnDemand documentation.