Gallant Lab neuroscientists map the human brain using OpenStack and Amazon S3

October 3, 2017

Erica Chen

How might matter give rise to subjective experience? This question helps drive the Gallant Lab at UC Berkeley to find explanations to the mysteries of neuroscience. The Gallant Lab focuses much of its research on functional cartography of the brain, mapping areas of the brain that are involved in cognitive or motor functions.

To develop cortical maps, the Gallant Lab uses an inductive scientific approach called system identification, “a systematic approach for discovering the computational principles of an unknown system such as the brain.” Researchers then collect data through functional magnetic resonance imaging (fMRI) as subjects perform tasks like watching movies or listening to natural sounds. Afterwards, the computational models are fitted to the brain data using statistical methods like Bayesian analysis and machine learning. These experiments generate vast amounts of data that must be stored — a problem that postdoc James Gao and graduate student Anwar Nunez-Elizalde have solved with assistance from Research IT’s Cloud Computing Support program.

Finding Solutions

In addition to doing research, Gao spent a substantial portion of his time several years ago managing the Gallant Lab’s traditional monolithic cluster system, which used workstation models, and RAID arrays for storage. However, a constant stream of large data sets required repeated expansion of the lab’s RAID storage capacity, which became unsustainable for the lab and for Gao to maintain. “Weirdly enough, storage space is our biggest limitation [rather than CPU],” Gao explained. “We kept on buying these bigger and bigger RAID units, and we kept on adding to our list of automounts, and it started getting completely out of hand.” When the lab received a sizeable hardware grant in August 2015, Gao and Nunez-Elizalde began searching for a better alternative to their original setup.

After getting connected to Berkeley Research Computing (BRC) consultant Aaron Culich, Gao and Nunez-Elizalde were able to lay out their requirements and get introduced to a variety of resources that could help them narrow down their search. For example, though they considered using the then-nascent Savio cluster, the Gallant Lab required ample storage for large data sets, which Savio was unable to accommodate at the time (BRC began offering a condo storage service much later). In the end, Gao and Nunez-Elizalde found their solution by building a local OpenStack cluster and storing their array data on local hardware using Ceph, a storage management system from Red Hat.

“When the system started getting really out of hand, I wanted to virtualize everything,” Gao said. “I wanted to make sure that everything was OpenStack, so if I needed to upgrade new software, I would just deploy a new image.” The virtualized cluster drastically reduced the amount of management needed, particularly because the software environments could be easily updated to meet the lab’s evolving requirements. Because the cluster could be configured to suit each job’s current computational needs, the lab could segment their computational power more efficiently than before. The setup allowed the Gallant Lab to not only be flexible with their computational resources, but also made collaboration more convenient. As many academic fields move towards collaboration, sharing data and resources on the internet is crucial to helping researchers work together. Gao adds, “A very simple advantage of the virtualized system is that we can very quickly spin up a server that’s connected to the wider internet that lets you share data, share a Python instance, share anything.”

However, in order to make their OpenStack + Ceph setup work, Gao and Nunez-Elizalde needed a way for OpenStack VMs to securely access data stored on local hardware or Amazon S3 via Ceph. The Amazon storage option -- facilitated by Ceph’s S3-compatible API -- will allow the Gallant Lab to store data in the Amazon cloud with a simple configuration change if local capacity is exhausted. They found no suitable solution, therefore Gao and Nunez-Elizalde created cottoncandy, a Python scientific library that can store and download array data via an Amazon S3 compatible api from the lab’s local cluster. While cottoncandy was still in development, Gao and Nunez-Elizalde used AWS credits provided by BRC to run continuous integration tests on the evolving bridge for data transport between the Gallant Lab and Amazon S3. “We had to have access to an S3 API [to test cottoncandy], but we didn’t want to have it open to the public,” Nunez-Elizalde said. “Aaron was nice enough to set up special [AWS credits for the continuous integration tests] … and now it’s all automatic.”

Gaining Access to Resources

The Gallant Lab’s virtualized cluster has been able to accommodate the lab’s needs for increased storage space, flexibility, and easier management. However, as cloud computing resources are constantly being updated, it can be difficult to keep track of what options are available at any point in time. “Aaron was very forthcoming with the amount of services that are available, and that has been very beneficial,” Gao said. Both Gao and Nunez-Elizalde, for example, were able to gain access to emerging XSEDE cloud resources that helped to extend and accelerate aspects of their work. Outside of the Gallant Lab, Gao is also employed by the Isacoff Lab, which used the Savio cluster extensively but needed a virtualized environment with a public IP address. After learning that Jetstream offers OpenStack capabilities, Gao was able to use Aaron’s XSEDE Campus Champion allocation to set up and run the environment for the Isacoff Lab. “Having [Aaron] help me through all of the [XSEDE] signup process[es] has been very beneficial in just getting clued in on what resources are available,” Gao added.

If you are a researcher like Gao and Nunez-Elizalde, there are a number of ways you can get connected to computing resources available at Berkeley and beyond. The D-Lab and Research IT offer a Cloud Computing Working Group that facilitates talks and trainings on cloud computing services from AWS to XSEDE. Research IT’s consultants are also happy to help develop a tailored set of options that fit your project’s needs: to get started, just send us an email at research-it@berkeley.edu