BRC supports cloud computing for Public Health Big Data course

Biostatistics Professor Lexin Li

In Spring 2015 Biostatistics Professor Lexin Li decided to use the Amazon Web Services (AWS) cloud computing platform to teach Big Data concepts and practices to graduate students in the School of Public Health. Berkeley Research Computing (BRC) Consultant Aaron Culich, Research IT’s lead Cloud Services Architect, provided Professor Li with support for planning, acquiring for cloud resource grants, and deploying cloud resources for instructional use.

Professor Li’s Public Health Seminar (PH290) introduced key Big Data approaches, tools, and techniques, including: data mining; machine learning methods; analysis of characteristically messy “real-world” data; cloud computing; and parallel processing using the Apache Spark software developed by the AMPLab.

The course integrated a series of BRC-led workshops into the curriculum, including: installing the Berkeley Common Environment (BCE) virtual machine on laptops; an introduction to using AWS and launching jobs from BCE; and an introduction to running Spark on AWS. The Spark  workshop was led by Chris Paciorek, who serves as a BRC Consultant in addition to his longtime roles as staff with the Statistical Computing Facility (SCF), and as a researcher and lecturer in the Department of Statistics.

Spring 2015 cloud computing support for PH290 was based on BRC’s earlier collaborations with the SCF involving the AWS platform, in support of data science courses in the Department of Statistics: Introduction to Statistical Computing (Stat243) taught by Chris Paciorek in Fall 2014; as well as Reproducible and Collaborative Data Science (Stat157) co-taught in Fall 2013 by Department Chair and Prof. Philip Stark and Aaron Culich, and taught in Fall 2014 by Yannet Interian.

Modern methods of working with large, real world data sets impose computational resource demands that often outstrip the capabilities of student laptops or lab workstations. Berkeley instructors have begun to turn to cloud computing both to meet this increased computational demand, and to provide students with hands-on experience in using tools and methods built on cloud provisioning models that are increasingly being adopted in research and industry.

If you are considering using cloud computing for the courses that you teach, BRC Consultants may be able to help. If you are already using cloud computing for these courses, we'd like to hear your stories and learn about what works and what doesn't.