Sociologist uses Docker to switch workflows with ease

July 26, 2017

As both a visiting doctoral candidate at UCLA’s Department of Sociology and a lecturer in the School of Information’s Master of Information and Data Science program (MIDS), Brooks Ambrose needed an efficient way to organize and switch between his different workflows throughout the day. Research IT’s cloud computing support service and communities on campus have helped him solve that problem by introducing Docker environments into his workflow and by providing an infrastructure from which he can get technical support.

Tracking the Rise and Fall of Academic Disciplines

Ambrose focuses his research on learning how scholarship in the American social sciences evolved over the 20th century. By applying network community detection and survival analysis to archives of full text as well as citations in journal papers, he can map cultural networks that measure the boundaries of ideas in time based on how cohesively scholars converge around important works of scholarship. By mapping these networks, Ambrose can locate the rise, fall, and rebirth of schools of thought alongside the transformations of higher education over the past century.

His research, Ambrose explains, measures higher education’s rapid evolution during the 20th century. As time passed, existing research fields were transformed as the ideas of established scholars were championed, supplanted, or ignored by those of their successors. Thus, as the new generation of intellectuals began implementing and normalizing their own methods and ideas, new schools of thought were born and others faded away within higher education. Ambrose aims to not only track the changes made during the turnover, but also to find the abandoned fields that could have remained productive and valuable. In turn, these observations could provide historical guidance for the conservation of forgotten fields of study and argue for the value of current research fields that are presently undergoing an evolution of their own.

Finding Support in a Large University

In general, one of Ambrose’s concerns as a researcher and instructor was finding a non-competitive, collaborative community that could provide technical support. Most interaction within and between departments, in Ambrose’s experience, leans more towards “show-and-tell” sessions rather than opportunities to collaborate and gain technical skills, which doesn’t address his needs. As someone who is largely a self-taught programmer and Linux user, Ambrose finds that support from Research IT consultants and campus communities like the Berkeley D-Lab working groups, as well as Slack channels over which these communities communicate between meetings, is helpful in ways that complement the professional emphasis on reporting findings. For example, when he encounters problems in the R programming language for which he is unable to find good, clear solutions on the internet, having a known and trusted community to consult gets him back on track quickly and efficiently.

With his work day divided between research, instruction, and co-coordinating the D-Lab Computational Text Analysis Working Group, Ambrose required a way to compartmentalize his many workflows and make his 2011 MacBook Air serviceable for a variety of needs. In fact, Ambrose estimates he used to spend 20-30% of his time transitioning between configurations needed to function in these three different contexts throughout each day, bogging down his productivity levels and straining the capacity of his aging laptop.

Using Docker to Transform Workflows

Aaron Culich of Research IT’s cloud computing support staff introduced Ambrose to Docker environments, which, he says, “fundamentally changed the way I work.” As a lightweight, efficient container, Docker enables Ambrose to have a separate image and software stack for his PhD research, his MIDS instruction, and other workflows he requires, allowing him to switch seamlessly between work modes throughout his professional and personal life. Typically, Brooks runs his Docker images in the cloud, which he says gives him the flexibility to work from a more powerful remote server, bypassing the need for an expensive laptop upgrade.  “I didn’t feel as tied down anymore,” he explains. “[Now,] I prefer to work from a remote server.” This change has allowed Ambrose to store his work safely using backups on the cloud, so that if something did happen to his laptop, he could easily recover.

Recreating his PhD software stack as a Docker image took about two weeks’ effort over the course of 4-5 months, but Ambrose emphasizes that familiarizing himself with the new environment and workflow was well worth the time. Now he has an organized, stable, and reproducible “starting point” for research work that he can run on his laptop or a variety of cloud/remote servers, or share with others, much like the portability that a repository like GitHub provides for code. “The workflow is more complicated, but it’s not harder, and the complications are worth it,” Ambrose says. “To be able to take for granted that I could just sit down and actually get to work [instead of take time to change my environment manually…meant that] I could get that time back.”

Moving forward, Ambrose is working out how to most effectively teach these methods and technologies to his students so they can take full advantage of his Docker-based workflow’s benefits. His forays into curriculum design also leverage consulting tips he picked up from Research IT staff, like using asciinema to record command line tutorials (see screenshot/link below).  “Our students all have day jobs, and these time efficiencies that can be translated into saving time for them are really critical,” he says. “I think that if some of the tech training that I found in the consulting environment ends up becoming part of the mainstream curriculum in social science departments...there could be a [drastic] change in productivity.”

If you’re interested in finding a supportive community here at Berkeley for data-intensive social science, joining a D-Lab Working Group is a great way to gain new skills regardless of discipline or department. The D-Lab offers self-organized working groups spanning a wide variety of interests from qualitative and geospatial methods to machine learning. If you’d like to know more about getting cloud computing support or advice on what kind of services might fit your needs, feel free to email us at research-it@berkeley.edu