Epidemiologists develop a computational tool to optimize study design

Screenshot, StudySimulator.com

Professor Jennifer Ahern of the School of Public Health, and two staff researchers, Masters of Public Health (MPH) Ellie Matthay and Scott Zimmerman, are developing a software tool called the Study Simulator, a web-based simulation generator that will allow investigators to identify optimal study designs and methods of analysis for research inquiries in Public Health and Epidemiology. The group hopes the simulator will eventually be used in other observational research domains (such as public policy, economics, and sociology) as well. The Study Simulator, according to Ahern, will be a user-friendly option for researchers to obtain input on their study design, equivalent to advice a biostatistician might provide. Study designs shaped by use of the web application, she explains, are most likely to yield the best estimate of the true effect on a population of the policy, program, or intervention under study. This, says Ahern, will help to avoid fruitless expenditure of researcher time and project funding.

Ahern’s team has been collaborating with Berkeley Research Computing’s (BRC) consultant staff, fine-tuning the architecture of the simulator to prepare it for development and testing on several vendor-provisioned cloud infrastructures, as well as on Savio, BRC’s shared High Performance Computing (HPC) cluster. Ahern, Matthay, and Zimmerman have found the BRC consulting team’s advice to be essential to the project’s progress. Ahern says, “these services are making researchers feel like their projects are supported, and will be able to thrive. The Study Simulator wouldn’t be happening without BRC.” 

What is the Study Simulator?

The Study Simulator evolved out of Professor Ahern’s experience designing studies and working with biostatisticians. As she explains it, “Every time I would be thinking about a project, I would have to consider what the best study design would be. But how do you pick between the various ways to assess a public health problem? You go talk to a biostatistician. Inevitably the answer would be, ‘let’s do a simulation, and decide that way.’” It occurred to Ahern that “unless you talk to a biostatistician this isn’t how you would make the decision.” Instead, a researcher would use “broad and general guidelines” that could point her in the wrong direction. In considering the host of reasons a researcher might not (or might not be able to) consult with a biostatistician, Ahern realized that “if we could create some user friendly software that would let people provide a reasonable set of inputs that they’d know about the question they’re trying to study, and could take those inputs and generate simulated data that looked something like the world of the study, and apply designs/different analysis approaches and then see how each one performs, the researcher could then make a choice about what to do based on these simulations.”

Ellie Matthay provided an example consistent with Ahern’s interest in social epidemiology. “One question we were originally interested in [applied to] Oakland’s violence prevention initiative, from a few years ago: was it effective in reducing violence related injuries and deaths, or other public health issues related to violence?”

Matthay explained that a researcher would begin by “collecting the data out there related to those questions. Then they would consider the possible sources of error or bias in assembling this data.” Next, considering the systems and events that could be measured in relation to the violence prevention efforts, the researcher could “parameterize that, then go to the simulator and tell it you’re interested in these particular outcomes, and they’re distributed in this way, and my exposure was this particular intervention. [...] You would input important variables that you think might confound your study or cause bias in your study, then you’d check a whole bunch of different study designs and analyses, hit run and then wait a while.” Importantly, Matthay noted, pre-specifying the approaches to be tested, and selecting among them using objective criteria of predictive performance, increases the rigor of a study’s conclusions.

“Ideally what would come back,” Matthay continued, “is a ranked list of the approaches that would give you the most accurate answer to your question [...] then you can go implement that study,” taking into account feasibility factors that might include costs and available sample sizes.

The quality of public health and epidemiological research can make a material difference to public policy. For example, Ahern suggested, regarding issues related to the Mental Health Services Act, a policy the team is in the process of investigating, “if we could understand something about how that is or isn’t improving suicide or homelessness [outcomes], the problems it was intended to address, that’s helpful for other states trying to decide what to do about these problems and maybe make policy implementations of their own.”

Computational Support 

Scott Zimmerman, who discussed some the project’s initial technical issues at a Research IT Reading Group in July of last year, has taken the lead on the development of the Study Simulator software. Zimmerman worked closely with BRC consultants to fine-tune the simulator’s software architecture so that it could be run on Amazon AWS and Microsoft Azure servers in the commercial cloud, to take advantage of resource grants those vendors awarded to the Study Simulator team; as well as on Savio, to leverage BRC’s Faculty Computing Allowance, granted to Professor Ahern, once the vendor-granted allocations are exhausted.

As Ahern describes support provided by BRC consultants, “it’s totally invaluable to have BRC because Scott is the one person on our team who actually knows how to program at the level of creating software. If he gets stuck on something, there’s no one else [...] to help him. He needs to talk to someone more knowledgeable, with IT infrastructure / architecture experience. [With BRC to consult], it feels like he has backup for issues that -- without backup -- would stall the project in a long term way.”

Berkeley Research Computing continues to work with Professor Ahern and her research staff, and is eager to contribute computational expertise and resources to research projects across the full spectrum of UC Berkeley’s academic domains. If you think we might be able to help your group achieve its ambitions, please contact us via our web site or e-mail research-it@berkeley.edu.