Theoretical chemists use Savio to build molecular computational models

Eric Neuscamman

“Well, I’m not safe near real chemicals,” says UC Berkeley Professor Eric Neuscamman, a theoretical chemist working to develop increasingly accurate and computationally cost-effective methods of modeling electron behavior in molecules. To inform chemists’ experimentation, theoretical chemists use supercomputers to solve complex mathematical equations, including the Schrödinger equation, that yield predictive models of what molecules will do in chemical reactions. A member of Berkeley Research Computing’s (BRC) User Advisory Board, Neuscamman puts great value on the contribution of Savio, Berkeley’s shared High Performance Computing (HPC) cluster, to his research. In his words, “I can add to the chorus of voices that say we, without Savio, would not have a lab. It’s like our wet lab. We would have no lab without a computer to run our calculations on.” 

Quantum Chemistry, Inspiration, and Approximation

In quantum chemistry, molecular models are centrally concerned with electron activity -- accounting for where the electrons are in the atoms that compose the molecule, and how those electrons are interacting with each other. Modeling electron activity in molecules is, however, complex and computationally costly. Neuscamman explained, for example, that to model beta carotene, a mid-sized pigment molecule, its molecular space is initially discretized, or broken down into boxes. The modeler then determines how many ways the molecule’s electrons can be arranged in these boxes and assigns a probability to each arrangement. For beta carotene, the number of ways the electrons in the molecule can be arranged is “greater than the number of atoms in the universe.” And because electrons operate under the laws of quantum mechanics, as Neuscamman emphasizes, “each of these arrangements exists simultaneously,” suggesting that to build an “exact model of the electron activity, no arrangement can be ignored”. Classical computation over this much data, and at this level of complexity, is more than ‘expensive’ -- it’s physically impossible.
Contributions of molecular geometry optimization software to experimental chemistry over the past two decades allow chemists to visualize how atoms are arranged in molecules. “Geometry optimizations fill a great need,” according to Neuscamman, “because it’s really hard to think about a twelve atom molecule in three-dimensional space in your head. It’s enormously helpful if a computer can answer the question of whether the molecular geometry” is compatible with an experimental technique. Analogously, Neuscamman and other theorists work to provide experimental chemists with an “efficient and reliable predictive infrastructure for other molecular properties” based on knowledge derived from theoretical chemistry. He asks, “What if we could help chemists predict, with precision and reliability, what color a molecule is, it’s acidity, whether it’s poisonous or not, at what temperature it burns and melts? [...] The contributions of theoretical chemists to the future are these physical models and the algorithms that support them”.

But how do you model something that can’t be modeled with current computing technology? The answer: approximation. To contain the complexity of his calculations, thereby reducing computing costs, Neuscamman’s models restrict how electrons “see” each other, that is, how they repel each other as subatomic particles that bear a common negative electrical charge. He says, “we could consider that electrons don’t interact with each other repulsively at all, but while this would help lower computational costs, it would also ruin the physics. What is more useful is to assume we need only consider one electron at a time and how that electron is repelled by an average charge cloud of all the other electrons that it no longer sees individually.”

While this method of approximation offers computational efficiency, the resulting models are inexact. For example, as Neuscamman explains, the model might predict “the light emitted from a chemical process is visible, or not, dependent on the approximation error.” Or, the model could make incorrect predictions “in situations where it is important that electrons be able to see each other directly, as for example when breaking chemical bonds.” These limitations point to the fundamental question of Neuscamman’s research: how can approximate molecular models be optimized for accuracy without wildly increasing computing costs? 

Neuscamman has shown that by introducing just one extra electron into his model, a surprising degree of accuracy can be recovered. In this model, he considers electron pairs that interact with each other in specific ways, but interact with the rest of the electrons in the molecule as an average charge cloud. This modeling technique has proven effective for the complicated chemical scenario of multiple bond association and dissociation, or for the formation and cleavage, respectively, of strong chemical bonds that involve more than two electrons. Although there may be no exact quantum chemical theory that divulges electron activity in tricky multiple bond scenarios, such as the chemical process of oxygen production in photosynthesis, Neuscamman is encouraged “that relatively simple, approximate models can still give practical information about large, complicated systems, and lower the cost of computing”. 

Computing Support

The hydrogen atom, composed of a single electron and proton, is one of the most complex structures for which calculations can be done with pen and ink. As Neuscamman emphasizes, “it’s been a while since paper” was an effective calculation tool for theoretical chemists. In his generation of quantum chemistry “there are essentially no models without computation.” Neuscamman’s group has tapped into a variety of computing resources to do their research, including those supported by NERSC, a national scientific computing facility organized as a division of the Lawrence Berkeley National Laboratory (LBNL); and Savio, the campus shared HPC cluster. While he has published two papers, with a third under review, for which Savio was the only computation used, the campus cluster is often a stepping stone towards larger national resources, giving Neuscamman readily accessible, state-of-the-art hardware on which to debug his code and test models of increasing size. Savio’s intermediate function allows Neuscamman to make the case that his code is ready to be scaled, and, as he puts it, avoid turning NERSC use cycles into “waste cycles”. 

Neuscamman’s research life cycle integrates well with Savio’s condo model, which is designed to optimize available compute resources. As a condo contributor, he can use his own nodes for weeks, uninterrupted, and when his group needs two or three times their normal compute resources, they can use the cluster's idle compute cycles, including those owned by another condo contributor who may be in a different stage of their research. He says, “This week of debugging we had no jobs to run. But next week, while another group is debugging, we can run at 200%. This is spectacular for us.”

Another issue of waste in Neuscamman’s experience of research computing is resolved by the HPC administration offered at no cost to condo contributors. He explains, “A chemist, a system administrator, and an IT specialist are not the same. My group has no idea what we’re doing on a hardware level, and so [having the BRC Program provide] HPC administration gives us enormous freedom.” Primed by his experience as a graduate student with inconsistently-maintained lab clusters, Neuscamman is certain that productivity increases when you’re “doing the thing you’re trained for.” Savio’s professional administration -- which he opines is, in his experience as an HPC user, superseded only by the HPC management team at Lawrence Livermore National Laboratory (LLNL), a leader in supercomputing for decades -- is also an important factor in competitive recruitment of both faculty and graduate students. Neuscamman offered the example of a rising graduate student choosing between two schools: at school one, where the computing infrastructure is well managed, “it’s easy to have fun with science,” and at school two, a hypothetical “computing basket case,” as Neuscamman says in reference to lab-maintained clusters, “it’s really hard and annoying to be a graduate student” because “to get your work done, you have to fight this beast of a computer.” Neuscamman continues, “I don’t want to lose good graduate students because of that. As awesome as information technology is to study, it’s not why chemists go to graduate school”.