As research becomes increasingly data-driven, the challenge of managing data efficiently and transparently is more pressing than ever. At UC Berkeley, a new campus pilot project is exploring the use of machine-actionable data management plans — a next-generation approach to data planning that enables automation, integration, and improved tracking across the research lifecycle. Led by Erin Foster and Anna Sackmann, the pilot is part of a national initiative coordinated by the California Digital Library and the Association of Research Libraries. This work focuses on evaluating how machine-actionable data management plans can improve data management practices, streamline compliance with funder requirements, and better connect research planning with real-world outputs.
Erin Foster, Service Lead for Berkeley’s Research Data Management (RDM) Program, began her career as a medical librarian. Working closely with health researchers who needed support handling research data, she gradually moved into her current hybrid role that straddles library services and research IT.
Anna Sackmann, Data Services Librarian, runs the Library Data Services Program. With a background as an engineering librarian, she became deeply involved in data management as federal funding agencies began requiring more structured data practices. Together, Foster and Sackmann bring deep expertise in supporting researchers—particularly those writing data management plans for National Institutes of Health (NIH) and National Science Foundation (NSF) grants.
Traditional data management plans are static documents—typically PDFs—that outline how researchers plan to manage and share data. But once submitted, they often go untouched. Machine-actionable data management plans are designed to change that. These digital, dynamic plans are structured in a way that can allow their information to interact with research tools and systems such as data repositories and publishing platforms.
“Instead of sitting in a drawer, machine-actionable data management plans can become living documents,” Foster explains. “They can track connections between data outputs and publications, making it easier to ensure compliance and trace the impact of research.”
The shift to machine-actionable data management plans is motivated by growing mandates from federal agencies like the NIH, NSF, NASA, and the Department of Energy, all of which increasingly require detailed data-sharing and access plans. The goal is to enhance transparency, reproducibility, and public access to federally funded research.
In early 2023, the California Digital Library and the Association of Research Libraries put out a call for institutions to propose machine-actionable data management plans for pilot projects. UC Berkeley was invited to join the extended
cohort thanks to the strength of its proposal. image from DMP Tool Blog
The Berkeley team set out to analyze the data management plans being generated on campus—particularly to understand what researchers were saying about their data and where gaps existed. They collected and coded 10 data management plans from various disciplines, identifying patterns in how researchers described their data types, storage methods, and sharing strategies.
“One thing we found was that while people often mention where data will be stored, they rarely say how much data they plan to store,” Sackmann notes. “That kind of detail matters when designing infrastructure or ensuring long-term access.”
This hands-on work also fed back into the DMPTool—a platform maintained by CDL that provides templates and guidance for writing data management plans. As a result of the pilot, DMPTool can now begin connecting plans to research outputs, helping track compliance with federal open-access mandates.
The UC Berkeley pilot is part of a larger ecosystem of experimentation within the UC system. Sister campuses like UC Santa Barbara, UC Riverside, and UC San Diego are also participating, each running their own projects. While there's no formal UC-wide integration yet, informal collaboration is strong.
Outside the UC system, institutions like Arizona State University have gone a step further—developing AI-powered chatbots to assist researchers in generating data management plans automatically.
Despite the promise of machine-actionable data management plans, Foster and Sackmann acknowledge barriers. There’s no dedicated funding for implementing machine-actionable data management plans and much of the requirements related to data management and sharing can be perceived as an “unfunded mandate.” Researchers are already burdened with many compliance tasks throughout the course of research, so adding more steps can be met with resistance.
Yet they remain hopeful. “The more we can integrate machine-actionable data management plans into existing workflows, the easier it will be for researchers to adopt them,” Sackmann says. “It’s not about adding work—it’s about making that work more meaningful.”
They envision a future where institutions—not just federal agencies—take ownership of data management as a priority. Ideally, in 5 to 10 years, data management plans will be fully integrated into the research lifecycle—from grant proposal to final publication.
“The dream,” Foster says, “is being able to clearly trace the impact of Berkeley research by connecting planning with outcomes. That helps researchers, funders, and the public all see the value of the work we’re doing.”
This year, the RDM Program celebrates its 10 year anniversary and this blog post is one of a series that focuses on support offered by the RDM Program as well as services and/or technologies provided by Research IT and the Library to enable research at UC Berkeley. To learn more about the RDM Program, its mission, and areas of support, visit https://researchdata.berkeley.edu or email researchdata@berkeley.edu