Securing Research Data: Storage, compute environments, and beyond

February 1, 2017

This spring, a cross-departmental project team is forming to provide guidance and recommendations to the campus about existing and much-needed services for securing research data.

Researchers in a wide range of campus disciplines face an increasingly complex set of requirements to protect their research data, with often overlapping requirements coming from an increasing number of directions: from funders, data providers, state and federal agencies, and from the campus and university itself. This problem is further complicated by disciplinary differences across the campus. Social scientists have long worked with human subjects research data, but they are exploring larger data sets, new methodologies, and new research opportunities. Other fields are perhaps newer to data-driven research (e.g., the humanities) or are exploring human subjects data for the first time or in new and expanded ways (e.g., in computer science and engineering). Simultaneously, even data that do not involve human subjects are increasingly subject to data use agreements and other requirements from data providers. Not surprisingly, much of this regulatory aspect is driven by cybersecurity threats and the potential impact of a data breach, a phenomenon and trend that is rapidly and chaotically evolving. In addition, data is increasingly a valuable commodity in our data-driven world, and many parties are taking an interest in issues of licensing, sharing, and even selling data.

In the summer of 2016, a joint team from Research IT, the Library, the D-Lab, and IST evaluated UC Berkeley’s service offerings for securing research data. Using the methodology developed for the RAE (Research and Academic Engagement) Redo Benchmarking project, the team defined Securing Research Data as:

Services in support of research data sets that have restrictions set by campuses, state or federal law, or contractual agreements. Includes policy and guidance; infrastructure services for storage and computation; consulting and training; cross-campus coordination for managing restricted data; and support for coordinated development to build resources and manage relationships with data providers.

The team identified the following criteria for evaluating service offerings at UC Berkeley and at thirteen peer universities:

Discoverable and useful policy and guidance documentation
Secure research environments and infrastructure
Active training and consulting
Life-cycle process support and cross-campus coordination
Active development of restricted data resources
Following the comparative analysis, the team ranked UC Berkeley’s service offerings as a 3 (with 1 being the highest score and 4 the lowest). This three-level ranking was characterized as follows:

Accessible policy guidance with limited services, few/inflexible research environments for secure data, environments limited in terms of compute or storage, limited service support and training, limited coordination among campus stakeholders and service providers, ad-hoc development of discipline specific resources.

In short, researchers working with data that require protections of any kind most often must perform this activity with very little campus support.

Another campus-wide effort that demonstrated broad needs in this area was an audit of Research Data Management conducted by Audit and Advisory Services. The final report to campus, dated June 24, 2016, determined a) that the process for identifying required security protections was driven by many offices that do not necessarily work together in a coordinated way and b) that the scope of the problem and the campus-wide risk are very difficult to ascertain. That is, the campus does not know how much risk it faces, nor does it have the right policies, governance mechanisms, or even relationships with researchers that would facilitate a more accurate assessment.

Other campuses are not only doing more to help researchers, but they are also treating the ability to manage restricted data as a strategic differentiator. As data-intensive research including data science grows on campus, the importance of providing services that allow researchers to work with sensitive data of various kinds becomes strategically valuable and even critical. Indeed researchers at Berkeley are often at the vanguard of working with data that require protections of various kinds, and yet the services campus offers are not adequate. Campus data policies and enterprise services are geared towards administrative data, and most research systems have emphasized data sharing and collaborative access. As a result, researchers often struggle to develop solutions, taking valuable time and energy away from primary research.

Importantly, securing research data is not just a technical challenge. There are numerous policy issues, and the political, technological, and research opportunity landscapes are evolving rapidly. Currently experts on campus are distributed across different organizations, and while they are beginning to coordinate their work better, there is a growing demand for consultations as well as demand for improved service offerings for storing sensitive data, for performing computations in a secure environment, for transferring data with different levels of sensitivity, for collaboration among team members both on campus and beyond, and for preserving, protecting and sharing data sets for the future.

Given the complex set of issues and the different offices involved, Research IT is launching a six-month project with its partners in the D-Lab, Information Security & Policy (OCIO), the Office of the Vice Chancellor for Research, the Library, and other units within Information Services & Technology. The outcomes of this project will be:

Improved guidance for the research community describing available services and contacts for securing research data.
Additional findings regarding the level of demand for different services.
Recommendations to campus for new services based on discussions with other campuses that have more mature service offerings.
Expect to hear more communications on the Securing Research Data project throughout the remainder of this academic year. Please contact researchdata@berkeley.edu with your comments, questions, and suggestions.