Expanding Data Horizons with Redivis

student using Redivis on a laptop

by Sarah Bender, Communications Coordinator

CMU Libraries is committed to providing researchers with support tools across the research cycle. Since January 2023, the Libraries has offered the CMU community access to Redivis, a dynamic data collaboration platform designed to share datasets with collaborators and analyze and visualize data in real-time. Now, the Libraries is seeking more adopters to help further streamline the service model.

Redivis offers researchers many benefits at different stages of the research cycle. It allows users to upload datasets that are completely private, shared with specific individuals, or available to the public. They can browse available datasets, apply for access, perform analyses, and export data derivatives and final figures.

The platform is well-suited for hosting high-risk and sensitive data, utilizing HIPAA- and FedRAMP-certified Google Cloud infrastructure, and has undergone thorough and regular security audits. When uploading data, users can define access rules that appropriately limit who can work with the data.

“In our efforts to understand emerging needs, we heard from researchers all across CMU about the desire for easy-to-use, affordable, collaborative data lake workspaces,” said Associate Dean for Research and Innovation Brian Mathews. “In the Libraries, we aim to propel the creation, sharing, and preservation of knowledge, and Redivis has been a great partner enabling us to get involved earlier in the data lifecycle process.”

Users may already be familiar with KiltHub, CMU’s institutional repository used to archive finished research products for public access. Unlike KiltHub, Redivis works particularly well for datasets that need to be queried or are still in the collection and analysis phases of research, or that have specific access permissions. With special approval on a case-by-case basis, Redivis can even host datasets that are larger than 1TB.

Researchers don’t need programming skills to utilize the Redivis platform — they can take advantage of built-in tools to work with their data. However, Python, R, and Stata users can benefit from the integration of “notebook nodes'' within Redivis projects. Redivis notebooks, built on top of Jupyter notebooks, offer integrated environments for querying, connecting, and manipulating data tables, as well as for analyzing and visualizing data.

Redivis will soon support AI notebooks as well. Researchers will be able to attach GPUs to their compute environment so that they can train, test, and deploy custom models, while also integrating with various open-source foundational models to inform their research.

For the Tepper School of Business, Redivis provides a way to host commercial or licensed data via a user-friendly interface. In addition to faculty and researcher-generated datasets, CMU’s platform now houses and facilitates access to two historical datasets from the Data Axle vendor that have been licensed for CMU research projects. The two datasets, Historic Business Data and Residential Historic Data, contain annual times series data on households and businesses within the United States.

Librarian Ryan Splenda, who liaises with Tepper, has worked with the school to provide access for researchers. “Redivis is a good storage solution for large, subscription-based data sets because it provides safe and reliable mechanisms for controlling access to and downloading of these data sets for research purposes,” he said. “Additionally, researchers can opt to work with these large datasets within the Redivis platform which saves them time, effort, and the need to find storage capacity on their end.”

Another early adopter is CMU’s Manufacturing Futures Institute. Headquartered at Mill 19, the discovery workspace where Carnegie Mellon engineers and scientists conduct advanced manufacturing research, the MFI is working to drive innovation in additive manufacturing, materials discovery, product design, artificial intelligence, robotics, machine learning, and more.

“Redivis enabled us to demonstrate data storage without having to invest in backend programming and logistics that are not necessarily a scientific contribution. So, it made it easy for us to make progress on our project goals,” said Assistant Professor of Mechanical Engineering Sneha Prabha Narra, who works on additive manufacturing research with the MFI.

As the open science program director, Librarian Melanie Gainey also recognizes the platform’s potential applications for open science. “One of our goals is to make open science easier for our researchers at CMU, including making research reproducible. The Redivis platform makes it easy to apply the same analyses and transforms to multiple datasets and document metadata to make the research more reproducible and transparent, both within a research group and for anyone you might share the data with,” she said.

“When we talk about access to computing, it’s going to be more equitable if we can share and analyze data in the cloud,” added Open Science Postdoctoral Associate Kristen Scotti, who works directly with clients on campus to tailor the program for their unique data needs. “It can be very limiting when you put data out there and people in various areas don’t have the physical infrastructure to work with those data. But through platforms like Redivis, you can work with data in the cloud, which ultimately increases accessibility around the globe.”

This semester, the Libraries aims to expand use of Redivis on campus, encouraging researchers to explore and experiment with the platform. An upcoming workshop will teach users how to access, manipulate, and analyze datasets in Redivis using Python and SQL.

To learn more about using Redivis, visit the LibGuide or contact Kristen Scotti or Melanie Gainey.