How Data and Code Support Service Makes Data Management More Accessible

Alfredo González-Espinoza

At the University Libraries, the Data & Code Support service offers specialized, one-on-one assistance for students, faculty and staff using open source coding languages and data science tools in their research or projects. Whether you're wrangling a massive dataset or learning to code for the first time, CMU Libraries’ Data & Code Support team offers personalized, discipline-specific guidance to empower your research and accelerate discovery.

As part of his research into energy affordability for U.S. households, College of Engineering Ph.D. candidate Kester Wade needed to to learn how to clean and analyze large amounts of energy meter data. In the summer of 2024, he reached out to the Data and Code Support team for help with this data, and was paired with Research Data Services Librarian Alfredo González-Espinoza, one of the Data and Code consultants.

“I believe the Data & Code Support Service is incredibly beneficial and important,” Wade said. “My current research involves computationally intense data analysis, but there’s no centralized support for data quality or management within my lab group or department. Since discovering this service last summer, it has become an essential resource I rely on regularly.”


Goal
  • To clean and analyze energy meter data — which came in the form of CSV files that contained hourly electricity and gas consumption measurements from 125,000 households over a 10-year period in Tallahassee, FL.
  • Using this data, to examine whether energy efficiency programs have a positive impact on household health and well-being.
How We Helped
  • First, González-Espinoza took time to understand Wade’s research, the type of data he was working with, and the preliminary progress Wade had made cleaning and transforming some of the files.
  • He then introduced Wade to the AI-tools offered by CMU and explained how he uses AI-powered chatbots in his own research, also sharing valuable introductory resources on Python packages such as scikit-learn and pandas. Through this discussion, Wade learned frameworks he could apply to his own work.
  • González-Espinoza also guided Wade through the process of creating a data management plan, which allowed him to catalog large datasets. With this training, Wade was better equipped to overcome code and data management challenges as he encountered them during the project.
Results
  • The initial results of Wade’s study are available through SSRN, an open access research platform used to share early-stage research, evolve ideas, measure results, and connect scholars around the world.
  • Since discovering the Data and Code Support service at the Libraries, Wade has returned for consultations as needed during his research.