Data management 101

This guide will provide general information about data management, including an overview of Data Management Plans (DMPs), file naming conventions, documentation, security, backup, publication, and preservation. We have included the CMU data life cycle to put the pieces in context.

The CMU Libraries provides research data management resources for guidance on data management, planning, and sharing for researchers, faculty, and students.

Collaborate, Store, Secure, Back-up, Version Design -Project conception -DMP development Plan -Documentation -File Naming -Metadata Collect Analyze Disseminate Publish Preserve -Trusted repositories -Policies -Publish datasets Re-Use -Citation Collaborate, Store, Secure, Back-up, Version Design -Project conception -DMP development Plan -Documentation -File Naming -Metadata Collect Analyze Disseminate Publish Preserve -Trusted repositories -Policies -Publish datasets Re-Use -Citation

Data may take many forms including:

  • Observational: data that is captured in real-time such as sensor measurements or survey responses
  • Experimental: data collected from lab equipment such as gene sequences or magnetic field readings
  • Simulation: data generated from test models such as climate or economic models
  • Derived or Compiled: data that is aggregated or analyzed such as data mining or compiled databases
  • Reference: data that is collected, reviewed, and published such as databanks or data portals

Data management addresses the completed life cycle of research output: from data creation through organization, accessibility, distribution, and archiving.

A data management horror story by Karen Hanson, Alisa Surkis, and Karen Yacobucci. This is what shouldn't happen when a researcher makes a data sharing request! Topics include storage, documentation, and file formats.

Record amounts of research data are being generated on a daily basis at CMU. Advancements in technology and research measurement tools, increased capacity for data storage, and improved access and discoverability of research data have created a new landscape for research that relies heavily on sharing, integrating, and re-using data. Success in this landscape will require purposeful management of research data throughout the data life cycle.

Some publishers and funding agencies have already begun to require researchers to share their data and more will be implementing data policies in the near future. Developing good data management practices early in your research will make it easier to keep your data organized and safe, meet funder requirements, and prepare data for sharing with others.

These resources can provide you with more data management information, skills, and tools.

Data Management Glossary
Cornell University's Research Data Management Service has complied a basic glossary of data management terms.

Data Curation Lifecycle
The DCC Curation Lifecycle Model provides a graphical, high-level overview of the stages required for successful curation and preservation of data from initial conceptualization or receipt. You can use the model to plan activities within your organisation or consortium to ensure that all of the necessary steps in the curation lifecycle are covered.

MANTRA - Research Data Management Training
MANTRA is a free, non-assessed course with guidelines to help you understand and reflect on how to manage the data you collect throughout your research. The course is particularly appropriate for those who work with digital data.

Nature Magazine: Special Issue on Data Sharing (Sept. 2009)
"Sharing data is good. But sharing your own data? That can get complicated. As two research communities who held meetings in May on the issue report their proposals to promote data sharing in biology, a special issue of Nature examines the cultural and technical hurdles that can get in the way of good intentions."

DataONE Best Practices
The DataONE Best Practices database provides individuals with recommendations on how to effectively work with their data through all stages of the data lifecycle.