Welcome to the second issue of Tartan Datascapes! This week, I'll be discussing a topic that is seemingly simple, but heavily complex: what are data? But first, let me start with a brief note:
Throughout this blog, you'll often see me treat the word data as plural, such as "Data are," "The data show," and "Your data may." Using data as singular vs. plural can be a bit contentious, but ultimately boils down to preference on adhering to the Latin origin of the word. Here's a great analysis of the conversation: https://www.theguardian.com/news/datablog/2010/jul/16/data-plural-singular. While, yes, if you prefer to strictly adhere to the origins of the word, you would treat data as plural (in accompanient to the singular datum), but it's also completely valid to refer to data as singular. Words and contexts change over time, and language is fluid. I often treat data as plural simply because this is what I am used to doing, but I don't stick to this as a hard and fast rule. Refer to data in the manner that makes sense to you. You do you!
Okay, now that we've got that cleared up, let's dive in:
What are data?
I like Merriam-Webster's definition of data, which falls into 3 parts:
- factual information (such as measurement or statistics) used as a basis for reasoning, discussion, or calculation
- information in digital form that can be transmitted or processed
- information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful
(see source and full definitions here: https://www.merriam-webster.com/dictionary/data#usage-1)
Through analysis, we can turn data into meaningful knowledge about a person, an observation, a phenomenon, and so many other things!
Data can include:
- A notepad full of hand-written observations from fieldwork
- A spreadsheet containing measurements from a laboratory experiment
- A box full of photographs taken during the process of creating a sculpture
- A text document with a transcribed interview with a musician
All of these examples are valid forms of data with their own methodological rigor and intellectual value!
Who has data?
My rule of thumb is that if you think you have data, you probably do.
One of the beliefs I hold firmly in my core as a Research Data Management Consultant at CMU is that everyone here has data. Everyone! Are you in the Tepper School of Business? You have data! In the College of Fine Arts? Yep, you have data too. Do you have dozens of training datasets for machine learning algorithms? That's really cool data! Are you translating the novels of an author into another language, and have a folder on your computer full of Word documents with your translations? That's awesome data as well! Structurally, your data may look different from data in other disciplines, but your data are just as valid as other data. As a cultural geographer by training, I have a humanities background and my data are generally interview transcripts, photographs, and literature reviews. As I'm sure many social sciences-focused folks reading this can attest, I've had my data called "less intensive" and "less useful" than data collected in STEM fields. I started to shy away from referring to myself as a researcher using data-driven methods, until one day I realized that my data, while they looked different, were just as valid as any other data out there! From fine arts, to humanities, to social sciences, and to STEM, we are all doing amazing things with data, and the beauty of being in a truly interdisciplinary environment like CMU is that we are exposed to the varying ways in which these data are collected, formatted, and analyzed. I vote for data solidarity across our campus where we support all the beautiful forms that data can take!
We are fortunate at CMU Libraries to have a team of librarians, consultants, and specialists who truly love data. No matter what kind of data you have, we can help you find the tools to analyze it, manage it, and publish it!
Want to continue this conversation? Feel free to email me at hgunderm@andrew.cmu.edu and we'll set up a time to chat!
Important Happenings in Research Data Management at CMU Libraries:
Want a general introduction to data management? I am co-leading a workshop with fellow librarian and Libraries Open Science Program Director Ana Van Gulick titled "Data Management Tips and Tricks to Organize your Research" on September 23rd from 1:00pm - 2:00pm in the Sorrells Library Den in Wean Hall, where we'll provide an introduction to data management applicable to folks in multiple disciplines, accompanied by some concrete tips and tricks that you can apply in your own research. This also happens to be on my birthday, and I'm thrilled to be able to spend my birthday talking about my favorite thing ever, research data management (seriously, just hear me out. It's a beautiful thing!). Only a few spots left! Follow this URL to register: https://cmu.libcal.com/event/5651104?hs=a.
And of course, please email me at hgunderm@andrew.cmu.edu if you'd like some help on your journey as a researcher/scholar/awesome human being here at CMU. Remember, we all use data, regardless of our discipline. If you think something might be data, you are likely correct and I can help you develop good habits for managing it! If you'd like to have your research data featured on Tartan Datascapes, please fill out this Google Form to get in touch!