I want to tell the story of my own personal data management disaster. The year is 2013. It's a Friday night, and I am in the GIS (Geographic Information Science) Lab at the University of Wyoming, where I am working on my Master of Arts in Geography/Environment and Natural Resources. My thesis is due in a week, and I am trying as quickly as I can to make the maps that my advisor has requested to be included in the thesis. I'm tired. I've got some tea with me, but the caffeine isn't helping anymore. I'm playing the Grateful Dead and I feel as though Jerry Garcia* is mocking me as he sings 'We will get by! We will survive!'
I quickly run the spatial analyses and start exporting my maps, realizing several times in a row that my analysis wasn't correct and I needed to re-run it and re-export the maps. The following ensued:
analysis_1_ final.mxd
analysis_1_final_thisone.mxd
analysis_1_final_ugh.mxd
analysis_1_final_ihatethis.mxd
analysis_1_final_usethisone.mxd
(Yes, these filenames are 100% real and can currently be seen in my Google Drive.)
See, at this moment, I wasn't thinking about how in a few years when I would be trying to publish this thesis in an academic journal, having descriptive filenames for these maps would have saved me hours of parsing through map after map, trying to find a specific analysis. Folks, it was a nightmare. Present Hannah was cursing past Hannah for not taking the extra time to use a good filenaming scheme before running those analyses in GIS. Here's the thing, though: this isn't just me. I've talked to dozens, if not hundreds, of researchers who find that data management is more of an afterthought in their research, and 9 times out of 10, they wish they had started using consistent filenames, file organization, and documentation techniques much sooner. Are they bad researchers because of this? Absolutely not! Was I a bad researcher when I cared more about getting the analyses done than developing a filenaming scheme? Absolutely not.
Data management can be tough. It's extra work on top of everything else we're already doing as researchers and learners. With today's installment ofTartan Datascapes, I want to share three tips and tricks for ways you can incorporate data management into the early stages of your research, hopefully avoiding a re-creation of that night in the GIS Lab.
1. If possible, before you start collecting your data, try to think of how you will organize your data and name the spreadsheets, databases, analyses, visualizations, etc. that are associated with your data. What elements are in this filenaming scheme? Here's some examples of elements you can include:
- Researcher's initials
- Experiment type or experiment number
- Date
- Subject
- Lab in which you are working
Remember, any filenaming system is better than not having one.
2. Try to take ten minutes at the end of each week and do a data quality check, asking yourself (and your collaborators, if applicable) the following questions:
- Are we sticking with our filenaming scheme in the data files we produced this week?
- Are the data all stored in the correct locations?
- Have we performed appropriate backups?
- Are our lab notebooks and/or documentation up-to-date?
3. Document your workflow as you run your analyses. Using a free tool like protocols.io, write down why and how you used certain analyses. I promise you, when you go to publish your work, or share it among a wider audience, having this workflow documented will save you time as you write your Methods sections and defend to your advisor, journal reviewers, and editors the choices you made in your research. It absolutely takes extra time and effort, but it will be worth it. And, I'm here to help you get that process started, as well as my colleagues who are well-versed in using protocols.io for documenting the research workflow.
Today, whenever I'm tempted to spend more time on the 'fun' part of my research (collecting data, analyzing data, etc.) instead of incredibly important data management work, I think of that night in the GIS Lab where Jerry Garcia sang about the beauty of living life to the fullest and aging gracefully as I stared at an increasingly messy file explorer. And folks, I am a professional data management educator! I completely empathize with researchers who see data management as an afterthought, and I hope with the tips that I've shared today it will help you incorporate data management early on in your research in the most painless ways possible. Still not convinced? Feel free to send me an email and we'll have a conversation!
* I wrote my Masters thesis on businesses in the United States named after the Grateful Dead, so I always felt that if I listened to them while working on my thesis, it would bring me good luck.