Open Science at CMU Newsletter - February 2020


February 2020

In our first Carnegie Mellon University Libraries Open Science newsletter of the year, we are excited to announce a slate of upcoming Open Science events, including Love Data Week 2020, this year's AIDR symposium, new dataCoLAB office hours, and workshops from Data Carpentries, the Childhood Cancer Data Lab, and the University Libraries Workshop series.

Contact us at and follow us at #CMUOpenScience.

To subscribe to this newsletter, click here.

Love Data Week
Every year around Valentine's Day, universities and libraries around the world participate in Love Data Week (February 10 - 14, 2020), an effort to raise awareness related to managing, sharing, preserving, and reusing research data. The University Libraries are no exception! This year, you'll be able to find us tabling at all three of our library locations as well as other buildings on campus all week with cookies, candies, button and zine making, a data scavenger hunt, workshops, giveaways, and data valentines to get students and researchers thinking about the importance of loving and caring for their research data.

Participate in our Love Data Week activities and follow along with the hashtag #lovedata20
Data Carpentries Workshop - March 12-13
The Carpentries – We teach foundational coding and data science skills to researchers worldwide.

We are thrilled to be hosting a free Data Carpentries R for Social Scientists workshop at CMU Libraries March 12th-13th! In these two-day intensive workshops, participants gain hands-on experience with programming techniques in a supportive, open environment. Data Carpentry's aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This is an introduction to R designed for participants with no programming experience. These lessons start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting.

Look for registration to open in mid-February on the Upcoming Workshops page. Space is extremely limited!
AIDR Symposium 2020: Call for Submissions

Last year, supported by the NSF scientific data reuse initiative, the inaugural AIDR (Artificial Intelligence for Data Discovery and Reuse) attracted AI/ML researchers, data professionals, and scientists from biomedicine, technology industry, high performance computing, astronomy, seismology, library and information science, archaeology, and more, to share innovative AI tools, algorithms and applications to make data more discoverable and reusable, and to discuss mutual challenges in data sharing and reuse. 

This year, we are following up with a shorter, one-day AIDR Symposium, that provides a place for the community to continue having these conversations and work together to build a healthy data ecosystem. We encourage deeper conversions and work-in-progress presentations. The accepted formats are: Short Talks, Lightning Talks, Posters and digital demos, and Roundtable Discussions. For more information and to submit your work, see our conference website.
New Hours for Research Data Collaborations Lab
Data Collaborations Lab (#dataCoLAB) is a mechanism that matches people who need help with their data analysis and visualization problems, with data scientists and computer scientists who look for interesting data project to collaborate on and/or to gain consulting experience. A website featuring our current projects and consultants is coming soon!
In the meantime, we are sharing with you our new office hours. We will increase the frequency to once a week so that you will have the flexibility to choose the week that works best for you. We are also excited to announce that we will be co-locating with dSHARP office hours, where our in-house experts on text-mining, data visualization, and data management will be in the same room!

Time: Every Wednesday 1-3pm
Location: The Den, Sorrells Engineering and Science Library, Wean Hall

Sign up to get help with data, or to become a consultant. And always feel free to email us at We encourage you to fill out these sign-up forms before coming in, but walk-ins are welcome! 

Childhood Cancer Genomics Data Analysis Workshop

The Open Science & Data Collaborations Program at Carnegie Mellon University Libraries is partnering with The Childhood Cancer Data Lab (CCDL), founded by Alex's Lemonade Stand Foundation, to host a Data Analysis workshop March 24-26th. This workshop will introduce reproducible analysis of bulk transcriptomic data to childhood cancer researchers and other researchers working on transcriptomic data. 

Attendees should bring a laptop (running Windows 10 Professional or higher, Mac OS X, or a Linux operating system) and a biological question that relates to transcriptomic data. We encourage attendees to bring your own data for analysis. We do not expect attendees to have programming experience: the material is designed to help attendees get up and running with software used in this domain. Attendees will learn to perform reproducible analyses in Docker containers and record their analyses in RMarkdown notebook.

See a tentative schedule or apply and register for the workshop
Upcoming Libraries Open Science and Research Data Management Workshops
CMU Libraries offer the following workshops on Open Science and Research Data Management topics in the Spring semester.  Unless noted, all workshops take place in the Sorrells Library Den in Wean Hall.

Data Visualization Basics
Emma Slayton
February 6, 12:30 PM
This workshop provides an introduction to data visualization, or the techniques used to visually display or communicate data in graphs, charts and other tools. 

Data Management for Social Sciences
Hannah Gunderman and Sarah Young
February 10, 6:00 PM
Researchers in the social sciences engage with unique forms of data, including surveys, ethnographic data, and census data. This workshop provides data management tips for social scientists, including cleaning data, developing filenaming schemes, and storing data securely. 

Love Your Data with Open Science Framework
Hannah Gunderman and Sarah Young
February 12, 12:00 PM
As a part of our week of events for Love Data Week 2020, we are excited to offer this workshop to help give you an exposure to Open Science Framework, a free and open source tool that can be used for managing projects and collaborations in any discipline. 

Data Management for Humanities
Hannah Gunderman
February 17, 6:00 PM
Researchers in the humanities engage with many unique forms of data, including photographs, film, literature, ethnographies, and audiovisual materials, all with their own considerations for how to effectively manage them. This workshop will provide data management tips and tips specifically aimed at the humanities from all levels (students, staff, and faculty)

Data Visualization in R 
Emma Slayton
February 20, 12:30 PM
This workshop provides an introduction to data visualization in R, or the techniques used to visually display or communicate data. 

Reproducible Data Visualization in Jupyter Notebooks
Huajin Wang
February 27, 12:30 PM
Beginning with a short presentation on principles of computational reproducibility, followed by a hands-on live-coding session, the workshop teaches how to using Python libraries and Jupyter Notebook to make reproducible and beautiful figures.  

Cleaning Messy Data with OpenRefine
Sarah Young 
March 5, 12:30 PM
OpenRefine is a free, open source tool to help you prepare your data for analysis. Quickly and easily transform data, split and merge columns, remove whitespace, and perform many more common data cleaning tasks. 

Data Management for STEM
Hannah Gunderman and Huajin Wang 
March 9, 6:00 PM
Researchers in STEM disciplines engage with unique forms of data including algorithms, code, spreadsheets, and big data. This workshop provides data management tips for STEM researchers including cleaning data, developing filenaming schemes, and storing data securely, especially with large datasets. 

Data Visualization with Tableau
Emma Slayton
March 16, 1:00 PM
This workshop provides an introduction to data visualization in Tableau, or the techniques used.  

You can register for workshops and see the full list of workshop offerings from University Libraries at