Hello all! We are rapidly approaching the end of the semester, and I'm sure as all of you can empathize, I am eagerly awaiting the holidays for some much needed rest. At CMU Libraries, we've been working hard to support students, staff, and faculty with their end-of-semester teaching and research needs, and we are also building up a great lineup of spring 2020 workshops and engagement opportunities. Keep a close watch on our website for updated information!
This week, Tartan Datascapes travels virtually to the American Shakespeare Center's Blackfriars Playhouse in Staunton, VA (hereafter referred to as the Blackfriars Playhouse), where two CMU-based researchers are collaborators on Shakespeare-VR, an educational project rooted in accessibility, inclusivity, and diversity which works to bring the joy of William Shakespeare's plays to a wider audience through 360-degree immersive experiences. Before you start reading this week's researcher highlight, I'd suggest grabbing a snack, a blanket, and some tea, as I've got a fascinating video to share:
A few weeks ago, I found myself in the office of Dr. Stephen Wittek, Assistant Professor in the Department of English in Dietrich College of Humanities and Social Sciences and Project Director for the Shakespeare-VR initiative, wearing a virtual reality headset and watching this promotional video, in awe of how, well, real it felt! That is the beauty of virtual reality, as Stephen noted to me after I took off the headset, my head spinning at how I could be so immersed into the space of the Blackfriars Playhouse in Virginia, yet I was sitting in Baker Hall in Pittsburgh. While I wish I could say I was the Doctor, I was thrilled that I could see portions of these plays without physically traveling to another state. As Stephen notes, this is one of the key goals of the Shakespeare-VR project: reading a play is fundamentally different than literature, as a play has characteristics of both readership and theatre, and to truly immerse in the work, one must not only engage with the characters, but also understand what is going on in the theatre itself. However, what if a person has a disability which prevents them from physically going to see a play and/or from fully immersing in the experience? Virtual reality can bring Shakespeare's work to these individuals, and provide them with a similar immersive experience that one may feel when directly in the theatre.
I know some readers right now may be thinking 'Wait, can virtual reality really mimic what it feels like to be physically in the theatre?' Having used the virtual headset to view some of the scenes, I can confidently say yes! It turns out, however, that there is research to back this up: as Stephen noted to me, within the Blackfriars Playhouse, and the Globe Theatre in London from which it is modeled, the spaces make use of universal lighting. This means the audience is lit to the same degree as the stage, meaning the actors can see the audience, and audience members can see each other. As Stephen noted, this is crucially important as audience members can see the 'words landing on other people,' making for a more intimate, collective experience and supporting audience cohesion. While being there in person among the audience is an unforgettable experience, Stephen notes that virtual reality is the second best thing, pedagogically.
At this point after talking to Stephen and feeling completely inspired by the theories and pedagogical goals underlying this project, I had an insatiable urge to learn more about the data driving this project. Dr. Matthew Lincoln, Research Software Engineer for CMU Libraries, played a large role in the architecture, design, and implementation of the Shakespeare-VR website, as well as the data processing and storage in KiltHub, CMU's institutional repository for research data and scholarly outputs.
So what are the data in Shakespeare-VR? Like several humanities-based research projects, it involves the fusion of many different flavors of data to build a holistic experience or narrative around a topic. In the case of Shakespeare-VR, the main data sources include raw videos of Hamlet taken by Google-produced 360-degree cameras at the Blackfriars Playhouse, images documenting the filming process, and pedagogical materials for using the videos in an educational setting. The camera setup, which was comprised of 17 (!) cameras in an orb structure, produced 17 feeds when filming any given scene. These feeds also have accompanying metadata, which makes the 'stitching' process complex in terms of understanding how the videos overlap with one another in a given scene. This raw video was knit together by Stitchbridge, a startup created by CMU students aimed at developing engaging 360-degree content for mobile devices.
The accompanying images taken by Stephen were an additional data source providing context to the videos and the environment in which they were filmed. These images document the filming day, including the setup of the filming location and other contextual visual information. As Matthew noted, there were also several images that were 'weeded' out, including test shots and camera misfires, and these were deemed not relevant to the documentation. With the final KiltHub upload of images, everything needed a purpose, and Matthew had to make important decisions on what documentation would best support reuse of the project files.
Keeping true to the Tartan Datascapes theme, I had to talk to Matthew about data management, which certainly must have been no easy endeavor considering the sheer amount of data produced in the project. It turns out, I was right! Given Matthew's expertise in software engineering and advanced data engagement, he had some solid techniques for managing large datasets that I'm happy to share with you all:
Matthew noted that with such a large amount of data (we are talking 500GB!), it was easier to have everything on a physical hard drive rather than stored in the cloud. He immediately reached out to David Scherer, our Scholarly Communications and Research Curation Consultant at CMU Libraries, to use a reliable, old-school method called FTP, or File Transfer Protocol, for moving these large files to KiltHub. In total, the upload included both the raw data as well as the final videos edited by Stitchbridge.
Have you ever emailed a final paper to a professor and immediately went back, re-downloaded the paper from your Sent folder, and opened it back up to make sure you sent the right version? Or, have you ever sent a dataset to another person, and re-downloaded the dataset to make sure you sent the right version? I always like to double-check that I didn't miss anything in translation when transferring data of some kind to another person or platform. You just never know! But, this becomes a lot more complex when sharing large datasets. As Matthew noted, most of the technical challenges in managing the data for Shakespeare-VR were figuring out how to easily transfer large amounts of data back and forth from his local storage to KiltHub. How did Matthew accomplish this? He had to perform a checksum validation after uploading the files, using the MD5 hash (a fingerprint of a file containing a unique string of numbers) from KiltHub and comparing it with the MD5 checksum of the files on his local storage. If both strings match, you have successfully transferred the proper files! Luckily for Matthew, the checksum validation was successful, and our own repository is now the proud home for the Shakespeare-VR data.
This project was part of an Andrew W. Mellon Foundation Seed Grant for Technology-Enhanced Learning & Digital Humanities, which helped enable the data curation, construction of the website, and promotion of the project. It is also important to note that Scott Weingart, Program Director for Digital Humanities with CMU Libraries and member of dSHARP, directs the Mellon Digital Humanities program and is an important contributor to the online production of Shakespeare-VR. The project also receives tremendous support from our very own Eberly Center, which is involved with the lesson plans and pedagogical impact of the project, which you can view here. These lesson plans are data in and of themselves, as they are information supporting the broader research observations generated through the Shakespeare-VR project. In true Tartan Datascapes fashion, I like to use every opportunity I get to remind readers about the diversity of data!
What's next for Shakespeare-VR? The project collaborators hope to increase the capacity for interactivity within the virtual reality experience, allowing users to choose a play, a character, and a scene, and take part in the scene as though, for example, Lady Macbeth is speaking directly to them. Another step in the future is to develop a 3D model of the Blackfriars Playhouse stage, allowing users to walk through it like Minecraft. As Stephen noted, the educational possibilities around this project are truly endless!
What are three takeaways from this researcher highlight?
1. Dealing with very large datasets requires a different approach when sharing data across researchers/collaborators! Getting comfortable with utilities like checksum validation and FTP can be a big help in this complex sharing use case.
2. Data can be incredibly diverse in form, structure, and medium, and projects can have multiple forms of data! In the case of Shakespeare-VR, we are seeing data represented through 360-degree videos, images, and lesson plans.
3. 'Weeding' is incredibly important in the data storage and preservation process - you may not need to save everything! Consider the potential use cases for your data, and weed out extraneous information that doesn't support those likely use cases. Not sure what to keep and what to save? Send your local Research Data Management Consultant (hint: me!) an email.
Important Happenings in Research Data Management at CMU Libraries:
We are wrapping up our semester and have just a few remaining workshops coming up at CMU Libraries (click here to see our full list of workshops for the remainder of the semester), many of which can help you learn new tips and tricks for data collection, analysis, and management. Here's a few that have a particular Tartan Datascapes-flavor, all relating to Dimensions, a worldwide database of awarded research funding that allows users to see the latest funded research projects in their areas of interest - wherever they are in the world:
Utilizing Dimensions to Search, Build, and Understand Research Context for Librarians, Wednesday, December 11th from 10:00 am - 11:30 am in the Sorrells Library Den (click here to register!)
Utilizing Dimensions to Search, Build, and Understand Research Context for Research Communication and Marketing and Communication Professionals, Wednesday, December 11th from 2:00 pm - 3:30 pm in the Sorrells Library Den (click here to register!)
Utilizing Dimensions to Search, Build, and Understand Research Context for Research Administrators and Campus Leadership, Thursday, December 12th from 10:00 am - 11:30 am in the Sorrells Library Den (click here to register!).
Utilizing Dimensions to Search, Build, and Understand Research Context for Faculty, Staff, and Students, Thursday, December 12th from 12:30 pm - 2:00 pm in the Sorrells Library Den (click here to register!)
And of course, please email me at hgunderm@andrew.cmu.edu if you'd like some help on your journey as a researcher/scholar/awesome human being here at CMU. Remember, we all use data, regardless of our discipline. If you think something might be data, you are likely correct and I can help you develop good habits for managing it! If you'd like to have your research data featured on Tartan Datascapes, please fill out this Google Form to get in touch!