Comic Books as Data: Dr. Felipe Gómez and the Latin American Comics Archive

Think back to the last time you read a comic book. How did you do it? Did you read it straight through from the first page to the last page? Did you first skim through it to get a general idea of the content? The last comic book I read was the manga Paradise Kiss (it's brilliant, if you haven't read it!), and I read it in the same way that I read all other comic books: I read the text in the speech bubbles and look at the illustrations in the panels. Most people reading this blog probably read comic books in a similar way! But what if I told you that there are other ways we can engage with the content in comic books and graphic novels, learning more about the social, cultural, and political themes present within the story? If you're curious, grab a snack, some coffee/tea, and keep reading!

Since the beginning of Tartan Datascapes, I've talked about a lot of different kinds of data, from surveys, to novels, to Facebook posts, to training datasets for machine learning algorithms. Today, I'm thrilled to be featuring a project that uses my favorite kind of data: popular culture data! As many of my readers know, I am a popular culture researcher and my data sources are usually television shows and the elements held within them (scripts, scenes, characters, themes, etc.). So, it's safe to say that anytime I see a project using data sources from popular culture, I just have to know more! This was the case when I learned about the Latin American Comics Archive (LACA), a project led by Dr. Felipe Gómez, Teaching Professor of Hispanic Studies in the Department of Modern Languages here at CMU. I first read about the project in an October 2019 feature from Jaycie King, with CMU News, found here, and I immediately became fascinated! As King describes it in the article:

The archive is a curated, online exhibit of comic strips and comic books with the goal of enabling researchers to visualize and employ comic strips and comic books created in Latin America between the 1920s and the present for pedagogical purposes.

While comic books themselves are wonderful, engaging, and creative works of art (okay, I'm biased. I've loved comic books ever since I was young!), to use them in a teaching setting it can be beneficial to find a way to further extract the themes and content beyond simply reading the text and interpreting the images. Dr. Gómez collaborated with the dSHARP (digital Science, Humanities, Arts: Research, & Publishing) group at CMU Libraries, specifically with Daniel Evans, Rikk Mulligan, and Scott Weingart, to create a system to add these interpretations as metadata to the transcriptions of comic strips, comics books, and graphic novels using the Comic Book Markup Language (CBML). CBML is an XML vocabulary, a subject-specific extension of the Text Encoding Initiative (TEI), informed by 'comics scholarship,' a field of academic scholarship that explores the social, cultural, and political themes within comics and studies their pedagogical value. In comics scholarship, comic books are data, and tools like CBML help us extract that data. CBML allows us to describe the different elements present in a comic book--not simply text, but text as narration, dialogue, character thoughts, sounds effects, but also editorials, fan letters, news, advertisements, and other textual content. In addition, CBL tags can define elements to other formal features such as panels, balloons, captions, and panel groups as the structure of a page. CBML suggests ways that standard TEI tags, elements, and attributes can be used to encode other distinctive comic features. When added to transcriptions and saved in XML, a digital format, the content of these tags can be quickly searched by comics scholars to quickly identify themes and elements across large numbers of comic titles. Check out CBML in action below:

Image Description: An example of Comic Book Markup Language, featuring the fifth panel of page 6 from Captain America #193 (January 1976), edited, written, and drawn by Jack Kirby. Image credit to John A. Walsh of the Digital Culture Lab at Indiana University, with further documentation found here.

LACA contains fourteen items (including comic strips, comic pages, and full comic books), ten of which are encoded in CMBL. So far, the team has obtained copyright permissions to display 5 of the items publicly, with the remainder of the items being used internally for classrooms under fair use (did you read the recent Tartan Datascapes feature on fair use? Check it out here if you missed it!). The pedagogical uses of CBML to create and use the archive were explored in Dr. Gómez's course during the 2017-2018 academic year titled 'Superheroes and Beyond: Spanish Language Comics in Digital Humanities.' I was lucky to take two courses in college on comic books, which both followed the format of reading an assigned book, then discussing the themes in the book in class. While this was a great experience, I wish I had the opportunity to engage with these works through CBML. As Dr. Gómez noted to me in an interview, in the classroom he is able to give students a glimpse into this encoding process, which ultimately creates a unique learning experience for the students that complements close-reading as an interpretive practice applied to comics. Creating these experiences for students with LACA led Dr. Gómez to receive a Teaching Innovation Award at CMU and a 'Best Formative Initiative Developed in 2018' by the Hispanic Digital Humanities Organization!

While I would argue that data management is always relevant in a project, it becomes especially relevant when transforming your data into a new format. When creating LACA, the research team worked with scanned pages from the comics, thereby transforming the raw data (the comics) into a new form. Dr. Gómez expressed some variation in the resolution of the scans, and especially in cases like this where a researcher may need to look back at the original scans to clarify any questions (such as text that is hard to read in the scans), having access to the original data sources is incredibly important! I recommend saving your original (often called 'raw') data, using descriptive filenaming schemes to identify which transformed files (in this case, the scans) are associated with which raw data, and keeping everything organized in a file/folder system!

I'll end this blog post by reiterating that we are so fortunate at CMU to have such brilliant researchers like Dr. Gómez on our campus, who expand our notions of what data-driven research can look like. I've now been here for 8 months, and I'm truly humbled by all the ways our students, staff, and faculty work with data on a daily basis!

What are three takeaways from this researcher highlight?

  1. Remember - what constitutes 'data' is very diverse! In the case of LACA and the broader field of comics scholarship, comic books are in and of themselves rich data sources!
  2. There are encoding languages and metadata schemas for way more topics than you'd think! Prior to chatting with Dr. Gómez, I had no idea there was an encoding language for comic books. How cool is that?! Looking for an encoding language for your own research? Reach out to the Research Data Services team (email us here) and we'll make some recommendations!
  3. Looking to embark on research that involves the humanities and digital techniques? Reach out to dSHARP and schedule a brainstorming session!

As we approach the end of the semester, there's still a lot of great ways to engage with CMU Libraries virtually! You can check out our LibGuides, specialized research guides created by our subject experts, including the Data 101 guide, which has a particular Tartan Datascapes flavour. We also have a variety of online databases and resources at your disposal. Looking for some more specialized support? dSHARP and the Data Collaborations Office Hours are now virtual until further notice, and you can register to join us for our final open office hour on Wednesday, May 6th, between 1-4pm.

And of course, please email me at hgunderm@andrew.cmu.edu if you'd like some help on your journey as a researcher/scholar/awesome human being here at CMU. Remember, we all use data, regardless of our discipline. If you think something might be data, you are likely correct and I can help you develop good habits for managing it! If you'd like to have your research data featured on Tartan Datascapes, please fill out this Google Form here to get in touch!