2025 Reproducibility Hackathon Investigates LLM Writing Styles

Reproducibility Hackathon

On Tuesday, Feb. 18, the Libraries hosted its second annual Reproducibility Hackathon. 35 participants were challenged to replicate and augment published research, working with existing data and code to explore new possibilities for ideas and analyses and winning prizes for their contributions.

Hackathon attendees worked with research provided by two faculty members from Dietrich College of Humanities and Social Sciences: Alex Reinhart, associate teaching professor with Statistics & Data Science, and David West Brown, associate teaching professor of English. Their work “Do LLMs write like humans? Variation in grammatical and rhetorical styles,” which was published as an openly available preprint, explores systematic differences between LLMs and humans, as well as between different LLMs.

Chasz Griego helps studentsSTEM Librarian Chasz Griego, who created the first Reproducibility Hackathon last March and published a case study of findings after the event, was enthusiastic about the opportunity to work with AI research for this year. 

“After hearing Alex Reinhart discuss this research at a Libraries professional development retreat, I was thrilled when I connected with him and David Brown for the hackathon,” Griego said. “AI and Large Language Models are incredibly relevant, and given the concerns about LLMs replacing or misrepresenting human writing, the research subject and artifacts shared by these researchers created an exciting opportunity for students to engage with the frontlines of this evolving landscape.”

Reinhart viewed the hackathon as an opportunity to test if he practices what he preaches — that researchers should share their data and code so others can check their work and build on their results. “We try to organize our data so others can use it, but the real test is whether motivated users can actually get it working,” he said. “So the hackathon was very exciting: Does the code really work? Can anyone actually use it?”

For Robert Morris University junior William V. Fullerton, the chance to dive deeper into LLM research was an especially exciting opportunity — he traveled all the way from Moon Township, PA to attend the hackathon. The statistics and data science major, who is minoring in finance, was curious to see firsthand how advanced LLMs are becoming, as well as explore the new R tools being developed for them.

“LLM writing styles are something that I have been interested in for a long time, especially since I’ve become very good at picking out ChatGPT’s writing,” he explained. “I will always be grateful for this amazing opportunity at CMU, as I can honestly say I left more knowledgeable than when I walked in.”

As a statistician, Heinz College of Information Systems and Public Policy doctoral student Peem Lerdputtipongporn spends much of his time thinking about how to extract insights from data. He views the process of collecting, analyzing, and using data as an art as well as a science.

“I joined this hackathon to learn how other scholars engage with this process and how I, as a researcher, can communicate my work more effectively,” he said. “My key takeaway from the event was realizing that many statistical practices I take for granted are not intuitive to others. This experience broadened my perspective on what accessible and reproducible research entails, and more importantly, to whom.”

Alumnus Vicente Malave, who graduated in 2005 with degrees from the School of Computer Science and Dietrich College and has since worked for both the Department of Psychology and the Language Technologies Institute, attended the hackathon to further explore reproducibility.

Over his career, Malave has worked for a number of startup companies that addressed reproducibility challenges. At one company, his data science team created handheld sensors for biomedical applications — but they struggled to replicate methods from published papers with them. “The gap between theory and what works with real patients was significant,” he said.

“The hackathon was refreshing because I could discuss these real-world challenges with the grad students at my table,” Malave added. “It reminded me of the community I left behind when I moved to California, and it's great to see that a warm, collaborative community still exists here: honestly, one of my best experiences since moving back to Pittsburgh. I appreciate that the library is putting on high-quality events while keeping them accessible.”

At the hackathon, prizes were awarded for outstanding contributions like data visualizations and author feedback by participants.

For Keerthana Gurushankar, a doctoral student in the School of Computer Science, the opportunity to work in a group to tackle technical challenges was a highlight of the hackathon. Over the course of the event, she also found herself gaining confidence in her own ability to take an unfamiliar and complex work, break it down, and understand it in depth. At the end of the day, she was recognized with a prize for successfully reproducing two visualizations using a smaller subset of the data.

Gurushankar and Heinz College of Information Systems and Lerdputtipongporn pose with their prizes.

Gurushankar and Lerdputtipongporn pose with their prizes.

“I was really impressed with the quality of work I saw some other participants produce in a matter of hours, so it was both surprising and motivating to receive a prize,” she said. “It has given me a push to continue exploring difficult problems.”

Mellon College of Science student Chehak Arora, who is part of the Master’s of Science in Data Analytics for Science (MS-DAS) program and also a member of the Tartan Research Data Alliance (TRDA), also won a prize for her work creating a visualization dashboard.

“As a TRDA member, I was eager to participate because the idea of reproducing the paper was a good idea for me to apply my skills and explore the intricacies of LLMs,” she explained. “Winning a prize was a pleasant surprise that added a fun dimension to the experience.”

Ultimately, the researchers were glad to see that participants were able to replicate their work, as well as produce new visualizations, interactive applications, and scripts.

“It’s great to see what CMU students could do with our work, and I hope this encourages other researchers to share their own code and data,” Reinhart said.

Hackathon outputs like data, code, and visualizations from all participants are available in a public repository on Open Science Framework.

For help exploring the reproducibility of your research, you can get in touch with specialists at the Libraries like Griego, Open Science Program Director and STEM Librarian Melanie Gainey and Research Data Services Librarian Alfredo González-Espinoza. To learn more about a variety of open science offerings, subscribe to the Open @ CMU Newsletter.