Twenty-Five Years of Digitization and Innovation

Joe Mesco at Scanner

Twenty-five years since the donation that launched the university’s first major digitization project, the scanning activities of the Libraries have expanded to offer more options and greater capabilities. 

In August 1994, Carnegie Mellon University received a gift of more than $1 million to develop an electronic, historical archive of the papers of the late Senator H. John Heinz III of Pennsylvania. Funded by the Teresa and H. John Heinz III Foundation, Heinz Company Foundation, and the Howard and Vira I. Heinz Endowments, Heinz Electronic Library Interactive Online System (HELIOS) represented at the time one of the country’s most exhaustive and comprehensive projects to systematically digitize textual documents for preservation, access and scholarship purposes.

HELIOS was unveiled in January 1998 after 700,000 pages scanned and four years of collaborative efforts between the University Libraries, the university’s Laboratory for Computational Linguistics (LCL), and CLARITECH, a text-retrieval and information-management company that had its origins at CMU. It radically altered the status quo by presenting images of the documents in a conventional manner, while also providing access to the content of the documents using natural language processing (NLP) software developed at the university.

Then-University Librarian Charles B. Lowry stated, in a 1994 press release, "We expect to create an archival information-technology environment that dramatically increases the depth of indexing and the quality of retrieval beyond what archiving resources have traditionally allowed."  

In the twenty-five years since the original gift from the Heinz Foundations, that vision has been realized. With the emergence of the web and ebooks, academic libraries began to transform–shifting away from the collection of books and focusing on enabling access to the institution’s rare and unique materials. Gabrielle V. Michalek, creator of the Libraries' Digitization Lab, spent two decades identifying suitable projects for digitization and raising soft money to support these efforts.

With a team of four working from the Libraries’ offsite location at 6555 Penn Avenue, the Libraries Digitization Lab, now one of the oldest in the country, continues to innovate in pursuit of their mission to provide access to the unique content held by the Libraries and deliver digitization services in support of research and scholarship. 

The Lab works closely in partnership with the University Archives, in recent years, concentrating efforts on archival photographs and negatives that capture the history of the university. Landmark projects include the Posner Memorial Collection, Pittsburgh Jewish Newspapers, and the papers of Nobel Prize winner Herbert Simon.

While the scanning workflow has remained mostly consistent since the launch of the HELIOS content management system, the equipment and software have advanced significantly, reducing the amount of time needed for the digitization process and making more scanning options available to the greater campus community.

In recognition of the increasing requests from faculty, staff, students and researchers for the inclusion of different content formats, including images and videos, metadata in scanned materials, a metadata specialist was added to the team. And the Lab is strategically moving in to projects that align with the university’s priorities, such as making content openly available, experimenting with new techniques like photogrammetry, which enables the capture of 3D objects, and investigating how to preserve video, GIS and software projects.

Looking ahead to the next twenty-five years, the pioneering spirit of the lab remains intact.  The Libraries are currently working to transform its digital collections and reimagine how users will interact with them–whether it is alumni searching for photos or digital humanities researchers doing text and data mining. The Libraries are also expanding their digital collections to encompass new formats–including photographs, video, audio, and oral histories–which have not previously been accessible online. 

In close coordination with stakeholders across the university and academia, and in partnership with the Islandora community, the digital collections team is helping to develop the future home of online content: the Islandora 8 digital collections platform. Islandora 8 will enable a reimagining of digital collections access by including linked data as within the descriptions, taking an important first step to making collections easier to find in search tools like Google, and connecting those collections to one another.

In the past, the Libraries have tapped into CMU’s entrepreneurial nature to develop their own digital collections tools–now that energy has been redirected to larger community-driven projects that will also impact other archives, libraries, and heritage institutions.  

Top Image: Scanner Operators Joe Mesco and Jon McIntire work out of the Libraries Digitization Lab, housed in the Libraries’ offsite location at 6555 Penn Avenue.

Bottom Image: Jon McIntire inspects a slide before scanning.