Reverse Engineering the Image Library

The Media Center was delighted to receive a one-year Sparks! Ignition grant from the Institute for Museum and Library Services in 2017. This award funded a project to assess the feasibility of using deep learning and computer vision to automatically sort digitized 35mm slides, with the goal of creating an open-source, scalable framework for archival discovery in legacy slide collections worldwide.

The Columbia University Department of Art History and Archaeology has a library of over 400,000 35mm slides collected, curated, and created by faculty and students during the latter half of the 20th century. The collection covers a vast geographic and temporal scope of topics in the fields of art history, archaeology, and anthropology. Both the slide images and labels are an important art historical resource that remains unused due to the obsolescence of the medium. The labels provide a teaching bibliography and a record of subject areas and artworks taught in the department over the last 60 years. Additionally, many of the slides are unique fieldwork photography completed by Columbia faculty and students for original research. As with most 35mm slide libraries, a master catalog for the collection was never created.

The size of the collection, coupled with the lack of a master catalog, makes it preventatively difficult to access the significant image resources in the collection – a problem faced by many slide collections. Throughout the project, the Media Center explored several computer science techniques on a sample set of digitized slides from the collection in an attempt to solve this issue. One experiment used Optical Character Recognition software to automatically read the slide labels. Another adapted computer vision and machine learning processes to automatically detect whether a slide image is copied from a book, or if it is an original photograph.

Reverse Engineering the Image Library was presented by Stefaan Van Liefferinge and Gabriel Rodriguez at the Visual Resources Association Annual Conference in 2019 as part of the session 'Moving/Still/Textual Pictures - Tools for Analyzing Art, Texts, and Films'. The white paper produced at the project's end is currently being submitted for publication. Please contact mediacenter@columbia.edu for more information.

This project was made possible in part by the Institute of Museum and Library Services Grant LG-89-17-0218-17.