MARC Records

MARC records are the keys to the castle. They represent access to vast swaths of human knowledge and understanding and they give us the ability to sort, search and classify this knowledge in useful ways. And yet, approaching the records directly feels like approaching the Library of Alexandria without knowing how to read:

I spent this week trying to understand how to make a “machine readable” record into something a bit more approachable for humans. I started with this questions: could we take these entirely inscrutable records and imbue them with something resembling personality or behavior? If the records themselves had some agency, how might we understand them differently? What new ways might they be willing to approach us or each other?

Agent Based Modeling

There is a computer simulation technique called “Agent Based Modeling,” in which a number of individual programmed entities (agents) interact with one another in a simulated environment. In architecture, this is used to better understand pedestrian flow through a building. In robotics, this can be used to model cooperative behaviors among multiple robots. Last spring, as part of Dan Shiffman’s Nature of Code course, I made this simulation of herding behavior based on a number of simple rules that might plausably define how sheep and sheep-dogs act around one another (deep dive here):

What I noticed as part of this herding simulation project was that the threshold for a programmed action feeling like ‘behavior’ or ‘personality’ to a viewer was quite low. As soon as a system of agents becomes sufficiently complicated that cause and effect of each agent’s actions are no longer clear, these circles and triangles become living, independent entities with their own set of desires and needs. This might be chalked up the fact that we as humans are particularly apt at creating narrative. Or that we see outselves in these shapes on a screen. Or something else entirely. Regardless, I felt that the transformation this simulation undergoes in the mind of a viewer – from a rule-based programmed environment into a field of independently acting beings – was a powerful one.

MARC Records X Agent Based Modeling

For this project, I wanted to use this transformation to provide a new way of viewing or interacting with MARC records. My goal was to create a ‘flocking simulation’ in which individual MARC records interacted with one another based on their similarity or connectedness. Birds of a feather flock together, and MARC records with similar Subject Heading datafields would do the same. Those lacking connection would avoid one another.

In more technical terms, each record would become a single bird (or ‘boid’), whose desire to ‘align’ ‘cohere’ and ‘separate’ from each other boid was determined by their level of interconnectedness. Using the provided MARC parsing template, create a relational map of a subset of the records. My first approach to this was to use the 600 series Subject Headings datafields. If record A had the following datafields: “Apple Trees, Arboriculture” and record B had “Apple Trees, Orchards, Poetry,” these records would have a single connection. If record C came along with “Orchards, Poetry, Loon (Birds),” this record would have two connections to record B. Plotted as a force-directed graph using the provided glitch template, this ends up looking a bit like this (interactive version here):

Similar subjects are clumped together, based on the weights of their connection. “Panoramic photographs” and “Lakes & ponds” are closely related, leading one to presume that the Library of Congress’s Photos & Prints division might have a number of panoramas of lakes or ponds. Indeed, a quick search on the LOC Web Portal reveals at least 136 Panoramic Photographs of Lakes and Ponds:

Next Steps?

My goal for this project is to create a new way for us to interact with MARC Records – and to allow MARC Records to interact with us and with each other – that allows spontaneous discovery and fun. What I did not quite realize when I started was the sheer scale of the records. I began with the visual materials MARC file – a single XML worth of records – and found that it produced a nearly memory-error-causing JSON file, even when using 1/10th of the available records. My next step is to begin again with a smaller subset of records, and try to bring them into an agent-based simulation.