Re-Imagining the Stack: Minimal Computing at Scale in the Digital Library
10 Nov 2016Below are the slides and edit-for-print text of a talk I gave as part of the “Minimal Computing in Libraries: Case Studies and the Case for” panel at the 2016 DLF Forum in Milwaukee, November 2016.
For the past 4 or 5 years, I have been experimenting with and employing minimal computing practices in digital scholarship and pedagogy projects. I had no name for it at the time and it didn’t have the cachet that it might today [I’m going to say it has a certain cachet — Stuart basically said we are the hipster Avant Garde]; Rather, it was a means to an end, borne out of necessity; an attempt to mitigate many of the issues inherent to a thick technology stack in libraries — In my context at the time, the biggest issues revolved around: 1) access to servers and technology, 2) steep learning curves, 3) funding and capacity issues, 4) lack of commitment to supporting projects (justifiably so), and 5) long-term preservation of these projects. Today I’m going to talk about a few of the ways that we are using minimal computing in the context of the digital library.
[slide2]
Minimal platforms
One of my first fully-realized minimal computing projects was developing a framework for an upper-level French course at Penn State where the students engaged with a literary cartography project to map Maupassant’s Bel-Ami and other Belle Epoque French novels.
The first iteration of the project was in Wordpress. Working in groups, students chose a place in the novel, writing essays exploring the significance of place and space from political, socio-economic, and other perspectives. They also curated contemporary images like paintings, early photos, posters, and playbills to further illustrate their chosen places. Their essays and images took the form of blog posts and students dropped pins on an embedded map of Paris.
After two semesters of using Wordpress, we wanted something different: 1) we wanted the map, text, and images to appear together on the page; 2) the instructor wanted students to start encoding parts of the texts for names and concepts; 3) we thought students could benefit from more hands on with technology; and 4) we grew tired of WordPress updates — we wanted less infrastructure and more flexibility — I mean, who doesn’t, right? I also thought about what might happen to the project if I were to leave Penn State — I wanted the project to stay in the hands of the instructor. With all this in mind, we decided to develop a new framework, building and hosting using GitHub. GitHub’s collaborative workflow was a good fit, and the fact that it would allow us to expose the framework, develop and publish in one place, and easily change ownership were big pluses.
We later generalized the framework and made it available via GitHub under the name “Boulevardier”. The framework is customizable, open source, and easy to fork to start one’s own project. It does have its issues and I’m working to make it even more minimal. Mapping Maupassant’s Bel-Ami was developed out of necessity, but working this way got me thinking more about using environments like GitHub and client-side web technologies instead of thick technology stacks — and how we can leverage this approach to reduce barriers and get more content up faster, with fewer resources, and with more control by the content creators.
[slide 3]
Re-imagining the Stack
I brought this mindset along to my position at UCLA, and we’ve since used Boulevardier and similar approaches successfully in a number of classroom projects in lieu of things like Drupal. By using minimal computing and infrastructure, we can embrace a “build and release” model to support digital scholarship and digital pedagogy projects. In this way, we can encourage researchers and instructors to take charge of their own projects — taking some of the pressure off of the digital library.
This is also informing how I’m thinking about digital libraries now and I want to explore how digital libraries might make use of minimal computing: What gains might we see at the enterprise level? Where can we reduce technical infrastructure, especially related to maintenance and labor costs? I don’t have answers to these questions yet, but we’re working on it some of it.
In UCLA’s Digital Library, we are experimenting with moving from a thick digital library stack with many dependencies and working parts to a lighter version, making use of heavy infrastructure only when necessary, preferring client side frameworks and APIs in place of resource intensive CMSs in some cases. Obviously we will need a robust infrastructure to maintain assets and metadata, (UCLA has over 2.3 million digital assets with their metadata, and this is growing exponentially), but perhaps the publishing side can be more fluid. We don’t want to lock ourselves into one method of publishing and hope to build and reinvent often on a more staid storage infrastructure. For access and browsing, we are working with client-side frameworks like Angular to publish our collections and to create special or “boutique” projects without recreating the stack. This should allow us to iterate more quickly, not feel married to our choices, and allow us to focus on development and experimentation and less on how we are going to maintain another 5 Drupal sites indefinitely.
I’ve also been thinking about “minimal workflows.” I don’t really know what that means yet, but I’m trying to identify steps or processes in our DL workflows that we don’t need, could be done better (or automated), or have bad cost-benefit ratios - basically achieving our goals in the simplest manner possible. This usually means a bit of up-front work in order to save many hours down the road. One of my new fav quotes (newly discovered by me, that is) is
“Automate like you are going to live forever, document like you are going to die tomorrow” - Michael Sperberg-McQueen.
I suspect this principle is foundational for a minimalist workflow…
[slide 4]
Minimal + Adaptive
Some projects have access issues that go beyond what our minimal computing efforts can solve. UCLA is working with a number of international communities to digitize and provide global access to their materials; however, in one case, due to the local technology environment, we are providing access to everyone but this community.
Cuba, one of the partner countries for UCLA’s International Digital Ephemera Project, has little infrastructure for internet access; so, despite our efforts to publish their materials online, very few people or institutions in Cuba can access these collections - and, what access exists is both limited and expensive. Cuba is not as cut of from the global digital community as one might expect though — in lieu of internet access, semi-legal “sneakernets” have developed where TB drives of data (from movies and music, to books and scholarly articles) are copied, delivered, and passed from person to person.
In order to make the digitized collections available to users in Cuba, we’d need to adapt to the way information is shared here. I wondered: how might we develop a more “adaptive computing” model, one that seeks to better understand how information is transmitted/shared within a community and adapt a delivery model that meets local needs; — how might we model publishing and dissemination of these digital assets and their metadata on these sneakernets?
For the short term, we have deployed a somewhat minimal version of the collections from Cuba’s Cinemateca — we’ve prepared a laptop running Collective Access locally and loaded assets and metadata. We’ve also configured the laptop to function as a hotspot to serve up the collection to those nearby. Getting more sophisticated equipment into Cuba for a better local wireless network is not possible right now, but first impressions of our preliminary setup are promising. My colleague and project manager for the IDEP project, T-Kay Sangwand, returned from Cuba a few weeks ago and reported that the Director of the Cinemateca thought it was great that she could access the collections from her phone. It’s too soon to say how it’s working with users though and this is a still very much a work-in-progress.
I’m currently outlining a plan to move to clone-able external drives with a lightweight, functional database and simple browser-based interface for querying and viewing the collections. This drive could then be cloned and shared with other institutions and plugged into laptops that serve as local hotspots… and cloned and shared again! As we digitize and add more assets, we can push these into the sneakernet and let them circulate.
For me, this is a pretty cool project on a number of levels. One in particular is the response from the LIS graduate student that has been working on the project with us, Niqui O’Neill. Niqui has been working hard to set up the local laptop environment and she’ll be helping with development of the external drives too. She told me recently that this is some of the most meaningful work she’s done in library school and she wanted to know if it was OK if she wrote her thesis on it. So, the project isn’t only about access to materials in Cuba - it’s about getting new professionals experimenting and thinking creatively about solutions to the myriad problems related to access, infrastructure, and so on.