Hacking a research project

Amongst the holdings of the National Archives of Australia are some of the most visually arresting documents you’ll see — thousands and thousands of forms from the early decades of the twentieth century, each with a portrait photograph and palm print, each documenting the movements of a non-white resident. Along with many other certificates, regulations, correspondence and case files, these forms are part of the massive bureaucratic legacy of the White Australia Policy.

These certificates allowed non-white Australians travelling overseas to re-enter the country. NAA: ST84/1, 1906/21-30

But these are more than just interesting looking pieces of paper, they are snapshots of people’s lives. The forms capture data about an individual’s place of birth, physical characteristics and more. Over time a person might have submitted several of these forms, so by bringing them together we could trace their history, we could map their journeys — we could even watch them age.

The system which sought to render non-whites invisible has captured and preserved the outlines of their lives. By extracting and linking this data we could build a picture of another Australia, an Australia in which non-white residents lived, loved, struggled and succeeded, despite the impositions of a repressive regime.

I talked about these records at the AAHC conference last year, inspired in part by Tim Hitchcock’s chapter in the Virtual Representation of the Past. Tim Hitchcock argues that technology can allow us to restructure archives, looking beyond institutional hierarchies to the lives of individuals contained within:

What changes when we examine the world through the collected fragments of knowledge that we can recover about a single person, reorganised as a biographical narrative, rather than as part of an archival system?

I don’t know, but I’d like to find out.

During my AAHC talk, Dave Lester suggested that the extraction of data from these forms might make a good crowdsourcing project. It’s a great idea. As you can see, the data is generally well-structured and legible, it should be possible to construct a simple series of forms that would allow volunteers to transcribe the data. The next stage would be to try and match identities across forms. That’s more complicated, but projects such as Tim Hitchcock’s London Lives show how users can construct identities by connecting a range of historical documents.

Then there are connections to resources outside of the archives — photographs, local histories, newspapers, genealogies, cemetery registers and more. By keeping our system open and extensible, and by working with others to help them expose their information in standard ways, it should be possible to develop the framework for an evolving mesh of biographical data.

So, how do we get started? This is the point when you usually have to start thinking about money — how can I fund this? In Australia that generally means a journey into the arcane world of the Australian Research Council. The ARC suffers from all the problems of a peer-reviewed system, but added to this is a rather antiquated notion of what research is.

In the rules covering each of the main schemes it’s clearly stated that the ‘compilation of data’ and the ‘development of research aids or tools’ are not supported. I spend part of my life working for the Australian National Data Service, an organisation that seeks to highlight how the sharing and reuse of data can open up new research possibilities. The ARC, however, seems to think that data has little value beyond its original research context.

Of course you can still mount a case for such activities. Applicants for a ‘Discovery’ grant can argue that data creation is integral to their project and provide details of the ‘specific research questions to be addressed’. But what if you don’t yet know what the questions are? Part of the point of a project such as this is to try and find out what questions we are able to ask. Until we start to compile, link and explore the data, the ‘specific research questions’ will be little more than convenient fictions, dreamt up to satisfy the prodding of peer reviewers.

Tom Scheinfeldt wrote a fantastic blog post recently, responding to concerns about the failure of many digital humanities projects to make arguments or answer questions. Drawing examples from the history of science, Tom argues:

we need to make room for both kinds of digital humanities, the kind that seeks to make arguments and answer questions now and the kind that builds tools and resources with questions in mind, but only in the back of its mind and only for later. We need time to experiment and even… time to play.

The ARC does not fund play.

You might imagine that the ARC’s infrastructure funding scheme would offer more hope for a project such as this. And yes, there are many worthy projects involving databases and online tools that have been supported in this way (and I have benefited from some of them!). But it seems that in the minds of research funders infrastructure is always BIG. Grants start at $150,000, and applications are expected to involve multiple institutional partners. Projects have to be scaled up to fit the ARC’s definition of infrastructure, often resulting in complex, lumbering, long-term projects whose products are out of date by the time of their release.

There is no room in our current infrastructure models for agile, innovative, user-focused digital toolmakers seeking small amounts to experiment with apps, prototypes, datasets or visualisations. I often look with envy upon the US National Endowment for the Humanities Digital Humanities Start-Up Grants.

In any case, neither I nor my partner in this endeavour, Kate Bagnall (@baibi), are currently in academic positions, so our chances of gaining any sort of research funding are next to none. We have the expertise — Kate has spent many years researching Australian-Chinese families and knows the records back-to-front, while I just can’t help playing with biographical data — but is that enough? How can you mount an ongoing research project without institutional support, research funding and the various badges and signifiers of academic authority?

I don’t know that either, but I have some ideas.

Ah Yin Pak Chong

Mrs Ah Yin Pak Chong. NAA: ST84/1, 1907/321-330

I didn’t manage to get a contribution together for Dan Cohen and Tom Scheinfeldt’s crowdsourced-in-a-week book, Hacking the Academy, but watching the process from afar I did begin to wonder about how we might hack the way we build and run major research projects. This is what I have in mind:

  • To strip down the large, lumbering beasts and design projects that are modular and opportunistic — able to grow quickly when resources allow, to bolt on related projects, to absorb existing tools.
  • To follow the data freely across technological and institutional boundaries, developing open networks that invite participation and use.
  • To develop a floating pool of collaborators, both inside and outside of academia, who are able to come and go, contributing whatever and whenever they can.
  • To make everything public, accessible and standards-compliant, so that even if the project stalls it could be picked up and developed by someone else.

Most of all I just want to be able to do it. I don’t want to second-guess the ARC. I don’t want to spend months negotiating with potential partners or begging for an institutional home. I want to build, experiment and play. I want to make a start.

So that’s what we’re going to do.

We have a topic, plenty of raw materials, some basic principles and the beginnings of a plan. We even have a name — Invisible Australians: Living under the White Australia Policy.

As the project develops, I’ll be blogging here about some of the technical stuff, while Kate will be exploring the content over at the tiger’s mouth. I hope to have a prototype of the transcription tool ready to demo at THATCamp Canberra, while Kate is already at work putting together guides on using the records and developing an Omeka site that follows a number of Chinese-Australian families through the archives.

Can we hack together a major research project? Let’s find out.