Digital Cultures, Big Data and Society

On 15-16 February 2018, the SouthHem team attended a conference on “Digital Cultures, Big Data and Society” organized by Prof. Emilie Pine, UCD School of English, Drama and Film.

Industrial Memories

Before raising some of the ideas that emerged from the conference papers, I’d like to foreground Prof. Pine’s launch of her, Prof. Mark Keane, and Dr. Susan Leavy’s important project “Industrial Memories”, funded by the Irish Research Council New Horizons 2015. The project provides a digital and searchable version of the Ryan Report (2009), a report that emerged from the Commission to Inquire into Child Abuse (2000).

The sheer size of the Ryan Report has meant that its findings are rarely engaged with by the public. Unlike the Report, which is a linear narrative of events across 5 volumes or 2,600 pages, the Industrial Memories database allows for a variety of keyword and thematic searches of the material within the Report. The mission statement of the Industrial Memories project is: “To act as witnesses to this history and increase access to the report”.

Industrial Memories essentially treats the Ryan Report as a data corpus. Using textual analytics (such as word-embedding and association rule analysis), Prof. Keane explained how a series of string searches using keywords allowed the team to build up both the histories of the 86 identified abusers, and an analysis of how and where abusers were transferred after complaints. In particular, the team looked for meaningful sequences in patterns of transfer to ascertain if there was a policy around how abusers were dismissed or relocated. Using various social network methodologies, Dr. Leavy explained how not only transfers but also patterns of correspondence could be mapped between parents, government departments, and clergy. The results showed how many people, and critically MPs and government departments, were aware of reported abuses. She also showcased a colocation map showing how the names of various people were linked together in the Report.

Supplementing this kind of digital reading of the Ryan Report is a mobile app walking tour called “Echoes from the Past” aimed at providing users with some insight into children’s experiences of the Irish industrial school system. The app was designed by Tom Lane, Maeve Cassidy, and Mick O’Brien. Another important output of the project is a virtual reality platform, “I.S. Complex”, designed by John Buckley of IADT, allowing the viewer to “walk through” Carriglea Park Industrial School (1894-1954).

In her closing remarks, Prof. Pine raised the idea of digital tools as a new way of witnessing past events (i.e. belated witnessing). At the same time, she noted that memory culture can and should be future-oriented; in other words, it can galvanize activist engagement for justice and social change.

Panels and Plenaries: Day 1

Unfortunately, I was unable to attend all the panels at the conference; nor am I able to do justice to the many interesting ideas raised by individual speakers.  I would, however, like to draw attention to the papers most relevant to the SouthHem project. On the first day, the “Digital Books” panel raised a number of important questions concerning issues of scale in the field of book history. Sara Kerr’s paper, “Enhanced not ‘Distant’: Examining Independence in the Novels of Austen, Edgeworth, and Owenson”, argued that close and distant reading are not mutually exclusive. Looking at 25 novels by Austen, Edgeworth, and Owenson, Kerr applied “middle distance” reading techniques by analyzing novels both at an individual text and at a corpus level. Using word-embedding and frequency analysis techniques, she showcased a number of collocations of words related to the theme of female independence, such as “education”. Kerr ended by noting that digital methodologies use metaphors and spatial analogies that already exist in literary studies.

Marie-Louise Coolihan’s paper, “At-scale Questions: Patterns in the Reception and Circulation of Early Modern Women’s Writing”, centred on her exciting ERC-funded project: RECIRC or “The Reception and Circulation of Early Modern Women’s Writing, 1550-1700”. A study of the intellectual impact of women writers, RECIRC looks at 4,019 different pieces of evidence relating to the reception of women writers from 1,353 different sources, involving 2,221 people (of which 1,900 are women). The project asks: How did women build reputations as writers? And how did gender shape ideas about authorship? The visualisations presented demonstrated 1. the most common genres/ sources of evidence (manuscripts and letters); various forms of attribution (names, initials, pseudonyms); places of reception (weighted according to the number of receptions); network analyses (e.g. the correspondence networks of a convent in Flanders); reception networks; and the most commonly circulated authors.

Justin Tonra’s paper considered the current state of large-scale digital bibliography, and the ways in which linked open data and ontology techniques can create new opportunities for analysis. Drawing on Simon Eliot’s insights, Tonra noted that both digitizing and linking existing bibliographical resources is important, as calibrating the known often reveals the unknown. Yet Tonra cautioned against reifying data, arguing that the best examples of book history in recent times have involved a marriage of quantitative and qualitative analysis (Bode, Joshi, Weedon). He also noted that Irish-focused resources can contribute to the developing focus on large-scale digital book history. His own work on the digitization of Mary Pollard’s Dictionary of Members of the Irish Book Trade, 1 550-1800 is a case in point. As an eclectic and sprawling printed work, Tonra noted the difficulty of categorizing and granulizing the entries in Pollard’s Dictionary, showcasing the ways in which a tool such as BIBFRAME could provide innovation at the level of meta-data.

Alison Booth’s plenary paper, “Biographical Networks, Gendered Types, and the Challenge of Mid-Range Reading”, began by showcasing her database “Collective Biographies of Women”, which contains the biographies of some 14,016 women and is increasingly being linked to “Social Networks in Archival Contexts” (SNAC). Booth pointed to the importance of diversity and social justice within the field of digital humanities; and the danger, noted by Lauren Klein, of inadvertently re-inscribing the power dynamics we intend to critique. She promoted the power of digital humanities to recover forgotten or marginalized peoples, and listed as “Recovery 3.0” examples projects such as “Coloured Conventions”, “Historical Slave Trade”, “Orlando”, “Women Writers Online”, and “Poetess Archive”.

Arguing for the significance of both the particular lives of texts and large-scale patterns in the history of women, Booth noted that biographies and especially nationalities are often slippery. Assigning typologies, in other words, can be difficult. Booth looked at two particular case studies: a collection of essays on Women Novelists of Queen Victoria’s Reign: A Book of Appreciations (London: Hurst & Blackett, 1897), in relation to which she considered ethnic geographies (in particular, questions of Irishness); and Monroe A. Major’s Noted Negro Women (Chicago, Donohue & Henneberry, 1893), which provides biographies of 112 women. Quoting from the latter, Booth noted that women have long been an index for the status of various civilisations: “A race, no less than a nation, is prosperous in proportion to the intelligence of its women.” Yet she also noted: 1. the extent to which women have been pushed out of the field of data technologies; 2. the masculine rhetoric behind big data and machine-based learning; and 3. the blunt way that gender is often used/applied in digital humanities. Warning us against the temptations of universalism in digital humanities, Booth suggested that we need to mediate between the close and the distant, using tools and techniques from both.

Panels and Plenaries: Day 2

On the second day, Michelle Doran’s paper, “The Two Cultures in the Digital Age”, considered her research on 34 digital humanities labs (primarily in the Anglophone world). Looking at the so-called “two cultures” debate (arts and letters v. science), Doran argued that even within the hard sciences there is a division between empiricist attitudes towards big data that involve hypotheses etc. and machine-driven models that argue against theoretical approaches. Drawing on Chris Anderson’s idea that in big data “correlation supersedes causation”, Doran noted the extent to which data is now (mistakenly?) considered to be free of value judgment or theory. Doran also noted the misapplication of the term “big data” in the humanities, arguing that most of our data sets are only “relatively” big. She argued that digital humanists should therefore be using methodologies designed for “little data”.

Also of great interest was Georgina Nugent-Foley’s paper on the Knowledge Complexity Project at Trinity College Dublin. This project looks at the impact of bias in big data projects (especially at the implicit truth claims of algorithmic or digital methodologies). The paper began by considering definitions of data. Drawing on Daniel Rosenberg and Christine L. Borgman’s work, data was defined as pre-analytical and pre-factual, without “truth” or “reality”. But data is also rhetorical and performative, taking its meaning from the various contexts in which it is read. Nugent-Foley focused in particular on the biases in data cleaning. Attempts to provide more refined data involve a number of process assumptions. NASA’s grading of data from raw data through to various levels of processing demonstrates an awareness of the levels of data, but differences between input and output data are often unacknowledged, as are the various value sets attached to data cleaning. Data is also changed by, and takes on meaning from, its contexts, which can themselves be redacted or hidden. Nugent-Foley noted, for example, that native contexts can be changed by the contexts of metadata: high-level non-granular categorization of data can produce very different results to granularity.

Geoffrey Rockwell’s plenary paper, “Thinking-Through Big Data in the Humanities”, considered data as a citizenship issue, focusing on the ethics of big data research. Unlike previous speakers, Rockwell defined big data not just in terms of size or volume but also in terms of velocity and variety. He argued for a definition of big data that involves a scale of data that propels the need to build new technologies. Big data, in other words, changes how we imagine evidence, the questions we ask of it, and the tools we need to understand it. This is particularly the case in relation to “complete” data sets versus representative samples. Rockwell recommended our Maynooth colleague Rob Kitchin’s book, The Data Revolution (2014), as an excellent example of the kinds of discussions currently surrounding data; for example, arguments that theory-free hypotheses are now possible because we no longer need to worry about how representative our data sets or samples are.

Arguing that the humanities has been dealing with big data all along, Rockwell began by outlining the history of how the humanities has worked with big data. As early as 1972 a group of classicists set up the “Thesaurus Linguae Graecae” project that scanned large swathes of Greek and Latin literature. Other examples Rockwell discussed include FRANTEXT created in France in 1984, which includes a study of the use of words in the French language through time. The same group developed the STELLA remote-user access system. Rockwell also considered the work of the French social scientist Jean-Paul Benzécri, whose work on interpretive statistics developed correspondence analysis, text-mining, and early visualization techniques. His work had little impact outside of France until Pierre Bourdieu brought the methodologies of correspondence analysis to international attention in Distinctions (1979). While the French were using statistical techniques and text mining in “grands corpus”, in the Anglophone world most involved in “humanities computing” were arguing about hypertext. As Rockwell pointed out, we had to wait for Franco Moretti to rediscover text mining before it made its full impact in the English-speaking world.

In a call for us to start reading the documents of big data, Rockwell moved on to considering why the humanities are uniquely suited to the study of the materiality of big data; that is, its documents and infrastructure. He noted that powerpoints or slide decks are a major genre of multimedia communication about big data, as well as being the communicative currency of commercial and government big data. Arguing that powerpoints are a form of “grey literature” that we should be “reading”, Rockwell went on to show how the methods of reading big data involve hermeneutic tools that humanists already use daily. The fact that they can be automated and replicated on a large scale does not change their essential nature.

Rockwell maintained that the humanities has a responsibility to lead discussions about the ethics of big data, and to promote literacy around these issues. He noted his involvement in a project called Gamergate Reactions, which is essentially an archive of material relating to Gamergate, particularly tweets and newspaper articles. Gamergate revealed the deep gender and political divisions in the gamer community. Rockwell notes that gamers were very aggressive towards researchers and archivists involved in analysis of the Gamergate phenomenon, especially towards feminist and female researchers. For Rockwell, this sort of toxic data is revealing of the ethical questions surrounding archiving. In the hierarchy of academic value, archivists are often relegated to service work. But Rockwell argued that archives are power; and that we need to treat the stewardship of cultural memory as scholarship in itself.

Drawing on the Latin etymology of the word data, Rockwell went on to reiterate that data is not an objective given or ground truth but rather a “hand-off”. Structuring data involves down-stream interpretation or what Jerome McGann has called “histories of transmission”. Acknowledging this involves acknowledging the ethical implications of our choices surrounding data. Rockwell maintained that we need to adopt a feminist ethics of care towards not only the data but also towards the people named in the data and the researchers who work on it. At the same time, ethics is not a problem to be solved but rather a dialogue or iterative process. This is a dialogue that Rockwell himself is particularly well-suited to engaging in since he is by all accounts a wonderful mentor to younger colleagues, as well as a rigorous and transparent archivist and theorist of data.

 

Porscha Fermanis is Professor of Romantic Literature at University College Dublin. Her research interests include global Romanticisms and colonial book history; Romantic historicism and the philosophy of history; the relationship between Enlightenment and Romanticism; and the work of John Keats. Her current research for the SouthHem project focuses on literary appreciation and the history of reading in the Straits Settlements.

Image: Jean-Paul Benzécri, October 2006, INA-PG. Photo by Guiseeppe Giordano, University of Saleme. Source: Modulad, No. 35. Dec 2006, INRIA.

Similar Posts