By Louis DiPietro
On

Literature offers a lens to the past, and with new computational tools, scholars can mine entire libraries of digitized books – representing thousands of titles and billions of words – expanding the view to spot insights about culture and society across time and space. 

A new book co-authored by a pair of digital humanities scholars explores the relationship between gender and geography in British literature and offers an example of how computational analysis can be used to extract discoveries from entire digital libraries. 

Matthew Wilkens, associate professor of information science in the Cornell Ann S. Bowers College of Computing and Information Science, and Elizabeth Evans, associate professor of English at Wayne State University, wrote Gender and Literary Geography, which was published in April by Cambridge University as part of its Elements in Digital Literary Studies series. The pair computationally analyzed more than 20,000 British books published between 1800 and 2009 to investigate the relationship between gender – of both the authors and their characters – and geographic spaces in literature.  

Matthew Wilkens

A color photo of a man smiling for a portrait.

We think of our contribution as bringing the equivalent of macroeconomics – large-scale data and scientific methods – to the humanities, which hasn't generally had the capacity, and in some cases, not the inclination, to use those methods.

associate professor of information science

“We think of our contribution as bringing the equivalent of macroeconomics – large-scale data and scientific methods – to the humanities, which hasn't generally had the capacity, and in some cases, not the inclination, to use those methods,” Wilkens said.

The pair found that female characters across those 20,000 books were far more likely than male characters to inhabit public urban spaces, upending the notion that women occupied more domestic spaces while men inhabited the public sphere of work, government, and power. Male characters, researchers found, were more often situated in nature than female characters.

“That's significant because it reshapes our whole idea of urban modernity,” said Evans, who studies gender, race, and urban space in literature. “The common association of urban modernity with men doesn't have a lot of evidence when it comes to the books themselves and where the characters are circulating, at least when we look at a large corpus of texts.”

Elsewhere in the book, they found that characters written by male and female authors tended to have more traditional, gendered roles, despite greater gender parity happening in real life, researchers said.  

These findings served as a kind of warning to “be a little bit wary of making connections between what literature tells us and what things were actually like," Evans said.

The pair’s use of natural language processing (NLP) methods in Gender and Literary Geography is every bit as important as the findings they produced, highlighting how powerful computational tools can be trained on books to analyze literature at scale rather than the traditional way of reading a few books on a given subject and offering an interpretation, Wilkens said.

“The big picture here is scale,” he said. "If you have no tool to examine tens of thousands or hundreds of thousands of books, it doesn't make sense to pose questions that you would need that kind of scale to answer."

The authors’ approach differs from simply feeding digital text into a large language model (LLM) and prompting it for insights. While the use of LLMs does present attractive possibilities in the humanities, leaning on an LLM for this work would’ve cost more money and produced similar results on par with “classic” NLP tools, Wilkens added. 

“I think there tends to be some reluctance and suspicion of these computational methods, partly because they're obscure to most people working in the humanities,” Evans said. “One of the aims of the book is to make transparent a lot of the decision making that happens on the technical side that influences the outcomes and the answers to the questions we ask.”

Louis DiPietro is a writer for the Cornell Ann S. Bowers College of Computing and Information Science.