“We think of our contribution as bringing the equivalent of macroeconomics – large-scale data and scientific methods – to the humanities, which hasn't generally had the capacity, and in some cases, not the inclination, to use those methods,” Wilkens said.
The pair found that female characters across those 20,000 books were far more likely than male characters to inhabit public urban spaces, upending the notion that women occupied more domestic spaces while men inhabited the public sphere of work, government, and power. Male characters, researchers found, were more often situated in nature than female characters.
“That's significant because it reshapes our whole idea of urban modernity,” said Evans, who studies gender, race, and urban space in literature. “The common association of urban modernity with men doesn't have a lot of evidence when it comes to the books themselves and where the characters are circulating, at least when we look at a large corpus of texts.”
Elsewhere in the book, they found that characters written by male and female authors tended to have more traditional, gendered roles, despite greater gender parity happening in real life, researchers said.
These findings served as a kind of warning to “be a little bit wary of making connections between what literature tells us and what things were actually like," Evans said.
The pair’s use of natural language processing (NLP) methods in Gender and Literary Geography is every bit as important as the findings they produced, highlighting how powerful computational tools can be trained on books to analyze literature at scale rather than the traditional way of reading a few books on a given subject and offering an interpretation, Wilkens said.
“The big picture here is scale,” he said. "If you have no tool to examine tens of thousands or hundreds of thousands of books, it doesn't make sense to pose questions that you would need that kind of scale to answer."
The authors’ approach differs from simply feeding digital text into a large language model (LLM) and prompting it for insights. While the use of LLMs does present attractive possibilities in the humanities, leaning on an LLM for this work would’ve cost more money and produced similar results on par with “classic” NLP tools, Wilkens added.
“I think there tends to be some reluctance and suspicion of these computational methods, partly because they're obscure to most people working in the humanities,” Evans said. “One of the aims of the book is to make transparent a lot of the decision making that happens on the technical side that influences the outcomes and the answers to the questions we ask.”
Louis DiPietro is a writer for the Cornell Ann S. Bowers College of Computing and Information Science.