Saturday, March 19, 2022

Open Scholarship and the Digital Humanities

I’ve been taking the Program for Open Scholarship and Education (POSE) , a flexible and blended MOOC created at UBC. One of the modules cover a key area of interest I have in open scholarship in the humanities, with the question of whether open scholarship methods were possible in the humanities. One of the readings, Rik Peels’ “Replicability and replication in the humanities'” argues why such replication in the humanities is not only possible but also desirable. I want to use the field of digital humanities to answer the question whether we should also pursue replication in the humanities and its positive impacts on open scholarship and research.

While there has been much debate about whether the humanities add value to scholarship and support preparation for students in their lifelong pursuits, it’s often been subjective, if not polemic, arguments are not necessarily supported by tangible or empirical evidence. I see the digital humanities (DH) as a field that is not only feasible for open research but provides new discoveries and interdisciplinary scholarship that benefits more than just intellectual outputs but also those students who study them.

Skeptics point to whether it is at all possible for “empirical studies in the humanities are often such that an independent repetition of it, using similar or different methods and conducted under similar circumstances, can be carried out” at all (Peels, 2019). Digital technologies have enabled researchers to use techniques that can help reproduce and replicate research findings, and that is a powerful approach to positioning the humanities to the scholarship that has been employed in the sciences.

Ted Underwood is a literary and digital humanities scholar that I follow and offers an insightful case of his impact on the humanities. Underwood often shares his research findings and links from his articles and books to his blog which shares links to data and code that support certain blog posts under a category of open data. For instance, Underwood and Jordan Sellers’ “The Emergence of Literary Diction” is an excellent example how the humanities can prioritize computational reproducibility, and practitioners can pass off all of the inputs (data, scripts, etc) of a project to empower other researchers to reproduce the findings for not only peer review but also enrich the research with additional new findings.

In their research, Underwood and Sellers ask the question of when did literary diction differentiate itself from nonfiction prose? He looks back to literature in the 18th and 19th centuries and through the use of textual analysis comes to the conclusion that literary fiction writers relied much more heavily on the older part of the lexicon. He does this by “counting” the number of words (the most common ten thousand) that entered English before 1150 and dividing it by the number of words that entered the language between 1150 and 1699.

What he finds is fascinating: by the end of the 19th century, a “new, sharply marked distinction between literary and nonliterary diction” in that novels used the older part of the lexicon at a rate almost double that of nonfiction prose. Prior to 1600, there was little distinction between poetry, non-fiction, and fiction. Underwood uses the programming language R and shares the scripts on GitHub.

What DH projects such as these is the possibility of computational replicability using the same workflow to be used by other researchers. One can conceivably use a different dataset of English texts (perhaps Project Gutenberg, just as an example) that could yield different but expected results using the same R scripts that Underwood provides. Most of the visualizations presented in the article are derived from a collection of 4,275 documents from the Eighteenth Century Collections Online (ECCO) which is open data.

However, while Peel’s argument that carrying out replication studies in the humanities is not only desirable, we “should actually frequently carry out such independent repetitions of published studies,” we also need to consider the advantages of the resources of the Global North. The rise of digital humanities certainly looks very different from the “traditional” analogue techniques of earlier times, when a close reading of texts relies on hermeneutical methods, based on subjective qualitative interpretations. Researchers don’t need to rely on just the expertise of the written word, they can reproduce the same findings themselves. It also impels students to utilize those skills and tools used more commonly in literate programming (using R, Python, just to name a few) that can be both useful to research, but also knowledge used in life beyond academia.

No comments: