Using Natural Language Processing Tools in Historical Research & Teaching

An Interview with Merve Tekgürler, Ph.D. Candidate in History and 2023-24 Mellon/ACLS Dissertation Innovation Fellow

Ph.D. candidate
Merve Tekgürler is awarded one of the forty-five 2023-24 Mellon/ACLS Dissertation Innovation Fellowships. Administered by American Council of Learned Societies (ACLS) with a grant from Andrew W. Mellon Foundation, the Mellon/ACLS Dissertation Innovation Fellowships support emerging scholars’ at the formative stage of dissertation development in order to foster practical, trans- or interdisciplinary, collaborative, critical, or methodological innovations in doctoral research. 


Merve Tekgürler
is finishing their 4th year in the doctoral program at Stanford Department of History. Their Mellon/ACLS fellowship project, titled “Crucible of Empire: Danubian Borderlands and the Making of Ottoman Administrative Mentalities,” balances traditional historical methods with computational text analysis for an interpretive study of a large corpora of primary resources. We chatted with Merve about their doctoral research and the significant potential of using digital methods in the History field. 


Congratulations on your Mellon/ACLS Fellowship. Could you tell us about your dissertation project? 


My dissertation focuses on Ottoman news and information networks in the Danubian borderlands of the Empire and the adjacent territories during the long-18th century. I study handwritten news communiques produced in the borderlands by Ottoman bureaucrats, scribal staff of the Greek governors of the Romanian, and that of the Khans of Crimea. The contents,  news and information, were supplied by all sorts of actors on the ground whose efforts shaped what the Ottoman administration knew about the world outside of its borders. I approach these news summaries at two different methodological levels: through a close historical reading and interpretation, and through a computational text analysis. While questions about the reception and the perception of the Partitions of Poland-Lithuania in the Ottoman sources require dense, archival research, there are other questions that can only be answered at the aggregate level. The news communiques identify their sources. Sometimes local authorities gathered news from their spies, other times from newspapers printed in European cities. Reading hundreds of these communiques through computational methods offers insights into practices of communication, patterns of trust building and ensuring reliability of information as well as political discourse. 


How will the Mellon/ACLS Dissertation Innovation Fellowship support your PhD program? 


Mellon/ACLS Dissertation Innovation Fellowship (DIF) supports the methodologically innovative aspects of my dissertation research. I work with the tools of natural language processing (NLP), a branch of artificial intelligence and computational linguistics. I am exploring how applying these technologies to the study of historical texts, particularly large text corpora, can offer new insights into historical research. The fellowship allows me to further pursue my training in NLP methods, which includes taking classes in Computer Science and Linguistics. Luckily, Stanford is the leading institution for NLP research. I have already learnt a lot and will continue this trajectory into the next academic year. Moreover, by adapting transdisciplinary, computational methods to Ottoman Turkish, I will contribute to the advancement of research in an under-resourced language. Currently, I am comparing the performance of Turkish NLP toolkits, language models developed for Modern Turkish, and general purpose Large Language Models (such as ChatGPT) on tasks using Ottoman Turkish.


Could you show us a simple example of how all this works out in your research?

This image is a word cloud of one of the 100 topics that I modeled using
MALLET, a topic modeling algorithm, on a corpus of 18th and 19th century Ottoman court histories. I used about 1.5 million words for this task. This algorithm goes through this corpus and clusters words together based on their similarity using statistical measures. The word sizes in the word cloud reflect how closely those words are related to the topic. I then interpret this clustering in relation to other topics that were produced by the algorithm. 

 

 

My interpretation of this topic is that it is about Russia and warfare. It is worth noting that France or Austria did not show up in this topic but did in other ones, which suggests that the ways in which the works in this corpus speak about Russia is different from the ways in which they speak about other clusters of states. This is a proof-of-concept which I am refining and developing further. 

It is important to note that historians and humanists can- and do- contribute immensely to improving the existing tools. For instance, as you may note, the word cloud above is in Ottoman Turkish, but I had to remove the non-ASCII characters for MALLET to work. I am now experimenting with a different pipeline for the same task that can work with Unicode characters.


How do you think digital methods may further advance learning and research in History?


Digital methods are like any other approach within the discipline of history. Not every historian has to or wants to do digital history and not every archive or topic will be suitable for these approaches. Just as there are economic historians, social historians, intellectual historians and many others, there are and will be digital historians. Moreover, digital will not be the only “hat” that an historian wears, as it is never really the case with methodology. My hope is that universities like Stanford that lead the academic discipline of history in North America and beyond, will institutionalize the field of digital history. 


Currently a large portion of digital history research happens in university libraries and other programs. This is certainly amazing and leads to interdisciplinary, cross-institutional approaches and advances in digital research. However, for us historians to have a stronger influence in the directions of digital history, we should also incorporate it into our structures, ideally through the establishment of tenure-track positions in digital history or through the hiring of faculty in other existing fields for research and teaching in digital history. 


In the short term, however, I am most excited about the incorporation of content management systems and AI-assisted ways of developing mini databases of personal, archival research. One of the main issues that I faced in the archives is how to organize my documents and make sure that I have written down everything I need in the places where I need them to be. I suggest to all of my colleagues to devise a plan before going to the archives and not rely solely on a USB stick or their phones to save pictures of their sources. A notebook is of course a great, time-tested solution but it is worthwhile exploring systems that can host copies of the document, proper citations, and notes all in one place.


What are your thoughts on using digital methods in teaching? 


In the past decade, literacy of digital methods, particularly AI literacy, has become crucial for undergraduate pedagogy. It is increasingly important for us instructors to learn and to communicate to our students, how to use and how not to use generative AI in the classroom and beyond. The current moment reminds me of the discussions around Google, Google Scholar, and Wikipedia. When (re)search became almost synonymous with googling, scholars developed critical yet constructive ways to teach students how to conduct research online and how to evaluate the output of their search queries. I anticipate a similar trajectory with generative AI. We will develop ways to use these technologies in our teaching and learning and devise methods to evaluate the generated text for accuracy and representativeness.


More information on the Mellon/ACLS Dissertation Innovation Fellows >>