Notes from DARIAH by Rachel McCarthy

Earlier this June, I had the pleasure of attending the 2025 DARIAH Annual Event in the historic city of Göttingen, Germany. Set in the impressive Göttingen State and University Library (SUB) and Alte Mensa, the conference brought together a vibrant community from the fields of digital humanities, cultural heritage, and data science under the unifying theme: “The Past”. Spanning four dynamic days, the event provided an inspiring forum for exploring how the digital humanities are reshaping our relationship with historical knowledge – from immersive archives and machine learning to educational games and inclusive metadata.

Of course, the primary debate focused on the place and role of artificial intelligence in digital humanities today. Dr Andrea Scharnhorst’s presentation, “Archiving for the Future Past – Multimodality and AI – Challenges and Opportunities”, explored how to enhance digital archival infrastructures with AI-driven solutions. Drawing on open-source infrastructures like the Dataverse project, the team demonstrated how innovative tools like Ghostwriter AI can support more inclusive annotation, metadata enrichment, and access across sensory modalities. Beyond the technical dimension, the presentation emphasized the importance of collaborative knowledge organization and the need to support smaller or vulnerable data communities, ensuring that the diversity of cultural expression is not lost to time – or technology. As Dr Scharnhorst explained, while AI will most certainly change human behaviour, it does not have to erode our humanity.

In another fascinating session entitled, “AI4LAM: A Collaborative Network for Reliable and Trustworthy Use of AI in Libraries, Archives, and Museums’ Historical Collections”, Dr Ines Vodopivec offered us a reality check: “It’s no longer about whether we use AI. Now, it’s about how we use it”. Her talk emphasized ethical, responsible integration of AI into archival institutions, highlighting the goal of AI4LAM in fostering a framework for organizing, sharing, and elevating the knowledge about and use of AI as well as advocating for reliable and trustworthy AI tools and services. This wasn’t just a plea to develop a common dialogue environment for implementing AI, it was an invitation to shape the tools that are already shaping us.

Equally compelling was the Dr Tianyu Yang’s presentation, “Content Analysis of Historical Datasets with Large Multi-modal Models”. This talk addressed the persistent challenges in digitizing and annotating historical documents – something which I have come up against in my own research multiple times. While OCR has long been the backbone of document transcription, its effectiveness drastically drops when applied to historical materials due to degraded quality and archaic layouts. The session explored how emerging Large Multi-modal Models, which combine the strengths of pretrained visual encoders with large language models like GPT-4 and LLaMA, can now step in to enhance this process. These models exhibit impressive capabilities, automatically generating image captions and transcriptions without the need for extensive domain-specific training. Their integration into digitization workflows offers a scalable path forward: reducing manual labour, increasing the discoverability of visual content, and significantly broadening access to historical archives. In essence, this form of AI model represents a new frontier in automated historical content enrichment, bridging the gap between text and image, and between computational efficiency and cultural preservation.

Whether over coffee at the poster session, touring the historic city centre, or chatting at the social dinner, one thing stood out: DARIAH’s strength is its people. From early-career researchers to seasoned infrastructure leads, everyone was generous with ideas and eager to collaborate. The community spirit was as memorable as the presentations themselves and the conference offered a compelling look at how the digital humanities are transforming our engagement with the past – not just through innovation, but through reflection, collaboration, and care. 

The 2025 DARIAH Annual Event made it abundantly clear that the digital humanities are not simply adopting new technologies such as AI out of necessity; they are actively shaping them to serve humanistic inquiry. Whether through inclusive infrastructure design, critical engagement with AI, or creative reinterpretations of the past, this community is working to ensure that the digital future remains grounded in cultural context and human connection. I’m already looking forward to DARIAH 2026 and the next chapter in this evolving conversation.

Rachel McCarthy is a PhD researcher at University College Cork, where she previously earned a bachelor’s degree in ‘Digital Humanities and Information Technology’. After completing a master’s degree in ‘Digital Text Analysis’ at the University of Antwerp, she now focuses on computational literary studies. Her research involves using techniques such as stylometry, natural language processing, and language models to investigate authorship attribution and writing styles, and to track semantic changes in texts beyond traditional reading methods. Passionate about advancing text analysis, she aims to uncover new insights into literary and historical texts using digital methodologies.

CASCADE is a collaboration between University College Cork, University of Sheffield, University of Helsinki, Katholieke Universiteit Leuven, and Universität des Saarlandes. Funded by Horizon Europe under the Marie Skłodowska-Curie Actions (MSCA) Doctoral Networks and the UKRO.

Recommended
Registration desk at NAACL 2025 About two months ago, I…