Automating the analysis of concepts through time

University of Sheffield (United Kingdom)

What are the components of a historically-important concept in its own time? Is the product of a machine-built data-driven historical thesaurus different from that constructed through human intuition about meaning? Taking forward tools and methods of thesaurus creation including automated distributional semantic methods (e.g. word embedding), lexicographical methods (e.g. Historical Thesaurus of English at Glasgow University), and linguistic concept modelling developed by the Linguistic DNA project, this project will explore the state of the art and best practice across multiple disciplines to construct a historical thesaurus of English automatically using large historical datasets. The PhD candidate should: 1) comprehensively evaluate bottom-up, context-driven approaches to thesaurus creation; 2) synthesise, develop, and apply a new approach to automatic thesaurus creation based on a range of existing methods across disciplines; 3) evaluate the success of these approaches compared to previous approaches used alone. The PhD candidate will have access to EEBO-TCP (including semantic tagging using SAMUELS); the Historical Thesaurus of English; ECCO; British Library Historical Newspapers; and the GloWBE corpus. Expected results include a new computational method for automatically generating a historical thesaurus of English; case studies implementing the new method; an evidenced account of the relationship/differences between thesaurus creation across disciplines.