Computational and Linguistic Issues in Designing a Syntactically Annotated Parallel Corpus of Indo-European Languages

Dag T. T. Haug, Marius L. Jøhndal, Hanne M. Eckhoff, Eirik Welo, Mari J. B. Hertzenberg, Angelika Müth
 
Department of Philosophy, Classics, History of Arts and Ideas
P.O. Box 1020 Blindern
N-0315 Oslo
Norway
 
This paper reports on the development of the PROIEL parallel corpus of New Testament texts, which contains the Greek original of the New Testament and its earliest Indo-European translations, into Latin, Gothic, Old Church Slavic and Classical Armenian. A web application has been constructed specifically for the purpose of annotating the texts at multiple levels : morphology, syntax, alignment at sentence, dependency graph and token level, information structure and semantics. We describe this web application and our annotation schemes. Although designed for investigating pragmatic resources, the corpus with its rich annotation is an important resource in contrastive and historical Indo-European syntax and pragmatics, easily expandable to include other old Indo-European languages.