Introduction

Marie Candito^* et Mark Liberman^**

^*LLF - Université Paris Diderot / CNRS

^**University of Pennsylvania

Résumé

Les corpus annotés sont toujours plus cruciaux, aussi bien pour la recherche scientifique
en linguistique que le traitement automatique des langues. Ce numéro spécial passe
brièvement en revue l’évolution du domaine et souligne les défis à relever en restant dans le
cadre actuel d’annotations utilisant des catégories analytiques, ainsi que ceux remettant en
question le cadre lui-même. Il présente trois articles, l’un concernant l’évaluation de la qualité
d’annotation, et deux concernant des corpus arborés du français, l’un traitant du plus ancien
projet de corpus arboré du français, le French Treebank, le second concernant la conversion
de corpus français dans le schéma interlingue des Universal Dependencies, offrant ainsi une
illustration de l’histoire du développement des corpus arborés.

Résumé (en anglais)

Annotated corpora are increasingly important for linguistic scholarship, science
and technology. This special issue briefly surveys the development of the field and points to
challenges within the current framework of annotation using analytical categories as well as
challenges to the framework itself. It presents three articles, one concerning the evaluation of
the quality of annotation, and two concerning French treebanks, one dealing with the oldest
project for French, the French Treebank, the second concerning the conversion of French corpora
into the cross-lingual framework of Universal Dependencies, thus offering an illustration
of the history of treebank development worldwide.

Paru dans

Corpus annotés

Document

TAL_60_2_0.pdf

Rank

0