How Bad are PoS Tagger in Cross-Corpora Settings? Evaluating Annotation Divergence in the UD Project - Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

How Bad are PoS Tagger in Cross-Corpora Settings? Evaluating Annotation Divergence in the UD Project

Guillaume Wisniewski
François Yvon

Résumé

The performance of Part-of-Speech tagging varies significantly across the treebanks of the Universal Dependencies project. This work points out that these variations may result from divergences between the annotation of train and test sets. We show how the annotation variation principle, introduced by Dickinson and Meurers (2003) to automatically detect errors in gold standard, can be used to identify inconsistencies between annotations; we also evaluate their impact on prediction performance.
Fichier principal
Vignette du fichier
N19-1019.pdf (347.34 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-02055137 , version 1 (14-06-2019)

Identifiants

  • HAL Id : hal-02055137 , version 1

Citer

Guillaume Wisniewski, François Yvon. How Bad are PoS Tagger in Cross-Corpora Settings? Evaluating Annotation Divergence in the UD Project. 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Jun 2019, Minneapolis, Minnesota, United States. pp.218 - 227. ⟨hal-02055137⟩
154 Consultations
142 Téléchargements

Partager

Gmail Facebook X LinkedIn More