Explicit Aspect Annotation via Transfer and Active Learning
Résumé
We present a semi-supervised annotation process for identifying and labelling explicit aspects of an initially unlabelled corpus. Firstly, we employ cross-domain learning to pre-annotate the initial data, deliberately excluding domain-related input features to ensure effective learning transfer. Then, we apply an active learning strategy to enhance the pre-annotation performance and enrich the learning data. We adjust the strategy to sequence labeling and address class imbalance. We evaluate this process using two unlabelled datasets in French, consisting of user opinions on beauty products and electronic devices, respectively. The results show an improved F1-score achieved by increasing and correcting 30% of the training dataset.
Domaines
Informatique [cs]Origine | Fichiers produits par l'(les) auteur(s) |
---|