PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation

Clément Rebuffel; Laure Soulier; Geoffrey Scoutheeten; Patrick Gallinari

Communication Dans Un Congrès Année : 2020

PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation

(1, 2) , (1) , (2) , (1, 3)

1
2
3

Clément Rebuffel

Fonction : Auteur

Machine Learning and Information Access

BNP-Paribas

Laure Soulier

Fonction : Auteur
PersonId : 8070
IdHAL : soulierl
ORCID : 0000-0001-9827-7400
IdRef : 189293683

Machine Learning and Information Access

Geoffrey Scoutheeten

Fonction : Auteur

BNP-Paribas

Patrick Gallinari

Fonction : Auteur
PersonId : 751615
IdHAL : patrick-gallinari
ORCID : 0000-0001-9060-9001
IdRef : 070709076

Machine Learning and Information Access

Criteo AI Lab

Résumé

In language generation models conditioned by structured data, the classical training via maximum likelihood almost always leads models to pick up on dataset divergence (i.e., hallucinations or omissions), and to incorporate them erroneously in their own generations at inference. In this work, we build ontop of previous Reinforcement Learning based approaches and show that a model-agnostic framework relying on the recently introduced PARENT metric is efficient at reducing both hallucinations and omissions. Evaluations on the widely used WikiBIO and WebNLG benchmarks demonstrate the effectiveness of this framework compared to state-of-the-art models.

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

2010.10866.pdf (390.44 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Laure Soulier : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-03479883

Soumis le : mardi 14 décembre 2021-15:05:38

Dernière modification le : mercredi 30 octobre 2024-13:32:24

Archivage à long terme le : mardi 15 mars 2022-19:14:45

Dates et versions

hal-03479883 , version 1 (14-12-2021)

Identifiants

HAL Id : hal-03479883 , version 1

Citer

Clément Rebuffel, Laure Soulier, Geoffrey Scoutheeten, Patrick Gallinari. PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation. Proceedings of the 13th International Conference on Natural Language Generation, INLG 2020, Dec 2020, Dublin, Ireland. ⟨hal-03479883⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIP6 SORBONNE-UNIVERSITE SU-SCIENCES

56 Consultations

37 Téléchargements

PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager