Learning Reading Order via Document Layout with Layout2Pos

Laura Nguyen; Benjamin Piwowarski; Julio Laborde; Gilles Moyse

doi:10.1007/978-3-031-72437-4_1

Communication Dans Un Congrès Année : 2024

Learning Reading Order via Document Layout with Layout2Pos

(1, 2) , (2, 3) , (1) , (1)

1
2
3

Laura Nguyen

Fonction : Auteur

reciTAL

Machine Learning and Information Access

Benjamin Piwowarski

Fonction : Auteur
PersonId : 9362
IdHAL : benjamin-piwowarski
ORCID : 0000-0001-6792-3262
IdRef : 226846601

Machine Learning and Information Access

Centre National de la Recherche Scientifique

Julio Laborde

Fonction : Auteur

reciTAL

Gilles Moyse

Fonction : Auteur

reciTAL

Résumé

Due to their remarkable performance, general-purpose multimodal pre-trained language models have gained widespread adoption for Document Understanding tasks. The majority of pre-trained language models rely on serialized text, extracted using either Optical Character Recognition (OCR) or PDF parsing. However, accurately determining the reading order of visually-rich documents (VrDs) is challenging, potentially affecting the accuracy of the extracted text and leading to sub-optimal performance in downstream tasks. For information extraction tasks, where entity recognition is commonly framed as a sequence-labeling task, incorrect reading order can hinder entity labeling. In this work, we avoid reading order issues by discarding sequential position information. Based on the intuition that layout contains the information for correct reading order, we present Layout2Pos – a shallow Transformer designed to generate position embeddings from layout. Incorporated into a BART architecture, our approach demonstrates competitiveness with models dependent on reading order across three benchmark datasets for information extraction. We also show that evaluating models using a reading order different from the one seen during training can result in substantial performance drops, thereby highlighting the importance of not relying on the reading order of documents.

Domaines

Informatique et langage [cs.CL] Intelligence artificielle [cs.AI]

Fichier sous embargo

0	―	10	―	24
Année		Mois		Jours

Avant la publication
mardi 18 novembre 2025

Benjamin Piwowarski : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-04718874

Soumis le : lundi 18 novembre 2024-11:10:58

Dernière modification le : mardi 19 novembre 2024-03:27:34

Dates et versions

hal-04718874 , version 1 (18-11-2024)

Licence

Identifiants

HAL Id : hal-04718874 , version 1
DOI : 10.1007/978-3-031-72437-4_1

Citer

Laura Nguyen, Benjamin Piwowarski, Julio Laborde, Gilles Moyse. Learning Reading Order via Document Layout with Layout2Pos. Linking Theory and Practice of Digital Libraries, Sep 2024, Ljubbljana, Slovenia. pp.3-19, ⟨10.1007/978-3-031-72437-4_1⟩. ⟨hal-04718874⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS ISIR SORBONNE-UNIVERSITE SU-SCIENCES ISIR_MLIA

20 Consultations

5 Téléchargements

Learning Reading Order via Document Layout with Layout2Pos

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager