Apprentissage non-supervisé de la morphologie des langues à l’aide de modèles bayésiens non-paramétriques

Abstract : A crucial issue in statistical natural language processing is the issue of sparsity, namely the fact that in a given learning corpus, most linguistic events have low occurrence frequencies, and that an infinite number of structures allowed by a language will not be observed in the corpus. Neural models have already contributed to solving this issue by inferring continuous word representations. These continuous representations allow to structure the lexicon by inducing semantic or syntactic similarity between words. However, current neural models only partially solve the sparsity issue, due to the fact that they require a vectorial representation for every word in the lexicon, but are unable to infer sensible representations for unseen words. This issue is especially present in morphologically rich languages, where word formation processes yield a proliferation of possible word forms, and little overlap between the lexicon observed during model training, and the lexicon encountered during its use. Today, several languages are used on the Web besides English, and engineering translation systems that can handle morphologies that are very different from western European languages has become a major stake. The goal of this thesis is to develop new statistical models that are able to infer in an unsupervised fashion the word formation processes underlying an observed lexicon, in order to produce morphological analyses of new unseen word forms.
Complete list of metadatas

Cited literature [64 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02354184
Contributor : Abes Star <>
Submitted on : Thursday, November 7, 2019 - 3:59:17 PM
Last modification on : Saturday, November 9, 2019 - 1:37:49 AM

File

76238_LOSER_2019_archivage.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02354184, version 1

Citation

Kevin Löser. Apprentissage non-supervisé de la morphologie des langues à l’aide de modèles bayésiens non-paramétriques. Informatique et langage [cs.CL]. Université Paris-Saclay, 2019. Français. ⟨NNT : 2019SACLS203⟩. ⟨tel-02354184⟩

Share

Metrics

Record views

62

Files downloads

20