Using Function Words for Authorship Attribution: Bag-Of-Words vs. Sequential Rules - Sorbonne Université
Communication Dans Un Congrès Année : 2015

Using Function Words for Authorship Attribution: Bag-Of-Words vs. Sequential Rules

Résumé

Authorship attribution is the task of identifying the author of a given document. Various style markers have been proposed in the literature to deal with the authorship attribution task. Frequencies of function words have been shown to be very reliable and effective for this task. However, despite the fact that they are state-of-the-art, they basically rely on the invalid bag-of-words assumption, which stipulates that text is a set of independent words. In this contribution, we present a comparative study on using two different types of style marker based on function words for authorship attribution. We compare the effectiveness of using sequential rules of function words as style marker that do not relay on the bag-of-words assumption to that of the frequency of function words which does. Our results show that the frequencies of function words outperform the sequential rules.
Fichier principal
Vignette du fichier
Authorship_identification_BOUKHALED.pdf (81.7 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01198407 , version 1 (12-09-2015)

Identifiants

Citer

Mohamed Amine Boukhaled, Jean-Gabriel Ganascia. Using Function Words for Authorship Attribution: Bag-Of-Words vs. Sequential Rules. The 11th International Workshop on Natural Language Processing and Cognitive Science, Oct 2014, Venice, Italy. pp.115-122, ⟨10.1515/9781501501289.115⟩. ⟨hal-01198407⟩
168 Consultations
320 Téléchargements

Altmetric

Partager

More