Skip to Main content Skip to Navigation
Conference papers

Using Function Words for Authorship Attribution: Bag-Of-Words vs. Sequential Rules

Abstract : Authorship attribution is the task of identifying the author of a given document. Various style markers have been proposed in the literature to deal with the authorship attribution task. Frequencies of function words have been shown to be very reliable and effective for this task. However, despite the fact that they are state-of-the-art, they basically rely on the invalid bag-of-words assumption, which stipulates that text is a set of independent words. In this contribution, we present a comparative study on using two different types of style marker based on function words for authorship attribution. We compare the effectiveness of using sequential rules of function words as style marker that do not relay on the bag-of-words assumption to that of the frequency of function words which does. Our results show that the frequencies of function words outperform the sequential rules.
Complete list of metadata

Cited literature [22 references]  Display  Hide  Download
Contributor : Mohamed Amine Boukhaled Connect in order to contact the contributor
Submitted on : Saturday, September 12, 2015 - 2:06:57 PM
Last modification on : Monday, March 29, 2021 - 2:47:31 PM
Long-term archiving on: : Tuesday, December 29, 2015 - 12:53:46 AM


Files produced by the author(s)



Mohamed Amine Boukhaled, Jean-Gabriel Ganascia. Using Function Words for Authorship Attribution: Bag-Of-Words vs. Sequential Rules. The 11th International Workshop on Natural Language Processing and Cognitive Science, Oct 2014, Venice, Italy. pp.115-122, ⟨10.1515/9781501501289.115⟩. ⟨hal-01198407⟩



Record views


Files downloads