Linguistic Pattern Extraction and Analysis for Classic French Plays
Résumé
Great authors of fiction and theatre have the capacity of creating memorable characters
that take life and become almost as real as living persons to the readers/audience. The study
of characterization, namely of how this is achieved, is a well-researched topic in corpus stylistics:
for instance (Mahlberg, 2012) attempts to identify typical lexical patterns for memorable
Dickens’ characters by extracting those lexical bundles that stand out (namely are overrepresented)
in comparison to a general corpus. In other works, authorship attribution methods
are applied to the different characters of a play to identify whether the author has been able
to provide each of them with a “distinct” voice. For instance (Vogel & Lynch, 2008) compare
individual Shakespeare characters against the whole play or even against all plays of the
same author.
The purpose of this paper is to propose a methodology for the study characterization of several
characters in French plays of the classical period. The tools developed are meant to support
textual analysis by:
1) Verifying the degree of characterization of each character with respect to others.
2) Automatically inducing a list of linguistic features that are significant, representative for
that character.
Preliminary investigations have been conducted on plays by Moliere, cross-comparing four
protagonists from four different plays. The proposed methodology relies on sequential data
mining for the extraction of linguistic patterns and on correspondence analysis for comparison
of patterns frequencies in each character and for the visual representation of such differences.