Computational Study of Stylistics: A Clustering-based Interestingness Measure for Extracting Relevant Syntactic Patterns
Résumé
In this contribution, we present a computational stylistic study of the French classic literature texts based on a data-driven approach where discovering interesting linguistic patterns is done without any prior knowledge. We propose an objective interestingness measure to extract meaningful stylistic syntactic patterns from a given author’s work. Our hypothesis is based on the fact that the most characterising linguistic patterns should significantly reflect the author’s stylistic choice in that the positions of theirs occurrences are controlled by the author’s purpose, while the irrelevant linguistic patterns are distributed randomly in the text. Since it does not rely on the counts of occurrences of the syntactic patterns in texts, this measure can work reasonably well with both large and small text samples. The analysed results show the effectiveness in extracting interesting syntactic patterns from a single text, and this seems particularly promising for the analyses of such texts that, for their characteristics or for historical reasons, cannot support a comparative study.