OPI-JSA at CLEF 2017: Author Clustering and Style Breach Detection

Daniel Karaś, Martyna Śpiewak, Piotr Sobecki

2017 W: CLEF 2017 Working Notes / Linda Cappellato, Nicola Ferro, Lorraine Goeuriot, Thomas Mandl; [S. l.]: CEUR-WS

Conference and Labs of the Evaluation Forum. Dublin, 2017-09-11 - 2017-09-14

In this paper, we propose methods for author identification task dividing into author clustering and style breach detection. Our solution to the first problem consists of locality-sensitive hashing based clustering of real-valued vectors, which are mixtures of stylometric features and bag of n-grams. For the second problem, we propose a statistical approach based on some different tf-idf features that characterize documents. Applying the Wilcoxon Signed Rank test to these features, we determine the style breaches.