Marcin Mirończuk, Jarosław Protasiewicz, Witold Pedrycz
2019 Expert Systems with Applications, T. 130, 15 September 2019, s. 97-112
This study aims to propose (i) a multi-view text classiﬁcation method and (ii) a ranking method that allows for selecting the best information fusion layer among many variations. Multi-view document clas- siﬁcation is worth a detailed study as it makes it possible to combine different feature sets into yet an- other view that further improves text classiﬁcation. For this purpose, we propose a multi-view framework for text classiﬁcation that is composed of two levels of information fusion. At the ﬁrst level, classiﬁers are constructed using different data views, i.e. different vector space models by various machine learning algorithms. At the second level, the information fusion layer uses input information using a features pro- jection method and a meta-classiﬁer modelled by a selected machine learning algorithm. A ﬁnal decision based on classiﬁcation results produced by the models positioned at the ﬁrst layer is reached. Moreover, we propose a ranking method to assess various conﬁgurations of the fusion layer. We use heuristics that utilise statistical properties of F-score values calculated for classiﬁcation results produced at the fusion layer. The information fusion layer of the classiﬁcation framework and ranking method has been empiri- cally evaluated. For this purpose, we introduce a use case checking whether companies’ domains identify their innovativeness. The results empirically demonstrate that the information fusion layer enhances clas- siﬁcation quality. The Friedman’s aligned rank and Wilcoxon signed-rank statistical tests and the effect size support this hypothesis. In addition, the Spearman statistical test carried out for the obtained results demonstrated that the assessment made by the proposed ranking method converges to a well-established method named Hellinger – The Technique for Order Preference by Similarity to Ideal Solution (H-TOPSIS). Thus, the proposed approach may be used for the assessment of classiﬁer performance.