Kategorie

Empirical evaluation of feature projection algorithms for multi-view text classification

Marcin Mirończuk, Jarosław Protasiewicz, Witold Pedrycz

2019 Expert Systems with Applications, T. 130, 15 September 2019, s. 97-112

This study aims to propose (i) a multi-view text classification method and (ii) a ranking method that allows for selecting the best information fusion layer among many variations. Multi-view document clas- sification is worth a detailed study as it makes it possible to combine different feature sets into yet an- other view that further improves text classification. For this purpose, we propose a multi-view framework for text classification that is composed of two levels of information fusion. At the first level, classifiers are constructed using different data views, i.e. different vector space models by various machine learning algorithms. At the second level, the information fusion layer uses input information using a features pro- jection method and a meta-classifier modelled by a selected machine learning algorithm. A final decision based on classification results produced by the models positioned at the first layer is reached. Moreover, we propose a ranking method to assess various configurations of the fusion layer. We use heuristics that utilise statistical properties of F-score values calculated for classification results produced at the fusion layer. The information fusion layer of the classification framework and ranking method has been empiri- cally evaluated. For this purpose, we introduce a use case checking whether companies’ domains identify their innovativeness. The results empirically demonstrate that the information fusion layer enhances clas- sification quality. The Friedman’s aligned rank and Wilcoxon signed-rank statistical tests and the effect size support this hypothesis. In addition, the Spearman statistical test carried out for the obtained results demonstrated that the assessment made by the proposed ranking method converges to a well-established method named Hellinger – The Technique for Order Preference by Similarity to Ideal Solution (H-TOPSIS). Thus, the proposed approach may be used for the assessment of classifier performance.

https://www.sciencedirect.com/science/article/pii/S0957417419302507?via%3Dihub