Titre : |
Dimensionality Reduction in Machine Learning for Arabic Text Classification |
Type de document : |
document électronique |
Auteurs : |
Maroua Louail, Auteur ; Kara-mohamed, Chafia, Directeur de thèse |
Editeur : |
Sétif:UFA1 |
Année de publication : |
2025 |
Importance : |
1 vol (123 f.) |
Format : |
29 cm |
Langues : |
Anglais (eng) |
Catégories : |
Thèses & Mémoires:Informatique
|
Mots-clés : |
Natural language processing
Arabic text classification
Dimensionality reduction
feature extraction
Meta-features
Word embeddings. |
Index. décimale : |
004 - Informatique |
Résumé : |
Text classification is the automated process of assigning predefined labels or categories to
text based on its content. This process helps organize vast amounts of textual data, simplifies
management, enables efficient searches, and extracts valuable knowledge. The computational
analysis of the Arabic language plays a crucial role in addressing its growing global
significance. As the fourth most widely used language online, Arabic has driven the emergence
of Arabic Text Classification (ATC) as a key research area. However, the field of ATC
faces considerable challenges, primarily due to the linguistic complexity of the language and
the high computational demands of its processing, which can impact the performance of realtime
systems. This dissertation aims to bridge the gap between effectiveness and efficiency
in ATC, particularly in resource-constrained environments.
The first objective of this research is to review existing ATC techniques, including preprocessing
methods, vectorization strategies, dimensionality reduction techniques, and both
classical machine learning and deep learning models, in order to provide a comprehensive
understanding of current approaches. The second objective is to propose three innovative
methods to enhance computational efficiency through dimensionality reduction while improving
or at least maintaining high classification effectiveness. These methods are specifically
designed for Modern Standard Arabic (MSA) text classification and are evaluated
against state-of-the-art methods.
The dissertation presents the use of Principal Component Analysis (PCA), Distance-
Based Meta-Features (DBMFs) for feature extraction, and the development of a new hybrid
approach called "Tasneef ", which addresses computational challenges in Arabic text processing
and outperforms state-of-the-art deep learning models and dimensionality reduction
techniques. Through these contributions, this dissertation advances the state of the art in
ATC by focusing on dimensionality reduction, which improves classification accuracy and
reduces memory usage and runtime. |
Note de contenu : |
Sommaire
Introduction 1
1 Background and Related Works 5
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Text Classification : Key Concepts and General Pipeline . . . . . . . . . . 5
1.2.1 Text Classification Levels . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Types of Text Classification . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Arabic Text Classification General Pipeline . . . . . . . . . . . . . 7
1.3 Arabic Language Properties and TC Challenges . . . . . . . . . . . . . . . 9
1.3.1 Importance of the Arabic Language . . . . . . . . . . . . . . . . . 9
1.3.2 Arabic Varieties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.3 Arabic Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.4 Arabic Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.5 Arabic Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Text Vectorization Techniques . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.1 One-Hot Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.2 Bag-of-Words (BoW) . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.3 Term Frequency-Inverse Document Frequency (TF-IDF) . . . . . . 20
1.4.4 Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.4.1 Static Word Embeddings . . . . . . . . . . . . . . . . . 21
1.4.4.2 Contextual Word Embeddings: . . . . . . . . . . . . . . 23
1.5 Dimensionality Reduction Techniques . . . . . . . . . . . . . . . . . . . . 24
1.5.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.5.1.1 Principal Component Analysis ( PCA) . . . . . . . . . . 24
1.5.1.2 Linear Discriminant Analysis ( LDA) . . . . . . . . . . . 25
1.5.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.5.2.1 Chi-Square (χ2) Test . . . . . . . . . . . . . . . . . . . . 27
1.5.2.2 Mutual Information (MI) . . . . . . . . . . . . . . . . . 27
1.5.2.3 Information Gain (IG) . . . . . . . . . . . . . . . . . . . 28
1.6 Classical Machine Learning-Based Approach . . . . . . . . . . . . . . . . 29
1.6.1 Logistic Regression (LR) . . . . . . . . . . . . . . . . . . . . . . . 29
1.6.2 k-Nearest Neighbors (kNN) . . . . . . . . . . . . . . . . . . . . . 30
1.6.3 Decision Trees (DT) . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.6.4 Support Vector Machine (SVM) . . . . . . . . . . . . . . . . . . . 30
1.7 Deep Learning-Based Approach . . . . . . . . . . . . . . . . . . . . . . . 32
1.7.1 Convolutional Neural Network (CNN) . . . . . . . . . . . . . . . . 32
1.7.2 Recurrent Neural Network (RNN) . . . . . . . . . . . . . . . . . . 34
1.7.3 Attention Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.7.4 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.8 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.8.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.8.2 Text Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.8.3 Text Vectorization . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.8.4 Text Dimensionality Reduction . . . . . . . . . . . . . . . . . . . 42
1.8.5 Classical Machine Learning and Deep Learning Models . . . . . . 44
1.8.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2 Arabic Text Classification Using Principal Component Analysis With Different
Supervised Classifiers 52
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.2.1 Proposed system architecture . . . . . . . . . . . . . . . . . . . . . 53
2.2.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.2.3 Document text preprocessing . . . . . . . . . . . . . . . . . . . . . 55
2.2.4 Document text representation . . . . . . . . . . . . . . . . . . . . 57
2.2.5 Dimentionality reduction using PCA . . . . . . . . . . . . . . . . . 57
2.2.6 Classifiers used and hyperparameter tuning . . . . . . . . . . . . . 57
2.2.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.3 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3 Distance-Based Meta-Features for Arabic Text Classification 69
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.3.2 Meta-features generation . . . . . . . . . . . . . . . . . . . . . . . 72
3.4 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4.2 Hyperparameter tuning . . . . . . . . . . . . . . . . . . . . . . . . 74
3.5 Results and discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5.1 Dimensionality reduction using Meta-Features . . . . . . . . . . . 75
3.5.2 Classifiers’ accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.5.3 Training time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.5.4 Time gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.5.5 Comparing DBMFs with PCA . . . . . . . . . . . . . . . . . . . . 79
3.5.6 Statistical evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4 Tasneef : A Fast and Effective Hybrid Representation Approach for Arabic Text
Classification 82
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2.1 Overall architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2.2 Tasneef Text preprocessing . . . . . . . . . . . . . . . . . . . . . . 84
4.2.3 Statistical property and DBMFs construction in Tasneef . . . . . . 85
4.2.3.1 DBMFs distance calculation . . . . . . . . . . . . . . . . 85
4.2.3.2 Local DBMFs obtainment . . . . . . . . . . . . . . . . . 85
4.2.3.3 Global DBMFs obtainment . . . . . . . . . . . . . . . . 86
4.2.3.4 Resulting DBMFs . . . . . . . . . . . . . . . . . . . . . 86
4.2.4 Embedding property in Tasneef and concatenation procedure . . . . 89
4.2.4.1 Pre-trained word embeddings usage . . . . . . . . . . . . 89
4.2.4.2 Concatenation of DBMFs and fasText embeddings . . . . 90
4.3 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3.1 Overall architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3.2 Evaluation tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3.2.1 Metrics used . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3.2.2 Datasets used . . . . . . . . . . . . . . . . . . . . . . . 93
4.3.2.3 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . 94
4.3.2.4 Hardware used . . . . . . . . . . . . . . . . . . . . . . . 95
4.3.3 Overall experimental steps . . . . . . . . . . . . . . . . . . . . . . 95
4.4 Results and discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4.1 Initial experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4.1.1 SVM classifier usage . . . . . . . . . . . . . . . . . . . 97
4.4.1.2 Hyperparameters tuning . . . . . . . . . . . . . . . . . . 97
4.4.1.3 Results of preprocessing . . . . . . . . . . . . . . . . . . 98
4.4.2 Selection of the best DBMFs groups . . . . . . . . . . . . . . . . . 98
4.4.2.1 DBMFs baslines choice . . . . . . . . . . . . . . . . . . 98
4.4.2.2 DBMFs ranking results . . . . . . . . . . . . . . . . . . 100
4.4.3 First series of experiments: baselines performance . . . . . . . . . 101
4.4.3.1 MicroF1 and MacroF1 results . . . . . . . . . . . . . . . 101
4.4.3.2 Dimentionality reduction in Tasneef . . . . . . . . . . . 102
4.4.3.3 Runtime analyses . . . . . . . . . . . . . . . . . . . . . 103
4.4.4 Second series of experiments: comparison with SOTA methods . . 106
4.4.4.1 Tasneef_var2 accuracy improvement ratio (AIR) . . . . . 106
4.4.4.2 Tasneef_var2 F-measure improvement . . . . . . . . . . 114
4.4.5 Summary of Tasneef main improvements . . . . . . . . . . . . . . 115
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Conclusion 119 |
Côte titre : |
DI/0087 |
Dimensionality Reduction in Machine Learning for Arabic Text Classification [document électronique] / Maroua Louail, Auteur ; Kara-mohamed, Chafia, Directeur de thèse . - [S.l.] : Sétif:UFA1, 2025 . - 1 vol (123 f.) ; 29 cm. Langues : Anglais ( eng)
Catégories : |
Thèses & Mémoires:Informatique
|
Mots-clés : |
Natural language processing
Arabic text classification
Dimensionality reduction
feature extraction
Meta-features
Word embeddings. |
Index. décimale : |
004 - Informatique |
Résumé : |
Text classification is the automated process of assigning predefined labels or categories to
text based on its content. This process helps organize vast amounts of textual data, simplifies
management, enables efficient searches, and extracts valuable knowledge. The computational
analysis of the Arabic language plays a crucial role in addressing its growing global
significance. As the fourth most widely used language online, Arabic has driven the emergence
of Arabic Text Classification (ATC) as a key research area. However, the field of ATC
faces considerable challenges, primarily due to the linguistic complexity of the language and
the high computational demands of its processing, which can impact the performance of realtime
systems. This dissertation aims to bridge the gap between effectiveness and efficiency
in ATC, particularly in resource-constrained environments.
The first objective of this research is to review existing ATC techniques, including preprocessing
methods, vectorization strategies, dimensionality reduction techniques, and both
classical machine learning and deep learning models, in order to provide a comprehensive
understanding of current approaches. The second objective is to propose three innovative
methods to enhance computational efficiency through dimensionality reduction while improving
or at least maintaining high classification effectiveness. These methods are specifically
designed for Modern Standard Arabic (MSA) text classification and are evaluated
against state-of-the-art methods.
The dissertation presents the use of Principal Component Analysis (PCA), Distance-
Based Meta-Features (DBMFs) for feature extraction, and the development of a new hybrid
approach called "Tasneef ", which addresses computational challenges in Arabic text processing
and outperforms state-of-the-art deep learning models and dimensionality reduction
techniques. Through these contributions, this dissertation advances the state of the art in
ATC by focusing on dimensionality reduction, which improves classification accuracy and
reduces memory usage and runtime. |
Note de contenu : |
Sommaire
Introduction 1
1 Background and Related Works 5
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Text Classification : Key Concepts and General Pipeline . . . . . . . . . . 5
1.2.1 Text Classification Levels . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Types of Text Classification . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Arabic Text Classification General Pipeline . . . . . . . . . . . . . 7
1.3 Arabic Language Properties and TC Challenges . . . . . . . . . . . . . . . 9
1.3.1 Importance of the Arabic Language . . . . . . . . . . . . . . . . . 9
1.3.2 Arabic Varieties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.3 Arabic Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.4 Arabic Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.5 Arabic Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Text Vectorization Techniques . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.1 One-Hot Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.2 Bag-of-Words (BoW) . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.3 Term Frequency-Inverse Document Frequency (TF-IDF) . . . . . . 20
1.4.4 Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.4.1 Static Word Embeddings . . . . . . . . . . . . . . . . . 21
1.4.4.2 Contextual Word Embeddings: . . . . . . . . . . . . . . 23
1.5 Dimensionality Reduction Techniques . . . . . . . . . . . . . . . . . . . . 24
1.5.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.5.1.1 Principal Component Analysis ( PCA) . . . . . . . . . . 24
1.5.1.2 Linear Discriminant Analysis ( LDA) . . . . . . . . . . . 25
1.5.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.5.2.1 Chi-Square (χ2) Test . . . . . . . . . . . . . . . . . . . . 27
1.5.2.2 Mutual Information (MI) . . . . . . . . . . . . . . . . . 27
1.5.2.3 Information Gain (IG) . . . . . . . . . . . . . . . . . . . 28
1.6 Classical Machine Learning-Based Approach . . . . . . . . . . . . . . . . 29
1.6.1 Logistic Regression (LR) . . . . . . . . . . . . . . . . . . . . . . . 29
1.6.2 k-Nearest Neighbors (kNN) . . . . . . . . . . . . . . . . . . . . . 30
1.6.3 Decision Trees (DT) . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.6.4 Support Vector Machine (SVM) . . . . . . . . . . . . . . . . . . . 30
1.7 Deep Learning-Based Approach . . . . . . . . . . . . . . . . . . . . . . . 32
1.7.1 Convolutional Neural Network (CNN) . . . . . . . . . . . . . . . . 32
1.7.2 Recurrent Neural Network (RNN) . . . . . . . . . . . . . . . . . . 34
1.7.3 Attention Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.7.4 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.8 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.8.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.8.2 Text Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.8.3 Text Vectorization . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.8.4 Text Dimensionality Reduction . . . . . . . . . . . . . . . . . . . 42
1.8.5 Classical Machine Learning and Deep Learning Models . . . . . . 44
1.8.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2 Arabic Text Classification Using Principal Component Analysis With Different
Supervised Classifiers 52
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.2.1 Proposed system architecture . . . . . . . . . . . . . . . . . . . . . 53
2.2.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.2.3 Document text preprocessing . . . . . . . . . . . . . . . . . . . . . 55
2.2.4 Document text representation . . . . . . . . . . . . . . . . . . . . 57
2.2.5 Dimentionality reduction using PCA . . . . . . . . . . . . . . . . . 57
2.2.6 Classifiers used and hyperparameter tuning . . . . . . . . . . . . . 57
2.2.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.3 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3 Distance-Based Meta-Features for Arabic Text Classification 69
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.3.2 Meta-features generation . . . . . . . . . . . . . . . . . . . . . . . 72
3.4 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4.2 Hyperparameter tuning . . . . . . . . . . . . . . . . . . . . . . . . 74
3.5 Results and discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5.1 Dimensionality reduction using Meta-Features . . . . . . . . . . . 75
3.5.2 Classifiers’ accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.5.3 Training time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.5.4 Time gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.5.5 Comparing DBMFs with PCA . . . . . . . . . . . . . . . . . . . . 79
3.5.6 Statistical evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4 Tasneef : A Fast and Effective Hybrid Representation Approach for Arabic Text
Classification 82
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2.1 Overall architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2.2 Tasneef Text preprocessing . . . . . . . . . . . . . . . . . . . . . . 84
4.2.3 Statistical property and DBMFs construction in Tasneef . . . . . . 85
4.2.3.1 DBMFs distance calculation . . . . . . . . . . . . . . . . 85
4.2.3.2 Local DBMFs obtainment . . . . . . . . . . . . . . . . . 85
4.2.3.3 Global DBMFs obtainment . . . . . . . . . . . . . . . . 86
4.2.3.4 Resulting DBMFs . . . . . . . . . . . . . . . . . . . . . 86
4.2.4 Embedding property in Tasneef and concatenation procedure . . . . 89
4.2.4.1 Pre-trained word embeddings usage . . . . . . . . . . . . 89
4.2.4.2 Concatenation of DBMFs and fasText embeddings . . . . 90
4.3 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3.1 Overall architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3.2 Evaluation tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3.2.1 Metrics used . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3.2.2 Datasets used . . . . . . . . . . . . . . . . . . . . . . . 93
4.3.2.3 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . 94
4.3.2.4 Hardware used . . . . . . . . . . . . . . . . . . . . . . . 95
4.3.3 Overall experimental steps . . . . . . . . . . . . . . . . . . . . . . 95
4.4 Results and discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4.1 Initial experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4.1.1 SVM classifier usage . . . . . . . . . . . . . . . . . . . 97
4.4.1.2 Hyperparameters tuning . . . . . . . . . . . . . . . . . . 97
4.4.1.3 Results of preprocessing . . . . . . . . . . . . . . . . . . 98
4.4.2 Selection of the best DBMFs groups . . . . . . . . . . . . . . . . . 98
4.4.2.1 DBMFs baslines choice . . . . . . . . . . . . . . . . . . 98
4.4.2.2 DBMFs ranking results . . . . . . . . . . . . . . . . . . 100
4.4.3 First series of experiments: baselines performance . . . . . . . . . 101
4.4.3.1 MicroF1 and MacroF1 results . . . . . . . . . . . . . . . 101
4.4.3.2 Dimentionality reduction in Tasneef . . . . . . . . . . . 102
4.4.3.3 Runtime analyses . . . . . . . . . . . . . . . . . . . . . 103
4.4.4 Second series of experiments: comparison with SOTA methods . . 106
4.4.4.1 Tasneef_var2 accuracy improvement ratio (AIR) . . . . . 106
4.4.4.2 Tasneef_var2 F-measure improvement . . . . . . . . . . 114
4.4.5 Summary of Tasneef main improvements . . . . . . . . . . . . . . 115
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Conclusion 119 |
Côte titre : |
DI/0087 |
|