Titre : |
Advanced Detection of Arabic Offensive Language in Social Media Using Machine Learning and Natural Language Processing |
Type de document : |
texte imprimé |
Auteurs : |
Asma Kahla, Auteur ; Yassina Belguet ; Sadik Bessou, Directeur de thèse |
Editeur : |
Setif:UFA |
Année de publication : |
2024 |
Importance : |
1 vol (93 f .) |
Format : |
29 cm |
Langues : |
Anglais (eng) |
Catégories : |
Thèses & Mémoires:Informatique
|
Mots-clés : |
Offensive Language
Machine Learning
Natural Language Processing
Transformers |
Index. décimale : |
004 - Informatique |
Résumé : |
Because social media platforms like Facebook offer so many services, they have become an
essential part of our everyday life. However, there has also been an increase in offensive
speech on these platforms, which has become a major issue. The massive amount of data
and the significant consumption of time and money make the old way of manually finding
and removing offensive language online extremely challenging. As a result, there is an
increasing need for automatic offensive language identification, particularly in cases when
posts are written in complex languages or lack resources, such as Arabic.
In this work, we focused on Algerian dialects. We used different machine learning algorithms:
Logistic Regression, SVM, MultinomialNB, BernoulliNB, and Stochastic Gradient
Descent. Our best results yielded accuracies of 89.70% and 89.19% for the Term
Frequency-Inverse Document Frequency (TFIDF) with n-gram and Bag of Words (BoW)
with n-gram, respectively, using MultinomialNB and BernoulliNB classifiers.
Additionally, we utilized Convolutional Neural Networks (CNN) and Long Short-Term
Memory (LSTM) models with BERT for data analysis, in our case we used marBERT v2.
In this experiment, LSTM achieved an accuracy of 89.20%. However, marBERT v2 alone
outperformed all previous experiments with an accuracy of 95%. |
Note de contenu : |
Sommaire
List of Figures i
List of Tables iv
General introduction 1
1 Offensive Language 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Offensive language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Offensive language types . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Offensive language categories . . . . . . . . . . . . . . . . . . . . . 7
1.2.4 Different definitions of offensive language . . . . . . . . . . . . . . . 8
1.2.5 Offensive language in social media and peoples’ cultures . . . . . . 9
1.3 International laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 Current state in offensive language detection and related concept . 13
1.4.2 Summary and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Machine Learning 20
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Machine Learning types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.3 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.4 Semi-supervised Learning . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Machine Learning algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.1 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.2 Naive Bayes classifier . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.3 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . 24
2.4.4 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Machine Learning concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6 Machine Learning process . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6.2 Preparation data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6.3 Choosing a model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6.4 Training the model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6.5 Testing the model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6.6 Improve the model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
.6.7 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.7 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7.2 Deep Learning types . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Sentiment Analysis 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Sentiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Sentiments Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Subjectivity classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Sentiment analysis levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.1 Document-level sentiment analysis . . . . . . . . . . . . . . . . . . 33
3.4.2 Sentence level of sentiment analysis . . . . . . . . . . . . . . . . . . 33
3.4.3 Word/Phrase Level of sentiment analysis . . . . . . . . . . . . . . . 33
3.5 Sentiment Analysis approaches . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.1 Machine Learning (ML) . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.2 Lexicon-based approach . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.3 Hybrid approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Types of Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6.1 Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.6.2 Fine-Grained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
........... |
Côte titre : |
MAI/0913
|
Advanced Detection of Arabic Offensive Language in Social Media Using Machine Learning and Natural Language Processing [texte imprimé] / Asma Kahla, Auteur ; Yassina Belguet ; Sadik Bessou, Directeur de thèse . - [S.l.] : Setif:UFA, 2024 . - 1 vol (93 f .) ; 29 cm. Langues : Anglais ( eng)
Catégories : |
Thèses & Mémoires:Informatique
|
Mots-clés : |
Offensive Language
Machine Learning
Natural Language Processing
Transformers |
Index. décimale : |
004 - Informatique |
Résumé : |
Because social media platforms like Facebook offer so many services, they have become an
essential part of our everyday life. However, there has also been an increase in offensive
speech on these platforms, which has become a major issue. The massive amount of data
and the significant consumption of time and money make the old way of manually finding
and removing offensive language online extremely challenging. As a result, there is an
increasing need for automatic offensive language identification, particularly in cases when
posts are written in complex languages or lack resources, such as Arabic.
In this work, we focused on Algerian dialects. We used different machine learning algorithms:
Logistic Regression, SVM, MultinomialNB, BernoulliNB, and Stochastic Gradient
Descent. Our best results yielded accuracies of 89.70% and 89.19% for the Term
Frequency-Inverse Document Frequency (TFIDF) with n-gram and Bag of Words (BoW)
with n-gram, respectively, using MultinomialNB and BernoulliNB classifiers.
Additionally, we utilized Convolutional Neural Networks (CNN) and Long Short-Term
Memory (LSTM) models with BERT for data analysis, in our case we used marBERT v2.
In this experiment, LSTM achieved an accuracy of 89.20%. However, marBERT v2 alone
outperformed all previous experiments with an accuracy of 95%. |
Note de contenu : |
Sommaire
List of Figures i
List of Tables iv
General introduction 1
1 Offensive Language 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Offensive language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Offensive language types . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Offensive language categories . . . . . . . . . . . . . . . . . . . . . 7
1.2.4 Different definitions of offensive language . . . . . . . . . . . . . . . 8
1.2.5 Offensive language in social media and peoples’ cultures . . . . . . 9
1.3 International laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 Current state in offensive language detection and related concept . 13
1.4.2 Summary and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Machine Learning 20
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Machine Learning types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.3 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.4 Semi-supervised Learning . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Machine Learning algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.1 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.2 Naive Bayes classifier . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.3 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . 24
2.4.4 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Machine Learning concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6 Machine Learning process . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6.2 Preparation data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6.3 Choosing a model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6.4 Training the model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6.5 Testing the model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6.6 Improve the model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
.6.7 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.7 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7.2 Deep Learning types . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Sentiment Analysis 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Sentiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Sentiments Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Subjectivity classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Sentiment analysis levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.1 Document-level sentiment analysis . . . . . . . . . . . . . . . . . . 33
3.4.2 Sentence level of sentiment analysis . . . . . . . . . . . . . . . . . . 33
3.4.3 Word/Phrase Level of sentiment analysis . . . . . . . . . . . . . . . 33
3.5 Sentiment Analysis approaches . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.1 Machine Learning (ML) . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.2 Lexicon-based approach . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.3 Hybrid approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Types of Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6.1 Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.6.2 Fine-Grained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
........... |
Côte titre : |
MAI/0913
|
|