Catalogue en ligne

University Sétif 1 FERHAT ABBAS Faculty of Sciences

Nouvelle recherche

Détail de l'auteur

Auteur Asma Kahla

Documents disponibles écrits par cet auteur

Ajouter le résultat dans votre panier Affiner la recherche

Advanced Detection of Arabic Offensive Language in Social Media Using Machine Learning and Natural Language Processing / Asma Kahla

Public

ISBD

Titre : Advanced Detection of Arabic Offensive Language in Social Media Using Machine Learning and Natural Language Processing
Type de document : texte imprimé
Auteurs : Asma Kahla, Auteur ; Yassina Belguet ; Sadik Bessou, Directeur de thèse
Editeur : Setif:UFA
Année de publication : 2024
Importance : 1 vol (93 f .)
Format : 29 cm
Langues : Anglais (eng)
Catégories : Thèses & Mémoires:Informatique

Mots-clés : Offensive Language
Machine Learning
Natural Language Processing
Transformers
Index. décimale : 004 - Informatique
Résumé :
Because social media platforms like Facebook offer so many services, they have become an
essential part of our everyday life. However, there has also been an increase in offensive
speech on these platforms, which has become a major issue. The massive amount of data
and the significant consumption of time and money make the old way of manually finding
and removing offensive language online extremely challenging. As a result, there is an
increasing need for automatic offensive language identification, particularly in cases when
posts are written in complex languages or lack resources, such as Arabic.
In this work, we focused on Algerian dialects. We used different machine learning algorithms:
Logistic Regression, SVM, MultinomialNB, BernoulliNB, and Stochastic Gradient
Descent. Our best results yielded accuracies of 89.70% and 89.19% for the Term
Frequency-Inverse Document Frequency (TFIDF) with n-gram and Bag of Words (BoW)
with n-gram, respectively, using MultinomialNB and BernoulliNB classifiers.
Additionally, we utilized Convolutional Neural Networks (CNN) and Long Short-Term
Memory (LSTM) models with BERT for data analysis, in our case we used marBERT v2.
In this experiment, LSTM achieved an accuracy of 89.20%. However, marBERT v2 alone
outperformed all previous experiments with an accuracy of 95%.
Note de contenu : Sommaire
List of Figures i
List of Tables iv
General introduction 1
1 Offensive Language 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Offensive language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Offensive language types . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Offensive language categories . . . . . . . . . . . . . . . . . . . . . 7
1.2.4 Different definitions of offensive language . . . . . . . . . . . . . . . 8
1.2.5 Offensive language in social media and peoples’ cultures . . . . . . 9
1.3 International laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 Current state in offensive language detection and related concept . 13
1.4.2 Summary and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Machine Learning 20
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Machine Learning types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.3 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.4 Semi-supervised Learning . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Machine Learning algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.1 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.2 Naive Bayes classifier . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.3 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . 24
2.4.4 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Machine Learning concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6 Machine Learning process . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6.2 Preparation data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6.3 Choosing a model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6.4 Training the model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6.5 Testing the model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6.6 Improve the model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
.6.7 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.7 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7.2 Deep Learning types . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Sentiment Analysis 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Sentiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Sentiments Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Subjectivity classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Sentiment analysis levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.1 Document-level sentiment analysis . . . . . . . . . . . . . . . . . . 33
3.4.2 Sentence level of sentiment analysis . . . . . . . . . . . . . . . . . . 33
3.4.3 Word/Phrase Level of sentiment analysis . . . . . . . . . . . . . . . 33
3.5 Sentiment Analysis approaches . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.1 Machine Learning (ML) . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.2 Lexicon-based approach . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.3 Hybrid approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Types of Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6.1 Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.6.2 Fine-Grained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
...........
Côte titre : MAI/0913

Advanced Detection of Arabic Offensive Language in Social Media Using Machine Learning and Natural Language Processing [texte imprimé] / Asma Kahla, Auteur ; Yassina Belguet ; Sadik Bessou, Directeur de thèse . - [S.l.] : Setif:UFA, 2024 . - 1 vol (93 f .) ; 29 cm.
Langues : Anglais (eng)
Catégories : Thèses & Mémoires:Informatique

Mots-clés : Offensive Language
Machine Learning
Natural Language Processing
Transformers
Index. décimale : 004 - Informatique
Résumé :
Because social media platforms like Facebook offer so many services, they have become an
essential part of our everyday life. However, there has also been an increase in offensive
speech on these platforms, which has become a major issue. The massive amount of data
and the significant consumption of time and money make the old way of manually finding
and removing offensive language online extremely challenging. As a result, there is an
increasing need for automatic offensive language identification, particularly in cases when
posts are written in complex languages or lack resources, such as Arabic.
In this work, we focused on Algerian dialects. We used different machine learning algorithms:
Logistic Regression, SVM, MultinomialNB, BernoulliNB, and Stochastic Gradient
Descent. Our best results yielded accuracies of 89.70% and 89.19% for the Term
Frequency-Inverse Document Frequency (TFIDF) with n-gram and Bag of Words (BoW)
with n-gram, respectively, using MultinomialNB and BernoulliNB classifiers.
Additionally, we utilized Convolutional Neural Networks (CNN) and Long Short-Term
Memory (LSTM) models with BERT for data analysis, in our case we used marBERT v2.
In this experiment, LSTM achieved an accuracy of 89.20%. However, marBERT v2 alone
outperformed all previous experiments with an accuracy of 95%.
Note de contenu : Sommaire
List of Figures i
List of Tables iv
General introduction 1
1 Offensive Language 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Offensive language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Offensive language types . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Offensive language categories . . . . . . . . . . . . . . . . . . . . . 7
1.2.4 Different definitions of offensive language . . . . . . . . . . . . . . . 8
1.2.5 Offensive language in social media and peoples’ cultures . . . . . . 9
1.3 International laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 Current state in offensive language detection and related concept . 13
1.4.2 Summary and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Machine Learning 20
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Machine Learning types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.3 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.4 Semi-supervised Learning . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Machine Learning algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.1 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.2 Naive Bayes classifier . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.3 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . 24
2.4.4 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Machine Learning concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6 Machine Learning process . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6.2 Preparation data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6.3 Choosing a model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6.4 Training the model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6.5 Testing the model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6.6 Improve the model . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
.6.7 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.7 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.7.2 Deep Learning types . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Sentiment Analysis 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Sentiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Sentiments Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Subjectivity classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Sentiment analysis levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.1 Document-level sentiment analysis . . . . . . . . . . . . . . . . . . 33
3.4.2 Sentence level of sentiment analysis . . . . . . . . . . . . . . . . . . 33
3.4.3 Word/Phrase Level of sentiment analysis . . . . . . . . . . . . . . . 33
3.5 Sentiment Analysis approaches . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.1 Machine Learning (ML) . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.2 Lexicon-based approach . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.3 Hybrid approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Types of Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6.1 Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.6.2 Fine-Grained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
...........
Côte titre : MAI/0913

Exemplaires (1)

Code-barres Cote Support Localisation Section Disponibilité
MAI/0913 MAI/0913 Mémoire Bibliothéque des sciences Anglais Disponible
Disponible

University Sétif 1 FERHAT ABBAS Faculty of Sciences

Détail de l'auteur

Auteur Asma Kahla

Documents disponibles écrits par cet auteur

Exemplaires (1)

Accueil

Sélection de la langue

Se connecter

Adresse

Horaires d'ouverture :