|
| Titre : |
Sentiment Analysis of Arabic Texts Using GPT Model, Deep Learning, and Machine Learning Algorithms |
| Type de document : |
document électronique |
| Auteurs : |
Abdelkadar Sana ; Fatiha Tebbani, Directeur de thèse |
| Editeur : |
Setif:UFA |
| Année de publication : |
2025 |
| Importance : |
1 vol (94 f .) |
| Format : |
29 cm |
| Langues : |
Anglais (eng) |
| Catégories : |
Thèses & Mémoires:Informatique
|
| Mots-clés : |
Sentiment Analysis
ML: Machine Learning
DL: Deep Learning
RNN: Recurrent Neural Networks
NLP: Natural Language Processing
BERT: Bidirectional Encoder Representations from Transformers
GPT: Generative Pretrained Transformer |
| Index. décimale : |
004 Informatique |
| Résumé : |
With the widespread adoption of Web 2.0 technologies and the rise of social media platforms like
Twitter, Facebook, and YouTube, sentiment analysis has emerged as a central task in Natural
Language Processing (NLP), especially in Arabic, which presents unique challenges due to its
morphological complexity and dialectal diversity.
This study focuses on sentiment analysis in Arabic tweets across multiple dialects, utilizing
a range of traditional and modern techniques. Four classification algorithms were employed:
SVM, Naïve Bayes, Logistic Regression, and Random Forest, along with repurposing
the AraGPT2 generative model (Generative Pretrained Transformer)—originally not designed
for classification—to assess its generalization ability beyond its intended scope. In addition,
a BiLSTM (Bidirectional Long Short-Term Memory) deep learning model was integrated to
evaluate its effectiveness in handling dialect-rich Arabic texts.
AraBERT was used to extract contextual embeddings, while MARBERTv2 served as a finetuned
model for direct sentiment classification. The study introduced several technical innovations:
A hybrid text representation combining TF-IDF and FastText embeddings to blend statistical
weighting with semantic richness.
A Curriculum Learning strategy that incrementally trained the model through phased data
segmentation, enabling training on low-resource environments like Google Colab.
A Fast Convergence approach to reach optimal performance with a minimal number of
training epochs.
Results showed stable performance for traditional classifiers, outstanding effectiveness of
MARBERTv2 for dialectal Arabic, and surprisingly competitive results from AraGPT2 despite
its generative nature. Embedding combinations and staged training significantly improved
memory efficiency and model scalability.
This work contributes to advancing AI research for Arabic and opens promising directions
for building efficient, deployable models. |
| Note de contenu : |
Sommaire
General Introduction 13
1 Sentiment analysis 16
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2 Definition of Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.2 Natural Language Processing (NLP) . . . . . . . . . . . . . . . . . . . 17
1.2.3 Opinion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.4 Sentiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3 Opinion Classifications in Sentiment Analysis . . . . . . . . . . . . . . . . . . 18
1.3.1 Regular Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.2 Comparative Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.3 Explicit Implicit Opinions . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Sentiment Analysis Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.1 Document Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.2 Sentence Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.3 Word Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.4 Aspect Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5 Sentiment Analysis Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5.1 Subjectivity and Opinion Detection . . . . . . . . . . . . . . . . . . . 20
1.5.2 Sentiment Categorization . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5.3 Opinion Holder and Target Identification . . . . . . . . . . . . . . . . 21
1.5.4 Opinion Summarization . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5.5 Sarcasm and Irony Detection . . . . . . . . . . . . . . . . . . . . . . . 21
1.5.6 Fake Opinion Detection (Spam Detection) . . . . . . . . . . . . . . . . 21
1.6 Overview of Sentiment Analysis Techniques . . . . . . . . . . . . . . . . . . . 21
1.6.1 Machine Learning and Deep Learning-based Techniques . . . . . . . . 22
1.6.2 Lexicon-based Techniques . . . . . . . . . . . . . . . . . . . . . . . . 23
1.7 The Arabic Language in Linguistic and Technical Context: Characteristics and
Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.7.1 Characteristics of the Arabic Language . . . . . . . . . . . . . . . . . 24
1.7.2 Challenges of the Arabic Language in Sentiment Analysis . . . . . . . 25
2 Machine Learning and Deep Learning 29
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Machine Learning: Concepts and Detailed Used Algorithms . . . . . . . . . . 29
2.2.1 Concept and Types of Machine Learning . . . . . . . . . . . . . . . . 29
2.2.2 Algorithms Used in This Research . . . . . . . . . . . . . . . . . . . . 30
2.3 Deep Learning: Concepts and Detailed Used Algorithms . . . . . . . . . . . . 33
2.3.1 Concept of Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.2 Used Neural Network Architectures . . . . . . . . . . . . . . . . . . . 35
2.4 Machine Learning and Deep Learning Steps . . . . . . . . . . . . . . . . . . . 45
2.4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.3 Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.4 Choosing the Right Model . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4.5 Training the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4.6 Evaluating the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4.7 Hyperparameter Tuning and Optimization . . . . . . . . . . . . . . . . 52
2.4.8 Predictions and Deployment . . . . . . . . . . . . . . . . . . . . . . . 52
2.5 Pre-trained Transformer Models for Arabic Language . . . . . . . . . . . . . . 52
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3 Experimental Study and System Implementation 54
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2.1 Observed Limitations in Existing Studies . . . . . . . . . . . . . . . . 55
3.3 Development Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.2 Google Colab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.3 Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.4 NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.5 Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.6 Scikit-learn (Sklearn) . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.7 Pickle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.8 Gensim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.9 PyTorch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.10 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.11 FarasaPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.12 PyArabic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.13 Random . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.14 Gradio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.15 TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.16 FAISS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.17 TQWM (Term Query Weighting Mechanism) . . . . . . . . . . . . . . 58
3.4 Methodology Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.5 Experimentation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.5.1 Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . 67
3.5.2 Deep learning models based on BiLSTM architecture . . . . . . . . . . 72
3.5.3 MarBERTv2 and AraGPT2 Models . . . . . . . . . . . . . . . . . . . 79
3.6 Interactive Interface for Sentiment Analysis . . . . . . . . . . . . . . . . . . . 84
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
General Conclusion 89
Areas for Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Future Work and Research Outlook 90
References 91 |
| Côte titre : |
MAI/1016 |
Sentiment Analysis of Arabic Texts Using GPT Model, Deep Learning, and Machine Learning Algorithms [document électronique] / Abdelkadar Sana ; Fatiha Tebbani, Directeur de thèse . - [S.l.] : Setif:UFA, 2025 . - 1 vol (94 f .) ; 29 cm. Langues : Anglais ( eng)
| Catégories : |
Thèses & Mémoires:Informatique
|
| Mots-clés : |
Sentiment Analysis
ML: Machine Learning
DL: Deep Learning
RNN: Recurrent Neural Networks
NLP: Natural Language Processing
BERT: Bidirectional Encoder Representations from Transformers
GPT: Generative Pretrained Transformer |
| Index. décimale : |
004 Informatique |
| Résumé : |
With the widespread adoption of Web 2.0 technologies and the rise of social media platforms like
Twitter, Facebook, and YouTube, sentiment analysis has emerged as a central task in Natural
Language Processing (NLP), especially in Arabic, which presents unique challenges due to its
morphological complexity and dialectal diversity.
This study focuses on sentiment analysis in Arabic tweets across multiple dialects, utilizing
a range of traditional and modern techniques. Four classification algorithms were employed:
SVM, Naïve Bayes, Logistic Regression, and Random Forest, along with repurposing
the AraGPT2 generative model (Generative Pretrained Transformer)—originally not designed
for classification—to assess its generalization ability beyond its intended scope. In addition,
a BiLSTM (Bidirectional Long Short-Term Memory) deep learning model was integrated to
evaluate its effectiveness in handling dialect-rich Arabic texts.
AraBERT was used to extract contextual embeddings, while MARBERTv2 served as a finetuned
model for direct sentiment classification. The study introduced several technical innovations:
A hybrid text representation combining TF-IDF and FastText embeddings to blend statistical
weighting with semantic richness.
A Curriculum Learning strategy that incrementally trained the model through phased data
segmentation, enabling training on low-resource environments like Google Colab.
A Fast Convergence approach to reach optimal performance with a minimal number of
training epochs.
Results showed stable performance for traditional classifiers, outstanding effectiveness of
MARBERTv2 for dialectal Arabic, and surprisingly competitive results from AraGPT2 despite
its generative nature. Embedding combinations and staged training significantly improved
memory efficiency and model scalability.
This work contributes to advancing AI research for Arabic and opens promising directions
for building efficient, deployable models. |
| Note de contenu : |
Sommaire
General Introduction 13
1 Sentiment analysis 16
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2 Definition of Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.2 Natural Language Processing (NLP) . . . . . . . . . . . . . . . . . . . 17
1.2.3 Opinion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.4 Sentiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3 Opinion Classifications in Sentiment Analysis . . . . . . . . . . . . . . . . . . 18
1.3.1 Regular Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.2 Comparative Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.3 Explicit Implicit Opinions . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Sentiment Analysis Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.1 Document Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.2 Sentence Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.3 Word Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4.4 Aspect Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5 Sentiment Analysis Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5.1 Subjectivity and Opinion Detection . . . . . . . . . . . . . . . . . . . 20
1.5.2 Sentiment Categorization . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5.3 Opinion Holder and Target Identification . . . . . . . . . . . . . . . . 21
1.5.4 Opinion Summarization . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5.5 Sarcasm and Irony Detection . . . . . . . . . . . . . . . . . . . . . . . 21
1.5.6 Fake Opinion Detection (Spam Detection) . . . . . . . . . . . . . . . . 21
1.6 Overview of Sentiment Analysis Techniques . . . . . . . . . . . . . . . . . . . 21
1.6.1 Machine Learning and Deep Learning-based Techniques . . . . . . . . 22
1.6.2 Lexicon-based Techniques . . . . . . . . . . . . . . . . . . . . . . . . 23
1.7 The Arabic Language in Linguistic and Technical Context: Characteristics and
Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.7.1 Characteristics of the Arabic Language . . . . . . . . . . . . . . . . . 24
1.7.2 Challenges of the Arabic Language in Sentiment Analysis . . . . . . . 25
2 Machine Learning and Deep Learning 29
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Machine Learning: Concepts and Detailed Used Algorithms . . . . . . . . . . 29
2.2.1 Concept and Types of Machine Learning . . . . . . . . . . . . . . . . 29
2.2.2 Algorithms Used in This Research . . . . . . . . . . . . . . . . . . . . 30
2.3 Deep Learning: Concepts and Detailed Used Algorithms . . . . . . . . . . . . 33
2.3.1 Concept of Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.2 Used Neural Network Architectures . . . . . . . . . . . . . . . . . . . 35
2.4 Machine Learning and Deep Learning Steps . . . . . . . . . . . . . . . . . . . 45
2.4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.3 Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.4 Choosing the Right Model . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4.5 Training the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4.6 Evaluating the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4.7 Hyperparameter Tuning and Optimization . . . . . . . . . . . . . . . . 52
2.4.8 Predictions and Deployment . . . . . . . . . . . . . . . . . . . . . . . 52
2.5 Pre-trained Transformer Models for Arabic Language . . . . . . . . . . . . . . 52
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3 Experimental Study and System Implementation 54
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2.1 Observed Limitations in Existing Studies . . . . . . . . . . . . . . . . 55
3.3 Development Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.2 Google Colab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.3 Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.4 NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.5 Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.6 Scikit-learn (Sklearn) . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.7 Pickle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.8 Gensim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.9 PyTorch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.10 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.11 FarasaPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.12 PyArabic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.13 Random . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.14 Gradio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.15 TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.16 FAISS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.17 TQWM (Term Query Weighting Mechanism) . . . . . . . . . . . . . . 58
3.4 Methodology Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.5 Experimentation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.5.1 Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . 67
3.5.2 Deep learning models based on BiLSTM architecture . . . . . . . . . . 72
3.5.3 MarBERTv2 and AraGPT2 Models . . . . . . . . . . . . . . . . . . . 79
3.6 Interactive Interface for Sentiment Analysis . . . . . . . . . . . . . . . . . . . 84
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
General Conclusion 89
Areas for Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Future Work and Research Outlook 90
References 91 |
| Côte titre : |
MAI/1016 |
|