Catalogue en ligne

University Sétif 1 FERHAT ABBAS Faculty of Sciences

Nouvelle recherche

Document: document électronique

Sentiment Analysis of Arabic Texts Using GPT Model, Deep Learning, and Machine Learning Algorithms / Abdelkadar Sana

Public
ISBD

Titre :	Sentiment Analysis of Arabic Texts Using GPT Model, Deep Learning, and Machine Learning Algorithms
Type de document :	document électronique
Auteurs :	Abdelkadar Sana ; Fatiha Tebbani, Directeur de thèse
Editeur :	Setif:UFA
Année de publication :	2025
Importance :	1 vol (94 f .)
Format :	29 cm
Langues :	Anglais (eng)
Catégories :	Thèses & Mémoires:Informatique
Mots-clés :	Sentiment Analysis ML: Machine Learning DL: Deep Learning RNN: Recurrent Neural Networks NLP: Natural Language Processing BERT: Bidirectional Encoder Representations from Transformers GPT: Generative Pretrained Transformer
Index. décimale :	004 Informatique
Résumé :	With the widespread adoption of Web 2.0 technologies and the rise of social media platforms like Twitter, Facebook, and YouTube, sentiment analysis has emerged as a central task in Natural Language Processing (NLP), especially in Arabic, which presents unique challenges due to its morphological complexity and dialectal diversity. This study focuses on sentiment analysis in Arabic tweets across multiple dialects, utilizing a range of traditional and modern techniques. Four classification algorithms were employed: SVM, Naïve Bayes, Logistic Regression, and Random Forest, along with repurposing the AraGPT2 generative model (Generative Pretrained Transformer)—originally not designed for classification—to assess its generalization ability beyond its intended scope. In addition, a BiLSTM (Bidirectional Long Short-Term Memory) deep learning model was integrated to evaluate its effectiveness in handling dialect-rich Arabic texts. AraBERT was used to extract contextual embeddings, while MARBERTv2 served as a finetuned model for direct sentiment classification. The study introduced several technical innovations: A hybrid text representation combining TF-IDF and FastText embeddings to blend statistical weighting with semantic richness. A Curriculum Learning strategy that incrementally trained the model through phased data segmentation, enabling training on low-resource environments like Google Colab. A Fast Convergence approach to reach optimal performance with a minimal number of training epochs. Results showed stable performance for traditional classifiers, outstanding effectiveness of MARBERTv2 for dialectal Arabic, and surprisingly competitive results from AraGPT2 despite its generative nature. Embedding combinations and staged training significantly improved memory efficiency and model scalability. This work contributes to advancing AI research for Arabic and opens promising directions for building efficient, deployable models.
Note de contenu :	Sommaire General Introduction 13 1 Sentiment analysis 16 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.2 Definition of Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.2.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.2.2 Natural Language Processing (NLP) . . . . . . . . . . . . . . . . . . . 17 1.2.3 Opinion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.2.4 Sentiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.3 Opinion Classifications in Sentiment Analysis . . . . . . . . . . . . . . . . . . 18 1.3.1 Regular Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.3.2 Comparative Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3.3 Explicit Implicit Opinions . . . . . . . . . . . . . . . . . . . . . . . . 19 1.4 Sentiment Analysis Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.4.1 Document Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.4.2 Sentence Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.4.3 Word Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.4.4 Aspect Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.5 Sentiment Analysis Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.5.1 Subjectivity and Opinion Detection . . . . . . . . . . . . . . . . . . . 20 1.5.2 Sentiment Categorization . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.5.3 Opinion Holder and Target Identification . . . . . . . . . . . . . . . . 21 1.5.4 Opinion Summarization . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.5.5 Sarcasm and Irony Detection . . . . . . . . . . . . . . . . . . . . . . . 21 1.5.6 Fake Opinion Detection (Spam Detection) . . . . . . . . . . . . . . . . 21 1.6 Overview of Sentiment Analysis Techniques . . . . . . . . . . . . . . . . . . . 21 1.6.1 Machine Learning and Deep Learning-based Techniques . . . . . . . . 22 1.6.2 Lexicon-based Techniques . . . . . . . . . . . . . . . . . . . . . . . . 23 1.7 The Arabic Language in Linguistic and Technical Context: Characteristics and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.7.1 Characteristics of the Arabic Language . . . . . . . . . . . . . . . . . 24 1.7.2 Challenges of the Arabic Language in Sentiment Analysis . . . . . . . 25 2 Machine Learning and Deep Learning 29 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2 Machine Learning: Concepts and Detailed Used Algorithms . . . . . . . . . . 29 2.2.1 Concept and Types of Machine Learning . . . . . . . . . . . . . . . . 29 2.2.2 Algorithms Used in This Research . . . . . . . . . . . . . . . . . . . . 30 2.3 Deep Learning: Concepts and Detailed Used Algorithms . . . . . . . . . . . . 33 2.3.1 Concept of Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3.2 Used Neural Network Architectures . . . . . . . . . . . . . . . . . . . 35 2.4 Machine Learning and Deep Learning Steps . . . . . . . . . . . . . . . . . . . 45 2.4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.4.3 Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.4.4 Choosing the Right Model . . . . . . . . . . . . . . . . . . . . . . . . 50 2.4.5 Training the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.4.6 Evaluating the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.4.7 Hyperparameter Tuning and Optimization . . . . . . . . . . . . . . . . 52 2.4.8 Predictions and Deployment . . . . . . . . . . . . . . . . . . . . . . . 52 2.5 Pre-trained Transformer Models for Arabic Language . . . . . . . . . . . . . . 52 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3 Experimental Study and System Implementation 54 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2.1 Observed Limitations in Existing Studies . . . . . . . . . . . . . . . . 55 3.3 Development Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3.2 Google Colab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3.3 Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3.4 NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3.5 Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3.6 Scikit-learn (Sklearn) . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3.7 Pickle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3.8 Gensim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3.9 PyTorch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3.10 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3.11 FarasaPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3.12 PyArabic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.13 Random . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.14 Gradio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.15 TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.16 FAISS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.17 TQWM (Term Query Weighting Mechanism) . . . . . . . . . . . . . . 58 3.4 Methodology Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.5 Experimentation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.5.1 Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . 67 3.5.2 Deep learning models based on BiLSTM architecture . . . . . . . . . . 72 3.5.3 MarBERTv2 and AraGPT2 Models . . . . . . . . . . . . . . . . . . . 79 3.6 Interactive Interface for Sentiment Analysis . . . . . . . . . . . . . . . . . . . 84 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 General Conclusion 89 Areas for Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Future Work and Research Outlook 90 References 91
Côte titre :	MAI/1016

Exemplaires (1)

Code-barres	Cote	Support	Localisation	Section	Disponibilité
MAI/1016	MAI/1016	Mémoire	Bibliothèque des sciences	Anglais	Disponible Disponible

A-
A
A+

Accueil

Se connecter

Mot de passe oublié ?

Adresse

Université Sétif -1- faculté des sciences el bez Sétif
19000 Sétif
Algérie

Horaires d'ouverture :

Dimanche:  8:00h-16h30
Lundi:         8:00h-16h30
Mardi:         8:00h-16h30
Mercredi:    8:00h-16h30
Jeudi:         8:00h-16h30