|
| Titre : |
Structured Emotion Analysis from Arabic Text |
| Type de document : |
document électronique |
| Auteurs : |
Ferial Senator, Auteur ; Lakhfif, Abdelaziz, Directeur de thèse |
| Editeur : |
Sétif:UFA1 |
| Année de publication : |
2026 |
| Importance : |
1 vol (139 f.) |
| Format : |
29 cm |
| Langues : |
Anglais (eng) |
| Catégories : |
Thèses & Mémoires:Informatique
|
| Mots-clés : |
Arabic
NLP
SRL
Cross-Lingual Annotation Projection
LLMs
ChatGPT
Emotion Analysis
Structural emotions |
| Index. décimale : |
004 - Informatique |
| Résumé : |
In the field of Natural Language Processing (NLP), emotion analysis aims to map
textual content with a predefined set of human emotions, typically including joy, anger,
fear, surprise, disgust, and sadness. Current state-of-the-art research mainly focuses
on identifying emotions in text using categories inspired by psychological theories, such
as Ekman’s (1992) basic emotions. Despite the importance of emotion detection, most
analyses are shallow and insufficient for tasks that require a deeper understanding of
emotional meaning in context. Such applications necessitate addressing key questions,
including identifying the cause that triggered the emotion (Cause), determining who
experienced it (Experiencer), and more generally addressing structural questions such
as who did what (Cue), to whom (Target), why (Cause), and how (Manner). This
doctoral thesis aims to propose original and effective solutions to address the lack of
resources and models dedicated to the structural analysis of emotions in Arabic text.
To achieve this, we introduce a novel approach for analyzing the argument structure
of emotions in Arabic, leveraging recent advances in Transformer-based architectures
and, in particular, the capabilities of large language models (LLMs) for Arabic.
The main contributions of this thesis are multifold. The first contribution consists
of the construction and annotation of the first Arabic corpus dedicated to structured
emotion analysis, named ‘AraERL’. The thesis also provides an in-depth examination
of the impact of each semantic argument on the performance of emotion identification.
In addition, it explores the use of ChatGPT for annotating Arabic texts with semantic
roles and emotions through an interlingual annotation projection approach. The
work further evaluates ChatGPT’s ability to accurately translate English semantic and
emotional annotation into Arabic. Finally, it offers a comprehensive comparison of the
performance of open large language models for these tasks. |
| Note de contenu : |
Sommaire
1 General Introduction 1
I Background and Literature Review 7
2 Arabic language processing 8
2.1 Major Challenges in Arabic Language Processing . . . . . . . . . . . . . . . . . 9
2.2 Characteristics of the Arabic Language . . . . . . . . . . . . . . . . . . . . . . 9
2.3 NLP resources for Arabic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Semantic Role Labeling 13
3.1 Overview and Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Labeling steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Semantic role labeling projects and resources . . . . . . . . . . . . . . . . . . . 15
3.2.1 English Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1.1 The Berkeley FrameNet project . . . . . . . . . . . . . . . . . 16
3.2.1.2 Propbank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1.3 CoNLL Shared Task . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1.4 VerbNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2 Arabic resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2.1 The Arabic PropBank . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2.2 Arabic VerbNet . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.2.3 The Arabic FrameNet . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Taxonomy of SRL task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.1 SRL Annotation process . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.1.1 Manual SRL Annotation . . . . . . . . . . . . . . . . . . . . . 21
3.3.1.2 Semi-automatic SRL Annotation . . . . . . . . . . . . . . . . 22
3.3.1.3 Automatic SRL Annotation . . . . . . . . . . . . . . . . . . . 22
3.3.1.3.1 Self-augmentation . . . . . . . . . . . . . . . . . . . 22
3.3.1.3.2 Annotation projection . . . . . . . . . . . . . . . . . 22
3.3.2 SRL Methods and techniques . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.2.1 Rule based . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.2.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.2.3 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.2.4 Large Language Models . . . . . . . . . . . . . . . . . . . . . 25
3.3.3 Learning Strategies for SRL . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.3.1 Transfer Learning / Cross-lingual Learning . . . . . . . . . . 26
3.3.3.2 Zero-Shot Learning . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.3.3 Few-Shot Learning . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.4.1 Precision, Recall, and F1-score . . . . . . . . . . . . . . . . . 27
3.3.4.2 Alignement with Gold Standard . . . . . . . . . . . . . . . . 28
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 Emotion analysis 29
4.1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Emotions Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5 Emotion Role Labeling 35
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Semantic frames associated with emotions . . . . . . . . . . . . . . . . . . . . . 36
5.3 Detailed Emotion Role Labeling example . . . . . . . . . . . . . . . . . . . . . 38
5.4 Existing Emotion Role Labeling dataset’s . . . . . . . . . . . . . . . . . . . . . 39
5.5 Emotion stimulus detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.6 Emotional SRL challenges for low resource language . . . . . . . . . . . . . . . 40
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6 Literature review 42
6.1 Semantic Role Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.1.1 Rule-based SRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.1.2 Classical Machine Learning-Based SRL . . . . . . . . . . . . . . . . . . 44
6.1.3 Neural Network-Based SRL . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.1.3.1 End-to-End SRL . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.1.3.2 Transformer based SRL . . . . . . . . . . . . . . . . . . . . . 48
6.1.4 Large Language Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.2 Semantic role labeling for Arabic . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.3 Emotion Role Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.3.1 Emotion-cause extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.4 ChatGPT in Cross-lingual projection . . . . . . . . . . . . . . . . . . . . . . . . 57
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
II Contributions 59
7 Arabic Emotion Role Labeling corpus 60
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.2 Research methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.2.1 Corpus creation and annotation process . . . . . . . . . . . . . . . . . . 62
7.2.1.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.2.1.2 Annotation tool . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.2.1.3 Emotions category . . . . . . . . . . . . . . . . . . . . . . . . 63
7.2.1.4 Semantic role labeling . . . . . . . . . . . . . . . . . . . . . . 64
7.2.2 Validation and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.2.2.1 Validation of the Dataset . . . . . . . . . . . . . . . . . . . . 64
7.2.2.2 Corpus statistics . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.2.2.3 Data storage format . . . . . . . . . . . . . . . . . . . . . . . 66
7.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8 SRL and Emotion Detection using transformer based models 69
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8.2.2 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.2.3 Masking roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.2.4 Dataset splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.2.5 AraBERT Pre-trained language model . . . . . . . . . . . . . . . . . . . 73
8.2.6 Hyperparameter for AraBERT . . . . . . . . . . . . . . . . . . . . . . . 74
8.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.3.1 Masking cue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.3.2 Masking experiencer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.3.3 Masking cause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.3.4 Masking target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
8.3.5 Emotion Classification with all semantic roles . . . . . . . . . . . . . . . 77
8.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.5 Comparison of Our Dataset with Existing Datasets . . . . . . . . . . . . . . . . 79
8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
9 Leveraging ChatGPT for enhancing Arabic NLP: Application for SRL and
Cross-lingual annotation projection 82
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
9.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
9.2.1 Data collection and preparation . . . . . . . . . . . . . . . . . . . . . . . 86
9.2.2 Humanized translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9.2.3 Emotional Semantic role labeling . . . . . . . . . . . . . . . . . . . . . . 87
9.2.3.1 Dominant emotion and intensity . . . . . . . . . . . . . . . . 87
9.2.3.2 ChatGPT semantic role labeling pipeline . . . . . . . . . . . 88
9.2.4 Cross-lingual Annotation Projection . . . . . . . . . . . . . . . . . . . . 90
9.2.4.1 Translation mechanism of ChatGPT . . . . . . . . . . . . . . 90
9.2.4.2 ChatGPT’s cross-lingual annotation projection pipeline . . . 91
9.2.5 Annotation tasks and comparison . . . . . . . . . . . . . . . . . . . . . . 92
9.2.6 Classification of sentence complexity . . . . . . . . . . . . . . . . . . . . 93
9.2.6.1 English sentences complexity . . . . . . . . . . . . . . . . . . 93
9.2.6.2 Arabic sentences complexity . . . . . . . . . . . . . . . . . . . 94
9.2.7 Open-LLMs in SRL and Cross-lingual annotation projection . . . . . . . 95
9.2.7.1 mBERT for SRL and emotion analysis . . . . . . . . . . . . . 97
9.2.7.2 mBART for Cross-lingual Annotation Projection . . . . . . . 98
9.3 Results and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
9.3.1 Comparison of ChatGPT translation with Expert translation . . . . . . 98
9.3.2 Comparison with manual annotation . . . . . . . . . . . . . . . . . . . . 101
9.3.2.1 Evaluation of ChatGPT’s in SRL . . . . . . . . . . . . . . . 102
9.3.2.2 Evaluation of ChatGPT’s in cross-lingual annotation projection103
9.3.3 Complexity classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9.3.3.1 ChatGPT as a tool for SRL . . . . . . . . . . . . . . . . . . . 104
9.3.3.2 ChatGPT for cross-lingual Annotation Projection . . . . . . . 105
9.3.4 Evaluating Open-LLM Performance . . . . . . . . . . . . . . . . . . . . 106
9.4 General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.4.1 Key challenges, linguistic characteristics, and driving remarks . . . . . . 107
9.4.2 ChatGPT limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
9.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
III Conclusion and Future Perspectives 113
IV Bibliography 118 |
| Côte titre : |
DI/0097 |
Structured Emotion Analysis from Arabic Text [document électronique] / Ferial Senator, Auteur ; Lakhfif, Abdelaziz, Directeur de thèse . - [S.l.] : Sétif:UFA1, 2026 . - 1 vol (139 f.) ; 29 cm. Langues : Anglais ( eng)
| Catégories : |
Thèses & Mémoires:Informatique
|
| Mots-clés : |
Arabic
NLP
SRL
Cross-Lingual Annotation Projection
LLMs
ChatGPT
Emotion Analysis
Structural emotions |
| Index. décimale : |
004 - Informatique |
| Résumé : |
In the field of Natural Language Processing (NLP), emotion analysis aims to map
textual content with a predefined set of human emotions, typically including joy, anger,
fear, surprise, disgust, and sadness. Current state-of-the-art research mainly focuses
on identifying emotions in text using categories inspired by psychological theories, such
as Ekman’s (1992) basic emotions. Despite the importance of emotion detection, most
analyses are shallow and insufficient for tasks that require a deeper understanding of
emotional meaning in context. Such applications necessitate addressing key questions,
including identifying the cause that triggered the emotion (Cause), determining who
experienced it (Experiencer), and more generally addressing structural questions such
as who did what (Cue), to whom (Target), why (Cause), and how (Manner). This
doctoral thesis aims to propose original and effective solutions to address the lack of
resources and models dedicated to the structural analysis of emotions in Arabic text.
To achieve this, we introduce a novel approach for analyzing the argument structure
of emotions in Arabic, leveraging recent advances in Transformer-based architectures
and, in particular, the capabilities of large language models (LLMs) for Arabic.
The main contributions of this thesis are multifold. The first contribution consists
of the construction and annotation of the first Arabic corpus dedicated to structured
emotion analysis, named ‘AraERL’. The thesis also provides an in-depth examination
of the impact of each semantic argument on the performance of emotion identification.
In addition, it explores the use of ChatGPT for annotating Arabic texts with semantic
roles and emotions through an interlingual annotation projection approach. The
work further evaluates ChatGPT’s ability to accurately translate English semantic and
emotional annotation into Arabic. Finally, it offers a comprehensive comparison of the
performance of open large language models for these tasks. |
| Note de contenu : |
Sommaire
1 General Introduction 1
I Background and Literature Review 7
2 Arabic language processing 8
2.1 Major Challenges in Arabic Language Processing . . . . . . . . . . . . . . . . . 9
2.2 Characteristics of the Arabic Language . . . . . . . . . . . . . . . . . . . . . . 9
2.3 NLP resources for Arabic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Semantic Role Labeling 13
3.1 Overview and Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Labeling steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Semantic role labeling projects and resources . . . . . . . . . . . . . . . . . . . 15
3.2.1 English Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1.1 The Berkeley FrameNet project . . . . . . . . . . . . . . . . . 16
3.2.1.2 Propbank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1.3 CoNLL Shared Task . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1.4 VerbNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2 Arabic resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2.1 The Arabic PropBank . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2.2 Arabic VerbNet . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.2.3 The Arabic FrameNet . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Taxonomy of SRL task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.1 SRL Annotation process . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.1.1 Manual SRL Annotation . . . . . . . . . . . . . . . . . . . . . 21
3.3.1.2 Semi-automatic SRL Annotation . . . . . . . . . . . . . . . . 22
3.3.1.3 Automatic SRL Annotation . . . . . . . . . . . . . . . . . . . 22
3.3.1.3.1 Self-augmentation . . . . . . . . . . . . . . . . . . . 22
3.3.1.3.2 Annotation projection . . . . . . . . . . . . . . . . . 22
3.3.2 SRL Methods and techniques . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.2.1 Rule based . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.2.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.2.3 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.2.4 Large Language Models . . . . . . . . . . . . . . . . . . . . . 25
3.3.3 Learning Strategies for SRL . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.3.1 Transfer Learning / Cross-lingual Learning . . . . . . . . . . 26
3.3.3.2 Zero-Shot Learning . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.3.3 Few-Shot Learning . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.4.1 Precision, Recall, and F1-score . . . . . . . . . . . . . . . . . 27
3.3.4.2 Alignement with Gold Standard . . . . . . . . . . . . . . . . 28
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 Emotion analysis 29
4.1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Emotions Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5 Emotion Role Labeling 35
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Semantic frames associated with emotions . . . . . . . . . . . . . . . . . . . . . 36
5.3 Detailed Emotion Role Labeling example . . . . . . . . . . . . . . . . . . . . . 38
5.4 Existing Emotion Role Labeling dataset’s . . . . . . . . . . . . . . . . . . . . . 39
5.5 Emotion stimulus detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.6 Emotional SRL challenges for low resource language . . . . . . . . . . . . . . . 40
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6 Literature review 42
6.1 Semantic Role Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.1.1 Rule-based SRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.1.2 Classical Machine Learning-Based SRL . . . . . . . . . . . . . . . . . . 44
6.1.3 Neural Network-Based SRL . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.1.3.1 End-to-End SRL . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.1.3.2 Transformer based SRL . . . . . . . . . . . . . . . . . . . . . 48
6.1.4 Large Language Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.2 Semantic role labeling for Arabic . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.3 Emotion Role Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.3.1 Emotion-cause extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.4 ChatGPT in Cross-lingual projection . . . . . . . . . . . . . . . . . . . . . . . . 57
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
II Contributions 59
7 Arabic Emotion Role Labeling corpus 60
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.2 Research methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.2.1 Corpus creation and annotation process . . . . . . . . . . . . . . . . . . 62
7.2.1.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.2.1.2 Annotation tool . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.2.1.3 Emotions category . . . . . . . . . . . . . . . . . . . . . . . . 63
7.2.1.4 Semantic role labeling . . . . . . . . . . . . . . . . . . . . . . 64
7.2.2 Validation and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.2.2.1 Validation of the Dataset . . . . . . . . . . . . . . . . . . . . 64
7.2.2.2 Corpus statistics . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.2.2.3 Data storage format . . . . . . . . . . . . . . . . . . . . . . . 66
7.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8 SRL and Emotion Detection using transformer based models 69
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8.2.2 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.2.3 Masking roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.2.4 Dataset splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.2.5 AraBERT Pre-trained language model . . . . . . . . . . . . . . . . . . . 73
8.2.6 Hyperparameter for AraBERT . . . . . . . . . . . . . . . . . . . . . . . 74
8.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.3.1 Masking cue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.3.2 Masking experiencer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.3.3 Masking cause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.3.4 Masking target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
8.3.5 Emotion Classification with all semantic roles . . . . . . . . . . . . . . . 77
8.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.5 Comparison of Our Dataset with Existing Datasets . . . . . . . . . . . . . . . . 79
8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
9 Leveraging ChatGPT for enhancing Arabic NLP: Application for SRL and
Cross-lingual annotation projection 82
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
9.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
9.2.1 Data collection and preparation . . . . . . . . . . . . . . . . . . . . . . . 86
9.2.2 Humanized translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9.2.3 Emotional Semantic role labeling . . . . . . . . . . . . . . . . . . . . . . 87
9.2.3.1 Dominant emotion and intensity . . . . . . . . . . . . . . . . 87
9.2.3.2 ChatGPT semantic role labeling pipeline . . . . . . . . . . . 88
9.2.4 Cross-lingual Annotation Projection . . . . . . . . . . . . . . . . . . . . 90
9.2.4.1 Translation mechanism of ChatGPT . . . . . . . . . . . . . . 90
9.2.4.2 ChatGPT’s cross-lingual annotation projection pipeline . . . 91
9.2.5 Annotation tasks and comparison . . . . . . . . . . . . . . . . . . . . . . 92
9.2.6 Classification of sentence complexity . . . . . . . . . . . . . . . . . . . . 93
9.2.6.1 English sentences complexity . . . . . . . . . . . . . . . . . . 93
9.2.6.2 Arabic sentences complexity . . . . . . . . . . . . . . . . . . . 94
9.2.7 Open-LLMs in SRL and Cross-lingual annotation projection . . . . . . . 95
9.2.7.1 mBERT for SRL and emotion analysis . . . . . . . . . . . . . 97
9.2.7.2 mBART for Cross-lingual Annotation Projection . . . . . . . 98
9.3 Results and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
9.3.1 Comparison of ChatGPT translation with Expert translation . . . . . . 98
9.3.2 Comparison with manual annotation . . . . . . . . . . . . . . . . . . . . 101
9.3.2.1 Evaluation of ChatGPT’s in SRL . . . . . . . . . . . . . . . 102
9.3.2.2 Evaluation of ChatGPT’s in cross-lingual annotation projection103
9.3.3 Complexity classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9.3.3.1 ChatGPT as a tool for SRL . . . . . . . . . . . . . . . . . . . 104
9.3.3.2 ChatGPT for cross-lingual Annotation Projection . . . . . . . 105
9.3.4 Evaluating Open-LLM Performance . . . . . . . . . . . . . . . . . . . . 106
9.4 General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.4.1 Key challenges, linguistic characteristics, and driving remarks . . . . . . 107
9.4.2 ChatGPT limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
9.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
III Conclusion and Future Perspectives 113
IV Bibliography 118 |
| Côte titre : |
DI/0097 |
|