Catalogue en ligne

University Sétif 1 FERHAT ABBAS Faculty of Sciences

Modifier la recherche

Derniers résultats de recherche

Nouvelle recherche

Document: texte imprimé

Offensive language detection in Arabic social media / Narimene Ayat

Public
ISBD

Titre :	Offensive language detection in Arabic social media
Type de document :	texte imprimé
Auteurs :	Narimene Ayat, Auteur ; Amira Guechi ; Lakhfif, Abdelaziz, Directeur de thèse
Editeur :	Setif:UFA
Année de publication :	2024
Importance :	1 vol (106 f .)
Format :	29 cm
Langues :	Anglais (eng)
Catégories :	Thèses & Mémoires:Informatique
Mots-clés :	Offensive language NLP Arabic BERT Twitter
Index. décimale :	004 - Informatique
Résumé :	In this MASTER thesis, we present a BERT based approach to detecte offensive discourse that is proliferating alarmingly on social media platforms, with particular attention to Twitter. This discourse, including hateful, discriminatory, and insulting remarks, has a negativelly impact on individuals and society. The aim of this project is to build BERT based models to combat offensive language content in social media networks. By utilizing several models such as Qarib, MARBERTV2, MARBERT, AraBertV02+CNN, and AraBertV2, and leveraging two distinct datasets, OSACT 2020 and OSACT 2022, our approach achieved an accuracy level of 95.2%. This work represents a significant advancement in combating offensive online discourse, thereby providing an effective means to promote a safer and more inclusive digital environment, specifically taking into account tweets in the Arabic language.
Note de contenu :	Sommaire Acknowledgments 1 Acknowledgments 2 ABSTRACT 3 RÉSUMÉ 4 Table of Content 9 List of Figures 11 List of Tables 13 General Introduction 14 1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2 Problematic . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 objectif : . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1 State of the art 17 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 17 2 Releted Work . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.1 Shared-task competitions : . . . . . . . . 18 2.1.2 Researchers data: . . . . . . . . . . . . . 19 2.2 Methods & Techniques . . . . . . . . . . . . . . . 20 2 THEORICAL BACKGROUND 26 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 26 2 CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3 RNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1 Layer . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1.1 Input layer . . . . . . . . . . . . . . . . . 27 3.1.2 Hidden Layers . . . . . . . . . . . . . . . 28 3.1.3 Output layer . . . . . . . . . . . . . . . . 28 3.2 Inconvinient . . . . . . . . . . . . . . . . . . . . . 28 4 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1 Encoder: . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Decoder . . . . . . . . . . . . . . . . . . . . . . . 29 5 Transformers . . . . . . . . . . . . . . . . . . . . . . . . 29 5.1 Transformer model: general architecture . . . . . 30 5.2 GPT (Generative Pre-trained Transformer) . . . 33 5.3 ELMO . . . . . . . . . . . . . . . . . . . . . . . . 34 5.4 BERT (Bidirectional Encoder Representations from Transformers): . . . . . . . . . . . . . . . . . . . . 35 5.4.1 BERT’s learning process : . . . . . . . . 37 5.4.2 BERT architecture : . . . . . . . . . . . . 38 5.4.3 BERT’s workings . . . . . . . . . . . . . . 39 6 NLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.1 Basics of NLP . . . . . . . . . . . . . . . . . . . . 40 6.1.1 Word Embeddings . . . . . . . . . . . . . 40 6.1.2 Tokenization . . . . . . . . . . . . . . . . 41 6.1.3 Stop word removal . . . . . . . . . . . . 41 6.1.4 Stemming and lemmatization . . . . . . . 41 6.2 NLP applications . . . . . . . . . . . . . . . . . . 41 6.3 What are the main methods used in NLP? . . . . 42 7 Optimization in deep learning . . . . . . . . . . . . . . . 42 8 Types of optimizers . . . . . . . . . . . . . . . . . . . . 42 8.1 Gradient Descent Optimizer . . . . . . . . . . . . 42 8.2 Stochastic Gradient Descent . . . . . . . . . . . . 43 8.3 Mini-Batch Gradient Descent . . . . . . . . . . . 43 8.4 Adam(Adaptive Moment Estimation) . . . . . . . 43 9 Regularization for Deep Learning . . . . . . . . . . . . . 43 9.1 Dropout . . . . . . . . . . . . . . . . . . . . . . . 43 9.2 Batch Normalization . . . . . . . . . . . . . . . . 44 10 Model Evaluation Metrics . . . . . . . . . . . . . . . . . 44 10.1 Confusion matrix : . . . . . . . . . . . . . . . . . 44 10.2 Accuracy : . . . . . . . . . . . . . . . . . . . . . . 45 10.3 F1-score : . . . . . . . . . . . . . . . . . . . . . . 45 11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 46 3 Social Media: challenges 47 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 47 2 Social Media . . . . . . . . . . . . . . . . . . . . . . . . 48 2.1 The ethical issues of social networks: challenges . 49 2.1.1 Twitter . . . . . . . . . . . . . . . . . . . 50 3 Offensive Languages and hate speech . . . . . . . . . . . 50 4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 51 4 Implementation & Evaluation 52 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 52 2 Preparation of Tools . . . . . . . . . . . . . . . . . . . . 52 2.1 Libraries . . . . . . . . . . . . . . . . . . . . . . 53 3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . 55 3.1.1 Data source 1: . . . . . . . . . . . . . . . 56 3.1.2 Data source 2: . . . . . . . . . . . . . . . 58 3.2 Data Cleaning . . . . . . . . . . . . . . . . . . . . 60 3.3 Preprocessing of Tweets . . . . . . . . . . . . . . 62 3.4 Tokenisation . . . . . . . . . . . . . . . . . . . . . 63 3.5 AraBERTV2 . . . . . . . . . . . . . . . . . . . . . 64 3.6 Qarib . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.6.1 A PyTorch Lightning Module for Arabic Text Classification . . . . . . . . . . . . . 65 ..........
Côte titre :	MAI/0882

Exemplaires (1)

Code-barres	Cote	Support	Localisation	Section	Disponibilité
MAI/0882	MAI/0882	Mémoire	Bibliothéque des sciences	Anglais	Disponible Disponible

A-
A
A+

Accueil

Se connecter

Mot de passe oublié ?

Adresse

Université Sétif -1- faculté des sciences el bez Sétif
19000 Sétif
Algérie

Horaires d'ouverture :

Dimanche:  8:00h-16h30
Lundi:         8:00h-16h30
Mardi:         8:00h-16h30
Mercredi:    8:00h-16h30
Jeudi:         8:00h-16h30