Catalogue en ligne

University Sétif 1 FERHAT ABBAS Faculty of Sciences

Nouvelle recherche

Détail de l'auteur

Auteur Mohamed Anes Abdeldjalil Ouahab

Documents disponibles écrits par cet auteur

Ajouter le résultat dans votre panier Affiner la recherche

Exploring Large Language Model Generated texts for fake news detection / Mohamed Anes Abdeldjalil Ouahab

Public

ISBD

Titre : Exploring Large Language Model Generated texts for fake news detection
Type de document : texte imprimé
Auteurs : Mohamed Anes Abdeldjalil Ouahab, Auteur ; Zaki Houssem Eddine Meddour ; Drif,Ahlem, Directeur de thèse
Editeur : Setif:UFA
Année de publication : 2024
Importance : 1 vol (73 f .)
Format : 29 cm
Langues : Anglais (eng)
Catégories : Thèses & Mémoires:Informatique

Mots-clés : Large Language models(LLMs)
Transformers
BERT
Gemini
Fake news
Bert architecture
Authenticity description
Prompting,Application Programming Interface(API)
Index. décimale : 004 - Informatique
Résumé : Fake news, which involves the spread of false or misleading information
disguised as factual reporting, undermines public trust and democratic governance. This
research investigates the use of large language models (LLMs) and transformer-based
models, specifically Gemini and BERT, to enhance the detection of fake news. For this
reason, we fine-tune a Bert architecture that is more accurate for our dataset and we develop
a prompting template from Gemini (Large Language model from Google). This template
leverages the abilities of Gemini to generate a relevant context features with authenticity
description. Also, we apply another Bert architecture to the LLM-generated features.
Finally, we fuse the output of the two architectures. The results show the effectiveness of
enhancing fake news detection by exploring LLM prompting API (Application Programming
Interface). The proposed methodology gained 4% of accuracy improvement and 4% of
precision comparing to the state of the art approaches.
Note de contenu : Sommaire
Introduction 11
I A survey on Natural language processing and machine learning 13
1 Machine learning - Theoretical background 14
1.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1.2 Types of Machine Learning Models . . . . . . . . . . . . . . . . 15
1.1.2.1 Supervised Learning . . . . . . . . . . . . . . . . . . . 16
1.1.2.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . 16
1.1.2.3 Semisupervised Learning . . . . . . . . . . . . . . . . 17
1.1.2.4 Reinforcement learning . . . . . . . . . . . . . . . . . 17
1.2 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.2 NLP Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.2 Transformer Architecture . . . . . . . . . . . . . . . . . . . . . 20
1.4.2.1 Encoder and Decoder Stacks . . . . . . . . . . . . . . . 21
1.4.2.2 Attention . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.2.3 Position-wise Feed-Forward Networks . . . . . . . . . 22
1.4.2.4 Embeddings and Softmax . . . . . . . . . . . . . . . . 22
1.4.2.5 Positional Encoding . . . . . . . . . . . . . . . . . . . 23
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2 Survey on Fake News Detection 26
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Definition of fake news . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Types of Fake news . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Fake news sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.1 Social Bots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4.2 Cyborg Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4.3 Trolls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Fake News Detection Methods . . . . . . . . . . . . . . . . . . . . . . . 31
2.5.1 Approaches Based on Unsupervised Learning . . . . . . . . . . . 32
2.5.2 Approaches Based on Supervised Learning . . . . . . . . . . . . 32
2.5.3 Approaches Based on Semi-supervised Learning . . . . . . . . . 33
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
II Our Contribution 36
3 Exploring LLM for fake news detection 37
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 The proposed Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.1 Small language model (Bert) . . . . . . . . . . . . . . . . . . . . 38
3.3.1.1 Definition: . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.1.2 Bert architecture . . . . . . . . . . . . . . . . . . . . . 39
3.3.2 Large Language Model . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.2.2 LLM prompting techniques . . . . . . . . . . . . . . . 40
3.3.2.3 The developed prompting method . . . . . . . . . . . . 42
3.3.3 Enhacing SLM using LLM features for Fake news detection . . . 43
3.4 Dataset Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5.1 Class distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6 Data Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.6.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4 Implementation and Experimentation 50
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.1 Programming languages and packages . . . . . . . . . . . . . . . 50
4.2.1.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.1.2 Numpy . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1.3 Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1.4 Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1.5 Scikit-Learn . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1.6 Pytorch . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.2 IDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.2.1 Jupyter Notebook . . . . . . . . . . . . . . . . . . . . . 52
4.2.2.2 Anaconda . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.2.3 Kaggle . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.2.4 Gemini . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.1 Bert model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.2 Bert+prompt model . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4.1 Classification accuracy . . . . . . . . . . . . . . . . . . . . . . . 58
4.4.2 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.3 Classification report . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.5 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5.1 Bert model evaluation . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5.2 Bert+Prompt model evaluation . . . . . . . . . . . . . . . . . . . 63
4.6 The performance comparison . . . . . . . . . . . . . . . . . . . . . . . . 67
4.7 The performance comparison between our proposed framework and previous
works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Côte titre : MAI/0863

Exploring Large Language Model Generated texts for fake news detection [texte imprimé] / Mohamed Anes Abdeldjalil Ouahab, Auteur ; Zaki Houssem Eddine Meddour ; Drif,Ahlem, Directeur de thèse . - [S.l.] : Setif:UFA, 2024 . - 1 vol (73 f .) ; 29 cm.
Langues : Anglais (eng)
Catégories : Thèses & Mémoires:Informatique

Mots-clés : Large Language models(LLMs)
Transformers
BERT
Gemini
Fake news
Bert architecture
Authenticity description
Prompting,Application Programming Interface(API)
Index. décimale : 004 - Informatique
Résumé : Fake news, which involves the spread of false or misleading information
disguised as factual reporting, undermines public trust and democratic governance. This
research investigates the use of large language models (LLMs) and transformer-based
models, specifically Gemini and BERT, to enhance the detection of fake news. For this
reason, we fine-tune a Bert architecture that is more accurate for our dataset and we develop
a prompting template from Gemini (Large Language model from Google). This template
leverages the abilities of Gemini to generate a relevant context features with authenticity
description. Also, we apply another Bert architecture to the LLM-generated features.
Finally, we fuse the output of the two architectures. The results show the effectiveness of
enhancing fake news detection by exploring LLM prompting API (Application Programming
Interface). The proposed methodology gained 4% of accuracy improvement and 4% of
precision comparing to the state of the art approaches.
Note de contenu : Sommaire
Introduction 11
I A survey on Natural language processing and machine learning 13
1 Machine learning - Theoretical background 14
1.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1.2 Types of Machine Learning Models . . . . . . . . . . . . . . . . 15
1.1.2.1 Supervised Learning . . . . . . . . . . . . . . . . . . . 16
1.1.2.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . 16
1.1.2.3 Semisupervised Learning . . . . . . . . . . . . . . . . 17
1.1.2.4 Reinforcement learning . . . . . . . . . . . . . . . . . 17
1.2 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.2 NLP Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.2 Transformer Architecture . . . . . . . . . . . . . . . . . . . . . 20
1.4.2.1 Encoder and Decoder Stacks . . . . . . . . . . . . . . . 21
1.4.2.2 Attention . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4.2.3 Position-wise Feed-Forward Networks . . . . . . . . . 22
1.4.2.4 Embeddings and Softmax . . . . . . . . . . . . . . . . 22
1.4.2.5 Positional Encoding . . . . . . . . . . . . . . . . . . . 23
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2 Survey on Fake News Detection 26
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Definition of fake news . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Types of Fake news . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Fake news sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.1 Social Bots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4.2 Cyborg Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4.3 Trolls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Fake News Detection Methods . . . . . . . . . . . . . . . . . . . . . . . 31
2.5.1 Approaches Based on Unsupervised Learning . . . . . . . . . . . 32
2.5.2 Approaches Based on Supervised Learning . . . . . . . . . . . . 32
2.5.3 Approaches Based on Semi-supervised Learning . . . . . . . . . 33
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
II Our Contribution 36
3 Exploring LLM for fake news detection 37
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 The proposed Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.1 Small language model (Bert) . . . . . . . . . . . . . . . . . . . . 38
3.3.1.1 Definition: . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.1.2 Bert architecture . . . . . . . . . . . . . . . . . . . . . 39
3.3.2 Large Language Model . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.2.2 LLM prompting techniques . . . . . . . . . . . . . . . 40
3.3.2.3 The developed prompting method . . . . . . . . . . . . 42
3.3.3 Enhacing SLM using LLM features for Fake news detection . . . 43
3.4 Dataset Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5.1 Class distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6 Data Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.6.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4 Implementation and Experimentation 50
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.1 Programming languages and packages . . . . . . . . . . . . . . . 50
4.2.1.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.1.2 Numpy . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1.3 Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1.4 Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1.5 Scikit-Learn . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1.6 Pytorch . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.2 IDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.2.1 Jupyter Notebook . . . . . . . . . . . . . . . . . . . . . 52
4.2.2.2 Anaconda . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.2.3 Kaggle . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.2.4 Gemini . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.1 Bert model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.2 Bert+prompt model . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4.1 Classification accuracy . . . . . . . . . . . . . . . . . . . . . . . 58
4.4.2 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.3 Classification report . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.5 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5.1 Bert model evaluation . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5.2 Bert+Prompt model evaluation . . . . . . . . . . . . . . . . . . . 63
4.6 The performance comparison . . . . . . . . . . . . . . . . . . . . . . . . 67
4.7 The performance comparison between our proposed framework and previous
works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Côte titre : MAI/0863

Exemplaires (1)

Code-barres Cote Support Localisation Section Disponibilité
MAI/0863 MAI/0863 Mémoire Bibliothèque des sciences Anglais Disponible
Disponible

University Sétif 1 FERHAT ABBAS Faculty of Sciences

Détail de l'auteur

Auteur Mohamed Anes Abdeldjalil Ouahab

Documents disponibles écrits par cet auteur

Exemplaires (1)

Accueil

Sélection de la langue

Se connecter

Adresse

Horaires d'ouverture :