Catalogue en ligne

University Sétif 1 FERHAT ABBAS Faculty of Sciences

Nouvelle recherche

Détail de l'auteur

Auteur Djessas, ouissem

Documents disponibles écrits par cet auteur

Ajouter le résultat dans votre panier Affiner la recherche

Machine Learning Approach to Analyze Big Genomic Data / Djessas, ouissem

Public

ISBD

Titre : Machine Learning Approach to Analyze Big Genomic Data
Type de document : texte imprimé
Auteurs : Djessas, ouissem, Auteur
Importance : 1 vol (66 f .)
Format : 29 cm
Langues : Français (fre)
Catégories : Thèses & Mémoires:Informatique

Mots-clés : Big Genomic Data
Deep Learning
Spark
Random Forests
Decision
Trees.
ii
Index. décimale : 004 Informatique
Résumé : L’énorme quantité de données structurées et non structurées générées est difficile à
traiter à l’aide de techniques traditionnelles de bases de données et de logiciels. Dans
la plupart des scénarios d’entreprise, le volume de données est énorme, trop rapide ou
dépasse la capacité de traitement actuelle. Cela est devenu un gros défi dans l’analyse
des données. C’est pourquoi, de nouvelles technologies ont été proposées pour traiter ce
problème.
Data Mining est un processus consistant à extraire des connaissances d’une grande quantité
de données. Il est utilisé dans plusieurs domaines: médecine, marketing, industrie,
recherche opérationnelle, entre autres. Dans notre étude, nous nous sommes concentrés
sur le domaine de Big Genomic Data où les technologies Big Data sont utilisées pour
analyser des données génomiques afin de prédire ou de guérir des maladies.
Dans ce projet, nous sommes intéressés à répondre à un certain nombre de questions telles
que:
Quels outils et techniques sont utilisés dans le domaine de Big Genomic Data pour
analyser les données et extraire les connaissances.
i
Comment introduire la notion de parallélisme pour implémenter l’algorithme Deep
Learning selon le modèle de programmation Spark.
Note de contenu : Sommaire
Abstract i
acknowledgement iii
dedication iv
Contents v
List of Figures ix
List of Tables xi
General Introduction 1
1 Big Genomic Data State of Art 3
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 The Evolution of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Definitions of Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1 The Vs Of Big Data . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.1 Volume . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 Velocity . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.3 Variety . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.4 Veracity . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.5 Value . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Types of Big Data . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.1 Structured Data . . . . . . . . . . . . . . . . . . . . . 8
3.2.2 Unstructured Data . . . . . . . . . . . . . . . . . . . . 8
3.2.3 Semi-Structured Data . . . . . . . . . . . . . . . . . . 9
3.2.4 Quasi-structured data . . . . . . . . . . . . . . . . . . 9
3.3 Big Data Analysis Lifecycle . . . . . . . . . . . . . . . . . . . . 9
3.3.1 Data Collection Phase . . . . . . . . . . . . . . . . . . 9
v
3.3.2 Data Storage . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.3 Data Analytics . . . . . . . . . . . . . . . . . . . . . . 10
3.3.4 Knowledge Creation Phase . . . . . . . . . . . . . . . 10
4 Big Data and managing tools . . . . . . . . . . . . . . . . . . . . . . . . 11
4.0.1 Hadoop . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.0.2 Apache Spark . . . . . . . . . . . . . . . . . . . . . . 15
4.1 Big Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Big Data Application fields . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1 Big data in banking . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Big Data In Finance . . . . . . . . . . . . . . . . . . . . . . . . 16
5.3 Big Data In Economy . . . . . . . . . . . . . . . . . . . . . . . . 17
5.4 Big data in Telecom . . . . . . . . . . . . . . . . . . . . . . . . 17
5.5 Big data in Social Media . . . . . . . . . . . . . . . . . . . . . . 17
5.6 Big data in HealthCare . . . . . . . . . . . . . . . . . . . . . . . 17
6 Big Data In Genomics field . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.1 Genome Databases and the integration of Sequence Information . 19
6.1.1 Sequencing . . . . . . . . . . . . . . . . . . . . . . . 20
6.1.2 Genome Browsers . . . . . . . . . . . . . . . . . . . . 20
6.2 Genomic Data perspectives and challenges . . . . . . . . . . . . 21
6.3 Few Research Works in Big Genomic Data field . . . . . . . . . . 24
7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 Machine Learning for Big Genomic Data Analysis 28
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2 Data Mining Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Data Mining Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Unsupervised learning techniques . . . . . . . . . . . . . . . . . 35
4.3 Reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . 35
5 Machine Learning process . . . . . . . . . . . . . . . . . . . . . . . . . 36
6 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.1 Train Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . 38
6.1.1 Regularization . . . . . . . . . . . . . . . . . . . . . . 38
6.1.2 Weight initialization . . . . . . . . . . . . . . . . . . . 38
6.1.3 Activation function . . . . . . . . . . . . . . . . . . . 39
6.1.4 Loss function . . . . . . . . . . . . . . . . . . . . . . 39
vi
6.1.5 Backpropagation . . . . . . . . . . . . . . . . . . . . . 40
6.2 Deep Learning Architectures . . . . . . . . . . . . . . . . . . . . 41
6.2.1 Feed Forward Neural Network . . . . . . . . . . . . . 41
6.3 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . 42
6.3.1 Recurrent neural networks . . . . . . . . . . . . . . . . 44
6.3.2 Autoencoders (AEs) . . . . . . . . . . . . . . . . . . . 45
7 Deep learning for Genomics . . . . . . . . . . . . . . . . . . . . . . . . 46
8 Machine Learning in Genomics . . . . . . . . . . . . . . . . . . . . . . . 48
9 Deep Learning and Big Data Tool “Spark” . . . . . . . . . . . . . . . . . 48
10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3 Contribution:Feedforward Deep Neural Network for Classification of Genomic
Data using Spark 51
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2 Set Up Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.1 VMware Workstation . . . . . . . . . . . . . . . . . . . . . . . 51
2.2 Apache Spark . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3 Installation and Configuration of Apache Spark . . . . . . . . . . . . . . 53
4 Description of the Development Tools . . . . . . . . . . . . . . . . . . . 54
4.1 Jupyter Notebook . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Python 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Keras backend TensorFlow . . . . . . . . . . . . . . . . . . . . . 56
5 Architecture of Spark Analysis System . . . . . . . . . . . . . . . . . . . 57
6 Breast Cancer Dataset Description . . . . . . . . . . . . . . . . . . . . . 59
7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.1 Importing SparkMl libraries . . . . . . . . . . . . . . . . . . . . 60
7.2 Import the Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.3 Pretreatment and Scaling of the dataset . . . . . . . . . . . . . . 61
7.4 Splitting the Dataset . . . . . . . . . . . . . . . . . . . . . . . . 62
7.5 Training Our Models . . . . . . . . . . . . . . . . . . . . . . . . 62
7.6 Evaluation of the model . . . . . . . . . . . . . . . . . . . . . . 63
8 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
9 Results Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
General Conclusion
Côte titre : MAI/0297
En ligne : https://drive.google.com/file/d/1t-vFrIzIrUxIAo5jQs-5AeGNhb_c7FaT/view?usp=shari [...]
Format de la ressource électronique : pdf

Machine Learning Approach to Analyze Big Genomic Data [texte imprimé] / Djessas, ouissem, Auteur . - [s.d.] . - 1 vol (66 f .) ; 29 cm.
Langues : Français (fre)
Catégories : Thèses & Mémoires:Informatique

Mots-clés : Big Genomic Data
Deep Learning
Spark
Random Forests
Decision
Trees.
ii
Index. décimale : 004 Informatique
Résumé : L’énorme quantité de données structurées et non structurées générées est difficile à
traiter à l’aide de techniques traditionnelles de bases de données et de logiciels. Dans
la plupart des scénarios d’entreprise, le volume de données est énorme, trop rapide ou
dépasse la capacité de traitement actuelle. Cela est devenu un gros défi dans l’analyse
des données. C’est pourquoi, de nouvelles technologies ont été proposées pour traiter ce
problème.
Data Mining est un processus consistant à extraire des connaissances d’une grande quantité
de données. Il est utilisé dans plusieurs domaines: médecine, marketing, industrie,
recherche opérationnelle, entre autres. Dans notre étude, nous nous sommes concentrés
sur le domaine de Big Genomic Data où les technologies Big Data sont utilisées pour
analyser des données génomiques afin de prédire ou de guérir des maladies.
Dans ce projet, nous sommes intéressés à répondre à un certain nombre de questions telles
que:
Quels outils et techniques sont utilisés dans le domaine de Big Genomic Data pour
analyser les données et extraire les connaissances.
i
Comment introduire la notion de parallélisme pour implémenter l’algorithme Deep
Learning selon le modèle de programmation Spark.
Note de contenu : Sommaire
Abstract i
acknowledgement iii
dedication iv
Contents v
List of Figures ix
List of Tables xi
General Introduction 1
1 Big Genomic Data State of Art 3
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 The Evolution of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Definitions of Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1 The Vs Of Big Data . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.1 Volume . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 Velocity . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.3 Variety . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.4 Veracity . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.5 Value . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Types of Big Data . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.1 Structured Data . . . . . . . . . . . . . . . . . . . . . 8
3.2.2 Unstructured Data . . . . . . . . . . . . . . . . . . . . 8
3.2.3 Semi-Structured Data . . . . . . . . . . . . . . . . . . 9
3.2.4 Quasi-structured data . . . . . . . . . . . . . . . . . . 9
3.3 Big Data Analysis Lifecycle . . . . . . . . . . . . . . . . . . . . 9
3.3.1 Data Collection Phase . . . . . . . . . . . . . . . . . . 9
v
3.3.2 Data Storage . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.3 Data Analytics . . . . . . . . . . . . . . . . . . . . . . 10
3.3.4 Knowledge Creation Phase . . . . . . . . . . . . . . . 10
4 Big Data and managing tools . . . . . . . . . . . . . . . . . . . . . . . . 11
4.0.1 Hadoop . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.0.2 Apache Spark . . . . . . . . . . . . . . . . . . . . . . 15
4.1 Big Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Big Data Application fields . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1 Big data in banking . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Big Data In Finance . . . . . . . . . . . . . . . . . . . . . . . . 16
5.3 Big Data In Economy . . . . . . . . . . . . . . . . . . . . . . . . 17
5.4 Big data in Telecom . . . . . . . . . . . . . . . . . . . . . . . . 17
5.5 Big data in Social Media . . . . . . . . . . . . . . . . . . . . . . 17
5.6 Big data in HealthCare . . . . . . . . . . . . . . . . . . . . . . . 17
6 Big Data In Genomics field . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.1 Genome Databases and the integration of Sequence Information . 19
6.1.1 Sequencing . . . . . . . . . . . . . . . . . . . . . . . 20
6.1.2 Genome Browsers . . . . . . . . . . . . . . . . . . . . 20
6.2 Genomic Data perspectives and challenges . . . . . . . . . . . . 21
6.3 Few Research Works in Big Genomic Data field . . . . . . . . . . 24
7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 Machine Learning for Big Genomic Data Analysis 28
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2 Data Mining Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Data Mining Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Unsupervised learning techniques . . . . . . . . . . . . . . . . . 35
4.3 Reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . 35
5 Machine Learning process . . . . . . . . . . . . . . . . . . . . . . . . . 36
6 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.1 Train Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . 38
6.1.1 Regularization . . . . . . . . . . . . . . . . . . . . . . 38
6.1.2 Weight initialization . . . . . . . . . . . . . . . . . . . 38
6.1.3 Activation function . . . . . . . . . . . . . . . . . . . 39
6.1.4 Loss function . . . . . . . . . . . . . . . . . . . . . . 39
vi
6.1.5 Backpropagation . . . . . . . . . . . . . . . . . . . . . 40
6.2 Deep Learning Architectures . . . . . . . . . . . . . . . . . . . . 41
6.2.1 Feed Forward Neural Network . . . . . . . . . . . . . 41
6.3 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . 42
6.3.1 Recurrent neural networks . . . . . . . . . . . . . . . . 44
6.3.2 Autoencoders (AEs) . . . . . . . . . . . . . . . . . . . 45
7 Deep learning for Genomics . . . . . . . . . . . . . . . . . . . . . . . . 46
8 Machine Learning in Genomics . . . . . . . . . . . . . . . . . . . . . . . 48
9 Deep Learning and Big Data Tool “Spark” . . . . . . . . . . . . . . . . . 48
10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3 Contribution:Feedforward Deep Neural Network for Classification of Genomic
Data using Spark 51
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2 Set Up Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.1 VMware Workstation . . . . . . . . . . . . . . . . . . . . . . . 51
2.2 Apache Spark . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3 Installation and Configuration of Apache Spark . . . . . . . . . . . . . . 53
4 Description of the Development Tools . . . . . . . . . . . . . . . . . . . 54
4.1 Jupyter Notebook . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Python 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Keras backend TensorFlow . . . . . . . . . . . . . . . . . . . . . 56
5 Architecture of Spark Analysis System . . . . . . . . . . . . . . . . . . . 57
6 Breast Cancer Dataset Description . . . . . . . . . . . . . . . . . . . . . 59
7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.1 Importing SparkMl libraries . . . . . . . . . . . . . . . . . . . . 60
7.2 Import the Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.3 Pretreatment and Scaling of the dataset . . . . . . . . . . . . . . 61
7.4 Splitting the Dataset . . . . . . . . . . . . . . . . . . . . . . . . 62
7.5 Training Our Models . . . . . . . . . . . . . . . . . . . . . . . . 62
7.6 Evaluation of the model . . . . . . . . . . . . . . . . . . . . . . 63
8 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
9 Results Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
General Conclusion
Côte titre : MAI/0297
En ligne : https://drive.google.com/file/d/1t-vFrIzIrUxIAo5jQs-5AeGNhb_c7FaT/view?usp=shari [...]
Format de la ressource électronique : pdf

Exemplaires (1)

Code-barres Cote Support Localisation Section Disponibilité
MAI/0297 MAI/0297 Mémoire Bibliothéque des sciences Français Disponible
Disponible

Optimisation de la recherche d'information sur le web par les techniques vectorielles / Djessas, ouissem

Public

ISBD

Titre : Optimisation de la recherche d'information sur le web par les techniques vectorielles
Type de document : texte imprimé
Auteurs : Djessas, ouissem ; BOUCHOUL,F, Directeur de thèse
Editeur : Setif:UFA
Année de publication : 2017
Importance : 1 vol (37f.)
Format : 29 cm
Langues : Français (fre)
Catégories : Thèses & Mémoires:Informatique

Mots-clés : Ingénierie de Données
Technologies Web
recherche d'information
indexation
pertinence
terme
technique vectorielle
Index. décimale : 004 Informatique
Côte titre : MAI/0201

Optimisation de la recherche d'information sur le web par les techniques vectorielles [texte imprimé] / Djessas, ouissem ; BOUCHOUL,F, Directeur de thèse . - [S.l.] : Setif:UFA, 2017 . - 1 vol (37f.) ; 29 cm.
Langues : Français (fre)
Catégories : Thèses & Mémoires:Informatique

Mots-clés : Ingénierie de Données
Technologies Web
recherche d'information
indexation
pertinence
terme
technique vectorielle
Index. décimale : 004 Informatique
Côte titre : MAI/0201

Exemplaires (1)

Code-barres Cote Support Localisation Section Disponibilité
MAI/0201 MAI/0201 Mémoire Bibliothéque des sciences Français Disponible
Disponible