Titre : |
Deep Learning for Breast cancer classification based on DNA data analysis |
Type de document : |
texte imprimé |
Auteurs : |
Ahlem Bakziz, Auteur ; Rania Rebai ; Yasmine Mansour, Directeur de thèse |
Editeur : |
Setif:UFA |
Année de publication : |
2024 |
Importance : |
1 vol (69 f .) |
Format : |
29 cm |
Langues : |
Anglais (eng) |
Catégories : |
Thèses & Mémoires:Informatique
|
Mots-clés : |
Bioinformatics
Genomics
Deep learning
NGS data analysis,
Cancer classification |
Index. décimale : |
004 - Informatique |
Résumé : |
challenge in terms of diagnosis and treatment. Next-Generation Sequencing
(NGS) has emerged as a powerful tool, offering profound insights into
the genetic landscape of various cancer types. This project aims to bridge the
gap between computer science and bioinformatics by developing an AI-based
application in Python to analyze NGS DNA data and classify cancer subtypes
with enhanced accuracy.
The objective of our research is coupling NGS DNA data analysis with artificial
intelligence (AI) techniques.this project underscores the potential of AI in
cancer genomics. The integration of advanced computational methods promises
to refine the classification of cancer subtypes based on NGS DNA data, thus
enhancing clinical relevance and accuracy.
The methodology involves the development of a user-friendly Python-based
application that harnesses AI algorithms to improve the accuracy and efficiency
of NGS DNA data interpretation, equipping them with essential skills for tackling
real-world challenges in cancer research.
Through this interdisciplinary endeavor, we aim to empower future researchers
to navigate the complex landscape of cancer biology with proficiency
and innovation. By harnessing the potential of AI and NGS technologies, we
endeavor to drive progress in cancer diagnosis, treatment, and patient care, ultimately
advancing the fight against this formidable disease. |
Note de contenu : |
Sommaire
Abbreviations xii
1 Bioinformatics and Cancer 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Molecules: DNA, RNA, miRNA . . . . . . . . . . . . 6
1.2.3 Gene Expression Analysis . . . . . . . . . . . . . . . 7
1.2.4 Next-Generation Sequencing . . . . . . . . . . . . . . 8
1.2.5 The genomic variations . . . . . . . . . . . . . . . . . 10
1.3 Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2 Cancer Stages . . . . . . . . . . . . . . . . . . . . . . 11
1.3.3 Cancer Therapy . . . . . . . . . . . . . . . . . . . . . 12
1.3.4 Breast cancer . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Precision medicine . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Artificial Intelligence (AI) 16
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Application of AI in bioinformatics . . . . . . . . . . . . . . 16
2.2.1 Sequence analysis: Genomics, transcriptomics and proteomics:
. . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Medical and clinical applications: From Personalized
medicine to enhanced decision-making . . . . . . . . 18
2.3 AI applications in bioinformatics . . . . . . . . . . . . . . . . 18
2.3.1 Machine Learning (ML) . . . . . . . . . . . . . . . . 18
2.3.2 Deep Learning (DL) . . . . . . . . . . . . . . . . . . 19
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 State of the art 26
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 AI for omic data analysis . . . . . . . . . . . . . . . . . . . . 26
3.2.1 Single Omic analysis . . . . . . . . . . . . . . . . . . 27
3.2.2 Multi-level Omic analysis . . . . . . . . . . . . . . . 27
3.2.3 Deep learning in multi-omics data . . . . . . . . . . . 27
3.2.4 Classification of cancer using DNA sequencing: Existing
AI models . . . . . . . . . . . . . . . . . . . . . . 28
3.2.5 Integration of DL models and statistical models . . . . 28
3.3 Read classifiers . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.1 Alignment-free read classifiers . . . . . . . . . . . . . 33
3.3.2 Alignment-based read classifiers . . . . . . . . . . . . 33
3.3.3 Assembly-based read classifiers . . . . . . . . . . . . 34
3.3.4 Read classifiers combining multiple other classifiers . 34
3.4 Breast cancer and machine/deep learning collaboration . . . . 34
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 Contribution and Experimentation 39
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Scientific question . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Data collection and pre-processing . . . . . . . . . . . . . . . 42
4.4.1 Data collection . . . . . . . . . . . . . . . . . . . . . 42
4.4.2 Data preprocessing . . . . . . . . . . . . . . . . . . . 43
4.5 Classification model: steps of MLP model . . . . . . . . . . . 43
4.5.1 Forward Pass: . . . . . . . . . . . . . . . . . . . . . . 43
4.5.2 Backward Pass (during training): . . . . . . . . . . . . 45
4.5.3 Training Loop: . . . . . . . . . . . . . . . . . . . . . 45
4.5.4 Key Points: . . . . . . . . . . . . . . . . . . . . . . . 45
4.5.5 Regarding the proposed model: . . . . . . . . . . . . 45
4.6 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . 46
4.6.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 46
4.6.2 Loss Function . . . . . . . . . . . . . . . . . . . . . . 46
4.7 Results and discussion . . . . . . . . . . . . . . . . . . . . . 46
4.7.1 Initial Loss and Rapid Decrease: . . . . . . . . . . . 50
4.7.2 Plateauing Loss: . . . . . . . . . . . . . . . . . . . . 50
4.7.3 Comparison of Training and Testing Loss: . . . . . . 50
4.7.4 Model Performance and Capacity: . . . . . . . . . . . 51
4.7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . 51
4.8 Development tools . . . . . . . . . . . . . . . . . . . . . . . 51
4.8.1 Environments . . . . . . . . . . . . . . . . . . . . . . 51
4.8.2 Programming language and libraries . . . . . . . . . . 52 |
Côte titre : |
MAI/0879 |
Deep Learning for Breast cancer classification based on DNA data analysis [texte imprimé] / Ahlem Bakziz, Auteur ; Rania Rebai ; Yasmine Mansour, Directeur de thèse . - [S.l.] : Setif:UFA, 2024 . - 1 vol (69 f .) ; 29 cm. Langues : Anglais ( eng)
Catégories : |
Thèses & Mémoires:Informatique
|
Mots-clés : |
Bioinformatics
Genomics
Deep learning
NGS data analysis,
Cancer classification |
Index. décimale : |
004 - Informatique |
Résumé : |
challenge in terms of diagnosis and treatment. Next-Generation Sequencing
(NGS) has emerged as a powerful tool, offering profound insights into
the genetic landscape of various cancer types. This project aims to bridge the
gap between computer science and bioinformatics by developing an AI-based
application in Python to analyze NGS DNA data and classify cancer subtypes
with enhanced accuracy.
The objective of our research is coupling NGS DNA data analysis with artificial
intelligence (AI) techniques.this project underscores the potential of AI in
cancer genomics. The integration of advanced computational methods promises
to refine the classification of cancer subtypes based on NGS DNA data, thus
enhancing clinical relevance and accuracy.
The methodology involves the development of a user-friendly Python-based
application that harnesses AI algorithms to improve the accuracy and efficiency
of NGS DNA data interpretation, equipping them with essential skills for tackling
real-world challenges in cancer research.
Through this interdisciplinary endeavor, we aim to empower future researchers
to navigate the complex landscape of cancer biology with proficiency
and innovation. By harnessing the potential of AI and NGS technologies, we
endeavor to drive progress in cancer diagnosis, treatment, and patient care, ultimately
advancing the fight against this formidable disease. |
Note de contenu : |
Sommaire
Abbreviations xii
1 Bioinformatics and Cancer 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Molecules: DNA, RNA, miRNA . . . . . . . . . . . . 6
1.2.3 Gene Expression Analysis . . . . . . . . . . . . . . . 7
1.2.4 Next-Generation Sequencing . . . . . . . . . . . . . . 8
1.2.5 The genomic variations . . . . . . . . . . . . . . . . . 10
1.3 Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2 Cancer Stages . . . . . . . . . . . . . . . . . . . . . . 11
1.3.3 Cancer Therapy . . . . . . . . . . . . . . . . . . . . . 12
1.3.4 Breast cancer . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Precision medicine . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Artificial Intelligence (AI) 16
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Application of AI in bioinformatics . . . . . . . . . . . . . . 16
2.2.1 Sequence analysis: Genomics, transcriptomics and proteomics:
. . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Medical and clinical applications: From Personalized
medicine to enhanced decision-making . . . . . . . . 18
2.3 AI applications in bioinformatics . . . . . . . . . . . . . . . . 18
2.3.1 Machine Learning (ML) . . . . . . . . . . . . . . . . 18
2.3.2 Deep Learning (DL) . . . . . . . . . . . . . . . . . . 19
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 State of the art 26
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 AI for omic data analysis . . . . . . . . . . . . . . . . . . . . 26
3.2.1 Single Omic analysis . . . . . . . . . . . . . . . . . . 27
3.2.2 Multi-level Omic analysis . . . . . . . . . . . . . . . 27
3.2.3 Deep learning in multi-omics data . . . . . . . . . . . 27
3.2.4 Classification of cancer using DNA sequencing: Existing
AI models . . . . . . . . . . . . . . . . . . . . . . 28
3.2.5 Integration of DL models and statistical models . . . . 28
3.3 Read classifiers . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.1 Alignment-free read classifiers . . . . . . . . . . . . . 33
3.3.2 Alignment-based read classifiers . . . . . . . . . . . . 33
3.3.3 Assembly-based read classifiers . . . . . . . . . . . . 34
3.3.4 Read classifiers combining multiple other classifiers . 34
3.4 Breast cancer and machine/deep learning collaboration . . . . 34
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 Contribution and Experimentation 39
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Scientific question . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Data collection and pre-processing . . . . . . . . . . . . . . . 42
4.4.1 Data collection . . . . . . . . . . . . . . . . . . . . . 42
4.4.2 Data preprocessing . . . . . . . . . . . . . . . . . . . 43
4.5 Classification model: steps of MLP model . . . . . . . . . . . 43
4.5.1 Forward Pass: . . . . . . . . . . . . . . . . . . . . . . 43
4.5.2 Backward Pass (during training): . . . . . . . . . . . . 45
4.5.3 Training Loop: . . . . . . . . . . . . . . . . . . . . . 45
4.5.4 Key Points: . . . . . . . . . . . . . . . . . . . . . . . 45
4.5.5 Regarding the proposed model: . . . . . . . . . . . . 45
4.6 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . 46
4.6.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 46
4.6.2 Loss Function . . . . . . . . . . . . . . . . . . . . . . 46
4.7 Results and discussion . . . . . . . . . . . . . . . . . . . . . 46
4.7.1 Initial Loss and Rapid Decrease: . . . . . . . . . . . 50
4.7.2 Plateauing Loss: . . . . . . . . . . . . . . . . . . . . 50
4.7.3 Comparison of Training and Testing Loss: . . . . . . 50
4.7.4 Model Performance and Capacity: . . . . . . . . . . . 51
4.7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . 51
4.8 Development tools . . . . . . . . . . . . . . . . . . . . . . . 51
4.8.1 Environments . . . . . . . . . . . . . . . . . . . . . . 51
4.8.2 Programming language and libraries . . . . . . . . . . 52 |
Côte titre : |
MAI/0879 |
|