|
| Titre : |
Advanced bioinformatics tools and artificial intelligence in genomic data analysis: application in cancer diagnosis |
| Type de document : |
document électronique |
| Auteurs : |
Bouthaina Mezghiche ; Hind Bakhouche, Auteur ; ZENBOUT,Imene, Directeur de thèse |
| Editeur : |
Setif:UFA |
| Année de publication : |
2025 |
| Importance : |
1 vol (70 f .) |
| Format : |
29 cm |
| Langues : |
Anglais (eng) |
| Catégories : |
Thèses & Mémoires:Informatique
|
| Mots-clés : |
Bioinformatics
Machine learning
Deep learning
Artificial intelligence
Cancer classification
Transcriptomics
Proteomics
TCGA |
| Index. décimale : |
004 Informatique |
| Résumé : |
Throughout the past decades, proteomics and transcriptomics have contributed
to a large extent in unraveling the complex molecular mechanisms
of cancer. The widespread acceptance of next-generation sequencing and
high-throughput technologies has transformed our ability to study large-scale
biological data, yet several challenges still remain — from heterogeneity,
dimensionality of data to the multimodal information integration challenge.
Addressing these challenges is critical in making enhanced cancer diagnosis,
monitoring, and treatment a reality through the precision medicine paradigm.
Parallel to that, artificial intelligence more specifically machine learning
and deep learning — has emerged as a remarkable facilitator in biomedical
research capable of extracting useful patterns from complex data structures.
This thesis follows the integration of transcriptomic profiles with image-based
data derived from protein interaction networks, represented as protein graph
images. Through these various data sources, we aim to improve the accuracy
and interpretability of cancer classification models.From actual datasets in
The Cancer Genome Atlas (TCGA), we constructed and validated machine
learning models with MLP, SVM, and KNN classifiers to detect cancer-relevant
molecular signatures. Our integrated approach demonstrates the efficacy of
applying omics data and image representation in early detection. The work
contributes to the new field of computational oncology and provides a valuable
starting point for researchers designing multi-modal, AI-enabled tools for
personalized cancer diagnostics. |
| Note de contenu : |
Sommaire
Abstract iv
List of Figures ix
List of Tables xi
List of Acronyms xii
General Introduction 1
1 Background on Bioinformatics and Artificial Intelligence 4
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Biological concept . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Genomics . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Transcriptomics . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Proteomics . . . . . . . . . . . . . . . . . . . . . . 9
1.2.4 Metabolomics . . . . . . . . . . . . . . . . . . . . . 10
1.3 Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Bioinformatics in cancer . . . . . . . . . . . . . . . . . . . 12
1.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.2 Applications . . . . . . . . . . . . . . . . . . . . . 13
1.5 Cancer databases and platforms . . . . . . . . . . . . . . . . 14
1.6 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . 15
1.6.2 Types of Learning . . . . . . . . . . . . . . . . . . 16
1.7 Deep Learning Overview . . . . . . . . . . . . . . . . . . . 19
1.8 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . 19
1.8.1 Convolutional Neural Networks (CNNs) . . . . . . . 20
1.8.2 Auto-Encoders (AEs) . . . . . . . . . . . . . . . . . 20
1.8.3 Recurrent Neural Networks (RNNs) . . . . . . . . . 21
1.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Literature Review 23
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Collection of Research Papers . . . . . . . . . . . . . . . . 23
2.3 Selection and Filtering of Relevant Studies . . . . . . . . . . 24
2.4 Machine Learning in Omics Integration . . . . . . . . . . . 25
2.5 Deep Learning in Omics Integration . . . . . . . . . . . . . 28
2.6 Strengths and limitations . . . . . . . . . . . . . . . . . . . 30
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3 Contributions in omics data analysis 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Contributions general overview . . . . . . . . . . . . . . . . 33
3.3 Data collection . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Data image generation . . . . . . . . . . . . . . . . . . . . 36
3.6 Dimensionality reduction using biological databases . . . . . 38
3.7 Model Architecture . . . . . . . . . . . . . . . . . . . . . . 38
3.7.1 Encoding Transcriptomic Data using Autoencoders . 39
3.8 Fusion Strategy . . . . . . . . . . . . . . . . . . . . . . . . 40
3.8.1 Fusion of Transcriptomic Representations . . . . . . 41
3.8.2 Image Feature Extraction using Convolutional Autoencoder
. . . . . . . . . . . . . . . . . . . . . . . . . 41
3.9 Fusion of Transcriptomic and Proteomic Representations . . 42
3.10 Deep Neural Network Classifier for Cancer Type Prediction . 42
3.11 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.11.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . 43
3.11.2 F1-score . . . . . . . . . . . . . . . . . . . . . . . . 43
3.11.3 Precision . . . . . . . . . . . . . . . . . . . . . . . 43
3.11.4 Recall . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.12 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Experimental results and discussion 45
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Data collection and preparation . . . . . . . . . . . . . . . . 45
4.2.1 Protein-to-Gene Mapping and mRNA Filtering . . . 47
4.3 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . 49
4.4 Proteomic interaction image generation . . . . . . . . . . . 50
4.5 Image preprocessing . . . . . . . . . . . . . . . . . . . . . 51
4.6 Multi-modal traditional and Convolutional autoencoder . . . 53
4.7 Convolutional Autoencoder . . . . . . . . . . . . . . . . . . 54
4.8 Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.9 Multimodal Fusion Methods and Performance . . . . . . . . 57
4.10 Results and discussion of the classification . . . . . . . . . . 59
4.10.1 Side by side comparison . . . . . . . . . . . . . . . 60
4.11 Ablation study . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.11.1 Discussion . . . . . . . . . . . . . . . . . . . . . . 61
4.12 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Bibliography 65 |
| Côte titre : |
MAI/1001 |
Advanced bioinformatics tools and artificial intelligence in genomic data analysis: application in cancer diagnosis [document électronique] / Bouthaina Mezghiche ; Hind Bakhouche, Auteur ; ZENBOUT,Imene, Directeur de thèse . - [S.l.] : Setif:UFA, 2025 . - 1 vol (70 f .) ; 29 cm. Langues : Anglais ( eng)
| Catégories : |
Thèses & Mémoires:Informatique
|
| Mots-clés : |
Bioinformatics
Machine learning
Deep learning
Artificial intelligence
Cancer classification
Transcriptomics
Proteomics
TCGA |
| Index. décimale : |
004 Informatique |
| Résumé : |
Throughout the past decades, proteomics and transcriptomics have contributed
to a large extent in unraveling the complex molecular mechanisms
of cancer. The widespread acceptance of next-generation sequencing and
high-throughput technologies has transformed our ability to study large-scale
biological data, yet several challenges still remain — from heterogeneity,
dimensionality of data to the multimodal information integration challenge.
Addressing these challenges is critical in making enhanced cancer diagnosis,
monitoring, and treatment a reality through the precision medicine paradigm.
Parallel to that, artificial intelligence more specifically machine learning
and deep learning — has emerged as a remarkable facilitator in biomedical
research capable of extracting useful patterns from complex data structures.
This thesis follows the integration of transcriptomic profiles with image-based
data derived from protein interaction networks, represented as protein graph
images. Through these various data sources, we aim to improve the accuracy
and interpretability of cancer classification models.From actual datasets in
The Cancer Genome Atlas (TCGA), we constructed and validated machine
learning models with MLP, SVM, and KNN classifiers to detect cancer-relevant
molecular signatures. Our integrated approach demonstrates the efficacy of
applying omics data and image representation in early detection. The work
contributes to the new field of computational oncology and provides a valuable
starting point for researchers designing multi-modal, AI-enabled tools for
personalized cancer diagnostics. |
| Note de contenu : |
Sommaire
Abstract iv
List of Figures ix
List of Tables xi
List of Acronyms xii
General Introduction 1
1 Background on Bioinformatics and Artificial Intelligence 4
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Biological concept . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Genomics . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Transcriptomics . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Proteomics . . . . . . . . . . . . . . . . . . . . . . 9
1.2.4 Metabolomics . . . . . . . . . . . . . . . . . . . . . 10
1.3 Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Bioinformatics in cancer . . . . . . . . . . . . . . . . . . . 12
1.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.2 Applications . . . . . . . . . . . . . . . . . . . . . 13
1.5 Cancer databases and platforms . . . . . . . . . . . . . . . . 14
1.6 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . 15
1.6.2 Types of Learning . . . . . . . . . . . . . . . . . . 16
1.7 Deep Learning Overview . . . . . . . . . . . . . . . . . . . 19
1.8 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . 19
1.8.1 Convolutional Neural Networks (CNNs) . . . . . . . 20
1.8.2 Auto-Encoders (AEs) . . . . . . . . . . . . . . . . . 20
1.8.3 Recurrent Neural Networks (RNNs) . . . . . . . . . 21
1.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Literature Review 23
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Collection of Research Papers . . . . . . . . . . . . . . . . 23
2.3 Selection and Filtering of Relevant Studies . . . . . . . . . . 24
2.4 Machine Learning in Omics Integration . . . . . . . . . . . 25
2.5 Deep Learning in Omics Integration . . . . . . . . . . . . . 28
2.6 Strengths and limitations . . . . . . . . . . . . . . . . . . . 30
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3 Contributions in omics data analysis 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Contributions general overview . . . . . . . . . . . . . . . . 33
3.3 Data collection . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Data image generation . . . . . . . . . . . . . . . . . . . . 36
3.6 Dimensionality reduction using biological databases . . . . . 38
3.7 Model Architecture . . . . . . . . . . . . . . . . . . . . . . 38
3.7.1 Encoding Transcriptomic Data using Autoencoders . 39
3.8 Fusion Strategy . . . . . . . . . . . . . . . . . . . . . . . . 40
3.8.1 Fusion of Transcriptomic Representations . . . . . . 41
3.8.2 Image Feature Extraction using Convolutional Autoencoder
. . . . . . . . . . . . . . . . . . . . . . . . . 41
3.9 Fusion of Transcriptomic and Proteomic Representations . . 42
3.10 Deep Neural Network Classifier for Cancer Type Prediction . 42
3.11 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.11.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . 43
3.11.2 F1-score . . . . . . . . . . . . . . . . . . . . . . . . 43
3.11.3 Precision . . . . . . . . . . . . . . . . . . . . . . . 43
3.11.4 Recall . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.12 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Experimental results and discussion 45
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Data collection and preparation . . . . . . . . . . . . . . . . 45
4.2.1 Protein-to-Gene Mapping and mRNA Filtering . . . 47
4.3 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . 49
4.4 Proteomic interaction image generation . . . . . . . . . . . 50
4.5 Image preprocessing . . . . . . . . . . . . . . . . . . . . . 51
4.6 Multi-modal traditional and Convolutional autoencoder . . . 53
4.7 Convolutional Autoencoder . . . . . . . . . . . . . . . . . . 54
4.8 Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.9 Multimodal Fusion Methods and Performance . . . . . . . . 57
4.10 Results and discussion of the classification . . . . . . . . . . 59
4.10.1 Side by side comparison . . . . . . . . . . . . . . . 60
4.11 Ablation study . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.11.1 Discussion . . . . . . . . . . . . . . . . . . . . . . 61
4.12 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Bibliography 65 |
| Côte titre : |
MAI/1001 |
|