|
| Titre : |
Exploration de la pr´ediction de la r´ecidive du cancer du sein `a partir d’images histopathologiques en utilisant l’intelligence artificielle |
| Type de document : |
document électronique |
| Auteurs : |
Haoua Taiar, Auteur ; Seif eddine Chouaba, Directeur de thèse |
| Editeur : |
Setif:UFA |
| Année de publication : |
2025 |
| Importance : |
1 vol (59 f.) |
| Format : |
29 cm |
| Langues : |
Français (fre) |
| Catégories : |
Thèses & Mémoires:Physique
|
| Mots-clés : |
Breast cancer recurrence
Multiple Instance Learning (MIL)
CNN discriminator
Digital pathology
Histopathology |
| Index. décimale : |
530 - Physique |
| Résumé : |
This study presents a comparative analysis of multiple classification models for
predicting breast cancer recurrence from histopathological features, based on a deep
learning pipeline developed in prior work. The referenced pipeline employs a CNN-based
discriminator to score patches extracted from whole-slide images (WSIs), selects the
top most discriminative patches per slide using the Discrimination Score (DS), extracts
deep feature vectors, and classifies recurrence risk at the slide level using a Multiple
Instance Learning (MIL) framework.
Our contribution lies in the comprehensive evaluation and comparison of several
classification algorithms including Decision Trees, Bagging, Gaussian Naive Bayes, and
K-Nearest Neighbors on the same dataset and features produced by the existing pipeline.
The goal was to investigate their effectiveness, generalizability, and clinical relevance
for breast cancer recurrence prediction.
The MIL classifier, forming part of the original pipeline, achieved a slide-level AUC
of 0.82 and an accuracy of 78.5%, outperforming all traditional models. Among the
latter, Bagging demonstrated relatively strong results (AUC of 0.84), yet suffered from a
high false discovery rate (77.8%). Decision Trees showed decent validation performance
(AUC up to 0.76) but overfitted the training data. Gaussian Naive Bayes achieved
high precision (92.3% PPV) for a single class but failed to generalize across classes.
KNN reached an AUC of 0.85 during validation, which dropped significantly to 0.72 in
testing, indicating poor robustness.
This comparative study reinforces the advantages of MIL-based models for handling
weakly labeled medical data and highlights the limitations of classical approaches in
complex, high-dimensional imaging tasks. Our findings emphasize the importance
of both intelligent patch selection and advanced learning frameworks in improving
diagnostic reliability and decision support in digital pathology. |
| Note de contenu : |
Sommaire
Aknowledgments i
Abstract ii
List of Figures vii
List of Tables ix
ABBREVIATION LIST x
Introduction 1
1 Prediction of Breast Cancer recurrence – A Brief Review 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Traditional approaches to predict Breast cancer recurrence . . . . . . . 6
1.3.1 Approaches: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 Treatment of recurrent cancer: . . . . . . . . . . . . . . . . . . . 10
1.3.3 Oncotype DX: . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Histopathological images: . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 CNN Architecture, ML Approaches, and Transfer Learning 13
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Artificial intelligence: . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Convolutional Neural Network: . . . . . . . . . . . . . . . . . . 15
2.2.2 Backpropagation: . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Machine Learning approaches: . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 Decision Trees: . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.2 K-Nearest Neighbors: . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.3 Support vectors machines: . . . . . . . . . . . . . . . . . . . . 20
2.3.4 Neural-Networks: . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Transformers: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1 Initialization: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.2 Self attention: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.3 TransMIL: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Related work: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Conclusion: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Materials and Methods: 26
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Toolkits and Libraries: . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 Toolkits: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.2 Data analysis packages: . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Implementations and Frameworks: . . . . . . . . . . . . . . . . . . . . 30
3.3.1 Anaconda: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.2 Jupyter Nootbook: . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.3 Python: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Dataset: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.1 Description: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5 BCR-Net: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.1 CNN-Scorer: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.2 Multi Instance Leraning (MIL): . . . . . . . . . . . . . . . . . . 34
3.5.3 Training the CNN scorer: . . . . . . . . . . . . . . . . . . . . . . 36
3.5.4 Evaluation metrics: . . . . . . . . . . . . . . . . . . . . . . . . 36
3.6 Conclusion: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 Experimental Results and Discussion 40
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Results: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.1 CNN-Scorer: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.2 Tuning: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.3 Features extraction: . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.4 MIL (Multi-Instance-Learning): . . . . . . . . . . . . . . . . . 46
4.3 Discussion 1: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.1 CNN-Scorer performance: . . . . . . . . . . . . . . . . . . . . . 47
4.3.2 MIL performance: . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4 Comparative Study: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4.1 Fine Tree: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4.2 Bagged Tree: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4.3 Gaussian Naive Bayes: . . . . . . . . . . . . . . . . . . . . . . . 53
4.4.4 Subspace KNN: . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4.5 Discussion 2: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.5 Conclusion: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
General Conclusion 59 |
| Côte titre : |
MAPH/0677 |
Exploration de la pr´ediction de la r´ecidive du cancer du sein `a partir d’images histopathologiques en utilisant l’intelligence artificielle [document électronique] / Haoua Taiar, Auteur ; Seif eddine Chouaba, Directeur de thèse . - [S.l.] : Setif:UFA, 2025 . - 1 vol (59 f.) ; 29 cm. Langues : Français ( fre)
| Catégories : |
Thèses & Mémoires:Physique
|
| Mots-clés : |
Breast cancer recurrence
Multiple Instance Learning (MIL)
CNN discriminator
Digital pathology
Histopathology |
| Index. décimale : |
530 - Physique |
| Résumé : |
This study presents a comparative analysis of multiple classification models for
predicting breast cancer recurrence from histopathological features, based on a deep
learning pipeline developed in prior work. The referenced pipeline employs a CNN-based
discriminator to score patches extracted from whole-slide images (WSIs), selects the
top most discriminative patches per slide using the Discrimination Score (DS), extracts
deep feature vectors, and classifies recurrence risk at the slide level using a Multiple
Instance Learning (MIL) framework.
Our contribution lies in the comprehensive evaluation and comparison of several
classification algorithms including Decision Trees, Bagging, Gaussian Naive Bayes, and
K-Nearest Neighbors on the same dataset and features produced by the existing pipeline.
The goal was to investigate their effectiveness, generalizability, and clinical relevance
for breast cancer recurrence prediction.
The MIL classifier, forming part of the original pipeline, achieved a slide-level AUC
of 0.82 and an accuracy of 78.5%, outperforming all traditional models. Among the
latter, Bagging demonstrated relatively strong results (AUC of 0.84), yet suffered from a
high false discovery rate (77.8%). Decision Trees showed decent validation performance
(AUC up to 0.76) but overfitted the training data. Gaussian Naive Bayes achieved
high precision (92.3% PPV) for a single class but failed to generalize across classes.
KNN reached an AUC of 0.85 during validation, which dropped significantly to 0.72 in
testing, indicating poor robustness.
This comparative study reinforces the advantages of MIL-based models for handling
weakly labeled medical data and highlights the limitations of classical approaches in
complex, high-dimensional imaging tasks. Our findings emphasize the importance
of both intelligent patch selection and advanced learning frameworks in improving
diagnostic reliability and decision support in digital pathology. |
| Note de contenu : |
Sommaire
Aknowledgments i
Abstract ii
List of Figures vii
List of Tables ix
ABBREVIATION LIST x
Introduction 1
1 Prediction of Breast Cancer recurrence – A Brief Review 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Traditional approaches to predict Breast cancer recurrence . . . . . . . 6
1.3.1 Approaches: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 Treatment of recurrent cancer: . . . . . . . . . . . . . . . . . . . 10
1.3.3 Oncotype DX: . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Histopathological images: . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 CNN Architecture, ML Approaches, and Transfer Learning 13
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Artificial intelligence: . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Convolutional Neural Network: . . . . . . . . . . . . . . . . . . 15
2.2.2 Backpropagation: . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Machine Learning approaches: . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 Decision Trees: . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.2 K-Nearest Neighbors: . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.3 Support vectors machines: . . . . . . . . . . . . . . . . . . . . 20
2.3.4 Neural-Networks: . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Transformers: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1 Initialization: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.2 Self attention: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.3 TransMIL: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Related work: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Conclusion: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Materials and Methods: 26
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Toolkits and Libraries: . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 Toolkits: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.2 Data analysis packages: . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Implementations and Frameworks: . . . . . . . . . . . . . . . . . . . . 30
3.3.1 Anaconda: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.2 Jupyter Nootbook: . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.3 Python: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Dataset: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.1 Description: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5 BCR-Net: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.1 CNN-Scorer: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.2 Multi Instance Leraning (MIL): . . . . . . . . . . . . . . . . . . 34
3.5.3 Training the CNN scorer: . . . . . . . . . . . . . . . . . . . . . . 36
3.5.4 Evaluation metrics: . . . . . . . . . . . . . . . . . . . . . . . . 36
3.6 Conclusion: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 Experimental Results and Discussion 40
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Results: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.1 CNN-Scorer: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.2 Tuning: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.3 Features extraction: . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.4 MIL (Multi-Instance-Learning): . . . . . . . . . . . . . . . . . 46
4.3 Discussion 1: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.1 CNN-Scorer performance: . . . . . . . . . . . . . . . . . . . . . 47
4.3.2 MIL performance: . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4 Comparative Study: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4.1 Fine Tree: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4.2 Bagged Tree: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4.3 Gaussian Naive Bayes: . . . . . . . . . . . . . . . . . . . . . . . 53
4.4.4 Subspace KNN: . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4.5 Discussion 2: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.5 Conclusion: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
General Conclusion 59 |
| Côte titre : |
MAPH/0677 |
|