|
| Titre : |
Autonomous Drone Delivery Navigation via Reinforcement Learning |
| Type de document : |
document électronique |
| Auteurs : |
Zakaria Bouraba ; Abdelmadjed Nabti, Auteur ; Djamila Mechta, Directeur de thèse |
| Editeur : |
Setif:UFA |
| Année de publication : |
2025 |
| Importance : |
1 vol (48 f .) |
| Format : |
29 cm |
| Langues : |
Anglais (eng) |
| Catégories : |
Thèses & Mémoires:Informatique
|
| Mots-clés : |
Autonomous drones delivery
Reinforcement learning
Multi-agent systems
Actro-Crtic
Transformer |
| Index. décimale : |
004 Informatique |
| Résumé : |
Autonomous drone delivery is increasingly recognized for its potential to expedite and decarbonize lastmile
logistics. However, challenges such as congested airspace, restricted flight zones, and limited onboard
energy persist. This dissertation presents a two-stage reinforcement-learning architecture designed to address
these issues. In the first stage, the single-drone delivery problem is formulated as a Markov Decision Process
(MDP), where time- and energy-efficient trajectories are learned using both tabular Q-learning and its
Double Q-learning variant. Empirical results in a prototypical grid environment show that Double Q-learning
accelerates convergence by approximately 30% and yields routes that are, on average, 15% shorter than those
produced by standard Q-learning. The second stage focuses on collaborative multi-drone operations through
MATAC (Multi-Agent Transformer-based Actor-Critic), enhanced with a Lagrangian proximal policy optimization
scheme. Two separate critics are employed to maximize performance and strictly enforce no-fly
zone constraints simultaneously. In large-scale urban simulations, MATAC achieves near-zero constraint violations,
reduces average delivery time by 25%, and lowers per-drone energy consumption by 20% compared to
capacity-matched MLP+PPO and unconstrained PPO baselines. |
| Note de contenu : |
Sommaire
Abstract i
Dedication iv
Table of contents vii
List of figures viii
List of tables ix
List of algorithms x
Abbreviations xii
General Introduction 1
1 State of the art: Drone Delivery 4
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Drone Delivery in Modern Logistics . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Evolution of Delivery Technologies . . . . . . . . . . . . . . . . 4
1.2.2 Operational and Environmental Benefits . . . . . . . . . . . . . 5
1.3 Artificial Intelligence Applied to Autonomous Systems . . . . . . . . . 6
1.3.1 The Role of AI in Autonomous Navigation for Delivery Drones . 6
1.4 Foundations of Reinforcement Learning . . . . . . . . . . . . . . . . . . 8
1.4.1 Key Concepts: State, Action, Reward, Policy . . . . . . . . . . 8
1.4.2 Q-Learning, Double Q-Learning, and Related Approaches . . . 9
1.5 Reinforcement Learning for Autonomous Drone Delivery . . . . . . . . 11
1.5.1 Intelligent Navigation Using Reinforcement Learning . . . . . . 11
1.5.2 Adaptive Route Planning . . . . . . . . . . . . . . . . . . . . . 11
1.5.3 Recent Work and Industrial Applications . . . . . . . . . . . . 12
1.6 Limitations of Existing Approaches and Motivation . . . . . . . . . . . 13
1.6.1 Limitations of Pre-Programmed Systems . . . . . . . . . . . . . 13
1.6.2 Challenges in Route and Decision Optimization . . . . . . . . . 14
1.6.3 Motivation for applying Reinforcement Learning . . . . . . . . . 14
1.7 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7.1 Single-Agent RL . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7.2 Multi-Agent RL . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7.3 Constrained/Safety-Aware Approaches . . . . . . . . . . . . . . 17
1.7.4 Research Trajectory and Gaps . . . . . . . . . . . . . . . . . . . 18
1.7.5 Positioning of the Proposed Architecture . . . . . . . . . . . . . 18
1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Autonomous Drone Delivery Navigation: Single Agent and Multi-
Agent Drone Delivery navigation 20
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Why Autonomous Drone Delivery? . . . . . . . . . . . . . . . . 21
2.2.2 Why Reinforcement Learning? . . . . . . . . . . . . . . . . . . . 21
2.2.3 Why Single-Agent RL? . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.4 Why Multi-Agent RL? . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 First Scenario: Drone-Delivery Navigation as a Single Agent . . . . . . 22
2.3.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.3 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.4 Complexity Discussion . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Second Scenario: Drone-Delivery Navigation as a Multi-Agent System . 25
2.4.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.3 MATAC Architecture . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.4 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.5 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Simulation and Results 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Single-Drone Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Multi-Drone Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.2 Benchmark Algorithms . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
General Conclusion 41
Bibliography 43 |
| Côte titre : |
MAI/1044 |
Autonomous Drone Delivery Navigation via Reinforcement Learning [document électronique] / Zakaria Bouraba ; Abdelmadjed Nabti, Auteur ; Djamila Mechta, Directeur de thèse . - [S.l.] : Setif:UFA, 2025 . - 1 vol (48 f .) ; 29 cm. Langues : Anglais ( eng)
| Catégories : |
Thèses & Mémoires:Informatique
|
| Mots-clés : |
Autonomous drones delivery
Reinforcement learning
Multi-agent systems
Actro-Crtic
Transformer |
| Index. décimale : |
004 Informatique |
| Résumé : |
Autonomous drone delivery is increasingly recognized for its potential to expedite and decarbonize lastmile
logistics. However, challenges such as congested airspace, restricted flight zones, and limited onboard
energy persist. This dissertation presents a two-stage reinforcement-learning architecture designed to address
these issues. In the first stage, the single-drone delivery problem is formulated as a Markov Decision Process
(MDP), where time- and energy-efficient trajectories are learned using both tabular Q-learning and its
Double Q-learning variant. Empirical results in a prototypical grid environment show that Double Q-learning
accelerates convergence by approximately 30% and yields routes that are, on average, 15% shorter than those
produced by standard Q-learning. The second stage focuses on collaborative multi-drone operations through
MATAC (Multi-Agent Transformer-based Actor-Critic), enhanced with a Lagrangian proximal policy optimization
scheme. Two separate critics are employed to maximize performance and strictly enforce no-fly
zone constraints simultaneously. In large-scale urban simulations, MATAC achieves near-zero constraint violations,
reduces average delivery time by 25%, and lowers per-drone energy consumption by 20% compared to
capacity-matched MLP+PPO and unconstrained PPO baselines. |
| Note de contenu : |
Sommaire
Abstract i
Dedication iv
Table of contents vii
List of figures viii
List of tables ix
List of algorithms x
Abbreviations xii
General Introduction 1
1 State of the art: Drone Delivery 4
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Drone Delivery in Modern Logistics . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Evolution of Delivery Technologies . . . . . . . . . . . . . . . . 4
1.2.2 Operational and Environmental Benefits . . . . . . . . . . . . . 5
1.3 Artificial Intelligence Applied to Autonomous Systems . . . . . . . . . 6
1.3.1 The Role of AI in Autonomous Navigation for Delivery Drones . 6
1.4 Foundations of Reinforcement Learning . . . . . . . . . . . . . . . . . . 8
1.4.1 Key Concepts: State, Action, Reward, Policy . . . . . . . . . . 8
1.4.2 Q-Learning, Double Q-Learning, and Related Approaches . . . 9
1.5 Reinforcement Learning for Autonomous Drone Delivery . . . . . . . . 11
1.5.1 Intelligent Navigation Using Reinforcement Learning . . . . . . 11
1.5.2 Adaptive Route Planning . . . . . . . . . . . . . . . . . . . . . 11
1.5.3 Recent Work and Industrial Applications . . . . . . . . . . . . 12
1.6 Limitations of Existing Approaches and Motivation . . . . . . . . . . . 13
1.6.1 Limitations of Pre-Programmed Systems . . . . . . . . . . . . . 13
1.6.2 Challenges in Route and Decision Optimization . . . . . . . . . 14
1.6.3 Motivation for applying Reinforcement Learning . . . . . . . . . 14
1.7 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7.1 Single-Agent RL . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7.2 Multi-Agent RL . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7.3 Constrained/Safety-Aware Approaches . . . . . . . . . . . . . . 17
1.7.4 Research Trajectory and Gaps . . . . . . . . . . . . . . . . . . . 18
1.7.5 Positioning of the Proposed Architecture . . . . . . . . . . . . . 18
1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Autonomous Drone Delivery Navigation: Single Agent and Multi-
Agent Drone Delivery navigation 20
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Why Autonomous Drone Delivery? . . . . . . . . . . . . . . . . 21
2.2.2 Why Reinforcement Learning? . . . . . . . . . . . . . . . . . . . 21
2.2.3 Why Single-Agent RL? . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.4 Why Multi-Agent RL? . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 First Scenario: Drone-Delivery Navigation as a Single Agent . . . . . . 22
2.3.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.3 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.4 Complexity Discussion . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Second Scenario: Drone-Delivery Navigation as a Multi-Agent System . 25
2.4.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.3 MATAC Architecture . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.4 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.5 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Simulation and Results 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Single-Drone Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Multi-Drone Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.2 Benchmark Algorithms . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
General Conclusion 41
Bibliography 43 |
| Côte titre : |
MAI/1044 |
|