Catalogue en ligne

University Sétif 1 FERHAT ABBAS Faculty of Sciences

Nouvelle recherche

Détail de l'auteur

Auteur Zakaria Bouraba

Documents disponibles écrits par cet auteur

Ajouter le résultat dans votre panier Affiner la recherche

Autonomous Drone Delivery Navigation via Reinforcement Learning / Zakaria Bouraba

Public

ISBD

Titre : Autonomous Drone Delivery Navigation via Reinforcement Learning
Type de document : document électronique
Auteurs : Zakaria Bouraba ; Abdelmadjed Nabti, Auteur ; Djamila Mechta, Directeur de thèse
Editeur : Setif:UFA
Année de publication : 2025
Importance : 1 vol (48 f .)
Format : 29 cm
Langues : Anglais (eng)
Catégories : Thèses & Mémoires:Informatique

Mots-clés : Autonomous drones delivery
Reinforcement learning
Multi-agent systems
Actro-Crtic
Transformer
Index. décimale : 004 Informatique
Résumé :
Autonomous drone delivery is increasingly recognized for its potential to expedite and decarbonize lastmile
logistics. However, challenges such as congested airspace, restricted flight zones, and limited onboard
energy persist. This dissertation presents a two-stage reinforcement-learning architecture designed to address
these issues. In the first stage, the single-drone delivery problem is formulated as a Markov Decision Process
(MDP), where time- and energy-efficient trajectories are learned using both tabular Q-learning and its
Double Q-learning variant. Empirical results in a prototypical grid environment show that Double Q-learning
accelerates convergence by approximately 30% and yields routes that are, on average, 15% shorter than those
produced by standard Q-learning. The second stage focuses on collaborative multi-drone operations through
MATAC (Multi-Agent Transformer-based Actor-Critic), enhanced with a Lagrangian proximal policy optimization
scheme. Two separate critics are employed to maximize performance and strictly enforce no-fly
zone constraints simultaneously. In large-scale urban simulations, MATAC achieves near-zero constraint violations,
reduces average delivery time by 25%, and lowers per-drone energy consumption by 20% compared to
capacity-matched MLP+PPO and unconstrained PPO baselines.
Note de contenu : Sommaire
Abstract i
Dedication iv
Table of contents vii
List of figures viii
List of tables ix
List of algorithms x
Abbreviations xii
General Introduction 1
1 State of the art: Drone Delivery 4
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Drone Delivery in Modern Logistics . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Evolution of Delivery Technologies . . . . . . . . . . . . . . . . 4
1.2.2 Operational and Environmental Benefits . . . . . . . . . . . . . 5
1.3 Artificial Intelligence Applied to Autonomous Systems . . . . . . . . . 6
1.3.1 The Role of AI in Autonomous Navigation for Delivery Drones . 6
1.4 Foundations of Reinforcement Learning . . . . . . . . . . . . . . . . . . 8
1.4.1 Key Concepts: State, Action, Reward, Policy . . . . . . . . . . 8
1.4.2 Q-Learning, Double Q-Learning, and Related Approaches . . . 9
1.5 Reinforcement Learning for Autonomous Drone Delivery . . . . . . . . 11
1.5.1 Intelligent Navigation Using Reinforcement Learning . . . . . . 11
1.5.2 Adaptive Route Planning . . . . . . . . . . . . . . . . . . . . . 11
1.5.3 Recent Work and Industrial Applications . . . . . . . . . . . . 12
1.6 Limitations of Existing Approaches and Motivation . . . . . . . . . . . 13
1.6.1 Limitations of Pre-Programmed Systems . . . . . . . . . . . . . 13
1.6.2 Challenges in Route and Decision Optimization . . . . . . . . . 14
1.6.3 Motivation for applying Reinforcement Learning . . . . . . . . . 14
1.7 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7.1 Single-Agent RL . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7.2 Multi-Agent RL . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7.3 Constrained/Safety-Aware Approaches . . . . . . . . . . . . . . 17
1.7.4 Research Trajectory and Gaps . . . . . . . . . . . . . . . . . . . 18
1.7.5 Positioning of the Proposed Architecture . . . . . . . . . . . . . 18
1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Autonomous Drone Delivery Navigation: Single Agent and Multi-
Agent Drone Delivery navigation 20
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Why Autonomous Drone Delivery? . . . . . . . . . . . . . . . . 21
2.2.2 Why Reinforcement Learning? . . . . . . . . . . . . . . . . . . . 21
2.2.3 Why Single-Agent RL? . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.4 Why Multi-Agent RL? . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 First Scenario: Drone-Delivery Navigation as a Single Agent . . . . . . 22
2.3.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.3 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.4 Complexity Discussion . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Second Scenario: Drone-Delivery Navigation as a Multi-Agent System . 25
2.4.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.3 MATAC Architecture . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.4 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.5 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Simulation and Results 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Single-Drone Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Multi-Drone Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.2 Benchmark Algorithms . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
General Conclusion 41
Bibliography 43
Côte titre : MAI/1044

Autonomous Drone Delivery Navigation via Reinforcement Learning [document électronique] / Zakaria Bouraba ; Abdelmadjed Nabti, Auteur ; Djamila Mechta, Directeur de thèse . - [S.l.] : Setif:UFA, 2025 . - 1 vol (48 f .) ; 29 cm.
Langues : Anglais (eng)
Catégories : Thèses & Mémoires:Informatique

Mots-clés : Autonomous drones delivery
Reinforcement learning
Multi-agent systems
Actro-Crtic
Transformer
Index. décimale : 004 Informatique
Résumé :
Autonomous drone delivery is increasingly recognized for its potential to expedite and decarbonize lastmile
logistics. However, challenges such as congested airspace, restricted flight zones, and limited onboard
energy persist. This dissertation presents a two-stage reinforcement-learning architecture designed to address
these issues. In the first stage, the single-drone delivery problem is formulated as a Markov Decision Process
(MDP), where time- and energy-efficient trajectories are learned using both tabular Q-learning and its
Double Q-learning variant. Empirical results in a prototypical grid environment show that Double Q-learning
accelerates convergence by approximately 30% and yields routes that are, on average, 15% shorter than those
produced by standard Q-learning. The second stage focuses on collaborative multi-drone operations through
MATAC (Multi-Agent Transformer-based Actor-Critic), enhanced with a Lagrangian proximal policy optimization
scheme. Two separate critics are employed to maximize performance and strictly enforce no-fly
zone constraints simultaneously. In large-scale urban simulations, MATAC achieves near-zero constraint violations,
reduces average delivery time by 25%, and lowers per-drone energy consumption by 20% compared to
capacity-matched MLP+PPO and unconstrained PPO baselines.
Note de contenu : Sommaire
Abstract i
Dedication iv
Table of contents vii
List of figures viii
List of tables ix
List of algorithms x
Abbreviations xii
General Introduction 1
1 State of the art: Drone Delivery 4
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Drone Delivery in Modern Logistics . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Evolution of Delivery Technologies . . . . . . . . . . . . . . . . 4
1.2.2 Operational and Environmental Benefits . . . . . . . . . . . . . 5
1.3 Artificial Intelligence Applied to Autonomous Systems . . . . . . . . . 6
1.3.1 The Role of AI in Autonomous Navigation for Delivery Drones . 6
1.4 Foundations of Reinforcement Learning . . . . . . . . . . . . . . . . . . 8
1.4.1 Key Concepts: State, Action, Reward, Policy . . . . . . . . . . 8
1.4.2 Q-Learning, Double Q-Learning, and Related Approaches . . . 9
1.5 Reinforcement Learning for Autonomous Drone Delivery . . . . . . . . 11
1.5.1 Intelligent Navigation Using Reinforcement Learning . . . . . . 11
1.5.2 Adaptive Route Planning . . . . . . . . . . . . . . . . . . . . . 11
1.5.3 Recent Work and Industrial Applications . . . . . . . . . . . . 12
1.6 Limitations of Existing Approaches and Motivation . . . . . . . . . . . 13
1.6.1 Limitations of Pre-Programmed Systems . . . . . . . . . . . . . 13
1.6.2 Challenges in Route and Decision Optimization . . . . . . . . . 14
1.6.3 Motivation for applying Reinforcement Learning . . . . . . . . . 14
1.7 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7.1 Single-Agent RL . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7.2 Multi-Agent RL . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7.3 Constrained/Safety-Aware Approaches . . . . . . . . . . . . . . 17
1.7.4 Research Trajectory and Gaps . . . . . . . . . . . . . . . . . . . 18
1.7.5 Positioning of the Proposed Architecture . . . . . . . . . . . . . 18
1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Autonomous Drone Delivery Navigation: Single Agent and Multi-
Agent Drone Delivery navigation 20
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Why Autonomous Drone Delivery? . . . . . . . . . . . . . . . . 21
2.2.2 Why Reinforcement Learning? . . . . . . . . . . . . . . . . . . . 21
2.2.3 Why Single-Agent RL? . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.4 Why Multi-Agent RL? . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 First Scenario: Drone-Delivery Navigation as a Single Agent . . . . . . 22
2.3.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.3 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.4 Complexity Discussion . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Second Scenario: Drone-Delivery Navigation as a Multi-Agent System . 25
2.4.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.3 MATAC Architecture . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.4 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.5 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Simulation and Results 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Single-Drone Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Multi-Drone Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.2 Benchmark Algorithms . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
General Conclusion 41
Bibliography 43
Côte titre : MAI/1044

Exemplaires (1)

Code-barres Cote Support Localisation Section Disponibilité
MAI/1044 MAI/1044 Mémoire Bibliothèque des sciences Anglais Disponible
Disponible

University Sétif 1 FERHAT ABBAS Faculty of Sciences

Détail de l'auteur

Auteur Zakaria Bouraba

Documents disponibles écrits par cet auteur

Exemplaires (1)

Accueil

Sélection de la langue

Se connecter

Adresse

Horaires d'ouverture :