Titre : |
Crime Prediction Using Machine Learning |
Type de document : |
texte imprimé |
Auteurs : |
Houfaf, Nardjes, Auteur ; Toumi,Lyazid, Directeur de thèse |
Editeur : |
Setif:UFA |
Année de publication : |
2019 |
Importance : |
1 vol (59 f .) |
Format : |
29 cm |
Langues : |
Français (fre) |
Catégories : |
Thèses & Mémoires:Informatique
|
Mots-clés : |
Crime classification,
San Francisco crime dataset
Supervised classification
Sklearn (Scikit-learn |
Index. décimale : |
004 - Informatique |
Résumé : |
Crime is one of the biggest issue in our society, a huge numbers of crimes are committed daily .Treating criminal activities of a place and time is important in order to interrupt it. Law enforcement can predict crimes effectively and solve it faster if they have a better information about crime patterns in different points of a city. In this project we will use machine learning techniques to classify the type of a criminal incident, depending on its occurrence at a given time and location. The experimentation will be conducted on a dataset of San Francisco that contains crime records from 01-01-2003 to 08-02-2019. For this supervised classification problem, we will use Gaussian Naive Bayes, Decision Tree, k-Nearest Neighbor (knn), multinomial logistic regression, Random Forest and support vector machine (Svm) And for unsupervised classification problem, we will use clustering. The results achieved are experimentally evaluated and compared with a previous work. Lastly, in a Smart City, the law enforcement can apply this proposed model. |
Note de contenu : |
Sommaire
Motivation .............................................................................................................. 6
1.2 Problem Formulation ............................................................................................. 6
CHAPTER 02
Definitions and Techniques
2.1 Predictive Analytics ............................................................................................... 8
2.2 Classification Techniques ..................................................................................... 9
2.2.1 Binary classification ....................................................................................... 9
2.2.2Multiclass classification ................................................................................. 9
2.3 Log Loss Scoring ................................................................................................. 14
2.4 Parallel Processing ............................................................................................... 15
CHAPTER 03
Related Work
3.1 Temporal and Spectral Analysis .......................................................................... 17
3.2 Prediction using Clustering and Classification techniques .................................. 18
3.3 Hotspot Detection................................................................................................. 19
CHAPTER 4
Design and Implementation
4.1 Overview of the dataset ........................................................................................ 21
4.2 Data Preprocessing ............................................................................................... 22
4.2.1 Preprocessing using sklearn ........................................................................ 22
4.2.2 Techniques used for preprocessing .............................................................. 23
4.2.2.1 Data Cleaning ....................................................................................... 23
4.2.2.2 Data Transformation ............................................................................. 24
4.2.2.3 Data Reduction ..................................................................................... 26
4.3 Software and Technologies Used ......................................................................... 26
CHAPTER 5
Experimental Results
5.1 Comparison of this approach with existing results .............................................. 28
5.2 Results of Graphical Analysis .............................................................................. 30
5.3 Methodology and Results ..................................................................................... 38
5.3.1 import necessary modules ........................................................................... 38
5.3.2 Preparation of the dataset ............................................................................. 38
5.3.2.1 Importing the data set ........................................................................... 38
5.3.2.2 Data Exploration ................................................................................... 38
5.3.2.3 Data Cleaning ....................................................................................... 40
5.3.2.4 Data Reduction ..................................................................................... 42
5.3.3 Pre- processing of the dataset using SKlearn ............................................... 45
5.3.3.1 Importing the data set ........................................................................... 45
5.3.3.2 Drop the unnecessary features .............................................................. 45
5.3.3.3 Convert Time to datetime format (numeric format) ............................ 45
5.3.3.3.1 Extract Hour from Time .................................................................. 46
5.3.3.4 convert Date to datetime format ........................................................... 46
5.3.3.4.1 Extract Year from Date ................................................................... 46
5.3.3.4.2 Extract Month from Date ................................................................... 47
5.3.3.4.3 Extract Day from Date ....................................................................... 47
5.3.3.4.4 Encode PdDistrict .............................................................................. 47
5.3.3.5 Build new array and create train data and train label ........................... 48
5.3.3.5.1 Create response 'category' ............................................................... 48
5.3.3.5.2 Create feature dataframe ................................................................ 49
5.3.3.6 Split the dataset into crime train data and crime test data .................... 49
5.3.3.7 The classifiers used to calculate the log_loss of the crime ................... 49
5.3.3.7.1 Random Forest ................................................................................ 49
5.3.3.7.2 Gaussian Naive Bayes ..................................................................... 49
5.3.3.7.3 Logistic Regression ......................................................................... 50
5.3.3.7.4 Nearest neighbors ............................................................................ 50
5.3.3.7.5 Decision Tree .................................................................................. 50
5.3.3.7.6 Support Vector Machine ................................................................. 50
5.3.3.8 Plotting the comparison of Log Loss using Histogram ........................ 51
5.3.3.8 Distribution of Longitude and Latitude in San Francisco map............. 51
5.3.3.9 Hotspot different crimes densities ........................................................ 52
CHAPTER 06
Conclusion and Future Work ............................................................................................... 55
LIST OF REFERENCES ........................................................................................... 56
|
Côte titre : |
MAI/0333 |
En ligne : |
https://drive.google.com/file/d/1nSq_f6zmecLXYj2X8P2rRmz4LGzwpfpa/view?usp=shari [...] |
Format de la ressource électronique : |
pdf |
Crime Prediction Using Machine Learning [texte imprimé] / Houfaf, Nardjes, Auteur ; Toumi,Lyazid, Directeur de thèse . - [S.l.] : Setif:UFA, 2019 . - 1 vol (59 f .) ; 29 cm. Langues : Français ( fre)
Catégories : |
Thèses & Mémoires:Informatique
|
Mots-clés : |
Crime classification,
San Francisco crime dataset
Supervised classification
Sklearn (Scikit-learn |
Index. décimale : |
004 - Informatique |
Résumé : |
Crime is one of the biggest issue in our society, a huge numbers of crimes are committed daily .Treating criminal activities of a place and time is important in order to interrupt it. Law enforcement can predict crimes effectively and solve it faster if they have a better information about crime patterns in different points of a city. In this project we will use machine learning techniques to classify the type of a criminal incident, depending on its occurrence at a given time and location. The experimentation will be conducted on a dataset of San Francisco that contains crime records from 01-01-2003 to 08-02-2019. For this supervised classification problem, we will use Gaussian Naive Bayes, Decision Tree, k-Nearest Neighbor (knn), multinomial logistic regression, Random Forest and support vector machine (Svm) And for unsupervised classification problem, we will use clustering. The results achieved are experimentally evaluated and compared with a previous work. Lastly, in a Smart City, the law enforcement can apply this proposed model. |
Note de contenu : |
Sommaire
Motivation .............................................................................................................. 6
1.2 Problem Formulation ............................................................................................. 6
CHAPTER 02
Definitions and Techniques
2.1 Predictive Analytics ............................................................................................... 8
2.2 Classification Techniques ..................................................................................... 9
2.2.1 Binary classification ....................................................................................... 9
2.2.2Multiclass classification ................................................................................. 9
2.3 Log Loss Scoring ................................................................................................. 14
2.4 Parallel Processing ............................................................................................... 15
CHAPTER 03
Related Work
3.1 Temporal and Spectral Analysis .......................................................................... 17
3.2 Prediction using Clustering and Classification techniques .................................. 18
3.3 Hotspot Detection................................................................................................. 19
CHAPTER 4
Design and Implementation
4.1 Overview of the dataset ........................................................................................ 21
4.2 Data Preprocessing ............................................................................................... 22
4.2.1 Preprocessing using sklearn ........................................................................ 22
4.2.2 Techniques used for preprocessing .............................................................. 23
4.2.2.1 Data Cleaning ....................................................................................... 23
4.2.2.2 Data Transformation ............................................................................. 24
4.2.2.3 Data Reduction ..................................................................................... 26
4.3 Software and Technologies Used ......................................................................... 26
CHAPTER 5
Experimental Results
5.1 Comparison of this approach with existing results .............................................. 28
5.2 Results of Graphical Analysis .............................................................................. 30
5.3 Methodology and Results ..................................................................................... 38
5.3.1 import necessary modules ........................................................................... 38
5.3.2 Preparation of the dataset ............................................................................. 38
5.3.2.1 Importing the data set ........................................................................... 38
5.3.2.2 Data Exploration ................................................................................... 38
5.3.2.3 Data Cleaning ....................................................................................... 40
5.3.2.4 Data Reduction ..................................................................................... 42
5.3.3 Pre- processing of the dataset using SKlearn ............................................... 45
5.3.3.1 Importing the data set ........................................................................... 45
5.3.3.2 Drop the unnecessary features .............................................................. 45
5.3.3.3 Convert Time to datetime format (numeric format) ............................ 45
5.3.3.3.1 Extract Hour from Time .................................................................. 46
5.3.3.4 convert Date to datetime format ........................................................... 46
5.3.3.4.1 Extract Year from Date ................................................................... 46
5.3.3.4.2 Extract Month from Date ................................................................... 47
5.3.3.4.3 Extract Day from Date ....................................................................... 47
5.3.3.4.4 Encode PdDistrict .............................................................................. 47
5.3.3.5 Build new array and create train data and train label ........................... 48
5.3.3.5.1 Create response 'category' ............................................................... 48
5.3.3.5.2 Create feature dataframe ................................................................ 49
5.3.3.6 Split the dataset into crime train data and crime test data .................... 49
5.3.3.7 The classifiers used to calculate the log_loss of the crime ................... 49
5.3.3.7.1 Random Forest ................................................................................ 49
5.3.3.7.2 Gaussian Naive Bayes ..................................................................... 49
5.3.3.7.3 Logistic Regression ......................................................................... 50
5.3.3.7.4 Nearest neighbors ............................................................................ 50
5.3.3.7.5 Decision Tree .................................................................................. 50
5.3.3.7.6 Support Vector Machine ................................................................. 50
5.3.3.8 Plotting the comparison of Log Loss using Histogram ........................ 51
5.3.3.8 Distribution of Longitude and Latitude in San Francisco map............. 51
5.3.3.9 Hotspot different crimes densities ........................................................ 52
CHAPTER 06
Conclusion and Future Work ............................................................................................... 55
LIST OF REFERENCES ........................................................................................... 56
|
Côte titre : |
MAI/0333 |
En ligne : |
https://drive.google.com/file/d/1nSq_f6zmecLXYj2X8P2rRmz4LGzwpfpa/view?usp=shari [...] |
Format de la ressource électronique : |
pdf |
|