{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Visualizing Crime Rate in SanFrancisco \n", "\n", "***Tarun Kamboj***\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table of contents\n", "* [Introduction](#introduction)\n", "* [Data](#data)\n", "* [Analysis](#analysis)\n", "* [Results](#results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this work, we will visualize the crime rates in San Francisco using `Choropleth` map.\n", "\n", "A `Choropleth` map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income. The choropleth map provides an easy way to visualize how a measurement varies across a geographic area or it shows the level of variability within a region." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## Data " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For our purpose, we will use the dataset of crimes, maintained by San Francisco Police Department for the year 2016 which I've downloaded from [here](https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Data_Files/Police_Department_Incidents_-_Previous_Year__2016_.csv).\n", "To create `Choropleth` map of San Francisco we'll need a `GeoJSON` file that marks the boundaries of the different neighborhoods in San Francisco, which I've downloaded from [here](https://cocl.us/sanfran_geojson).\n", "\n", "Each row consists of 13 features:\n", "> 1. **IncidntNum**: Incident Number\n", "> 2. **Category**: Category of crime or incident\n", "> 3. **Descript**: Description of the crime or incident\n", "> 4. **DayOfWeek**: The day of week on which the incident occurred\n", "> 5. **Date**: The Date on which the incident occurred\n", "> 6. **Time**: The time of day on which the incident occurred\n", "> 7. **PdDistrict**: The police department district\n", "> 8. **Resolution**: The resolution of the crime in terms whether the perpetrator was arrested or not\n", "> 9. **Address**: The closest address to where the incident took place\n", "> 10. **X**: The longitude value of the crime location \n", "> 11. **Y**: The latitude value of the crime location\n", "> 12. **Location**: A tuple of the latitude and the longitude values\n", "> 13. **PdId**: The police department ID" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we get the data and start exploring it, let's download all the dependencies that we will need." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import folium #for plotting maps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's read the data using `read_csv` method." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IncidntNumCategoryDescriptDayOfWeekDateTimePdDistrictResolutionAddressXYLocationPdId
0120058272WEAPON LAWSPOSS OF PROHIBITED WEAPONFriday01/29/2016 12:00:00 AM11:00SOUTHERNARREST, BOOKED800 Block of BRYANT ST-122.40340537.775421(37.775420706711, -122.403404791479)12005827212120
1120058272WEAPON LAWSFIREARM, LOADED, IN VEHICLE, POSSESSION OR USEFriday01/29/2016 12:00:00 AM11:00SOUTHERNARREST, BOOKED800 Block of BRYANT ST-122.40340537.775421(37.775420706711, -122.403404791479)12005827212168
2141059263WARRANTSWARRANT ARRESTMonday04/25/2016 12:00:00 AM14:59BAYVIEWARREST, BOOKEDKEITH ST / SHAFTER AV-122.38885637.729981(37.7299809672996, -122.388856204292)14105926363010
3160013662NON-CRIMINALLOST PROPERTYTuesday01/05/2016 12:00:00 AM23:50TENDERLOINNONEJONES ST / OFARRELL ST-122.41297137.785788(37.7857883766888, -122.412970537591)16001366271000
4160002740NON-CRIMINALLOST PROPERTYFriday01/01/2016 12:00:00 AM00:30MISSIONNONE16TH ST / MISSION ST-122.41967237.765050(37.7650501214668, -122.419671780296)16000274071000
\n", "
" ], "text/plain": [ " IncidntNum Category Descript \\\n", "0 120058272 WEAPON LAWS POSS OF PROHIBITED WEAPON \n", "1 120058272 WEAPON LAWS FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE \n", "2 141059263 WARRANTS WARRANT ARREST \n", "3 160013662 NON-CRIMINAL LOST PROPERTY \n", "4 160002740 NON-CRIMINAL LOST PROPERTY \n", "\n", " DayOfWeek Date Time PdDistrict Resolution \\\n", "0 Friday 01/29/2016 12:00:00 AM 11:00 SOUTHERN ARREST, BOOKED \n", "1 Friday 01/29/2016 12:00:00 AM 11:00 SOUTHERN ARREST, BOOKED \n", "2 Monday 04/25/2016 12:00:00 AM 14:59 BAYVIEW ARREST, BOOKED \n", "3 Tuesday 01/05/2016 12:00:00 AM 23:50 TENDERLOIN NONE \n", "4 Friday 01/01/2016 12:00:00 AM 00:30 MISSION NONE \n", "\n", " Address X Y \\\n", "0 800 Block of BRYANT ST -122.403405 37.775421 \n", "1 800 Block of BRYANT ST -122.403405 37.775421 \n", "2 KEITH ST / SHAFTER AV -122.388856 37.729981 \n", "3 JONES ST / OFARRELL ST -122.412971 37.785788 \n", "4 16TH ST / MISSION ST -122.419672 37.765050 \n", "\n", " Location PdId \n", "0 (37.775420706711, -122.403404791479) 12005827212120 \n", "1 (37.775420706711, -122.403404791479) 12005827212168 \n", "2 (37.7299809672996, -122.388856204292) 14105926363010 \n", "3 (37.7857883766888, -122.412970537591) 16001366271000 \n", "4 (37.7650501214668, -122.419671780296) 16000274071000 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv('Police_Department_Incidents_SanFrancisco_Year_2016_.csv')\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " \n", "\n", "## Analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at the columns of the DataFrame." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['IncidntNum', 'Category', 'Descript', 'DayOfWeek', 'Date', 'Time',\n", " 'PdDistrict', 'Resolution', 'Address', 'X', 'Y', 'Location', 'PdId'],\n", " dtype='object')" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's find out which column contains the area information." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "0 SOUTHERN\n", "1 SOUTHERN\n", "2 BAYVIEW\n", "3 TENDERLOIN\n", "4 MISSION\n", "Name: PdDistrict, dtype: object" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['PdDistrict'].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks like the column `PdDistrict` contains the area information. Now let's group the dataframe w.r.t. `PdDistrict` column." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IncidntNumCategoryDescriptDayOfWeekDateTimeResolutionAddressXYLocationPdId
PdDistrict
BAYVIEW143031430314303143031430314303143031430314303143031430314303
CENTRAL176661766617666176661766617666176661766617666176661766617666
INGLESIDE115941159411594115941159411594115941159411594115941159411594
MISSION195031950319503195031950319503195031950319503195031950319503
NORTHERN201002010020100201002010020100201002010020100201002010020100
PARK869986998699869986998699869986998699869986998699
RICHMOND892289228922892289228922892289228922892289228922
SOUTHERN284452844528445284452844528445284452844528445284452844528445
TARAVAL113251132511325113251132511325113251132511325113251132511325
TENDERLOIN994299429942994299429942994299429942994299429942
\n", "
" ], "text/plain": [ " IncidntNum Category Descript DayOfWeek Date Time \\\n", "PdDistrict \n", "BAYVIEW 14303 14303 14303 14303 14303 14303 \n", "CENTRAL 17666 17666 17666 17666 17666 17666 \n", "INGLESIDE 11594 11594 11594 11594 11594 11594 \n", "MISSION 19503 19503 19503 19503 19503 19503 \n", "NORTHERN 20100 20100 20100 20100 20100 20100 \n", "PARK 8699 8699 8699 8699 8699 8699 \n", "RICHMOND 8922 8922 8922 8922 8922 8922 \n", "SOUTHERN 28445 28445 28445 28445 28445 28445 \n", "TARAVAL 11325 11325 11325 11325 11325 11325 \n", "TENDERLOIN 9942 9942 9942 9942 9942 9942 \n", "\n", " Resolution Address X Y Location PdId \n", "PdDistrict \n", "BAYVIEW 14303 14303 14303 14303 14303 14303 \n", "CENTRAL 17666 17666 17666 17666 17666 17666 \n", "INGLESIDE 11594 11594 11594 11594 11594 11594 \n", "MISSION 19503 19503 19503 19503 19503 19503 \n", "NORTHERN 20100 20100 20100 20100 20100 20100 \n", "PARK 8699 8699 8699 8699 8699 8699 \n", "RICHMOND 8922 8922 8922 8922 8922 8922 \n", "SOUTHERN 28445 28445 28445 28445 28445 28445 \n", "TARAVAL 11325 11325 11325 11325 11325 11325 \n", "TENDERLOIN 9942 9942 9942 9942 9942 9942 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('PdDistrict').count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By looking at the above result we can see that the column `IncidntNum` contains the total no. of incidents in the coresponding area. So let's only take the `\n", "PdDistrict` and `IncidntNum` columns into account." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IncidntNum
PdDistrict
BAYVIEW14303
CENTRAL17666
INGLESIDE11594
MISSION19503
NORTHERN20100
PARK8699
RICHMOND8922
SOUTHERN28445
TARAVAL11325
TENDERLOIN9942
\n", "
" ], "text/plain": [ " IncidntNum\n", "PdDistrict \n", "BAYVIEW 14303\n", "CENTRAL 17666\n", "INGLESIDE 11594\n", "MISSION 19503\n", "NORTHERN 20100\n", "PARK 8699\n", "RICHMOND 8922\n", "SOUTHERN 28445\n", "TARAVAL 11325\n", "TENDERLOIN 9942" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = df.groupby('PdDistrict').count().iloc[:, [0]]\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks nice. Let's reset the indexes and make `PdDistrict` a regular column." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PdDistrictIncidntNum
0BAYVIEW14303
1CENTRAL17666
2INGLESIDE11594
3MISSION19503
4NORTHERN20100
5PARK8699
6RICHMOND8922
7SOUTHERN28445
8TARAVAL11325
9TENDERLOIN9942
\n", "
" ], "text/plain": [ " PdDistrict IncidntNum\n", "0 BAYVIEW 14303\n", "1 CENTRAL 17666\n", "2 INGLESIDE 11594\n", "3 MISSION 19503\n", "4 NORTHERN 20100\n", "5 PARK 8699\n", "6 RICHMOND 8922\n", "7 SOUTHERN 28445\n", "8 TARAVAL 11325\n", "9 TENDERLOIN 9942" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.reset_index(inplace=True)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Great! now we are all set to plot are results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Map Visualization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use `Folium` to create `Choropleth` maps, for which we will require a `geo jason` file, which I've already downloaded.\n", "\n", "Let's first create a simple map of San Francisco" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sf_geo = r'san-francisco.geojson' # geojson file\n", "\n", "# San Francisco latitude and longitude values\n", "latitude = 37.77\n", "longitude = -122.42\n", "\n", "# create map \n", "sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)\n", "\n", "# display the map of San Francisco\n", "sanfran_map\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now create the Choropleth map." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\kambo\\Anaconda3\\lib\\site-packages\\folium\\folium.py:415: FutureWarning: The choropleth method has been deprecated. Instead use the new Choropleth class, which has the same arguments. See the example notebook 'GeoJSON_and_choropleth' for how to do this.\n", " FutureWarning\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Using cartodbpositron (cartodbdark_matter) to make the background white\n", "folium.TileLayer('cartodbpositron').add_to(sanfran_map) \n", "\n", "# generate choropleth map \n", "sanfran_map.choropleth(\n", " geo_data=sf_geo,\n", " data=df,\n", " columns=['PdDistrict', 'IncidntNum'],\n", " key_on='feature.properties.DISTRICT',\n", " fill_color='RdPu', \n", " fill_opacity=0.7, \n", " line_opacity=0.2,\n", " legend_name='Crime Rate in San Francisco'\n", ")\n", "\n", "# display map\n", "sanfran_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## Results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks like most crimes occurs in the **Treasure Island** and **Soma** district." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Thank You for Reading!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }