# Visualizing Crime Rate in SanFrancisco <a name="top"></a>

***Tarun Kamboj***

---

## Table of contents
* [Introduction](#introduction)
* [Data](#data)
* [Analysis](#analysis)
* [Results](#results)

<a name="introduction"></a>

## Introduction

In this work, we will visualize the crime rates in San Francisco using `Choropleth` map.

A `Choropleth` map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income. The choropleth map provides an easy way to visualize how a measurement varies across a geographic area or it shows the level of variability within a region.

<a name="data"></a>

## Data 

For our purpose, we will use the dataset of crimes, maintained by San Francisco Police Department for the year 2016 which I've downloaded from [here](https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Data_Files/Police_Department_Incidents_-_Previous_Year__2016_.csv).
To create `Choropleth` map of San Francisco we'll need a `GeoJSON` file that marks the boundaries of the different neighborhoods in San Francisco, which I've downloaded from [here](https://cocl.us/sanfran_geojson).

Each row consists of 13 features:
> 1. **IncidntNum**: Incident Number
> 2. **Category**: Category of crime or incident
> 3. **Descript**: Description of the crime or incident
> 4. **DayOfWeek**: The day of week on which the incident occurred
> 5. **Date**: The Date on which the incident occurred
> 6. **Time**: The time of day on which the incident occurred
> 7. **PdDistrict**: The police department district
> 8. **Resolution**: The resolution of the crime in terms whether the perpetrator was arrested or not
> 9. **Address**: The closest address to where the incident took place
> 10. **X**: The longitude value of the crime location 
> 11. **Y**: The latitude value of the crime location
> 12. **Location**: A tuple of the latitude and the longitude values
> 13. **PdId**: The police department ID

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import pandas as pd
import folium #for plotting maps

Let's read the data using `read_csv` method.

In [2]:
df = pd.read_csv('Police_Department_Incidents_SanFrancisco_Year_2016_.csv')
df.head()

Unnamed: 0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId
0,120058272,WEAPON LAWS,POSS OF PROHIBITED WEAPON,Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212120
1,120058272,WEAPON LAWS,"FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE",Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212168
2,141059263,WARRANTS,WARRANT ARREST,Monday,04/25/2016 12:00:00 AM,14:59,BAYVIEW,"ARREST, BOOKED",KEITH ST / SHAFTER AV,-122.388856,37.729981,"(37.7299809672996, -122.388856204292)",14105926363010
3,160013662,NON-CRIMINAL,LOST PROPERTY,Tuesday,01/05/2016 12:00:00 AM,23:50,TENDERLOIN,NONE,JONES ST / OFARRELL ST,-122.412971,37.785788,"(37.7857883766888, -122.412970537591)",16001366271000
4,160002740,NON-CRIMINAL,LOST PROPERTY,Friday,01/01/2016 12:00:00 AM,00:30,MISSION,NONE,16TH ST / MISSION ST,-122.419672,37.76505,"(37.7650501214668, -122.419671780296)",16000274071000


<a name="analysis"></a> 

## Analysis

Let's take a look at the columns of the DataFrame.

In [3]:
df.columns

Index(['IncidntNum', 'Category', 'Descript', 'DayOfWeek', 'Date', 'Time',
       'PdDistrict', 'Resolution', 'Address', 'X', 'Y', 'Location', 'PdId'],
      dtype='object')

Let's find out which column contains the area information.

In [4]:
df['PdDistrict'].head()

0      SOUTHERN
1      SOUTHERN
2       BAYVIEW
3    TENDERLOIN
4       MISSION
Name: PdDistrict, dtype: object

Looks like the column `PdDistrict` contains the area information. Now let's group the dataframe w.r.t. `PdDistrict` column.

In [5]:
df.groupby('PdDistrict').count()

Unnamed: 0_level_0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,Resolution,Address,X,Y,Location,PdId
PdDistrict,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
BAYVIEW,14303,14303,14303,14303,14303,14303,14303,14303,14303,14303,14303,14303
CENTRAL,17666,17666,17666,17666,17666,17666,17666,17666,17666,17666,17666,17666
INGLESIDE,11594,11594,11594,11594,11594,11594,11594,11594,11594,11594,11594,11594
MISSION,19503,19503,19503,19503,19503,19503,19503,19503,19503,19503,19503,19503
NORTHERN,20100,20100,20100,20100,20100,20100,20100,20100,20100,20100,20100,20100
PARK,8699,8699,8699,8699,8699,8699,8699,8699,8699,8699,8699,8699
RICHMOND,8922,8922,8922,8922,8922,8922,8922,8922,8922,8922,8922,8922
SOUTHERN,28445,28445,28445,28445,28445,28445,28445,28445,28445,28445,28445,28445
TARAVAL,11325,11325,11325,11325,11325,11325,11325,11325,11325,11325,11325,11325
TENDERLOIN,9942,9942,9942,9942,9942,9942,9942,9942,9942,9942,9942,9942


By looking at the above result we can see that the column `IncidntNum` contains the total no. of incidents in the coresponding area. So let's only take the `
PdDistrict` and `IncidntNum` columns into account.

In [6]:
df = df.groupby('PdDistrict').count().iloc[:, [0]]
df

Unnamed: 0_level_0,IncidntNum
PdDistrict,Unnamed: 1_level_1
BAYVIEW,14303
CENTRAL,17666
INGLESIDE,11594
MISSION,19503
NORTHERN,20100
PARK,8699
RICHMOND,8922
SOUTHERN,28445
TARAVAL,11325
TENDERLOIN,9942


Looks nice. Let's reset the indexes and make `PdDistrict` a regular column.

In [7]:
df.reset_index(inplace=True)
df

Unnamed: 0,PdDistrict,IncidntNum
0,BAYVIEW,14303
1,CENTRAL,17666
2,INGLESIDE,11594
3,MISSION,19503
4,NORTHERN,20100
5,PARK,8699
6,RICHMOND,8922
7,SOUTHERN,28445
8,TARAVAL,11325
9,TENDERLOIN,9942


Great! now we are all set to plot are results.

### Map Visualization

We will use `Folium` to create `Choropleth` maps, for which we will require a `geo jason` file, which I've already downloaded.

Let's first create a simple map of San Francisco

In [8]:
sf_geo = r'san-francisco.geojson' # geojson file

# San Francisco latitude and longitude values
latitude = 37.77
longitude = -122.42

# create map 
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# display the map of San Francisco
sanfran_map


Let's now create the Choropleth map.

In [9]:
# Using cartodbpositron (cartodbdark_matter) to make the background white
folium.TileLayer('cartodbpositron').add_to(sanfran_map) 

# generate choropleth map 
sanfran_map.choropleth(
    geo_data=sf_geo,
    data=df,
    columns=['PdDistrict', 'IncidntNum'],
    key_on='feature.properties.DISTRICT',
    fill_color='RdPu', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Crime Rate in San Francisco'
)

# display map
sanfran_map



<a name="results"></a>

## Results

Looks like most crimes occurs in the **Treasure Island** and **Soma** district.

## Thank You for Reading!