# Neighborhood Analysis of Toronto - 1<a name="top"></a>

***Tarun Kamboj***

---

## Table of contents
* [Introduction](#introduction)
* [Data](#data)
  * [Scraping The Data](#A)
  * [Getting the Geographical Coordinates](#B)

<a name="introduction"></a>

## Introduction

Suppose you want to shift somewhere else in the city beacause you got a new job at the other end of the city but you like your current neighbourhood beacuse of the facilities avialble nearby. What'll you do then? To solve this problem I've used the concept of clustering, and clustered the neighbourhoods based on their popular nearby venues as shown in the images below. Now you can choose where you should shift your resident to so that you have to travel minimum for work while getting the same facilities as your previous neighbourhood.

<a name="data"></a>

## Data 

In order to obtain the data that is in the table of postal codes, neighborhoods and boroughs of Toronto, we'll scrap the `HTML` of the wikipedia page https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M .
Next, to get the geographical coordinates of toronto we'll use the dataset `Geospatial_Coordinates.csv` Which can be downloaded from [here](http://cocl.us/Geospatial_data).

Before we get the data and start exploring it, let's import all the dependencies.

<a id="A"></a>

In [1]:
import numpy as np 
import pandas as pd
import folium # for generating maps
import requests # for requesting html
from bs4 import BeautifulSoup # for scraping

### Scrapping The Data

1. Get the HTML

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

In [3]:
r = requests.get(url)
html_content = r.content

2. Parse the HTML

In [4]:
soup = BeautifulSoup(html_content, 'html.parser')
table = soup.table

3. Convert into Pandas Dataframe

In [5]:
data_table = pd.read_html(str(table))[0]
data_table.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


4. Drop the unneccessary rows

In [6]:
df = data_table[data_table['Borough'] != 'Not assigned']
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


5. Fix the indexes

In [7]:
df.reset_index(drop=True,inplace=True)
print('Dimension of Dataframe is ',df.shape)
df.head()

Dimension of Dataframe is  (103, 3)


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


<a id="B"></a>

### Getting the Geographical Coordinates

Since the `geocoder` was not working properly, after trying a lot i decided to use the `Geospatial_Coordinates.csv` file. Which you can download from [here](http://cocl.us/Geospatial_data).

In [8]:
df_cor = pd.read_csv('Geospatial_Coordinates.csv')

Sorting both dataframes w.r.t.`Postal Code` so that the coordinates can match.

In [9]:
df_cor.sort_values('Postal Code', ascending=True, inplace=True)
df.sort_values('Postal Code', ascending=True, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Adding the Coordinates to the main DataFrame.

In [10]:
latlangs = df_cor[['Latitude','Longitude']]
df = pd.concat([df, latlangs], axis=1, sort=False)
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.806686,-79.194353
1,M4A,North York,Victoria Village,43.784535,-79.160497
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.763573,-79.188711
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.770992,-79.216917
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.773136,-79.239476


Let's explore only the boroughs containing Toronto for simplicity.

In [11]:
df['Borough'].unique()

array(['North York', 'Downtown Toronto', 'Etobicoke', 'Scarborough',
       'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

In [12]:
df_west = df[df['Borough'] == 'West Toronto']
df_east = df[df['Borough'] == 'East Toronto']
df_central = df[df['Borough'] == 'Central Toronto']
df_downtown = df[df['Borough'] == 'Downtown Toronto']

In [13]:
df_tor = pd.concat([ df_west, df_east,df_central, df_downtown]).reset_index(drop=True)

In [14]:
df_tor.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M6H,West Toronto,"Dufferin, Dovercourt Village",43.739015,-79.506944
1,M6J,West Toronto,"Little Portugal, Trinity",43.676357,-79.293031
2,M6K,West Toronto,"Brockton, Parkdale Village, Exhibition Place",43.659526,-79.340923
3,M6P,West Toronto,"High Park, The Junction South",43.646435,-79.374846
4,M6R,West Toronto,"Parkdale, Roncesvalles",43.669542,-79.422564


Save the DataFrame into a `csv` file for further use

In [15]:
df_tor.to_csv('Toronto_Neighbourhood.csv')

Create a map of toronto with neighborhoods superimposed on top.

In [16]:
latitude = 43.693781
longitude = -79.428191
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.693781, -79.428191.


In [17]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_tor['Latitude'], df_tor['Longitude'], df_tor['Borough'], df_tor['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto



Further explorations are in the next notebook.

## Thanks for Reading :)