# Neighborhood Analysis of Toronto - 2

***Tarun Kamboj***

---

## Table of contents
* [Exploratory Analysis](#analysis)
* [Modelling](#model)
* [Results](#results)

<a id="A"></a>

Let's import all the dependencies.

In [1]:
import numpy as np 
import pandas as pd

import folium # for generating maps
import requests # for requesting html
from bs4 import BeautifulSoup # for scraping

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

<a name="analysis"></a> 

## Exploratory Analysis

First, let's read the data we saved in the previous notebook.

In [2]:
df_tor = pd.read_csv('Toronto_Neighbourhood.csv',index_col=0)

In [3]:
df_tor.rename(columns={"Neighbourhood": "Neighborhood"}, inplace=True)
df_tor.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M6H,West Toronto,"Dufferin, Dovercourt Village",43.739015,-79.506944
1,M6J,West Toronto,"Little Portugal, Trinity",43.676357,-79.293031
2,M6K,West Toronto,"Brockton, Parkdale Village, Exhibition Place",43.659526,-79.340923
3,M6P,West Toronto,"High Park, The Junction South",43.646435,-79.374846
4,M6R,West Toronto,"Parkdale, Roncesvalles",43.669542,-79.422564


Defining the latitude and longitude of Toronto for plotting the map later in this notebook.

In [4]:
latitude = 43.693781
longitude = -79.428191
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.693781, -79.428191.


Define Foursquare Credentials and Version

In [5]:
CLIENT_ID = 'LKQ43TFWV43QUDRKV0KQTO0OX1VKUGNFRUBX0YTGBTQVP5FQ' # your Foursquare ID
CLIENT_SECRET = 'F4MI1V1BIQ4U43QG2HNRCSQ2IPDTNX5D5O1QYGLUEYTRYFYF' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: LKQ43TFWV43QUDRKV0KQTO0OX1VKUGNFRUBX0YTGBTQVP5FQ
CLIENT_SECRET:F4MI1V1BIQ4U43QG2HNRCSQ2IPDTNX5D5O1QYGLUEYTRYFYF


Defining a function that extracts the category of the venue

In [6]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Let's create a function to get nearby venues of all the neighborhoods in Toronto.

In [7]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now runing the above function on each neighborhood.

In [8]:
LIMIT = 100
radius = 500
toronto_venues = getNearbyVenues(names=df_tor['Neighborhood'],
                                   latitudes=df_tor['Latitude'],
                                   longitudes=df_tor['Longitude']
                                  )

Dufferin, Dovercourt Village
Little Portugal, Trinity
Brockton, Parkdale Village, Exhibition Place
High Park, The Junction South
Parkdale, Roncesvalles
Runnymede, Swansea
The Beaches
The Danforth West, Riverdale
India Bazaar, The Beaches West
Studio District
Business reply mail Processing Centre, South Central Letter Processing Plant Toronto
Lawrence Park
Davisville North
North Toronto West, Lawrence Park
Davisville
Moore Park, Summerhill East
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
Roselawn
Forest Hill North & West, Forest Hill Road Park
The Annex, North Midtown, Yorkville
Rosedale
St. James Town, Cabbagetown
Church and Wellesley
Regent Park, Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and

One hot encoding the venue categories for simplicity in clustering.

In [9]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,...,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0


Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [None]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()

First, let's write a function to sort the venues in descending order.

In [11]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [12]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Brockton, Parkdale Village, Exhibition Place",Café,Coffee Shop,Gastropub,Brewery,Bakery,American Restaurant,Yoga Studio,Middle Eastern Restaurant,Bookstore,Pet Store
1,"Business reply mail Processing Centre, South C...",Bus Line,Sandwich Place,Pizza Place,Creperie,Doner Restaurant,Dog Run,Discount Store,Diner,Dessert Shop,Department Store
2,"CN Tower, King and Spadina, Railway Lands, Har...",Light Rail Station,Yoga Studio,Skate Park,Garden,Gym / Fitness Center,Fast Food Restaurant,Farmers Market,Comic Shop,Park,Pizza Place
3,Central Bay Street,Coffee Shop,Grocery Store,Pharmacy,Pizza Place,Bank,Cuban Restaurant,Doner Restaurant,Dog Run,Discount Store,Diner
4,Christie,Park,Food & Drink Shop,Women's Store,Cuban Restaurant,Donut Shop,Doner Restaurant,Dog Run,Discount Store,Diner,Dessert Shop


<a id="D"></a>

<a name="model"></a> 

## Modeling

Runing *k*-means to cluster the neighborhood into 10 clusters.

In [13]:
# set number of clusters
kclusters = 10

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[::] 

array([2, 6, 2, 0, 5, 2, 4, 2, 2, 0, 1, 2, 8, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 1, 2, 0, 2, 3, 2, 2, 2, 9, 2, 2, 2, 7], dtype=int32)

In [14]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_tor

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M6H,West Toronto,"Dufferin, Dovercourt Village",43.739015,-79.506944,0.0,Shopping Mall,Park,Grocery Store,Bank,Curling Ice,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Discount Store
1,M6J,West Toronto,"Little Portugal, Trinity",43.676357,-79.293031,2.0,Health Food Store,Pub,Trail,Donut Shop,Doner Restaurant,Dog Run,Discount Store,Diner,Dessert Shop,Department Store
2,M6K,West Toronto,"Brockton, Parkdale Village, Exhibition Place",43.659526,-79.340923,2.0,Café,Coffee Shop,Gastropub,Brewery,Bakery,American Restaurant,Yoga Studio,Middle Eastern Restaurant,Bookstore,Pet Store
3,M6P,West Toronto,"High Park, The Junction South",43.646435,-79.374846,2.0,Coffee Shop,Café,Italian Restaurant,Seafood Restaurant,Hotel,Restaurant,Gym,Japanese Restaurant,Beer Bar,Pub
4,M6R,West Toronto,"Parkdale, Roncesvalles",43.669542,-79.422564,2.0,Grocery Store,Café,Park,Nightclub,Coffee Shop,Italian Restaurant,Diner,Restaurant,Candy Store,Baby Store


Droping the rows containing `NaN` values, incase some neighbourhoods couldn't get assigned with a cluster.

In [15]:
toronto_merged.dropna(axis=0, inplace=True)

Finally, let's visualize the resulting clusters

In [16]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.

In [17]:
toronto_merged[toronto_merged['Cluster Labels'] == 0].head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M6H,West Toronto,"Dufferin, Dovercourt Village",43.739015,-79.506944,0.0,Shopping Mall,Park,Grocery Store,Bank,Curling Ice,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Discount Store
5,M6S,West Toronto,"Runnymede, Swansea",43.673185,-79.487262,0.0,Grocery Store,Pizza Place,Convenience Store,Bus Line,Women's Store,Curling Ice,Donut Shop,Doner Restaurant,Dog Run,Discount Store
27,M5G,Downtown Toronto,Central Bay Street,43.782736,-79.442259,0.0,Coffee Shop,Grocery Store,Pharmacy,Pizza Place,Bank,Cuban Restaurant,Doner Restaurant,Dog Run,Discount Store,Diner


<a name="results"></a>

## Results

Look how these neighbourhoods have similar famous venues such as Grocery Store, Discount Store, Dog Run, Pizza Place, Bank, Donut Shop , Restaurants, etc. Hence, our purpose of finding neighborhoods with similar venues is fulfilled.

## Thanks for reading :)