# Choosing Best Locations to Open an Indian Restaurant in New York <a name="top"></a>

***Tarun Kamboj***

---

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

<a name="introduction"></a>

## Introduction: Business Problem 

Starting a restaurant business is no less than a herculean task. In a market where everybody wants to open a restaurant there are few things which restaurateurs have to keep in mind before starting up the business which includes cuisine, ambience, quality, services and location. Choosing a happening place could help restaurateurs keep an upper hand in the market. A happening place would grant you with happening consumers. Not everyone belongs to everywhere. Same, not every place is for restaurant. It is really difficult than some people think it might. A place could take you to the heights or could even turn into dud. Like, how would you reach to the consumers if itâ€™s not in the right place? No consumers mean No business. It is as straight as that.

In this work, we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening an **Indian restaurant** in **New York**.

Since there are lots of restaurants in New York we will try to detect **locations that are not already crowded with restaurants**. We are also particularly interested in **areas with no Indian restaurants**. We would also prefer locations **as close to city center as possible**, assuming that first two conditions are met.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

[Go to Top](#top) <a name="data"></a>

## Data 

To choose the best location for our purpose we will need the data of all neighborhoods in New York. New York has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood.

Luckily, the dataset for neighbourhoods exists for free on the web at https://geo.nyu.edu/catalog/nyu_2451_34572

After we get this data, we will explore the popular venues in each neighbourhood using foursquare API and then choose the ones having high demands of restaurant. 
Based on definition of our problem, factors that will influence our decission are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to Indian restaurants in the neighborhood, if any
* distance of neighborhood from city center

Before we get the data and start exploring it, let's import all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library
from folium.plugins import HeatMap

print('Libraries imported.')

Libraries imported.


#### Load and explore the data

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [5]:
neighborhoods_data = newyork_data['features']

Let's take a look at the first item in this list.

In [6]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [7]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Take a look at the empty dataframe to confirm that the columns are as intended.

In [8]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [9]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Quickly examine the resulting dataframe.

In [10]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


And make sure that the dataset has all 5 boroughs and 306 neighborhoods.

In [11]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


Get the latitude and longitude values of New York City. I used google to get the coordinates beacuse it's more efficient and convienient to get coordinates of a singal place, rather than using some geo location library.

In [12]:
latitude = 40.7127281
longitude = -74.0060152
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


Let's visualise these neighbourhoods on the map using `Folium`. 

In [14]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

**Note** the blue markers are marked on the center of neighborhood and not the entire neighborhood.

#### Define Foursquare Credentials and Version

In [1]:
# Code is hidden due to privacy

Let's define a funtion to get all the restaurant in all the neighbourhoods using foursquare API calls.

In [17]:
def getRestaurants(names, latitudes, longitudes, radius=500, LIMIT=100):
    print('Getting Data ')
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print('.',end="")
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    restaurants = nearby_venues[nearby_venues['Venue Category'].str.contains("Restaurant")]
    print('done.')
    return(restaurants)

Now let's use this function to get the data of Restaurants.

In [18]:
restaurants = getRestaurants(names=neighborhoods['Neighborhood'],
                             latitudes=neighborhoods['Latitude'],
                             longitudes=neighborhoods['Longitude']
                            )

Getting Data 
..................................................................................................................................................................................................................................................................................................................done.


Deleting all the duplicates if any.

In [19]:
restaurants.drop_duplicates(inplace=True)

Let's view the restaurants we got and how many we got.

In [20]:
print(restaurants.shape)
restaurants.head()

(2520, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
15,Co-op City,40.874294,-73.829939,Arby's,40.870411,-73.828606,Fast Food Restaurant
16,Co-op City,40.874294,-73.829939,Townhouse Restaurant,40.876086,-73.828868,Restaurant
18,Co-op City,40.874294,-73.829939,Kennedy's,40.876807,-73.829627,Fast Food Restaurant
21,Eastchester,40.887556,-73.827806,Fish & Ting,40.885656,-73.829197,Caribbean Restaurant
24,Eastchester,40.887556,-73.827806,Dyre Fish Market,40.889318,-73.831453,Seafood Restaurant


In [23]:
print(indian_restaurants.shape)
indian_restaurants.head()

(73, 8)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Indian
0,Woodlawn,40.898273,-73.867315,Curry Spot,40.897625,-73.867147,Indian Restaurant,True
1,Parkchester,40.837938,-73.856003,Al-Aqsa Restaurant,40.836345,-73.854888,South Indian Restaurant,True
2,Unionport,40.829774,-73.850535,Melanies Roti Bar And Grill,40.833293,-73.85104,Indian Restaurant,True
3,Bay Ridge,40.625801,-74.030621,Bombay Grill,40.622371,-74.031799,Indian Restaurant,True
4,Greenpoint,40.730201,-73.954241,Agra Taj Mahal,40.733321,-73.954928,Indian Restaurant,True


So we got ~2500 restaurants which is a lot! But the DataFrame is looking nice.

Now let's assign a new column to the dataframe which will tell whether the restaurant is Indian or not?

In [21]:
restaurants['Indian'] = restaurants['Venue Category'].str.contains("Indian")

lets make a new dataframe for indian restraunts.

In [22]:
indian_restaurants = restaurants[restaurants['Venue Category'].str.contains("Indian")]

restaurants.reset_index(drop=True, inplace=True)
indian_restaurants.reset_index(drop=True, inplace=True)

Quickly examine the resulting dataframe.

Let's see how many of the restaurants are Indian restaurants.

In [24]:
print('Total number of restaurants:', len(restaurants))
print('Total number of Indian restaurants:', len(indian_restaurants))
print('Percentage of Indian restaurants: {:.2f}%'.format(len(indian_restaurants) / len(restaurants) * 100))

Total number of restaurants: 2520
Total number of Indian restaurants: 73
Percentage of Indian restaurants: 2.90%


Let's now see all the collected restaurants in our area of interest on map, and let's also show Indian restaurants in different color.

In [27]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=11)

In [28]:
# add markers to map
for lat, lng, venue, neighborhood, is_indian in zip(restaurants['Venue Latitude'], restaurants['Venue Longitude'], restaurants['Venue'], restaurants['Neighborhood'],restaurants['Indian']):
    label = '{}, {}'.format(venue, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color = 'red' if is_indian else 'blue',
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Looking good. So now we have all the restaurants in New York, and we know which ones are Indian restaurants! 

This concludes the data gathering phase - we're now ready to use this data for analysis to produce the report on optimal locations for a new Indian restaurant!

[Go to Top](#top) <a name="methodology"></a>

## Methodology 

In this project we will direct our efforts on detecting areas of New York that have low restaurant density, particularly those with low number of Indian restaurants.

In first step we have collected the required **data: location and type (category) of every restaurant in New York**. We have also **identified Indian restaurants** (according to Foursquare categorization).

Second step in our analysis will be calculation and exploration of '**restaurant density**' across different areas of New York - we will use `Heatmaps` to identify a few promising areas close to busy city centers with low number of restaurants in general (*and* no Indian restaurants in vicinity) and focus our attention on those areas.

[Go to Top](#top) <a name="analysis"></a> 

## Analysis

After surfing the internet a little, i found that `Midtown Manhattan` is the busiest neighborhood in New York beacause it have various tourist destinations and business hubs, so it makes sense that we should focus on `Midtown Manhattan`.

Let's get the coordinates of Midtown Manhattan

In [29]:
# Coordinates are taken from google.
latitude = 40.7549
longitude = -73.9840
print('The geograpical coordinate of Midtown Manhattan City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Midtown Manhattan City are 40.7549, -73.984.


Let's visualize the area of Midtown Manhattan and see all the restaurants in that area.

In [30]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, venue, neighborhood, is_indian in zip(restaurants['Venue Latitude'], restaurants['Venue Longitude'], restaurants['Venue'], restaurants['Neighborhood'],restaurants['Indian']):
    label = '{}, {}'.format(venue, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color = 'red' if is_indian else 'blue',
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Let's crete a map showing **heatmap / density of restaurants** and try to extract some meaningfull info from that.

In [31]:
# making separate 2-D list for latitudes and longitudes of restaurants for heatmap
lats = restaurants['Venue Latitude']
lngs = restaurants['Venue Longitude']
latlngs = [[x,y] for x,y in zip(lats,lngs)]

# making separate 2-D list for latitudes and longitudes of indian restaurants for heatmap
lats = indian_restaurants['Venue Latitude']
lngs = indian_restaurants['Venue Longitude']
indian_latlngs = [[x,y] for x,y in zip(lats,lngs)]

In [32]:
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=13)

folium.TileLayer('cartodbpositron').add_to(map_newyork) #cartodbpositron cartodbdark_matter
HeatMap(latlngs).add_to(map_newyork)
folium.Marker([latitude, longitude]).add_to(map_newyork)

map_newyork

Whew! that's really hot!

Looks like a few pockets of low restaurant density closest to city center can be found **south-west and north-east from center**. 

Let's create another heatmap map showing **heatmap/density of Indian restaurants** only.

In [33]:
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=13)

folium.TileLayer('cartodbpositron').add_to(map_newyork) #cartodbpositron cartodbdark_matter
HeatMap(indian_latlngs).add_to(map_newyork)
folium.Marker([latitude, longitude]).add_to(map_newyork)

map_newyork

Now this one's cool ;)

Indian restaurants represent a subset of ~3% of all restaurants, so luckily we only have a little density of Indian restaurants directly south and east from centre, with **No Indian restaurant density positioned north-east and south-west from center**.

Based on this we can now focus on areas *north-east and south-west from center* for our purpose. 

In [34]:
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=14)

folium.TileLayer('cartodbpositron').add_to(map_newyork) #cartodbpositron cartodbdark_matter
HeatMap(latlngs).add_to(map_newyork)
folium.Marker([latitude, longitude]).add_to(map_newyork)
folium.CircleMarker(
    [40.751478, -73.991834],
    radius=35,
    color = 'blue',
    fill=True,
    fill_color='blue',
    fill_opacity=0.4,
    parse_html=False).add_to(map_newyork)  
folium.CircleMarker(
    [40.756555, -73.974755],
    radius=30,
    color = 'blue',
    fill=True,
    fill_color='blue',
    fill_opacity=0.4,
    parse_html=False).add_to(map_newyork)  

map_newyork

Locations marked with `blue circles` are good for opening an **Indian Restaurant**.

[Go to Top](#top)  <a name="results"></a>

## Results and Discussion

Our analysis shows that although there is a great number of restaurants in New York (~2500 in our initial area of interest), there are pockets of low restaurant density too. Highest concentration of restaurants was detected in downtown neighbourhoods of Manhattan, so we focused our attention to Midtown Manhattan which is also the busiest area when it comes tourists and business travelers. 

After directing our attention to this more narrow area of interest, we first created a dense grid of restaurants, and focused on those areas which were having low density of restaurants and literally no Indian restaurants.

Result of all this was 2 zones containing largest number of potential new restaurant locations based on number of and distance to existing venues - both restaurants in general and Indian restaurants particularly. This, of course, does not imply that those zones are actually optimal locations for a new restaurant! Purpose of this analysis was to only provide info on areas close to busy centers of the city, but not crowded with existing restaurants (particularly Indian) - it is entirely possible that there is a very good reason for small number of restaurants in any of those areas, reasons which would make them unsuitable for a new restaurant regardless of lack of competition in the area. Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

[Go to Top](#top) <a name="conclusion"></a>

## Conclusion

Purpose of this project was to identify New York areas popular with tourists and business travellers but have a low number of or no restaurants (particularly Indian restaurants) in order to aid stakeholders in narrowing down the search for optimal location for a new Indian restaurant. By calculating restaurant density distribution from Foursquare data we have first identified general neighborhoods that justify further analysis, and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby restaurants. After which major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were created to be used as starting points for final exploration by stakeholders.

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.

## Thanks for Reading :)