DBA | Pred_SCORE |
---|---|
’CESCA | 13 |
’ESSEN | 11 |
’RITAS | 13 |
’WICHCRAFT | 13 |
#1 Chinese Restaurant | 12 |
#1 GARDEN CHINESE RESTAURANT | 13 |
#1 NATURAL JUICE BAR | 13 |
#1 SABOR LATINO RESTAURANT | 4 |
$1 PIZZA | 13 |
$1.25 PIZZA | 13 |
% SHAO BIN ZHENG | 12 |
& PIZZA | 11 |
&PIZZA | 13 |
© 2020. The Data Incubator. All Rights Reserved.
DBA | No_of_Violations |
---|---|
DUNKIN’ | 3070 |
SUBWAY | 2319 |
STARBUCKS | 1668 |
MCDONALD’S | 1536 |
KENNEDY FRIED CHICKEN | 1196 |
CROWN FRIED CHICKEN | 973 |
DUNKIN’, BASKIN ROBBINS | 833 |
BURGER KING | 747 |
POPEYES | 721 |
GOLDEN KRUST CARIBBEAN BAKERY & GRILL | 677 |
CHIPOTLE MEXICAN GRILL | 604 |
LE PAIN QUOTIDIEN | 511 |
DOMINO’S | 461 |
NA | 418 |
KFC | 401 |
CHECKERS | 373 |
WENDY’S | 333 |
PRET A MANGER | 327 |
VIVI BUBBLE TEA | 312 |
JUST SALAD | 294 |
JOE & THE JUICE | 288 |
BREAD & BUTTER | 270 |
BLUESTONE LANE | 258 |
TEXAS CHICKEN & BURGERS | 232 |
BAREBURGER | 230 |
© 2020. The Data Incubator. All Rights Reserved.
STREET | No_of_Violations |
---|---|
BROADWAY | 11494 |
3 AVENUE | 8838 |
2 AVENUE | 6833 |
5 AVENUE | 6823 |
8 AVENUE | 5635 |
1 AVENUE | 5275 |
7 AVENUE | 4708 |
LEXINGTON AVENUE | 4461 |
AMSTERDAM AVENUE | 4455 |
9 AVENUE | 4346 |
FLATBUSH AVENUE | 3131 |
NOSTRAND AVENUE | 2957 |
FULTON STREET | 2837 |
4 AVENUE | 2811 |
GRAND STREET | 2586 |
86 STREET | 2031 |
MADISON AVENUE | 2009 |
WESTCHESTER AVENUE | 1950 |
BEDFORD AVENUE | 1927 |
CONEY ISLAND AVENUE | 1913 |
CHURCH AVENUE | 1889 |
10 AVENUE | 1774 |
6 AVENUE | 1768 |
COLUMBUS AVENUE | 1764 |
FOREST AVENUE | 1750 |
© 2020. The Data Incubator. All Rights Reserved.
ZIPCODE | No_of_Violations |
---|---|
10003 | 10221 |
10019 | 9212 |
10013 | 8494 |
10002 | 8028 |
10036 | 7875 |
10001 | 6891 |
10016 | 6745 |
11220 | 6709 |
10022 | 6384 |
10012 | 6327 |
10011 | 6274 |
11201 | 5720 |
10014 | 5630 |
10018 | 5249 |
11211 | 5169 |
10017 | 5060 |
11215 | 4912 |
10009 | 4392 |
11209 | 4265 |
11217 | 3731 |
11237 | 3673 |
10025 | 3626 |
10010 | 3513 |
11238 | 3433 |
10029 | 3338 |
© 2020. The Data Incubator. All Rights Reserved.
The UBCF and IBCF collaborative models were used to generate the Food Score Recommender system. The following is a breakdown of both approaches including validation metrics from both models. The UBCF: This method produces recommendations based on user-based collaborative filtering. The IBCF: This method produces recommendations based on item-based collaborative filtering.
Sparse Matrix Object
The sparse matrix object used in the model development. It consists of filtered DBA with no critical flags, only A grades, and scores less than 20.
[1] "realRatingMatrix"
attr(,"package")
[1] "recommenderlab"
The following code segment builds a model using the POPULAR method with the first 100 data points to issue three (TopN = 5) recommendations for a new DBA.
[1] "POPULAR"
The number of ratings from the POPULAR method.
[1] 11268
$`'CESCA`
[1] 2.802913
$`'ESSEN`
[1] 5.678246
$`'RITAS`
[1] 2.802913
$`'WICHCRAFT`
[1] 4.802913
$`#1 Chinese Restaurant`
[1] 3.148508
$`#1 GARDEN CHINESE RESTAURANT`
[1] 2.802913
$`#1 NATURAL JUICE BAR`
[1] 4.136246
$`#1 SABOR LATINO RESTAURANT`
[1] 2.672806
$`$1 PIZZA`
[1] 2.802913
$`$1.25 PIZZA`
[1] 4.802913
$`% SHAO BIN ZHENG`
[1] 3.981842
$`& PIZZA`
[1] 3.678246
$`&PIZZA`
[1] 2.802913
[1] "0" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "17" "21"
$`'CESCA`
[1] 13
$`'ESSEN`
[1] 11
$`'RITAS`
[1] 13
$`'WICHCRAFT`
[1] 13
$`#1 Chinese Restaurant`
[1] 12
$`#1 GARDEN CHINESE RESTAURANT`
[1] 13
$`#1 NATURAL JUICE BAR`
[1] 13
$`#1 SABOR LATINO RESTAURANT`
[1] 4
$`$1 PIZZA`
[1] 13
$`$1.25 PIZZA`
[1] 13
$`% SHAO BIN ZHENG`
[1] 12
$`& PIZZA`
[1] 11
$`&PIZZA`
[1] 13
Collaborative Model Creation
The model created is called IBCF or Item Based Collaborative Filtering trained with 5000 user ratings. The IBCF is similar to the UBCF or User Based Collaborative Filtering but allows for a less memory intensive approach without having to load the entire user database in memory, locally.
Essentially, this model computes internally the cosine similarity between all users represented as vectors, which in R is as simple as:
crossprod(a,b)/sqrt(crossprod(a)*crossprod(b))
Reading 5,000 ratings using the IBCF collaborative model, then predicting from second indexed DBA.
$`'ESSEN`
[1] "0" "2" "4" "11" "5"
recommended.items.2.top3 <- bestN(recommended.items.2, n = 3)
# Display Top 3
as(recommended.items.2.top3, "list")
$`'ESSEN`
[1] "0" "2" "4"
Create evaluation scheme splitting taking 90% of the date for training and leaving 10% for validation or test
e <- evaluationScheme(dba_scores_grade[1:1000], method="split", train=0.9, given=1)
# Creation of recommender model based on IBFC
Rec.ibcf <- Recommender(getData(e, "train"), "IBCF")
# Predictions on the test data set
p.ibcf <- predict(Rec.ibcf, getData(e, "known"), type="ratings")
# Obtaining the error metrics
error.ibcf <- calcPredictionAccuracy(p.ibcf, getData(e, "unknown"))
#In user-based collaborative filtering (UBCF) the procedure is to first find other users that are similar to a given user, then find the top-rated items purchased by those users. Those items are then recommended for the given user.
vector_ratings <- as.vector(dba_scores_grade@data)
kable(table(vector_ratings), caption="Rating frequency")
vector_ratings | Freq |
---|---|
0 | 191167 |
2 | 17713 |
4 | 11040 |
6 | 3147 |
8 | 1214 |
10 | 484 |
12 | 196 |
14 | 95 |
16 | 72 |
18 | 31 |
20 | 24 |
22 | 22 |
24 | 18 |
26 | 12 |
28 | 8 |
30 | 8 |
32 | 11 |
34 | 8 |
36 | 6 |
38 | 11 |
40 | 2 |
42 | 5 |
44 | 6 |
46 | 3 |
48 | 3 |
50 | 5 |
52 | 2 |
54 | 2 |
58 | 2 |
60 | 2 |
64 | 3 |
66 | 2 |
72 | 1 |
74 | 1 |
78 | 3 |
82 | 2 |
86 | 1 |
88 | 2 |
94 | 1 |
96 | 2 |
100 | 2 |
106 | 1 |
108 | 1 |
110 | 1 |
112 | 2 |
114 | 1 |
116 | 1 |
130 | 3 |
132 | 1 |
140 | 1 |
142 | 1 |
144 | 1 |
152 | 2 |
154 | 1 |
156 | 2 |
160 | 1 |
168 | 2 |
174 | 1 |
182 | 1 |
190 | 2 |
194 | 3 |
196 | 1 |
208 | 1 |
238 | 1 |
252 | 1 |
258 | 1 |
280 | 1 |
310 | 1 |
For splitting data into test and train sets, we can use the evaluationScheme() function in recommenderlab. It extends the usage of generic methods of splitting the data, by allowing several parameters that are specific to recommender systems. As shown in the code section below, there is a parameter specifying how many items to use for each user, and another parameter specifying the minimum value that indicates a good rating.
set.seed(21)
percent_train = 0.8
items_to_keep = 1 # items to use for each user
rating_threshold = 1 # good rating implies >=1
n_eval = 1
eval_sets = evaluationScheme(data = ratings, method = "split",
train = percent_train,
given = items_to_keep,
goodRating = rating_threshold,
k = n_eval)
eval_sets
Evaluation scheme with 1 items given
Method: 'split' with 1 run(s).
Training set proportion: 0.800
Good ratings: >=1.000000
Data set: 5425 x 12 rating matrix of class 'realRatingMatrix' with 19345 ratings.
We now build a UBCF model using the default parameters of the Recommender() function, and use it to predict using the test portion of the data set. We use library functions to evaluate accuracy of the prediction by comparing against values in the data set. Performance metrics for the UBCF model are displayed.
eval_recommender = Recommender(data = getData(eval_sets, "train"),
method = "UBCF", parameter = NULL)
items_to_recommend = 10
eval_prediction = predict(object = eval_recommender,
newdata = getData(eval_sets, "known"),
n = items_to_recommend,
type = "ratings")
eval_accuracy = calcPredictionAccuracy(x = eval_prediction,
data = getData(eval_sets, "unknown"),
byUser = TRUE)
RMSE MSE MAE
'ESSEN 1.986125 3.944691 1.5111111
#1 Chinese Restaurant 2.128673 4.531250 2.1250000
1 OR 8 2.173067 4.722222 2.1666667
104-01 FOSTER AVENUE COFFEE SHOP(UPS) 1.166424 1.360544 0.7619048
118 KITCHEN 1.056531 1.116259 0.8714286
16TH AVENUE GLATT 0.728869 0.531250 0.6250000
© 2020. The Data Incubator. All Rights Reserved.
The NYC Health Department discovered 10 foodborne illness outbreaks since 2012 using Yelp reviews.
—Columbia University
Foodborne illness outbreaks and food poisonings are increasing becoming more frequent. There are over 3,000 deaths a year in the USA due food poisoning. To combat these new trends of foodborne illness outbreaks and food poisoning, a web app was created focusing on NYC inspection violations of food establishments. Food Score is the dashboard designed to provide the cleanest restaurants in New York City by borough.
Food Score was designed and created by Kyle W. Brown as a Capstone project for the Data Incubator Summer 20’ Cohort.
The Data Incubator is a data science education company that offers an intensive, immersive, 8-week, full-time bootcamp for those with advanced STEM degrees. They provide corporate data science training and placement services. The program has four campuses: New York City, Washington DC, San Francisco, and Online. The program also offers corporate training to Fortune 500 clients.
According to Venture Beat, the program had over 1000 applicants from over 80 universities in its first round and accepted just under 3% of all applicants. The program was selected by Business Insider as one of 15 competitive programs in the world with more competitive admissions than Harvard.
Only accepting and training the best STEM post-graduates, ensures confidence that hiring partners are getting the best when working us. The Data Incubator teaches the most on-demand skills and best open-source programs preparing students to jump right in and make a difference.
Link to apply:
NYC Restaurant Inspection Violations
The data used for Food Score was exclusively from NYC OpenData website, in particular the restaurant inspection results from August 2014 until present.
The NYC Inspection Violations can be found here:
To put food safety and foodborne illness outbreaks into perspective, and why it’s a problem.
Food safety is a concern in NYC as the number of violations have significantly risen by an average of 28% over the past 3 years.
Across the USA there are 3,000 deaths a year due to food poisoning and foodborne illness outbreaks are common in NYC.
New York City has one of the highest concentration of restaurants in the world with 27,000.
Tourist revenue accounted for $44 billion in 2017.
Food safety is the central focus of the platform for consumers, tourists, or food and drink establishment revenue. Food Scores value statement is designed to drive value through tourist revenue and customer centric approach.
The main feature of Food Score is building a recommendation system that predicts companies based on no critical flags, scores of less than 20, and only A scores.
With 65 million people that traveled to NYC in 2018, accounting for $9 billion spent in food and drink establishments.
The main deliverable is building a dashboard that integrates the recommender model that produces a list of the cleanest restaurants in New York City by borough and maps it.
SALT lets you view a restaurant’s menu, make a reservation through OpenTable when applicable, and even allows you to request an Uber to any of your saved locations.
ChefsFeed capitalizes on the credibility and clout of leading professional chefs to help New Yorkers discover new spots in a social media-type network.
PopCity allows users to map any food photos they find on social media outlets like Instagram or on the Popcity discovery channel. Using Instagram’s photo copy link feature, you can immediately import a post to your Popcity map.
The end-user for Food Score is anyone looking to grab a bite to eat, whether it’s close to work, hotel, shopping, or most importantly close to Broadway in New York City. The value to the end-users would be in the form of knowing the cleanest/safest restaurants, their locations whether its by building, street, borough, or by cuisine type.
More specifically, end-users are:
Food Score was created to provide a recommendation of the cleanest restaurants in New York City by borough. The recommendation system used was Recommenderlab’s UBCF/IBCF collaborative model. The recommendations model considered companies with only grade of A, score lower than 20, and no critical flags. There is a vibrant market for Food Score in NYC with similar competitors such as ChefsFeed, PopCity, and SALT. in NYC with similar applications.
Food Score provides a new unique approach to providing clean restaurants in NYC, and could impact:
Food safety and providing awareness of food safety is the core value and mission of Food Score. By recommending the cleanest restaurants in NYC, we hope to cut down on food poisoning deaths and foodborne illness outbreaks.
© 2020. The Data Incubator. All Rights Reserved.
Food Score was designed and created by Kyle W. Brown as a Capstone project for the Data Incubator Summer 20’ Cohort.
With an M.S. in Data Science & Business Analytics concentration in Advanced Computing from Wayne State University; Kyle W. Brown is an advanced technology researcher, entreprenuer, author, and national leadership award recipient. Who’s research interests include: Accelerator hardware for datacenters, automotive embedded systems, and particle physics. Besides researching Kyle’s hobbies are volunteering for the Detroit Economic Club as a Young Leader, reading classical lit, and medicinal chemistry.
Achievements
Integrating the World with Innovative Solutions
WorldCapital Integrated Solutions, LLC. (WCIS) is an investment banking firm focusing on economic development in emerging markets. While we specialize in raising capital, corporate valuations, advanced technology, and research. We also assist clients and the community with business development, investment startegies, and open source software solutions.
Specializing in solutions for:
© 2020. The Data Incubator. All Rights Reserved.