Introduction
Selecting a product that fits your needs is a tiring process. The process is more demanding with the evergrowing range of products. Makeup is a category that does not only have a big range of products, but also requires a consumer to select the right product according to skin type. It is not uncommon for relatives or friends to buy makeup as a gift for their loved ones; buying such gifts is a difficult process because of wide range of products and the added complexity of selecting a product that matches skin-type of the loved ones. In such a situation, a recommender system that recommends products as per skin type can take burden off the purchaser.
Dataset
Sephora’s data is used for building this recommender system. Sephora is a popular makeup brand with millions of customers all around the world. The fact that Sephora is a global brand ensures that all different types of makeup products and skin types are accounted for. While the reason for choosing Sephora’s dataset is to be able to account for wide range of makeup products and skin types, for the purpose of this assignment a smaller dataset with 1000 user ratings, 12 products and 5 skin types is chosen. However, expanding the algorithm to a larger dataset will be frictionless.
Reviews raw import
product_id
|
review_title
|
review_text
|
rating
|
age_range
|
skin_type
|
skin_tone
|
eye_color
|
reviewer_username
|
tags
|
review_id
|
P38217
|
Worth the money
|
Sometimes I stray from this cleanser, but I always come back. It takes off all makeup completely and leaves my skin feeling fresh and looking bright. The exfoliation is very, very gentle, and perfect for my slightly sensitive skin. It seems expensive, but because you need just a bit it lasts for a long, long time.
|
5
|
NA
|
normal
|
light
|
NA
|
katechatte
|
{foamy,exfoliating}
|
6611717f-2636-4756-bf36-66c81cc267a7
|
P38217
|
Great
|
I am a 41 year old African American woman with sign of hormonal aging. This product has my skin looking great, but you must use all of the other products for the full benefits.
|
5
|
NA
|
combination
|
deep
|
NA
|
snook41
|
{foamy}
|
e7d3307e-02ff-45a1-8fc3-6bd628bedd86
|
P38217
|
Great Product
|
I’m really enjoying this product. Received a sample trio of the Murad products to try first, then purchased the cleanser. Great deal at $35 as a little goes a long way! I use it nightly with my Clarisonic Mia and in the morning by itself. Leaves my skin feeling clean and smooth and appears to be helping to even out my skin tone.
|
5
|
NA
|
combination
|
olive
|
NA
|
wahinewarrior
|
{foamy,milky,exfoliating}
|
4188d728-fde6-4d06-984e-164cca2b8781
|
P38217
|
Nice, but not great for combination skin
|
I tried this cleanser at a friends house, and I was instantly in love. I have combination skin, oily t-zone with flaky, dehydrated patches near my mouth and jaw line. After washing my face with this cleanser, my skin felt hydrated and more even-textured…So the next day, I bought this cleanser! However, it has been a week and my pores very large and I’m beginning to notice black heads :( my skin feels hydrated, but I’m still noticing a bit of dryness.
|
3
|
NA
|
combination
|
fair
|
NA
|
jenlines22
|
{hydrating,creamy}
|
248c904c-6e30-4929-8228-87b03ad7a921
|
P38217
|
great moisturizer
|
leaves the skin feeling fresh and revived… just loving it
|
5
|
NA
|
dry
|
light
|
NA
|
jessea
|
{exfoliating}
|
654bdb99-9371-4440-a540-0dd2a73da339
|
P38217
|
It works
|
I am a 33 year old Latina with combination oily/normal skin. I didn’t like the marketing on this as something that was to protect skin from hormonal aging (not there yet), but I loved the creamy texture and the way it left my skin feeling soft. Also, a little really goes a long way. It’s worht the splurge.
|
5
|
NA
|
combination
|
medium
|
NA
|
Anonymous
|
{milky}
|
35eadf21-f589-4e57-b149-af59a2e8fe07
|
Data Preparation
The original dataset is stripped down to columns that are required for the purpose of building a recommendation engine. Following columns are selected:
- User_id
- User_name
- Rating
- Product
- Skin_type
Inorder to be able to build a recommendation system that recommends products as per skin type, the five skin types are transformed in to dummy columns. Therfore, the prepared dataset is as follows:
Reviews raw import
user_id
|
user_name
|
rating
|
product
|
skin_type_
|
skin_type_combination
|
skin_type_dry
|
skin_type_normal
|
skin_type_oily
|
6611717f-2636-4756-bf36-66c81cc267a7
|
katechatte
|
5
|
P38217
|
0
|
0
|
0
|
1
|
0
|
e7d3307e-02ff-45a1-8fc3-6bd628bedd86
|
snook41
|
5
|
P38217
|
0
|
1
|
0
|
0
|
0
|
4188d728-fde6-4d06-984e-164cca2b8781
|
wahinewarrior
|
5
|
P38217
|
0
|
1
|
0
|
0
|
0
|
248c904c-6e30-4929-8228-87b03ad7a921
|
jenlines22
|
3
|
P38217
|
0
|
1
|
0
|
0
|
0
|
654bdb99-9371-4440-a540-0dd2a73da339
|
jessea
|
5
|
P38217
|
0
|
0
|
1
|
0
|
0
|
35eadf21-f589-4e57-b149-af59a2e8fe07
|
Anonymous
|
5
|
P38217
|
0
|
1
|
0
|
0
|
0
|
Real Rating Matrix
The matrix shows users in rows and products in columns. A numeric value corresponding to a particular user and product implies how the product was rated by a particular user. For example, user 00e8748d-1763-490f-8076-a9125cbaa4b3 gave a rating of 4 to product P382292.
## 5 x 5 sparse Matrix of class "dgCMatrix"
## P38217 P382204 P382292 P382353
## 0010647f-8326-4e71-b9eb-7e21f4add1dd . . . .
## 002feeba-f7ee-453c-80ed-2e5a1aa44dc1 . . . .
## 007e93f4-bd3e-479c-a8ab-0b796c0167be . . . .
## 00e8748d-1763-490f-8076-a9125cbaa4b3 . . 4 .
## 00ed93b3-d077-4e04-9579-078393fe580a 5 . . .
## P382354
## 0010647f-8326-4e71-b9eb-7e21f4add1dd .
## 002feeba-f7ee-453c-80ed-2e5a1aa44dc1 .
## 007e93f4-bd3e-479c-a8ab-0b796c0167be .
## 00e8748d-1763-490f-8076-a9125cbaa4b3 .
## 00ed93b3-d077-4e04-9579-078393fe580a .
Number of Ratings per User
The figure below shows that reviews per user equal 1. A dataset with only one review per user can not produce reasonable recommendations because the algorithm will not be able to learn enough about the user. Therefore, dummy users are created.

Fake Data
In order to solve the problem of one review per user, fake data is created for users. The method used for creating fake users is a simple sampling with replacement of numbers between 1 and 100 for 1000 rows of the dataset. The dataset with fake users is as follows:
# Replacing user column with dummy data
users <- sample.int(100, 1000, replace = T, prob = runif(100, min=0, max=0.2))
sephora_reviews_edited$user_id <- users
datRlab$user_id <- users
# Display
head(datRlab) %>%
kable(caption = "Data in Recommender Format") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
scroll_box(width = "100%", height = "400px")
Data in Recommender Format
user_id
|
product
|
rating
|
skin_type_
|
skin_type_combination
|
skin_type_dry
|
skin_type_normal
|
skin_type_oily
|
38
|
P38217
|
5
|
0
|
0
|
0
|
1
|
0
|
94
|
P38217
|
5
|
0
|
1
|
0
|
0
|
0
|
41
|
P38217
|
5
|
0
|
1
|
0
|
0
|
0
|
61
|
P38217
|
3
|
0
|
1
|
0
|
0
|
0
|
70
|
P38217
|
5
|
0
|
0
|
1
|
0
|
0
|
41
|
P38217
|
5
|
0
|
1
|
0
|
0
|
0
|
New Real Rating Matrix
The matrix shows users in rows and products in columns. A numeric value corresponding to a particular user and product implies how the product was rated by a particular user. For example, user 1 gave a rating of 4 to product P38217.
## 5 x 5 sparse Matrix of class "dgCMatrix"
## P38217 P382204 P382292 P382353 P382354
## 1 4 . 5 5 .
## 10 5 3 2 . 5
## 100 . 1 . 5 5
## 11 . 4 . 4 .
## 12 5 3 3 5 5
Number of Ratings per User with Fake Data
The following histogram shows number of ratings per user. Fake data for users has solved the issue of only 1 rating per user.

Number of Products per Mean Rating
The following graph shows number of products for a particular mean rating. As 3 out of 12 products have a mean rating of 5, where as 2 out of 12 products have mean rating less than 3. This finding is helpful in determining the “good rating” parameter in evaluation scheme. In order to not recommend the two products with mean rating less than 3, given rating parameter will be set to 3.

Heatmap
The heatmap is a useful way of visualizing preference of users for products. A closer look at the heatmap shows that most of the dark black boxes lie in the column of product 5, which means that most users rated product 5 favorably.

Evaluation Scheme
The following evaluation scheme splits the data into 75/25. 75% of the data is used for training, whereas 25% of the data is used for testing the recommender system.
As stated above, “good rating” parameter is set to 3 in order for the recommender system to not recommend products that have a mean rating of less than 3. This is a crucial decision to build reliability of a recommender system. If a recommender system is allowed to recommend products with low ratings, users will not see value in it.
“Given” parameter is set to 1 in order for the recommender to use ratings of all the users present in the dataset. The objective of the recommender system is to cover as many user preferences as possible, therefore users with less ratings are not dropped.
sephora_realrating_binarized <- binarize(sephora_realrating, minRating=3)
esSplit <- evaluationScheme(sephora_realrating, method="split", train=0.75, given=1, goodRating=3)
train <- getData(esSplit, "train")
known <- getData(esSplit, "known")
unknown <- getData(esSplit, "unknown")
Recommendation Methodology
Before designing a recommendation system for recommending makeup products, it is important to understand all the nuances of the industry:
In an event where consumer switches to another brand, products of the new brand vary interms of fitness for skin type. In such a case, the consumer has to try a pletohra of products before chosing one. As much as it is an issue for consumers, it is an opportunities for brands to pull customers. The recommender system will be sold to makeup brands and will enable the brands to recommend products to new customers as per their skin types.
- New products due to innovation in the industry
Brands in the makeup industry only stay relevant as far as they innovate. Innovation is essential for growth, and it introduces new products. Strong marketing of new products attracts existing consumers to buy it, however, as much as this is an opportunity for brands, it can also end up in a bad experience for consumers. In order to avoid a bad experience for consumers, the recommender system will have two algorithms working side by side, the main one and another recommender system that is designed to solve the issue of bad experience with new products. The second recommender system will solely be a user-based recommender that recommends products as per skin type. User-based recommender will only recommend items that have been vetted by other users who like similar items to the consumer.
- Makeup is a propular gift item
It is often that relatives and friends buy makeup for loved ones. In order to ease the process of buying a gift, recommender systems at brands’ outlets will recommend suitable products as per skin type.
- Young consumers try first makeup product during teenage
Consumers who are buying makeup for the first time pose the same problem as consumers who switch brands. They have a million products to chose from and it can be a tiring process. The main recommender system proposed will ease the process by recommending products that suit their skin types.
The above stated understanding of the industry demands two recommendation systems to be designed:
- Main Recommendation System
The main recommendation system has to recommend based on skin type and cater to the issue of brand switching and new consumers, therefore an appropriate recommender system will be a combination of popular and Item based collaborative filtering recommender. The hybrid recommender will give recommendations to people who have never bought makeup before, aswell as to people who are looking for similar products after a brand switch.
- Second Recommendation System
The second recommendation system is solely designed to solve the issue of bad experience with new products that enter the market. In order to only recommend new products that have been previously vetted by similar users, a User based collaborative filtering recommender is designed.
Main Recommendation System
recc <- NA
targeted_skin <- NA
# This algorithm only recommends within one skin_type
trainMyAlgorithm <- function(data, metadata, skin) {
targeted_skin <<- metadata[metadata[,skin] == 1, "product"]
filtered_data <- data[,targeted_skin]
recc <<- HybridRecommender(Recommender(filtered_data[1:69], method="IBCF", param=list(method =
"Pearson",k=300)),
Recommender(filtered_data[1:69], method="POPULAR", param= NULL),
weights = c(0.65,0.35)
)
}
applyMyAlgorithm <- function(data) {
filtered_data <- data[,targeted_skin]
pre <- predict(recc, filtered_data, n = 2)
return(as(pre,"list"))
}
Predictions
trainMyAlgorithm(train, sephora_reviews_edited, skin="skin_type_oily")
head(applyMyAlgorithm(known))
## $`1`
## [1] "P382355" "P382355"
##
## $`12`
## [1] "P382355" "P382355"
##
## $`13`
## [1] "P38217" "P38217"
##
## $`14`
## [1] "P38217" "P38217"
##
## $`17`
## [1] "P38217" "P38217"
##
## $`21`
## [1] "P382355" "P382355"
Second Recommendation System
recc <- NA
targeted_skin <- NA
# This algorithm only recommends within one skin_type
trainMyAlgorithm <- function(data, metadata, skin) {
targeted_skin <<- metadata[metadata[,skin] == 1, "product"]
filtered_data <- data[,targeted_skin]
recc <<- Recommender(filtered_data[1:69], method="UBCF",param=list(nn=40))
}
applyMyAlgorithm <- function(data) {
filtered_data <- data[,targeted_skin]
pre <- predict(recc, filtered_data, n = 2)
return(as(pre, "list"))
}
Predictions
trainMyAlgorithm(train, sephora_reviews_edited, skin="skin_type_combination")
head(applyMyAlgorithm(known))
## $`1`
## [1] "P384342"
##
## $`12`
## [1] "P384342"
##
## $`13`
## character(0)
##
## $`14`
## character(0)
##
## $`17`
## character(0)
##
## $`21`
## [1] "P384342"
Feedback Loop
Recommender systems require a strong feedback loop in order to provide valuable recommendations to users. As this recommender system will be sold to Makeup brands and will be installed at points of sale to facilitate customers in their buying decision, the feedback will be based on actual sale of products that are recommended. If a customer buys the product that is recommended, a transaction is considered a good recommendation, otherwise the recommender system considers it a bad recommendations and learns from it. Since consumers try the product before buying, purchase of product is a good indicator of a good recommendation.