Introduction

Selecting a product that fits your needs is a tiring process. The process is more demanding with the evergrowing range of products. Makeup is a category that does not only have a big range of products, but also requires a consumer to select the right product according to skin type. It is not uncommon for relatives or friends to buy makeup as a gift for their loved ones; buying such gifts is a difficult process because of wide range of products and the added complexity of selecting a product that matches skin-type of the loved ones. In such a situation, a recommender system that recommends products as per skin type can take burden off the purchaser.

Dataset

Sephora’s data is used for building this recommender system. Sephora is a popular makeup brand with millions of customers all around the world. The fact that Sephora is a global brand ensures that all different types of makeup products and skin types are accounted for. While the reason for choosing Sephora’s dataset is to be able to account for wide range of makeup products and skin types, for the purpose of this assignment a smaller dataset with 1000 user ratings, 12 products and 5 skin types is chosen. However, expanding the algorithm to a larger dataset will be frictionless.

Reviews raw import
product_id	review_title	review_text	rating	age_range	skin_type	skin_tone	eye_color	reviewer_username	tags	review_id
P38217	Worth the money	Sometimes I stray from this cleanser, but I always come back. It takes off all makeup completely and leaves my skin feeling fresh and looking bright. The exfoliation is very, very gentle, and perfect for my slightly sensitive skin. It seems expensive, but because you need just a bit it lasts for a long, long time.	5	NA	normal	light	NA	katechatte	{foamy,exfoliating}	6611717f-2636-4756-bf36-66c81cc267a7
P38217	Great	I am a 41 year old African American woman with sign of hormonal aging. This product has my skin looking great, but you must use all of the other products for the full benefits.	5	NA	combination	deep	NA	snook41	{foamy}	e7d3307e-02ff-45a1-8fc3-6bd628bedd86
P38217	Great Product	I’m really enjoying this product. Received a sample trio of the Murad products to try first, then purchased the cleanser. Great deal at $35 as a little goes a long way! I use it nightly with my Clarisonic Mia and in the morning by itself. Leaves my skin feeling clean and smooth and appears to be helping to even out my skin tone.	5	NA	combination	olive	NA	wahinewarrior	{foamy,milky,exfoliating}	4188d728-fde6-4d06-984e-164cca2b8781
P38217	Nice, but not great for combination skin	I tried this cleanser at a friends house, and I was instantly in love. I have combination skin, oily t-zone with flaky, dehydrated patches near my mouth and jaw line. After washing my face with this cleanser, my skin felt hydrated and more even-textured…So the next day, I bought this cleanser! However, it has been a week and my pores very large and I’m beginning to notice black heads :( my skin feels hydrated, but I’m still noticing a bit of dryness.	3	NA	combination	fair	NA	jenlines22	{hydrating,creamy}	248c904c-6e30-4929-8228-87b03ad7a921
P38217	great moisturizer	leaves the skin feeling fresh and revived… just loving it	5	NA	dry	light	NA	jessea	{exfoliating}	654bdb99-9371-4440-a540-0dd2a73da339
P38217	It works	I am a 33 year old Latina with combination oily/normal skin. I didn’t like the marketing on this as something that was to protect skin from hormonal aging (not there yet), but I loved the creamy texture and the way it left my skin feeling soft. Also, a little really goes a long way. It’s worht the splurge.	5	NA	combination	medium	NA	Anonymous	{milky}	35eadf21-f589-4e57-b149-af59a2e8fe07

Data Preparation

The original dataset is stripped down to columns that are required for the purpose of building a recommendation engine. Following columns are selected:

User_id
User_name
Rating
Product
Skin_type

Inorder to be able to build a recommendation system that recommends products as per skin type, the five skin types are transformed in to dummy columns. Therfore, the prepared dataset is as follows:

Reviews raw import
user_id	user_name	rating	product	skin_type_combination	skin_type_dry	skin_type_normal
6611717f-2636-4756-bf36-66c81cc267a7	katechatte	5	P38217	0	0	1
e7d3307e-02ff-45a1-8fc3-6bd628bedd86	snook41	5	P38217	1	0	0
4188d728-fde6-4d06-984e-164cca2b8781	wahinewarrior	5	P38217	1	0	0
248c904c-6e30-4929-8228-87b03ad7a921	jenlines22	3	P38217	1	0	0
654bdb99-9371-4440-a540-0dd2a73da339	jessea	5	P38217	0	1	0
35eadf21-f589-4e57-b149-af59a2e8fe07	Anonymous	5	P38217	1	0	0

Data in Recommender Lab Format

Data in Recommender Format
user_id	product	rating	skin_type_combination	skin_type_dry	skin_type_normal
6611717f-2636-4756-bf36-66c81cc267a7	P38217	5	0	0	1
e7d3307e-02ff-45a1-8fc3-6bd628bedd86	P38217	5	1	0	0
4188d728-fde6-4d06-984e-164cca2b8781	P38217	5	1	0	0
248c904c-6e30-4929-8228-87b03ad7a921	P38217	3	1	0	0
654bdb99-9371-4440-a540-0dd2a73da339	P38217	5	0	1	0
35eadf21-f589-4e57-b149-af59a2e8fe07	P38217	5	1	0	0

Real Rating Matrix

The matrix shows users in rows and products in columns. A numeric value corresponding to a particular user and product implies how the product was rated by a particular user. For example, user 00e8748d-1763-490f-8076-a9125cbaa4b3 gave a rating of 4 to product P382292.

## 5 x 5 sparse Matrix of class "dgCMatrix"
##                                      P38217 P382204 P382292 P382353
## 0010647f-8326-4e71-b9eb-7e21f4add1dd      .       .       .       .
## 002feeba-f7ee-453c-80ed-2e5a1aa44dc1      .       .       .       .
## 007e93f4-bd3e-479c-a8ab-0b796c0167be      .       .       .       .
## 00e8748d-1763-490f-8076-a9125cbaa4b3      .       .       4       .
## 00ed93b3-d077-4e04-9579-078393fe580a      5       .       .       .
##                                      P382354
## 0010647f-8326-4e71-b9eb-7e21f4add1dd       .
## 002feeba-f7ee-453c-80ed-2e5a1aa44dc1       .
## 007e93f4-bd3e-479c-a8ab-0b796c0167be       .
## 00e8748d-1763-490f-8076-a9125cbaa4b3       .
## 00ed93b3-d077-4e04-9579-078393fe580a       .

Number of Ratings per User

The figure below shows that reviews per user equal 1. A dataset with only one review per user can not produce reasonable recommendations because the algorithm will not be able to learn enough about the user. Therefore, dummy users are created.

Fake Data

In order to solve the problem of one review per user, fake data is created for users. The method used for creating fake users is a simple sampling with replacement of numbers between 1 and 100 for 1000 rows of the dataset. The dataset with fake users is as follows:

# Replacing user column with dummy data
users <- sample.int(100, 1000, replace = T, prob = runif(100, min=0, max=0.2))
sephora_reviews_edited$user_id <- users
datRlab$user_id <- users
# Display
head(datRlab) %>%
  kable(caption = "Data in Recommender Format") %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
     scroll_box(width = "100%", height = "400px")

Data in Recommender Format
user_id	product	rating	skin_type_combination	skin_type_dry	skin_type_normal
38	P38217	5	0	0	1
94	P38217	5	1	0	0
41	P38217	5	1	0	0
61	P38217	3	1	0	0
70	P38217	5	0	1	0
41	P38217	5	1	0	0

New Real Rating Matrix

## 5 x 5 sparse Matrix of class "dgCMatrix"
##     P38217 P382204 P382292 P382353 P382354
## 1        4       .       5       5       .
## 10       5       3       2       .       5
## 100      .       1       .       5       5
## 11       .       4       .       4       .
## 12       5       3       3       5       5

Number of Ratings per User with Fake Data

The following histogram shows number of ratings per user. Fake data for users has solved the issue of only 1 rating per user.

Number of Products per Mean Rating

The following graph shows number of products for a particular mean rating. As 3 out of 12 products have a mean rating of 5, where as 2 out of 12 products have mean rating less than 3. This finding is helpful in determining the “good rating” parameter in evaluation scheme. In order to not recommend the two products with mean rating less than 3, given rating parameter will be set to 3.

Heatmap

The heatmap is a useful way of visualizing preference of users for products. A closer look at the heatmap shows that most of the dark black boxes lie in the column of product 5, which means that most users rated product 5 favorably.

Evaluation Scheme

The following evaluation scheme splits the data into 75/25. 75% of the data is used for training, whereas 25% of the data is used for testing the recommender system.

As stated above, “good rating” parameter is set to 3 in order for the recommender system to not recommend products that have a mean rating of less than 3. This is a crucial decision to build reliability of a recommender system. If a recommender system is allowed to recommend products with low ratings, users will not see value in it.

“Given” parameter is set to 1 in order for the recommender to use ratings of all the users present in the dataset. The objective of the recommender system is to cover as many user preferences as possible, therefore users with less ratings are not dropped.

sephora_realrating_binarized <- binarize(sephora_realrating, minRating=3)
esSplit <- evaluationScheme(sephora_realrating, method="split", train=0.75, given=1, goodRating=3)

train <- getData(esSplit, "train")
known <- getData(esSplit, "known")
unknown <- getData(esSplit, "unknown")

Recommendation Methodology

Before designing a recommendation system for recommending makeup products, it is important to understand all the nuances of the industry:

Consumers switch brands

In an event where consumer switches to another brand, products of the new brand vary interms of fitness for skin type. In such a case, the consumer has to try a pletohra of products before chosing one. As much as it is an issue for consumers, it is an opportunities for brands to pull customers. The recommender system will be sold to makeup brands and will enable the brands to recommend products to new customers as per their skin types.

New products due to innovation in the industry

Brands in the makeup industry only stay relevant as far as they innovate. Innovation is essential for growth, and it introduces new products. Strong marketing of new products attracts existing consumers to buy it, however, as much as this is an opportunity for brands, it can also end up in a bad experience for consumers. In order to avoid a bad experience for consumers, the recommender system will have two algorithms working side by side, the main one and another recommender system that is designed to solve the issue of bad experience with new products. The second recommender system will solely be a user-based recommender that recommends products as per skin type. User-based recommender will only recommend items that have been vetted by other users who like similar items to the consumer.

Makeup is a propular gift item

It is often that relatives and friends buy makeup for loved ones. In order to ease the process of buying a gift, recommender systems at brands’ outlets will recommend suitable products as per skin type.

Young consumers try first makeup product during teenage

Consumers who are buying makeup for the first time pose the same problem as consumers who switch brands. They have a million products to chose from and it can be a tiring process. The main recommender system proposed will ease the process by recommending products that suit their skin types.

The above stated understanding of the industry demands two recommendation systems to be designed:

Main Recommendation System

The main recommendation system has to recommend based on skin type and cater to the issue of brand switching and new consumers, therefore an appropriate recommender system will be a combination of popular and Item based collaborative filtering recommender. The hybrid recommender will give recommendations to people who have never bought makeup before, aswell as to people who are looking for similar products after a brand switch.

Second Recommendation System

The second recommendation system is solely designed to solve the issue of bad experience with new products that enter the market. In order to only recommend new products that have been previously vetted by similar users, a User based collaborative filtering recommender is designed.

Main Recommendation System

recc <- NA
targeted_skin <- NA

# This algorithm only recommends within one skin_type
trainMyAlgorithm <- function(data, metadata, skin) {
  targeted_skin <<- metadata[metadata[,skin] == 1, "product"]
  filtered_data <- data[,targeted_skin]
  recc <<- HybridRecommender(Recommender(filtered_data[1:69], method="IBCF", param=list(method =
                                         "Pearson",k=300)),
                             Recommender(filtered_data[1:69], method="POPULAR", param= NULL),
                             weights = c(0.65,0.35)
                            )
 
}

applyMyAlgorithm <- function(data) {
  filtered_data <- data[,targeted_skin]
  pre <- predict(recc, filtered_data, n = 2)
  return(as(pre,"list"))
}

Predictions

trainMyAlgorithm(train, sephora_reviews_edited, skin="skin_type_oily")

head(applyMyAlgorithm(known))

## $`1`
## [1] "P382355" "P382355"
## 
## $`12`
## [1] "P382355" "P382355"
## 
## $`13`
## [1] "P38217" "P38217"
## 
## $`14`
## [1] "P38217" "P38217"
## 
## $`17`
## [1] "P38217" "P38217"
## 
## $`21`
## [1] "P382355" "P382355"

Second Recommendation System

recc <- NA
targeted_skin <- NA

# This algorithm only recommends within one skin_type
trainMyAlgorithm <- function(data, metadata, skin) {
  targeted_skin <<- metadata[metadata[,skin] == 1, "product"]
  filtered_data <- data[,targeted_skin]
  recc <<- Recommender(filtered_data[1:69], method="UBCF",param=list(nn=40))
}

applyMyAlgorithm <- function(data) {
  filtered_data <- data[,targeted_skin]
  pre <- predict(recc, filtered_data, n = 2)
  return(as(pre, "list"))
}

Predictions

trainMyAlgorithm(train, sephora_reviews_edited, skin="skin_type_combination")

head(applyMyAlgorithm(known))

## $`1`
## [1] "P384342"
## 
## $`12`
## [1] "P384342"
## 
## $`13`
## character(0)
## 
## $`14`
## character(0)
## 
## $`17`
## character(0)
## 
## $`21`
## [1] "P384342"

Feedback Loop

Recommender systems require a strong feedback loop in order to provide valuable recommendations to users. As this recommender system will be sold to Makeup brands and will be installed at points of sale to facilitate customers in their buying decision, the feedback will be based on actual sale of products that are recommended. If a customer buys the product that is recommended, a transaction is considered a good recommendation, otherwise the recommender system considers it a bad recommendations and learns from it. Since consumers try the product before buying, purchase of product is a good indicator of a good recommendation.

Recommendation Engine

Introduction

Dataset

Data Preparation

Data in Recommender Lab Format

Real Rating Matrix

Number of Ratings per User

Fake Data

New Real Rating Matrix

Number of Ratings per User with Fake Data

Number of Products per Mean Rating

Heatmap

Evaluation Scheme

Recommendation Methodology

Main Recommendation System

Predictions

Second Recommendation System

Predictions

Feedback Loop