Recently I helped out a friend who is starting a video streaming gig to layout the roadmap on how he should go about developing the video content recommendation engine. Since I got good feedback, I thought of sharing the same with you. And you can apply the same principles to any recommendation, irrespective of the product.
So let’s assume you are also trying to build a video streaming company which will compete with the likes of Netflixes of the world. We have to come up with a complete road map of how the recommendation engine will be built from day 1 till you are a billion dollar company.
Here are our competitors –
Since we are a new business and we are in the business of recommending movies, we want to start off quickly.
The problems are –
o All our users are new. That is, we don’t have any history of individual users.
o And since our company is also new, we haven’t also still learned what kind of movies people of different age, gender etc. prefer to watch, so that we can recommend.
So as we collect more data let’s start with level 1 recommendation strategy.
· Level 1: Popularity. Recommending products is just based off of the popularity of each product. The first most simple and basic method is to recommend the most popular movie irrespective of user characteristic. This can be deployed when you don’t have any history to work on, you don’t know anything about your user, or you are not much concerned about personalization, or you want to quickly start off with something while you develop the other approaches.
Pros:
1. Simple and easy to implement.
2. Can work on shallow data
Cons:
1. Lacks personalization
So what we will do for our Startup is we will collect the best rated movies from imdb’s public data and recommend the best rated movies to everyone.
We keep doing this until we collect enough data on what kind of movies different type of people are watching. Then we want to handle some level of personalization, so let’s move to level 2.
Now few days has passed for our Startup and due to good work done by our Marketing team, we have had good number of people who have signed up and watched some movies. With these data in place we get trends like –
Female, Age – 30-40 -> Likes -> {Romantic Comedies, Julia Roberts}
Male, Age – 30-40 -> Likes -> {Action, Arnold Schwarzenegger}
Male, Age – 35-45 -> Likes -> {Animated from Pixar Production}
And not just that. We can also find trends like
On a weekend, afternoon, comedies or romance genre are watched more, while thrillers and horrors are watched more during night times.
So next time when a user logs in we may recommend him movie based on his demography matching trends like above. But we are not removing Level 1 strategy completely. We now will implement both Level 1 and Level 2 strategy with some weightage to each.
· Level 2: Classification model. Where we're going to use features of both the products and the users to make our recommendations. So this classification model is going to take in features about the user, features about the past purchases of that user, features about the product that we're thinking about possibly recommending, as well as potentially lots of other features that we can talk about.
Pros:
1. It can be very personalized
2. Can capture context
3. Works well in very limited historical data
Cons:
1. Works poorly when features are insufficient or poor in quality.
Now at our startup we have reached a pretty advanced level of personalization and everyone is happy. But let’s look at certain scenario of –
Male, Age – 30-40 -> Likes -> Action, Arnold Schwarzenegger
Male, Age – 35-45 -> Likes -> Animated from Pixar Production
We can assume the Pixar movies are mostly for children watching from their parents account. So these kind of problems can be either solved by asking more questions from Users during sign up. But there are only a certain number of questions you can ask and rest you have to understand from individual patterns. In cases like these, we can implement Level 3. Here we analyze trends like people who watch movies from Pixar also tend to like movies from Disney.
So we may find trends like below which are very tightly coupled, which we can use for our recommendation.
People who like {Pixar}-> also likes ->{Disney, Animation, Ice Age Franchise}
People who like {Avengers} ->also like -> {Marvel Studio, Batman Franchise}
Again we will implement Level 1, Level 2 and Level 3 together with combined weightages to get the best possible solution.
· Level 3: Collaborative filtering. This brings us to the idea of co-occurrence of purchases. Product recommendation is built on information like if a person bought this item, then they're probably also interested in some other item because we've seen lots and lots of examples in the past of people buying those pairs of items together. Maybe not simultaneously, at the same time, but in the course of their purchase histories.
Pros:
1. Works well even when past data is unavailable for the individual user.
2. Handles lack of feature or poor quality feature well.
Cons:
1. Complex to build
Congratulations, you are now successfully running a unicorn video streaming company with a great recommendation engine.
Please reach out to me if you need any technical clarification.
Until next time.
Komentáře