Coms 573 Machine Learning Project How much someone is going to love a movie - PowerPoint PPT Presentation

1 / 16

About This Presentation

Title:

Coms 573 Machine Learning Project How much someone is going to love a movie

Description:

To predict the ratings of a movie for a user based on average statistics ... Calculate the average rate of rated movies for each user, assign it to the unrated movie ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 17

Provided by: link9

Category:

more less

Transcript and Presenter's Notes

Title: Coms 573 Machine Learning Project How much someone is going to love a movie

1
Coms 573 Machine Learning ProjectHow much
someone is going to love a movie

Hailin Tang

2
Contents

Goal of the project
Background
Experiment design and results
Discussion

3
Goal of the project

The goal of this project is to predict whether
someone will love a movie based on how much they
liked or disliked other movies by predicting a
rating on a scale of 1-5 for a given movie
The ultimate goal is to help Netflix improve
their current movie recommendation
system-Cinematch

Movies related to a movie being recommended at
Netflix
4
Background

Several challenges related to the data set of
Netflix
The huge data set (Uncompressed full dataset is
about 2 Gigabytes)
More than 100 million ratings (training set, on a
scale of 1-5 stars)
Over 480 thousand users
Nearly 18 thousand movie titles
The distribution of movies per user is skewed
median number of ratings per user is 93
10 user rated 16 or fewer movies
Two users rated as many as 17000 movies

5
Background

The ratings per movie are also skewed
25 of the movies had 190 or fewer ratings
associated with them
5 of the movies are rated fewer than 10 times
The lack of user and movie attributes
Nearly 99 percent of user-item entries being zero
on user-item dataset
Simply focused on the quality of recommendation
system in terms of minimizing the error between
the user rating and the predicted rating

6
(No Transcript)
7
Experiment Designs

Simple Method Averaging methods
To predict the ratings of a movie for a user
based on average statistics
Assign average rate to all movies for all users
Calculate the average rate of rated movies for
each user, assign it to the unrated movie
Calculate averages over all users for any
specific type of movies, assign it to the unrated
movies

8
Experiment Designs

Current wining solution was a linear combination
of 107 sets of predictions
KNN K-nearest neighbor
RBMs Restricted Boltzmann Machines
LSI Latent Semantic Indexing
SVD (Factor models) singular value decomposition
Conclusion
A simple linear combination of three models
K-NN,LSI and RBM could also get fairly good
result
Need a principled approach to optimizing the
solution

9
Experiment Designs

Maximum Margin Matrix Factorization for
collaborative ranking
Idea is take advantages of the collaborative
effects rating patterns from other users are
used to estimate rating for the current user
It works without feature extraction
Can handle very large, sparse and very imbalanced
datasets and deal with users who have very few
ratings
Software package Cofirank

10
Experiment Designs

Feature extraction and Traditional Classifiers
Feature Extraction
User related features
Movie related features
User-movie pair features
Classifers
K-nearest neighbor
Logistic Regression
Naïve Bayes Classifier

11
Feature extraction
User related features

Number of historic user ratings
Percentage of 1-star ratings of the user
Percentage of 5-star ratings of the user
Average rating of the user
Standard deviation of the user ratings

12
Feature extraction
Movie related features

number of historic ratings received by the movie
percentage of 1-star ratings received by the
movie
percentage of 5-star ratings received by the
movie
average rating received by the movie
standard deviation of the ratings received by
movie

13
Feature extraction
User-movie pair features

To include information on user and/or movies that
behave similarly
Identify groups of users which might have similar
behaviors and also groups of movies that were
rated in similar ways

14
Test Results
Grand Prize RMSElt0.8563, 10
improvement Current leading RMSE 0.8596, 9.65
improvement logistic regression provides the most
accurate results
15
Discussion

The key to winning is to build models that can
predict accurately where theres sufficient data,
without over-fitting in the absence of adequate
data
Two main classes of algorithms can be applied
into this project
Content-based analysis, similar movie is
recommended by computing the similarity in term
vectors between various movies
Collaborative analysis, each movie is treated as
black box, and user interactions with the movie
are used to compute the similarities

16
Thank You!

Write a Comment

User Comments (0)