Coms 573 Machine Learning Project How much someone is going to love a movie - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Coms 573 Machine Learning Project How much someone is going to love a movie

Description:

To predict the ratings of a movie for a user based on average statistics ... Calculate the average rate of rated movies for each user, assign it to the unrated movie ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 17
Provided by: link9
Category:

less

Transcript and Presenter's Notes

Title: Coms 573 Machine Learning Project How much someone is going to love a movie


1
Coms 573 Machine Learning ProjectHow much
someone is going to love a movie
  • Hailin Tang

2
Contents
  • Goal of the project
  • Background
  • Experiment design and results
  • Discussion

3
Goal of the project
  • The goal of this project is to predict whether
    someone will love a movie based on how much they
    liked or disliked other movies by predicting a
    rating on a scale of 1-5 for a given movie
  • The ultimate goal is to help Netflix improve
    their current movie recommendation
    system-Cinematch

Movies related to a movie being recommended at
Netflix
4
Background
  • Several challenges related to the data set of
    Netflix
  • The huge data set (Uncompressed full dataset is
    about 2 Gigabytes)
  • More than 100 million ratings (training set, on a
    scale of 1-5 stars)
  • Over 480 thousand users
  • Nearly 18 thousand movie titles
  • The distribution of movies per user is skewed
  • median number of ratings per user is 93
  • 10 user rated 16 or fewer movies
  • Two users rated as many as 17000 movies

5
Background
  • The ratings per movie are also skewed
  • 25 of the movies had 190 or fewer ratings
    associated with them
  • 5 of the movies are rated fewer than 10 times
  • The lack of user and movie attributes
  • Nearly 99 percent of user-item entries being zero
    on user-item dataset
  • Simply focused on the quality of recommendation
    system in terms of minimizing the error between
    the user rating and the predicted rating

6
(No Transcript)
7
Experiment Designs
  • Simple Method Averaging methods
  • To predict the ratings of a movie for a user
    based on average statistics
  • Assign average rate to all movies for all users
  • Calculate the average rate of rated movies for
    each user, assign it to the unrated movie
  • Calculate averages over all users for any
    specific type of movies, assign it to the unrated
    movies

8
Experiment Designs
  • Current wining solution was a linear combination
    of 107 sets of predictions
  • KNN K-nearest neighbor
  • RBMs Restricted Boltzmann Machines
  • LSI Latent Semantic Indexing
  • SVD (Factor models) singular value decomposition
  • Conclusion
  • A simple linear combination of three models
    K-NN,LSI and RBM could also get fairly good
    result
  • Need a principled approach to optimizing the
    solution

9
Experiment Designs
  • Maximum Margin Matrix Factorization for
    collaborative ranking
  • Idea is take advantages of the collaborative
    effects rating patterns from other users are
    used to estimate rating for the current user
  • It works without feature extraction
  • Can handle very large, sparse and very imbalanced
    datasets and deal with users who have very few
    ratings
  • Software package Cofirank

10
Experiment Designs
  • Feature extraction and Traditional Classifiers
  • Feature Extraction
  • User related features
  • Movie related features
  • User-movie pair features
  • Classifers
  • K-nearest neighbor
  • Logistic Regression
  • Naïve Bayes Classifier

11
Feature extraction
User related features
  • Number of historic user ratings
  • Percentage of 1-star ratings of the user
  • Percentage of 5-star ratings of the user
  • Average rating of the user
  • Standard deviation of the user ratings

12
Feature extraction
Movie related features
  • number of historic ratings received by the movie
  • percentage of 1-star ratings received by the
    movie
  • percentage of 5-star ratings received by the
    movie
  • average rating received by the movie
  • standard deviation of the ratings received by
    movie

13
Feature extraction
User-movie pair features
  • To include information on user and/or movies that
    behave similarly
  • Identify groups of users which might have similar
    behaviors and also groups of movies that were
    rated in similar ways

14
Test Results
Grand Prize RMSElt0.8563, 10
improvement Current leading RMSE 0.8596, 9.65
improvement logistic regression provides the most
accurate results
15
Discussion
  • The key to winning is to build models that can
    predict accurately where theres sufficient data,
    without over-fitting in the absence of adequate
    data
  • Two main classes of algorithms can be applied
    into this project
  • Content-based analysis, similar movie is
    recommended by computing the similarity in term
    vectors between various movies
  • Collaborative analysis, each movie is treated as
    black box, and user interactions with the movie
    are used to compute the similarities

16
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com