Using Coclustering for Predicting Movie Ratings in Netflix - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Using Coclustering for Predicting Movie Ratings in Netflix

Description:

... Predicting Movie Ratings in Netflix. Tuyen Huynh. Duy Vu ... Subset of Netflix dataset: 12326 movies, 9730 users, 2048499 ratings (1.71% known ratings) ... – PowerPoint PPT presentation

Number of Views:239
Avg rating:3.0/5.0
Slides: 22
Provided by: huynhng
Category:

less

Transcript and Presenter's Notes

Title: Using Coclustering for Predicting Movie Ratings in Netflix


1
Using Co-clustering for Predicting Movie Ratings
in Netflix
  • Tuyen Huynh
  • Duy Vu

2
Outline
  • Data analysis
  • Methods for predicting missing ratings
  • Experiment setup
  • Results
  • Conclusion

3
Data Analysis
  • Subset of Netflix dataset
  • 12326 movies, 9730 users, 2048499 ratings (1.71
    known ratings)

4
Data Analysis (cont.)
5
Data Analysis (cont.)
6
Data Analysis (cont.)
7
Co-clustering for Missing Value Prediction (MVP)
  • Co-clustering can be seen as a matrix
    approximation
  • Co-clustering the original matrix to form
    coclusters
  • Compute some summary statistics based on these
    coclusters
  • Construct an approximation matrix based on these
    summary statistics

Use the approximation matrix for predicting
missing values
8
Co-clustering for MVP (cont.)
Banerjee et. al. A generalized maximum entropy
approach to bregman co-clustering and matrix
approximation.
9
Local prediction based on SVD
  • Use a thin SVD of rank k for approximating a
    matrix
  • Has been shown that this rank-k matrix is the
    best rank-k approximation matrix of R in
    Frobenius norm (squared error loss)

10
Local prediction based on SVD (cont.)
11
Experiment Setup
  • The dataset is split into a training set (90 of
    ratings of each user) and a test set.
  • Use squared Euclidean distance and I-divergence
    for the nonnegative rating matrix.
  • Use different settings of co-clusters 2x2, 5x5,
    10x10, 15x15, 20x20, 25x25.
  • Initialize row and column clusters randomly or by
    using Graclus clustering results.

12
Evaluation Metrics
  • Root Mean Squared Error (RMSE)
  • Mean Absolute Error (MAE)

13
Results
  • Baseline results use the average rating (3.6)
    for predicting missing ratings
  • RMSE 1.0834
  • MAE 0.90881

14
Results (cont.)
15
Results (cont.)
16
Results (cont.)
17
Results (cont.)
18
Results (cont.)
  • Local SVD on the best co-clustering result with
    the setting 20x20 using Euclidean distance and
    scheme 3.

19
Results (cont.)
  • Local SVD on the best co-clustering result of
    Graclus.

20
Conclusion
  • The co-clustering method with the appropriate
    schemes achieves the significant prediction
    performance.
  • The application of local SVD to the best
    co-clusters obtained from co-clustering and
    Graclus is successful.

21
  • Thank you!!!
Write a Comment
User Comments (0)
About PowerShow.com