Welcome!%20MSCIT%20521:%20Knowledge%20Discovery%20and%20Data%20Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Welcome!%20MSCIT%20521:%20Knowledge%20Discovery%20and%20Data%20Mining

Description:

Predict if a user is going to rate a movie? Predict how many users are going to rate a movie? ... Given a web query ('Apple'), predict the categories (IT, Food) 1998: ... – PowerPoint PPT presentation

Number of Views:510
Avg rating:3.0/5.0
Slides: 16
Provided by: Qiang
Category:

less

Transcript and Presenter's Notes

Title: Welcome!%20MSCIT%20521:%20Knowledge%20Discovery%20and%20Data%20Mining


1
Welcome! MSCIT 521 Knowledge Discovery and
Data Mining
  • Qiang Yang
  • Hong Kong University of Science and Technology
  • qyang_at_cs.ust.hk
  • http//www.cs.ust.hk

2
Data Mining An Example
  • KDDCUP from past years
  • 2007
  • Predict if a user is going to rate a movie?
  • Predict how many users are going to rate a movie?
  • 2006
  • Predict if a patient has cancer from medical
    images
  • 2005
  • Given a web query (Apple), predict the
    categories (IT, Food)
  • 1998
  • Given a person, predict if this person is going
    to donate money
  • In general, we wish to
  • Input Data
  • Output
  • Build model
  • Apply model to future data

2
3
Data Mining Convergence of Three Technologies
3
4
Definition Predictive Model
  • A black box that makes predictions about the
    future based on information from the past and
    present
  • Large number of inputs usually available

4
5
How are Models Built and Used?
  • High Level View

5
6
The Data Mining Process
6
7
What does the Real World Look Like
7
8
Predictive Models are
  • Decision Trees
  • Nearest Neighbor Classification
  • Neural Networks
  • Rule Induction
  • Clustering

8
9
Course Description
  • Data Mining and Knowledge Discovery
  • Focus
  • Focus 1 Theoretical foundations in Pattern
    Recognition and Machine Learning
  • Algorithms
  • Differences?
  • where they apply?
  • Focus 2 Broad survey of recent research
  • Focus 3 Hands-on, apply algorithms to KDD data
    sets

10
Topic 1 Foundations
  • Classification algorithms
  • Clustering algorithms
  • Association algorithms
  • Sequential Data Mining
  • Novel Applications
  • Web
  • Customer Relationship Management
  • Biological Data

11
Topic 2 Hands On
  • Apply learned algorithms to selected data sets
  • Homework assignments
  • Get familiar with existing system packages and
    libraries
  • In-class workshops
  • Programming Assignments

12
Important Sites
  • Instructor Web Site
  • http//www.cse.ust.hk/qyang/521
  • TA Kaixiang Mo
  • Assignment Hand-in online
  • csit5210_at_ust.hk
  • Course Discussion Site
  • Check out the web cite

13
Prerequisites
  • Statistics and Probability would help,
  • but not necessary
  • Pattern Recognition would help,
  • but not necessary
  • Databases
  • Knowledge of SQL and relational algebra
  • But not necessary
  • One programming language
  • One of Java, C, Perl, Matlab, etc.
  • Will need to read Java Library

14
Grading
  • Grade Distribution
  • Assignments 20
  • Course Project 20
  • Exams 60
  • Midterm 20
  • Final 40

15
More info
  • Textbooks For reference only
  • Introduction to Data Mining by Pang-Ning Tan,
    Michael Steinbach, and Vipin Kumar, Pearson
    International Edition, 2005.
  • Data Mining.  by Ian Witten and Ebe Frank.
    (Google books)
  • Data Mining -- Concepts and Techniques by Jiawei
    Han and Micheline Kamber. Morgan Kaufmann
    Publishers.
  • Available in our bookstore
Write a Comment
User Comments (0)
About PowerShow.com