Introduction of Data Mining and Association Rules - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Introduction of Data Mining and Association Rules

Description:

The automated extraction of hidden predictive information from database. Allows users to analyze large databases to solve business decision problems. ... – PowerPoint PPT presentation

Number of Views:217
Avg rating:3.0/5.0
Slides: 15
Provided by: ruiz3
Category:

less

Transcript and Presenter's Notes

Title: Introduction of Data Mining and Association Rules


1
Introduction of Data Mining and Association Rules
  • cs157 Spring 2009
  • Instructor Dr. Sin-Min Lee
  • Student Dongyi Jia

2
What is data mining?
  • The automated extraction of hidden predictive
    information from database
  • Allows users to analyze large databases to solve
    business decision problems.
  • An extension of statistics, with a few artificial
    intelligence and machine learning twists thrown
    in.
  • Attempts to discover rules and patterns from
    data.

3
Data Mining - On What Kind of Data
  • In principle, data mining should be applicable to
    any kind of information repositiory
  • ? relational databases
  • ? data warehouses
  • ? transactional and advanced databases
  • ? flat files
  • ? World Wide Web

4
Data Mining Functionalities-What kinds of
Patterns Can be Mined?
  • Association Analysis
  • Classification and Prediction
  • Cluster Analysis
  • Evolution Analysis

5
Applications of data mining
  • Require some sort of Prediction
  • for example when a person applies for a
    credit card, the credit-card company wants to
    predict if the person is a good credit risk.
  • Looks for Associations
  • for example if a customer buys a book, an
    on-line bookstore may suggest other associated
    books.

6
Associations Rule Discovery
  • Task Discovering association rules among items
    in a transaction database.
  • How are association rules mined from large
    database?
  • 1. Find all frequent itemset each of these
    itemsets will occur at least as frequent as
    pre-determined minimum support count.
  • 2. Generate strong association rules from the
    frequent itemsets these rules must satisfy
    minimum support and minimum confidence.

7
Association Rules (cont.)
  • Retail shops are often interested in associations
    between items that people buy.
  • Someone who buys bread is quite likely also to
    buy milk.
  • association rule bread gt milk
  • A person who brought the book Database System
    Concepts is quite likely also to buy the book
    Operating System Concepts.
  • association rule DSC gt OSC

8
Association Rules (cont.)
  • Two numbers
  • Support is a measure of what fraction of the
    population satisfies both the antecedent and the
    consequent of the true.
  • Confidence is a measure of how often the
    consequent is true when the antecedent is true.

9
Association Rules (cont.)
  • Let I i1, i2, im be a total set of items
  • D is a set of transactions
  • d is one transaction consists of a set
    of items
  • d ? I
  • Association rule
  • X ? Y where X ? I ,Y ? I and X ? Y ?
  • support (of transactions contain X ? Y ) /D
  • confidence (of transactions contain X ? Y ) /
  • of transactions contain X

10
example
  • Example of transaction data
  • CD player, musics CD, musics book
  • CD player, musics CD
  • musics CD, musics book
  • CD player
  • I CD player, musics CD, musics book
  • D 4
  • of transactions contain both CD player, musics
    CD 2
  • of transactions contain CD player 3
  • CD player ? musics CD (sup2/4 , conf 2/3 )

11
Association Rules (cont.)
  • Rule support and confidence reflect the
    usefulness and certainty of discovered rules.
  • A support of 50 for association rule means that
    50 of all the transactions under analysis that
    CDs player and music CD are purchased together.
  • A confidence of 67 means that 67 of the
    customers who purchased a CDs player also bought
    music CD.

12
Strong Association Rule
  • User sets support and confidence thresholds.
  • Rules above support threshold have LARGE support.
  • Rules above confidence threshold have HIGH
    confidence.
  • Rules satisfying both are said to be STRONG.

13
References
  • Professor Lees lectures
  • http//www.cs.sjsu.edu/lee/cs157b/cs157b.html
  • Rui Zhao, SJSU
  • http//www.cs.sjsu.edu/lee/cs157b/cs157b.html
  • Jiawei Han, Micheline Kamber
  • Data Mining Concepts and Techniques
  • Morgan Kaufmann Publishers

14
Thank you !
Write a Comment
User Comments (0)
About PowerShow.com