Data Mining: Concepts - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Data Mining: Concepts

Description:

Data Mining: Concepts & Techniques Motivation: Necessity is the Mother of Invention Data explosion problem Automated data collection tools and mature database ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 30
Provided by: LeeYu8
Category:

less

Transcript and Presenter's Notes

Title: Data Mining: Concepts


1
Data Mining Concepts Techniques
2
Motivation Necessity is the Mother of Invention
  • Data explosion problem
  • Automated data collection tools and mature
    database technology lead to tremendous amounts of
    data stored in databases, data warehouses and
    other information repositories
  • We are drowning in data, but starving for
    knowledge!
  • Solution Data warehousing and data mining
  • Data warehousing and on-line analytical
    processing
  • Extraction of interesting knowledge (rules,
    regularities, patterns, constraints) from data
    in large databases

3
Evolution of Database Technology
4
(No Transcript)
5
(No Transcript)
6
What Is Data Mining?
  • Data mining (knowledge discovery in databases)
  • Extraction of interesting (non-trivial, implicit,
    previously unknown and potentially useful)
    information or patterns from data in large
    databases
  • Alternative names and their inside stories
  • Data mining a misnomer?
  • Knowledge discovery(mining) in databases (KDD),
    knowledge extraction, data/pattern analysis, data
    archeology, data dredging, information
    harvesting, business intelligence, etc.
  • What is not data mining?
  • (Deductive) query processing.
  • Expert systems or small ML/statistical programs

7
Data Mining A KDD Process
Data mining the core of knowledge discovery
process
8
Steps of a KDD Process
  • Learning the application domain
  • relevant prior knowledge and goals of application
  • Creating a target data set data selection
  • Data cleaning and preprocessing (may take 60 of
    effort!)
  • Data reduction and transformation
  • Find useful features, dimensionality/variable
    reduction, invariant representation.
  • Choosing functions of data mining
  • summarization, classification, regression,
    association, clustering.
  • Choosing the mining algorithm(s)
  • Data mining search for patterns of interest
  • Pattern evaluation and knowledge presentation
  • visualization, transformation, removing redundant
    patterns, etc.
  • Use of discovered knowledge

9
Knowledge Discovery Process
  • The whole process of extraction of implicit,
    previously unknown and potentially useful
    knowledge from a large database
  • It includes data selection, cleaning, enrichment,
    coding, data mining, and reporting
  • Data Mining is the key stage of Knowledge
    Discovery Process
  • The process of finding the desired information
    from large database

10
Knowledge Discovery Process
  • Example the database of a magazine publisher
    which sells five types of magazines on cars,
    houses, sports, music and comics
  • Data mining
  • Find interesting categorical properties
  • Questions
  • What is the profile of a reader of a car
    magazine?
  • Is there any correlation between an interest in
    cars and an interest in comics?
  • The knowledge discovery process consists of six
    stages

11
Data Selection
  • Select the information about people who have
    subscribed to a magazine

12
Cleaning
  • Pollutions Type errors, moving from one place to
    another without notifying change of address,
    people give incorrect information about
    themselves
  • Pattern Recognition Algorithms

13
Cleaning
  • Lack of domain consistency

14
Enrichment
  • Need extra information about the clients
    consisting of date of birth, income, amount of
    credit, and whether or not an individual owns a
    car or a house

15
Enrichment
  • The new information need to be easily joined to
    the existing client records
  • Extract more knowledge

16
Coding
  • We select only those records that have enough
    information to be of value (row)
  • Project the fields in which we are interested
    (column)

17
Coding
  • Code the information which is too detailed
  • Address to region
  • Birth date to age
  • Divide income by 1000
  • Divide credit by 1000
  • Convert cars yes-no to 1-0
  • Convert purchase date to month numbers starting
    from 1990
  • The way in which we code the information will
    determine the type of patterns we find
  • Coding has to be performed repeatedly in order to
    get the best results

18
Coding
  • The way in which we code the information will
    determine the type of patterns we find

19
Coding
  • We are interested in the relationships between
    readers of different magazines
  • Perform flattening operation

20
Data mining
  • We may find the following rules
  • A customer with credit gt 13000 and aged between
    22 and 31 who has subscribed to a comics at time
    T will very likely subscribe to a car magazine
    five years later
  • The number of house magazines sold to customers
    with credit between 12000 and 31000 living in
    region 4 is increasing
  • A customer with credit between 5000 and 10000 who
    reads a comics magazine will very likely become a
    customer with credit between 12000 and 31000 who
    reads a sports and a house magazine after 12 years

21
Knowledge Discovery Process
22
Business-Question-Driven Process
23
Data Mining and Business Intelligence
24
Architecture of a Typical Data Mining System
25
Data Mining On What Kind of Data?
  • Relational databases
  • Data warehouses
  • Transactional databases
  • Advanced DB and information repositories
  • Object-oriented and object-relational databases
  • Spatial databases
  • Time-series data and temporal data
  • Text databases and multimedia databases
  • Heterogeneous databases
  • WWW

26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com