Introduction to Data Mining - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Introduction to Data Mining

Description:

WalMart captures point-of-sale transactions from over 2,900 stores in 6 ... classify cars based on gas mileage. Stored data is used to locate data in ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 18
Provided by: Prefer414
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Data Mining


1
Introduction to Data Mining
  • Jiang Li
  • Department of Computer Science Information
    Technology
  • Austin Peay State University

2
Outline
  • Data Collected
  • Knowledge Discovery An Iterative Process
  • Data Mining Examples
  • Data Mining Functions and Algorithms

3
Data Collected
  • Business
  • Wal-Mart
  • 20 million transactions a day
  • Mobile Oil Corporation
  • A 100 terabytes data warehouse
  • Science
  • The human genome database project
  • Gigabytes of data
  • NASA Earth Observing System (EOS)
  • 50 gigabytes data per hour
  • Radio, Television, and Film Studios
  • Multimedia databases
  • WWW the infinite resources
  • Email huge digital libraries


4
Data vs. Knowledge
  • Technology is available to help us collect data
  • Bar code, cameras, scanners, Radars, satellites,
    etc.
  • Technology is available to help us store data
  • Databases, data warehouses, variety of
    repositories
  • We are swamped by data that pours on us
  • We need to interpret this data in search for new
    knowledge

We are drowning in information, but starving for
knowledge. John Naisbitt
Our need is to extract interesting knowledge
(rules, regularities, patterns, constraints) from
data in large collections.
5
Evolution of Database Technology
  • 1960s
  • Data collection, database creation (hierarchical
    and network models)
  • 1970s
  • Relational data model, relational DBMS
    implementation
  • 1980s
  • Ubiquitous RDBMS, advanced data models
    (extended-relational, Object-Oriented, deductive,
    etc.) and application-oriented DBMS (spatial,
    scientific, engineering, etc.)
  • 1990s
  • Data mining and data warehousing, multimedia
    databases, and Web-based database technology

6
Knowledge Discovery
7
Data Mining
  • In theory, data mining is a step in the knowledge
    discovery process. It is the extraction of
    implicit information from a large dataset.
  • In practice, data mining and knowledge discovery
    are becoming synonyms.
  • KDD Knowledge Discovery and Data Mining

Notice the misnomer for data mining. Shouldnt
it be knowledge mining?
8
Steps of a KDD Process
  • Learning the application domain
  • relevant prior knowledge and goals of application
  • Gathering and integrating of data
  • Cleaning and preprocessing data (may take 60 of
    effort!)
  • Reducing and projecting data
  • Find useful features, dimensionality/variable
    reduction,
  • Choosing mining functions and algorithms
  • summarization, classification, regression,
    association,
  • Data mining search for patterns of interest
  • Evaluating results
  • Interpretation analysis of results
  • visualization, alteration, removing redundant
    patterns,
  • Use of discovered knowledge

9
Data Mining On What Kind of Data?
  • Flat Files
  • Generic Data
  • Relational Object-Relational Databases
  • Object-Oriented Databases
  • Multimedia Data
  • Text Databases
  • Audio, Image, and Video Databases
  • Business Data
  • Transactional Databases
  • Engineering Data
  • Spatial databases
  • Temporal and Time-series databases
  • WWW Data

10
Data Mining Examples
  • Data mining is primarily used today by companies
    with a strong consumer focus - retail, financial,
    communication, and marketing organizations.
  • It enables these companies to determine
    relationships among "internal" factors such as
    price, product positioning, or staff skills, and
    "external" factors such as economic indicators,
    competition, and customer demographics.
  • And, it enables them to determine the impact on
    sales, customer satisfaction, and corporate
    profits.
  • Finally, it enables them to "drill down" into
    summary information to view detail transactional
    data.

11
Data Mining Examples
  • With data mining, a retailer could use
    point-of-sale records of customer purchases to
    send targeted promotions based on an individual's
    purchase history.
  • By mining demographic data from comment or
    warranty cards, the retailer could develop
    products and promotions to appeal to specific
    customer segments.
  • Blockbuster Entertainment mines its video rental
    history database to recommend rentals to
    individual customers.
  • American Express can suggest products to its
    cardholders based on analysis of their monthly
    expenditures.

12
Data Mining Examples
  • WalMart is pioneering massive data mining to
    transform its supplier relationships.
  • WalMart captures point-of-sale transactions from
    over 2,900 stores in 6 countries and continuously
    transmits this data to its massive 7.5 terabyte
    Teradata data warehouse.
  • WalMart allows more than 3,500 suppliers, to
    access data on their products and perform data
    analyses.
  • These suppliers use this data to identify
    customer buying patterns at the store display
    level.
  • They use this information to manage local store
    inventory and identify new merchandising
    opportunities.

13
Business Data Mining Examples
  • The NBA is exploring a data mining application
    that can be used in conjunction with image
    recordings of basketball games.
  • The Advanced Scout software analyzes the
    movements of players to help coaches orchestrate
    plays and strategies.
  • For example, an analysis of the play-by-play
    sheet of the game played between the New York
    Knicks and the Cleveland Cavaliers on January 6,
    1995 reveals that when Mark Price played the
    Guard position, John Williams attempted four jump
    shots and made each one!
  • A coach can automatically bring up the video
    clips showing each of the jump shots attempted by
    Williams with Price on the floor, without needing
    to comb through hours of video footage.
  • Those clips show a very successful pick-and-roll
    play in which Price draws the Knick's defense and
    then finds Williams for an open jump shot.

14
Data Mining Functions and Algorithms
  • Association Rules
  • Data can be mined to identify associations.
  • The butter-gtbread example is an example of
    associative mining.
  • To find rules like inside(x, city) à near(x,
    highway).
  • Classification and Prediction
  • Classify data based on the values in a
    classifying attribute, e.g.,
  • classify countries based on climate
  • classify cars based on gas mileage
  • Stored data is used to locate data in
    predetermined groups.
  • A restaurant chain could mine customer purchase
    data to determine when customers visit and what
    they typically order. This information could be
    used to increase traffic by having daily specials.

15
Data Mining Functions and Algorithms
  • Clustering
  • Data items are grouped according to logical
    relationships or consumer preferences.
  • Data can be mined to identify market segments or
    consumer affinities.
  • To cluster houses to find distribution patterns.
  • Sequential patterns
  • Data is mined to anticipate behavior patterns and
    trends.
  • An outdoor equipment retailer could predict the
    likelihood of a backpack being purchased based on
    a consumer's purchase of sleeping bags and hiking
    shoes.
  • To find and characterize similar sequences and
    deviation data, e.g., stock analysis.
  • To find segment-wise or total cycles or periodic
    behaviors in time-related data.

16
Data Mining Linear Classification
  • A simple linear classification boundary for the
    loan data set shaded region denotes class no
    loan

17
Data Mining - Confluence of Multiple Disciplines
Write a Comment
User Comments (0)
About PowerShow.com