DATA, TEXT, - PowerPoint PPT Presentation

About This Presentation
Title:

DATA, TEXT,

Description:

A process that uses statistical, mathematical, artificial intelligence and ... People who buy swimwear may buy fins, goggles, cap, etc. Uses of Data Mining-3 ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 24
Provided by: Jud4154
Learn more at: http://www.csun.edu
Category:
Tags: data | text | swimwear

less

Transcript and Presenter's Notes

Title: DATA, TEXT,


1
Chapter 7
Pages 304-309, 311, Sections 7.3, 7.5, 7.6
  • DATA, TEXT,
  • AND WEB MINING

2
Data mining
  • A process that uses statistical, mathematical,
    artificial intelligence and machine-learning
    techniques to extract and identify new knowledge
    from large databases
  • Recognizes the untapped value of data in large
    databases
  • You may unexpectedly strike rich in understanding
    relationships among data

3
Example
Task Find the best route to cover the territory
4
Challenge of finding relationships in large
databases
5
Connect equal elevation points to make a contour
map
The dark vertical line shows the best route to
cross the territory without falling off a cliff.
6
Once relationships are discovered, they can be
used for prediction
7
Uses of Data Mining-1
  • Classification
  • Identify attribute of interest (eg. You want to
    classify who is likely to pay late)
  • Examine all other attribute values of customer
    from data warehouse and locate the one that is
    most related to the attribute of interest (eg.
    monthly income level)
  • Mining Algorithm
  • The most common algorithm used for
    Classification is Decision trees
  • Gini Index helps to determine where to find the
    split between two classes (eg. at what income
    level)
  • - used in developing decision trees
  • (see example on page 316)

8
Which product class is the best seller?
Conclusion Clay products with a price below 25!
9
Uses of Data Mining-2
  • Segmentation
  • Partitioning a database into groups in which the
    members of each group share similar
    characteristics
  • Mining Algorithm
  • Clustering The object is to sort cases into
    groups so that the similarities within the group
    are strong among members of the same cluster and
    weak between members of different clusters
  • Eg. Companies with over 100 employees may share
    similar characteristics (eg. revenue size) than
    those with less than 100 employees.
  • Knowledge can help with developing different
    policies when dealing with different type of
    companies

10
Uses of Data Mining-3
  • Association
  • A category of data mining algorithm that
    establishes relationships about items that occur
    together in a given record
  • Eg. You may discover from data that senior
    students take elective courses together in the
    final semester
  • Can be helpful to schedule courses
  • People who buy a suit may also buy dress shirt
  • People who buy swimwear may buy fins, goggles,
    cap, etc.

11
Uses of Data Mining-4
  • Sequence discovery
  • The identification of associations over time.
    Discovering the order in which events occur.
  • The algorithm can examine data and predict what
    event is most likely to occur next.
  • Widely used in studying how visitors navigate a
    Web site. Helps to improve chances of making a
    sale.

12
Uses of Data Mining-5
  • Regression is a statistical technique that is
    used to map data to a prediction value
  • Forecasting estimates future values based on
    patterns within large sets of data
  • Eg. Gasoline prices this month may predict next
    months sales of SUVs

13
Data Mining Concepts and Applications
Data mining applications
  • Marketing
  • Banking
  • Retailing and sales
  • Manufacturing and production
  • Brokerage and securities trading
  • Insurance
  • Computer hardware and software
  • Government and defense
  • Airlines
  • Health care
  • Broadcasting
  • Police
  • Homeland security

14
Text Mining
  • Application of data mining to text files,
    typically freestyle text material
  • Discovers new knowledge that is not obvious
  • Examples
  • Examine all news services, cluster similar
    topics, create a new summary for each topic
  • Find the hidden content of documents,
    including additional useful relationships, eg.
    Lies, deceptions, scams
  • Not same as the search engine on the Web.

15
Text Mining how is it done?
  • It entails the generation of meaningful numerical
    indices/factors from the unstructured text and
    then processing these indices using various data
    mining algorithms
  • Example
  • Extract each word from the document being text
    mined
  • Eliminate commonly used words (the, and, other,
    etc)
  • Combine synonyms and phrases
  • Calculate weights for each term
  • tf factor (term frequency) actual number of
    times a word appears in a document
  • idf factor (inter document frequency) across
    multiple documents
  • High tf factor value of a given term indicates
    that the document topic is probably around the
    meaning of that term!


16
Text Mining - applications
  • Automatic detection of e-mail spam or phishing
    through analysis of the document content
  • Automatic processing of messages or e-mails to
    route a message to the most appropriate party to
    process that message
  • Analysis of warranty claims, help desk
    calls/reports, and so on to identify the most
    common problems and relevant responses

17
Web Mining
  • The discovery and analysis of interesting and
    useful information from the Web

18
Web content mining
  • The extraction of useful information from Web
    pages
  • Eg. Search with the help of keywords in the Meta
    tags of the web page
  • You can analyze the document content of the
    first 10 links of Google in a search response
  • You can generate a summary of the contents
    automatically in a new document!

19
Web structure mining
  • The development of useful information from the
    links included in the Web documents
  • If a web sites pages predominantly link to each
    other, you may consider the site to exist
    independent
  • If a collection of web sites are linked to each
    other heavily, it points to a web community or
    clan that share common interests
  • Example application Web structure mining can
    lead to better understanding of extremist groups

20
Web usage mining
  • The extraction of useful information from the
    data being generated through webpage visits,
    transaction, etc.
  • Clickstream analysis
  • Uses cookies, number of logs, time of log, etc
  • Can help profile users


21
Uses for Web mining
  • Determine the lifetime value of clients
  • Design cross-marketing strategies across products
  • Evaluate promotional campaigns
  • Target electronic ads and coupons at user groups
  • Predict user behavior
  • Present dynamic information to users

22
Data Mining Project Processes
23
Steps for Data Mining
  • Problem definition Decide the measure to study
    and the suitable mining algorithm (see Exercise
    11)
  • Data preparation Design the cube and populate it
    relevant data from the data warehouse
  • Training Run the mining algorithm on a subset of
    the data warehouse data for the system to learn
    to find segments, associations, etc among data
  • Validation Run the learnt model from previous
    step to the remaining subset of data and try to
    predict. Since you have historical data, you
    can verify if the learnt model is any good.
  • Deploy Implement to predict in real environment
    where you do not know the actual results.
Write a Comment
User Comments (0)
About PowerShow.com