Data Mining Basics presentation

About This Presentation

Transcript and Presenter's Notes

Title: Data Mining Basics

1
Data Mining Basics
Database Modeling and Design
Chapter 8 (Part D)

Instructor Paul Chen

2
Topics

How Data Mining Evolved?
Decision Processing Overview and Tasks
Data Mining, Whats it?
Data Mining vs. Data Warehousing
How Data Mining Works? And Its Applications
Data Mining Operations and Associated Techniques
The Data Mining Process
Data Mining Tools
Data Mining Applications For CRM
Data Mining From Government Printing Office
Data Mining Techniques- A Summary

3
Topic 1How Data Mining Evolved?

Many businesses have invested heavily in
information technology to help them manage their
businesses more effectively and gain a
competitive edge. Increasingly large amounts of
critical business data are being stored
electronically and this volume is expected to
continue to grow. The Data Mining technology is
helping companies leverage their existing data
more effectively and obtain insightful
information giving them a competitive edge.

4
How Data Mining Evolved?
1960s Data Collection
1970s-80s RDBMS
1990s OLAP and DW
Late 1990s to Now Data Mining
Time Line
5
Topic 2 Decision Processing Overview

Decision processing systems, and their underlying
analytical applications, provide business users
with the information they need to track and
analyze business trends, and to explore new
business opportunities. As businesses become
increasingly competitive and complex, effective
decision processing systems are essential for
success.

6
The Next Generation of Business Intelligence

A decision processing system analyzes business
information captured from operational systems
(Back-and-front office, and e-business
applications).
Distribution of business information to business
users is via corporate intranets and extranets.
The flow of data can be thought of as an
information supply chain whose objective is to
convert operational data into useful business
information.

7
The Decision Processing Information Supply Chain
Business Metrics
Operational Systems
External Data
Analytic Applications
E-Business Applications
DW
Collaborative Office Systems
Back-Office Transaction Applications
Business Intelligence Tools
Information Staging Area
Front-Office Applications
Business Decisions
8
Decision ProcessingFour Tasks

Extracting and transforming information

This involves capturing data from operational
systems, transforming it into business
information, and loading Into a data warehouse
information store. Current extract templates on
the market are primarily at Capturing data from
ERP (Enterprise Resource Planning) Transaction
processing systems for example SAP
Business Information Warehouse and Peoplesoft BPM
data warehouse)
Mentioned in chapter 2
9
Decision ProcessingFour Tasks (Contd)

Managing information
This task encompasses the maintenance of business
information in information stores, and how these
information stores are processed by business
intelligence tools and analytic applications.
The cornerstone of decision processing is data
warehousing, and warehouse information stores
should be organized and modeled into relational
and multidimensional database products.

10
Decision ProcessingFour Tasks (Contd)

Analyzing and modeling information

The traditional approach to decision processing
is to build a data warehouse and supply business
users with a set of business intelligence tools
(query, reporting, OLAP and data mining, for
example) to process information in data warehouse
information stores. A better approach is employ
turn-key and web-based analytic application
packages that are designed to provide
comprehensive analyses for the business area
being researched. Key business metrics (ex.
Revenue dollars per sales rep per day) are
useful.
11
Decision ProcessingFour Tasks (Contd)

Distributing information

Business intelligence tools and analytic
applications distribute information and the
results of analysis operations to business users
via standard graphical and Web interfaces. To
help users uncover and organize this range of
business information, an enterprise information
portal (EIP) is required. An EIP provides a
single point of entry to any piece of business
information, no matter where it resides. The
main components of an EIP are information
assistant (Web browser interface) , an
information directory and a subscription
facility.
12
Decision Making Under Risk

Decisions are made under three sets of
conditions
Certainty
The decision makers know everything in advance of
making the decision
Uncertainty
The decision makers know nothing about the
probabilities or the consequences of decisions
Risk

13
Decision-Making Style

Decision-making styles of users are categorized
as either
Analytic or
Heuristic

14
Analytic and Heuristic Decision Making

Analytical Decision Maker
Learns by analyzing
Uses step-by-step procedure
Values quantitative information and models
Builds mathematical models and algorithms
Seeks optimal solution

Heuristic Decision Maker
Learns by acting
Uses trial and error
Values experiences
Relies on common sense
Seeks completely satisfying solution

15
Topic 3 Data Mining, Whats it?

Data Mining has been defined as a decision
support process in which a search is made for
patterns of information in data. To detect
patterns in data, Data Mining uses sophisticated
statistical analysis and modeling technologies to
uncover useful relationships hidden in databases.
It predicts future trends and finds behavior
allowing businesses to make predictive,
knowledge-driven decisions.

16
Data Mining, Whats it?

The process of extracting valid, previously
unknown, comprehensible, and actionable
information from large databases and using it to
make crucial business decisions, (Simoudis,1996).
Involves analysis of data and use of software
techniques for finding hidden and unexpected
patterns and relationships in sets of data.

17
Data Mining, Whats it?

Reveals information that is hidden and
unexpected, as little value in finding patterns
and relationships that are already intuitive.
Patterns and relationships are identified by
examining the underlying rules and features in
the data.
Tends to work from the data up and most accurate
results normally require large volumes of data to
deliver reliable conclusions.

18
Data Mining, Whats it?

Starts by developing an optimal representation of
structure of sample data, during which time
knowledge is acquired and extended to larger sets
of data.
Data mining can provide huge paybacks for
companies who have made a significant investment
in data warehousing.
Relatively new technology, however already used
in a number of industries.

19
Topic 4 Data Mining vs. Data Warehousing

Data Mining does not require that a Data
Warehouse be built. Often, data can be downloaded
from the operational files to flat files that
contain the data ready for the data mining
analysis.
Data Mining can be implemented rapidly on
existing software and hardware platforms. Data
Mining tools can analyze massive databases to
deliver answers to questions such as, Which
customers are most likely to respond to my next
promotional mailing, and why?

20
Data Mining vs. Data Warehousing

Major challenge to exploit data mining is
identifying suitable data to mine.
Data mining requires single, separate, clean,
integrated, and self-consistent source of data.
A data warehouse is well equipped for providing
data for mining.
Data quality and consistency is a pre-requisite
for mining to ensure the accuracy of the
predictive models. Data warehouses are populated
with clean, consistent data.

21
Data Mining vs. Data Warehousing

Advantageous to mine data from multiple sources
to discover as many interrelationships as
possible. Data warehouses contain data from a
number of sources.
Selecting relevant subsets of records and fields
for data mining requires query capabilities of
the data warehouse.
Results of a data mining study are useful if
there is some way to further investigate the
uncovered patterns. Data warehouses provide
capability to go back to the data source.

22
Topic 5 How Data Mining Works?

How exactly is Data Mining able to tell you
important things that you didnt know or what is
going to happen next? The technique in Data
Mining is called Predictive Modeling which is
knowledge discovery process via relationships and
patterns in broad sense.
Modeling is the act of building a model in one
situation where you know the answer and then
applying it to another situation that you dont.

23
Examples of Applications of Data Mining via
relationships and patterns

Retail / Marketing
Identifying buying patterns of customers
Finding associations among customer demographic
characteristics
Predicting response to mailing campaigns
Market basket analysis

24
Examples of Applications of Data Mining via
relationships and patterns

Banking
Detecting patterns of fraudulent credit card use
Identifying loyal customers
Predicting customers likely to change their
credit card affiliation
Determining credit card spending by customer
groups

25
Examples of Applications of Data Mining via
relationships and patterns

Insurance
Claims analysis
Predicting which customers will buy new policies.
Medicine
Characterizing patient behaviour to predict
surgery visits
Identifying successful medical therapies for
different illnesses.

26
Examples of Applications of Data Mining via
relationships and patterns

Customer profiling characteristics of good
customers are identified with the goals of
predicting who will become one and helping
marketers target new prospects.
Targeting specific marketing promotions to
existing and potential customers offers similar
benefits.
Market-basket analysis With Data Mining,
companies can determine which products to stock
in which stores, and even how to place them
within a store.

27
Examples of Applications of Data Mining via
relationships and patterns

Customer Relationships Management-Determines
characteristics of customers who are likely to
leave for a competitor, a company can take action
to retain that customer because doing so is
usually for less expensive than acquiring a new
customer.
Fraud detection- With Data Mining, companies can
identify potentially fraudulent transactions
before they happen.

28
Topic 6 Data Mining Operations and Associated
Techniques
In previous foils, predictive modeling in essence
includes other operations shown in the above
table.
29
Descriptive The dealer sold 200 cars last month.
Operational
(OLTP)
Explanatory For every increase in 1 in the
interest, auto sales decrease by 5 .
Traditional DW
OLAP
Predictive predictions about future buyer
behavior.
Data Mining
30
Level of Modeling vs. Level of Analytical
Processing
Explanatory WHAT IF PROCESSING ANALYZE
WHAT HAS PREVIOUSLY OCCURRED TO BRING ABOUT
THE CURRENT STATE OF THE DATA
Predictive
Descriptive SIMPLE QUERIES REPORTS
DETERMINE IF ANY PATTERNS EXIST BY
REVIEWING DATA RELATIONSHIPS
Normalized Tables
Statistical Analysis/ Artificial Intelligence

Denormalized Tables
Classification Value Prediction
Roll-up Drill Down
31
Predictive Modelling

Similar to the human learning experience
uses observations to form a model of the
important characteristics of some phenomenon.
Uses generalizations of real world and ability
to fit new data into a general framework.
Can analyze a database to determine essential
characteristics (model) about the data set.

32
Predictive Modelling

Model is developed using a supervised learning
approach, which has two phases training and
testing.
Training builds a model using a large sample of
historical data called a training set.
Testing involves trying out the model on new,
previously unseen data to determine its accuracy
and physical performance characteristics.

33
Predictive Modelling

Applications of predictive modelling include
customer retention management, credit approval,
cross selling, and direct marketing.
Two techniques associated with predictive
modelling A. classification
B. value prediction, distinguished by nature
of the
variable being predicted.

34
Statistical Analysis of Actual Sales (dollars and
quantities) relative To these Signage Variables-a
predictive modeling example.

Content
Frequency
Depth
Focus
Depth
Scale
Length
Location
Statistical Analysis Correlation, Regression,
Experiment Design,
Optimization. Now it goes into real time
analysis.

35
Signage
36
Signage
37
PREDICTIVE MODELING

There are two techniques associated with
predictive modeling classification and value
prediction, which are distinguished by the nature
of the variable being predicted.

38
Predictive Modelling - Classification

Used to establish a specific predetermined class
for each record in a database from a finite set
of possible, class values.
Two specializations of classification tree
induction and neural induction.

39
Example of Classification using Tree Induction
40
Example of Classification using Tree Induction
Customer renting property gt 2 years
No
Yes
Rent property
Customer agegt45
No
Yes
Rent property
Buy property
41
Example of Classification using Neural Induction
42
Example of Classification using Neural Induction

Each processing unit (circle) in one layer is
connected to each processing unit in the next
layer by a weighted value, expressing the
strength of the relationship. The network
attempts to mirror the way the human brain works
in recognizing patterns by arithmetically
combining all the variables with a given data
point.
In this way, it is possible to develop nonlinear
predictive models that learn by studying
combinations of variables and how different
combinations of variables affect different data
sets.

43
Predictive Modelling - Value Prediction

Used to estimate a continuous numeric value that
is associated with a database record.
Uses the traditional statistical techniques of
linear regression and non-linear regression.
Relatively easy-to-use and understand.

44
Predictive Modelling - Value Prediction

Linear regression attempts to fit a straight line
through a plot of the data, such that the line is
the best representation of the average of all
observations at that point in the plot.
Problem is that the technique only works well
with linear data and is sensitive to the presence
of outliers (i.e.., data values, which do not
conform to the expected norm).

45
Predictive Modelling - Value Prediction

Although non-linear regression avoids the main
problems of linear regression, still not flexible
enough to handle all possible shapes of the data
plot.
Statistical measurements are fine for building
linear models that describe predictable data
points, however, most data is not linear in
nature.

46
Predictive Modelling - Value Prediction

Data mining requires statistical methods that can
accommodate non-linearity, outliers, and
non-numeric data.
Applications of value prediction include credit
card fraud detection or target mailing list
identification.

47
Database Segmentation

Aim is to partition a database into an unknown
number of segments, or clusters, of similar
records.
Uses unsupervised learning to discover
homogeneous sub-populations in a database to
improve the accuracy of the profiles.

48
Database Segmentation

Less precise than other operations thus less
sensitive to redundant and irrelevant features.
Sensitivity can be reduced by ignoring a subset
of the attributes that describe each instance or
by assigning a weighting factor to each variable.
Applications of database segmentation include
customer profiling, direct marketing, and cross
selling.

49
Example of Database Segmentation using a Scatter
plot
50
Database Segmentation

Associated with demographic or neural clustering
techniques, distinguished by
Allowable data inputs
Methods used to calculate the distance between
records
Presentation of the resulting segments for
analysis.

51
Example of Database Segmentation using a
Visualization
52
Link Analysis

Aims to establish links (associations) between
records, or sets of records, in a database.
There are three specializations
Associations discovery
Sequential pattern discovery
Similar time sequence discovery
Applications include product affinity analysis,
direct marketing, and stock price movement.

53
Link Analysis - Associations Discovery

Finds items that imply the presence of other
items in the same event.
Affinities between items are represented by
association rules.
e.g. When customer rents property for more than
2 years and is more than 25 years old, in 40 of
cases, customer will buy a property. Association
happens in 35 of all customers who rent
properties.

54
Link Analysis - Sequential Pattern Discovery

Finds patterns between events such that the
presence of one set of items is followed by
another set of items in a database of events over
a period of time.
e.g. Used to understand long term customer buying
behaviour.

55
Link Analysis - Similar Time Sequence Discovery

Finds links between two sets of data that are
time-dependent, and is based on the degree of
similarity between the patterns that both time
series demonstrate.
e.g. Within three months of buying property, new
home owners will purchase goods such as cookers,
freezers, and washing machines.

56
Deviation Detection

Relatively new operation in terms of commercially
available data mining tools.
Often a source of true discovery because it
identifies outliers, which express deviation from
some previously known expectation and norm.

57
Deviation Detection

Can be performed using statistics and
visualization techniques or as a by-product of
data mining.
Applications include fraud detection in the use
of credit cards and insurance claims, quality
control, and defects tracing.

58
A Summary Data-Driven Techniques

Data Visualization
Decision Trees
Clustering
Factor Analysis
Neural Network
Association Rules
Rule Induction
Based on Sakhr Younesss book Professional
Data Warehousing with SQL Server 7.0 and OLAP
Services

59
Data Visualization
A pie chart showing the sales of a product by
region is Sometimes much more effective than
presenting the same Data in a text or tabular
form.
9
11
Northeast
South
North
39
21
West
20
East
60
Decision Tree
61
Cluster Analysis
First segment (high incomegt8,000)
Have Children
Second Segment (8000gtmiddle income gt3000)
Married
Last car is A used one
Third Segment (low income lt 3000)
Own car
62
Factor Analysis

Unlike cluster analysis, factor analysis builds a
model from data. The technique finds underlying
factors, also called latent variables and
provides models for these factors based on
variables in the data. For ex., a software
company is considering a survey to find out the
nine most perceived attributes of one of their
products. They might categorize these products to
categories such as service for technical support,
availability for training and a help system.
Factor analysis is used for grouping together
products based on a similarity of buying patterns
so that vendors may bundle several products as
one to sell them together at a lower price than
their added individual prices..

63
Neural Networks
64
Association Rules

Association models are models that examine the
extent to which values of one field depend on, or
are produced by, values of another field. These
models are often referred to as Market Basket
Analysis when they are applied to retail
industries to study the buying patterns of these
customers, especially in grocery and retail
stores that issue their own credit cards.
Charging against these cards gives the store the
chance to associate the purchases of customers
with their identities, which allows them to study
associations among other things.

65
Rules Induction

This is a powerful technique that involves a
large number of rules using a set of if..then
statements in the pursuit of all possible
patterns in the dataset. For ex., if the customer
is a male then, if he is between 30 and 40 years
of ages, and his income is less than 50,000 and
more than 20,000, he is likely to be driving a
car that was bought as new.

66
A Summary Theory-Driven Techniques

Correlations
T-Tests
Analysis of Variables
Linear Regression
Logistic Regression
Discriminate Analysis
Forecasting Methods

67
Topic 7 The Data Mining Process

Define the problem.
Select the data.
Prepare the data.
Mine the data.
Deploy the model.
Take business action.
Are you ready for Data Mining?

68
Define the problem

A successful data mining initiative always starts
with
a well-defined project. To insure that the
project produces incremental value, include an
assessment of the status quo
solution and a review of technology,
organization, and business processes.

69
Select the data

This step involves defining your data source .
(not every
data source and record is required.) The data
is usually extracted from the source system to a
separate server.

70
Prepare the data

This step represents up to 80 percent of the
total project effort. For data mining, the data
must reside in one flat table (each record has
many columns). In addition to being the most time
consuming, the step is also the most critical.
The resulting models are only as good as the data
used to create them.

71
Mine the data

Typically the easiest and shortest phase, this
step involves applying statistical and AI tools
to create mathematical models. Data mining
typically occurs on a server separate from the
data warehousing and other corporate systems.

72
Deploy the Model

Model deployment is the process of implementing
the mathematical models into operational systems
to improve business results.

73
Take Business Action

Use the deployed model to achieve improved
results to the business problem identified at the
beginning of the process.

74
Step to Implement Data Mining
Discovery (patterns, relations Associations, etc.)
Prior Knowledge
Information Model
Validation
Deployment
75
ARE YOU READY FOR DATA MINING?

Just because you have a data warehouse doesnt
mean
youre necessarily ready for data mining. Much of
the
work our company does in the data mining arena
has
more to do with data mining readiness assessment
than
with actually performing data mining.

76
Metrics you can use to gauge your data mining
readiness

Do you have a staff of experienced knowledge
workers?
Do you have the data?
Do you have marketing processes in place that can
use this data?
Do you have a business champion who can embrace
the process and results?
Do you have the technology infrastructure to
support advanced analysis?

77
Topic 8 Data Mining Tools

Data mining tools are typically classified by the
type of
algorithm they use to identify hidden patterns.
There are
many different algorithms in use, but the four
most
popular are association, sequence, clustering (or
segmentation), and predictive modeling.

78
Data Mining Tools

There are a growing number of commercial data
mining tools on the marketplace.
Important characteristics of data mining tools
include
Data preparation facilities
Selection of data mining operations
Product scalability and performance
Facilities for visualization of results.

79
Data Mining vs. OLAP

They are two separate breeds of analysis with
entirely different objectives, not to mention
tools, skill sets, and implementation methods.

80
Data Mining

With canned reports, ad hoc querying, and
OLAP, the end user defines a hypothesis and
determines which data to examine. With data
mining, the tool identifies the hypothesis, and
it
actually tells the user where in the data to
start
the exploration process.

81
Data Mining

Rather than using SQL to filter out values and
methodically
reduce the data into a concise answer set, data
mining uses
algorithms that exhaustively review the
relationships among
data elements to determine if any patterns exist.
The whole
purpose of data mining is to yield new business
information
that a business person can act on.

82
OLAP vs. Data Mining Tools
OLAP Tools
Data Mining Tools

Are ad hoc, shrink wrapped tools that provide an
interface to data
Are used when you have specific known questions
Looks and feels like a spreadsheet that allow
rotation, slicing and graphic
Can be deployed to large number of users

Methods for analyzing multiple data types
-- Regression Trees
-- Neural networks
-- Genetic algorithms
Are used when you dont know what the questions
are
Usually textual in nature
Usually deployed to a small number of analysts

83
Data Mining Tools

ASSOCIATION
Association, also frequently referred to as
"affinity analysis," reviews numerous sets of
items and looks for common groupings. An example
of association is market basket analysis, which
involves reviewing the products that consumers
purchase in a single trip to the grocery store.

84
ASSOCIATION

Finds items that imply the presence of other
items in the same event.
Affinities between items are represented by
association rules.
e.g. When a customer rents property for more
than 2 years and is more than 25 years old, in
40 of cases, the customer will buy a property.
This association happens in 35 of all customers
who rent properties.

85
Data Mining Tools

SEQUENCE
Sequential analysis helps data miners
identify a set of order-specific items or events.
Association identifies the existence of patterns
or groups of items sequential
analysis identifies the order of those
patterns or groups of items.

86
SEQUENCE

Finds patterns between events such that the
presence of one set of items is followed by
another set of items in a database of events over
a period of time.
e.g. Used to understand long term customer
buying behavior.

87
Link Analysis - Similar Time Sequence Discovery

Finds links between two sets of data that are
time-dependent, and is based on the degree of
similarity between the patterns that both time
series demonstrate.
e.g. Within three months of buying property,
new home owners will purchase goods such as
cookers, freezers, and washing machines.

88
Data Mining Tools

CLUSTERING
Cluster analysis lets the data miner assemble
data into unforeseen groups containing similar
characteristics. Also known as "segmentation,"
this type of data
mining is probably the most widely used.

89
CLUSTERING

Aim is to partition a database into an unknown
number of segments, or clusters, of similar
records.
Uses unsupervised learning to discover
homogeneous sub-populations in a database to
improve the accuracy of the profiles.

90
Data Mining Tools

PREDICTIVE MODELING
As the name implies, predictive modeling
involves developing a model from historical data
for predicting a future event. The power of
predictive modeling engines is that they can use
a broad range of data attributes to identify
future behavior. Both cluster analysis and
predictive modeling tools identify distinct
groups of items with common attributes the
difference is that predictive modeling focuses on
the likelihood of a particular outcome for a
particular group.

91
Topic 9 Data Mining Applications for CRM

Which customers are most profitable to me? Why?
What promotions are most effective? For which
customers?
What kind of customers will be interested in my
new product?
What customers are at risk to defect to my
competitor?
How do I identify prospects with the greatest
profit potentials?
Customer information is rapidly becoming a
companys most
important asset to answer these questions.
However, to answer these
Questions in broad generalities is not enough.
Each customer must be
Analyzed and potentially treated uniquely.
Customer relationship
management provides the framework for analyzing
customer
Profitability and improving marketing
effectiveness.

92
Customer Relationship Management -Framework

Many organizations have collected and stored a
wealth of data about
their Customers, suppliers, and business
partners. However, the
inability to Discover valuable information hidden
in the data prevents
these organizations From transforming this data
into knowledge. The
business desire is, therefore, to Extract valid,
previously unknown,
and comprehensible information from large
Databases and use it for
profits. To fulfill these goals, organizations
need to follow these steps
- Capture and integrate both the internal and
external data into a
comprehensive view that encompasses the whole
organization.
- Mine the integrated data for information.
- Organize and present the information with
knowledge for decision-making.

93
Customer Relationship Management -Framework

From the architecture point of view, the entire
CRM framework can
Be classified into three key components
Operational CRM The automation of horizontally
integrated business processes, including customer
touch-points, channels, and front-back office
integration.
Analytical CRM- The analysis of data created by
the Operational CRM
Collaborative CRM- Applications of Collaborative
services including e-mail, personalized
publishing, e-communities, and similar vehicles
designed to facilitate interactions between
customers and organizations.

94
CRM Architecture
Business Rules and Metadata Management
Market Data Store
Decision Support Applications
Data Sources
Communication Channels
Contact History
Direct Mails
Campaign Mgt
Campaign Mgt
Call Center Call Center
Contact Mgt
Transaction History
ETL Tools
Customer Service Center
Analytics Data Mart
Data Mining Analytics
Marketing Data Marts
Internet

E-mail
Reporting Data Mart
Reporting Data Mart
Other
External Data
Workflow Management
Workflow Management
95
CRM -The Business Perspective

Tools and technologies will be applied to these
real CRM business problems.
They are
Customer Profitability provides a blueprint for
how to define and use customer profitability as
the bedrock for your CRM processes.
Customer Acquisition shows how to use data
mining to acquire new customers in the most
profitable way possible.
Customer Cross-selling details how the
technology architecture can be used to increase
the value of existing customers by applying more
to them.
Customer Retention uses a case study from the
telecommunications industry to show how to
execute successful CRM systems to retain your
profitable customers.
Customer Segmentation provides the business
methodology of how to segment and manage your
customers in a consistent and repeatable way
across the enterprise.

96
Information Mining and Knowledge Discovery for
Effective CRM

In the current and emerging competitive and
highly dynamic business
Environment, only the most competitive companies
will achieve
sustained market success. In order to capitalize
on business
Opportunities, these organization will
distinguish themselves by the
Capacity to leverage information about their
marketplace, customers,
And operations. A central part of this strategy
for long-term
Sustaining success will be an active information
repository- an
Advanced data warehouse, in which information
from various
Applications or parts of the business is
coalesced and understood.

97
Information Mining

The shortest path from complex data to knowledge
discovery is
Information mining instead of data mining to
reflect the rich variety
Of forms that information required for business
intelligence can take.
Information mining implies using powerful and
sophisticated tools to
Do the following

Uncover associations, patterns, and trends
Detect deviations
Group and classify information
Develop predictive models

98
Information Mining

From a technical perspective, the real keys to
successful information
Mining are its algorithms complex mathematical
processes that
Compare and correlate data. Algorithms enable an
information
mining application to determine who the best
customers for the
Business are or what they like to buy. They can
also determine at
what time of day, in what combinations, or how an
organization can
Optimize inventory, pricing, and merchandising in
order to retain
These customers and cause them to buy more, at
increased profit
Margins. A large volume of information is stored
in anon-numeric
Forms documents, images and video files.

99
Text Mining and Knowledge Management

Text Mining is a subset of information mining
technology that, in
turn, is a Component of a more general category
of Knowledge
Management (KM) Knowledge, in this case, refers
to the collective
expertise, experiences, know-How, and wisdom of
an organization. In
a business world, knowledge is Represented not
only by the
structured data found in traditional database,
But in a wide variety of
unstructured sources such as word documents,
Memos and letters, e-
mail messages, news feeds, Web pages, and so
forth.

100
Text Mining and Knowledge Management

Unlike data mining, text mining works with
information stored in an
Unstructured collection of text documents.
Specifically, online text
Mining refers to the process of searching
through unstructured data
On the internet and deriving some meaning from
it. Text mining goes
beyond applying statistical models to data files
in fact, text mining
Uncovers relationships in a text collection, and
leverages the
creativity of the knowledge work to explore
these relationships and
Discover new knowledge.

101
Text Mining Technologies

There are two key key technologies that make
online text mining
possible
Internet Searching - It has been around for a
quite few years. Yahoo, Alta Vista, and Excite
are three of the earliest. Search engines (and
discovery services) operate by indexing the
context in a particular Web site and allows users
to search the indexes. Although useful, first
generations of these tools often were wrong
because they did nit correctly index the content
they retrieved. Advances in text mining applied
to the internet searching resulted in online text
mining, representing the new generation of
Internet search tools. With these products, users
can gain more relevant information by processing
smaller amount of links, pages and indexes.

102
Text Mining Technologies

Text Analysis - It has been around longer than
Internet searching. Indeed, scientists have been
trying to make computers understand natural
languages for decades text analysis is an
integral part of these efforts. The automatic
analysis of text information can be used for
several different general purposes
1. To provide an overview of the contents of
a large document collection, for ex., finding
significant clusters of documents in a customer
feedback collection could indicate where a
companys products and services need improvement.
2. To identify hidden structures between
groups of objects this may help to organize an
intranet site so that related documents are all
connected by hyperlinks.

103
Text Mining Technologies

3. To increase the efficiency and
effectiveness of a search process to find similar
or related information for ex., to search
articles from a news service and discover all
unique documents that contain hints on possible
trends or technologies that have so far not been
mentioned in their articles.
4. To detect duplicate documents in an
article.

104
Text Mining Technologies-Applications

1. E-mail management. A popular use of text
analysis is for messae routing in which the
computer reads the message to decide who should
deal with it. (Spam control is another good
example)
2. Document Management. By mining the
different documents for meaning as they are put
into a document repository, a company can
establish a detailed index that allows the
location of relevant documents at any time.
3. Automated help desk. Some companies use
text mining to respond to customer inquiries.
Customers letters and e-mails are processed by a
text mining applications.
4. Market research. A market researcher can
use online text mining to gather statistics on
the occurrences of certain words,c phases,
concepts, or themes on the World Wide Web. This
information can be useful for establishing market
demographics and demand curves.
5. Business intelligence gathering. This is
the most advanced use of text mining. (See next
slide)

105
Blogger

Blogger is one of the most popular online
blogging tool, works with
any browser, and is free, well designed and easy
to use. Millions of
people are changing their information acquisition
habits, and the web
Log, or blog has become a popular source.
Title-Publishing a blog with blogger/by Elizabeth
Castro, Berkeley, Calif, Peachpit, 2005
Title- Blog Understaning the information thats
changing your world/ Hugh Howitt, Nashiville,
Tenn, Nelson Books, c2005
Webblogs (isbn 0321321235)

106
CRM in the eBusiness World

As e-business continues to mature and affect
radical changes throughout all
Aspects of the businesses, the focus of new
e-business-enabled application
Software will shift away from narrowly defined
commerce platforms toward
A broader vision of managing customer
relationships.
A new model that Forrester Research calls
eRelationship Management (eRM)
Is defined as follows
A Web-centric approach to synchronizing customer
relationships across
Communication channels, business functions, and
audiences

107
CRM in the eBusiness World

To implement this new e-business CRM model,
companies should do the
Following
Create a dynamic customer context that can
address every customer interaction that is
different from a view of the customer constructed
from data contained in the applications. This can
be achieved by collecting and organizing customer
data, calculating high-level matrices for each
customer (I.e., customer profitability,
satisfaction, and churn potential), and
assembling and delivering dynamic context to
customer touch points.
Generate consistent, custom responses by
delivering a consolidated rules engine for
routing, workflow, personalization, smart
navigation, and consistent treatment of customers
Build and maintain a Content Directory to point
to company, products, and business partner
content and give to employees, business
partners, and customers.

108
Topic 10 Data Mining From US Government Printing
Office

Washington, March 25, 2003. Subcommittee on
Technology, Information Policy, Intergovernmental
Relations and the Census Oversight hearing on
Data Mining Current Applications and the Future
Possibilities-Available via www.gpo.gov/congress/
house or www.house.gov/reform.
Background The hearing will explore instances
where data mining technology is currently
employed, examine the benefits and the pitfalls,
and discuss the potential uses of data mining at
the Federal level of government. A specific focus
on privacy and abuse concerns surrounding this
technology.

109
Data Mining Current Applications and the Future
Possibilities

Data Mining technology has been utilized
successfully for many years in both the private
and public sectors to identify and analyze useful
data that would otherwise be overlooked or
inaccessible.
Government agencies have also used data mining
techniques quite extensively to identify and
eliminate fraud, waste and abuse. States work
with localities by providing them access to their
data sources. This has allowed local and state
enforcement agencies to zero in on tax evaders,
perpetrators of financial crimes or those
conducting any number of fraudulent activities.
At the federal level, the Treasury Department
uses this technology to identify and prosecute
money laundering schemes, the IRS to track down
delinquent taxpayers, and the US Customers to
identify drug trafficking activities at U.S,
boarders.

110
Topic 11 Data Mining Techniques- A Summary

Artificial neural networks Non-linear predictive
models that learn through training and resembles
biological neural networks in structure.
Decision Trees Tree-shaped structures that
represent sets of decisions. These decisions
generate rules for the classification of a
database.
Generic Algorithms Optimization techniques that
use processes such as generic combination,
mutation, and natural selection in a design based
on the concepts of revolution.
Rule induction The extraction of useful if-then
rules from data based on statistical
significance.

111
Data Mining Techniques- A Summary

Predictive modeling
Database Segmentation
Link analysis
Deviation detection

Classification
Value prediction
Demographic clustering
Neural clustering
Association discovery
Sequential pattern discovery
Similar time sequence discovery
Statistics
Visualization

112
Two Types of Data Mining Modeling- Verification
and Discovery

The verification model utilizes a process that
looks in a database to detect trends and patterns
in data that will help answer some specific
questions about the business.
In this mode, the user generates a hypothesis
about the data, issues a query against the data
and examines the results of the query looking for
verification of the hypothesis or the user
decides that the hypothesis is not valid.

113
Verification Model

In this model, very little information is created
in this extraction process either the hypothesis
is verified or it is not.
Common tools used in this mode are queries,
multidimensional analysis and visualization. What
all have in common are that the user is
essentially guiding the exploration of the data
being inspected.

114
Discovery Model

A more popular model is the Discovery Model that
utilizes a process that looks in a database to
discover and/or predict future patterns. The
discovery model is divided into two modes
Descriptive and Predictive.

115
Discovery Model- Descriptive Mode

The Descriptive mode finds hidden patterns
without a predetermined idea or hypothesis about
what the patterns may be. In other words, the
Data Mining software or program takes the
initiative in finding what the interesting
patterns are, without the user thinking of the
relevant questions first. In this mode
information is created about the data with very
little or guidance from the user. The exploration
of the data is done in such a way as to yield as
large a number of useful facts about the data in
the shortest amount of time.

116
Discovery Model- Predictive Mode

In the Predictive mode patterns discovered from
the database are used to predict the future
patterns or trends. Predictive modeling allows
the user to submit records with some unknown
field values, and the system will guess the
unknown values based on previous patterns
discovered from the database.
In comparing the two models, one can state that
Verification can be very inefficient, timely
and costly. Whereas, Discovery modeling can be
very efficient, cost effective, less dependent on
user input and increases modeling accuracy.

Write a Comment

User Comments (0)

About PowerShow.com

Data Mining Basics PowerPoint PPT Presentation