Data Mining Basics - PowerPoint PPT Presentation


PPT – Data Mining Basics PowerPoint presentation | free to download - id: 3c782a-MDczY


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Data Mining Basics


Database Modeling and Design Chapter 8 (Part D) Data Mining Basics Instructor: Paul Chen Topics How Data Mining Evolved? Decision Processing Overview and Tasks Data ... – PowerPoint PPT presentation

Number of Views:580
Avg rating:3.0/5.0
Slides: 117
Provided by: circusofl
Learn more at:
Tags: basics | data | mining


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Data Mining Basics

Data Mining Basics
Database Modeling and Design
Chapter 8 (Part D)
  • Instructor Paul Chen

  • How Data Mining Evolved?
  • Decision Processing Overview and Tasks
  • Data Mining, Whats it?
  • Data Mining vs. Data Warehousing
  • How Data Mining Works? And Its Applications
  • Data Mining Operations and Associated Techniques
  • The Data Mining Process
  • Data Mining Tools
  • Data Mining Applications For CRM
  • Data Mining From Government Printing Office
  • Data Mining Techniques- A Summary

Topic 1How Data Mining Evolved?
  • Many businesses have invested heavily in
    information technology to help them manage their
    businesses more effectively and gain a
    competitive edge. Increasingly large amounts of
    critical business data are being stored
    electronically and this volume is expected to
    continue to grow. The Data Mining technology is
    helping companies leverage their existing data
    more effectively and obtain insightful
    information giving them a competitive edge.

How Data Mining Evolved?
1960s Data Collection
1970s-80s RDBMS
1990s OLAP and DW
Late 1990s to Now Data Mining
Time Line
Topic 2 Decision Processing Overview
  • Decision processing systems, and their underlying
    analytical applications, provide business users
    with the information they need to track and
    analyze business trends, and to explore new
    business opportunities. As businesses become
    increasingly competitive and complex, effective
    decision processing systems are essential for

The Next Generation of Business Intelligence
  • A decision processing system analyzes business
    information captured from operational systems
    (Back-and-front office, and e-business
  • Distribution of business information to business
    users is via corporate intranets and extranets.
  • The flow of data can be thought of as an
    information supply chain whose objective is to
    convert operational data into useful business

The Decision Processing Information Supply Chain
Business Metrics
Operational Systems
External Data
Analytic Applications
E-Business Applications
Collaborative Office Systems
Back-Office Transaction Applications
Business Intelligence Tools
Information Staging Area
Front-Office Applications
Business Decisions
Decision ProcessingFour Tasks
  • Extracting and transforming information

This involves capturing data from operational
systems, transforming it into business
information, and loading Into a data warehouse
information store. Current extract templates on
the market are primarily at Capturing data from
ERP (Enterprise Resource Planning) Transaction
processing systems for example SAP
Business Information Warehouse and Peoplesoft BPM
data warehouse)
Mentioned in chapter 2
Decision ProcessingFour Tasks (Contd)
  • Managing information
  • This task encompasses the maintenance of business
    information in information stores, and how these
    information stores are processed by business
    intelligence tools and analytic applications.
  • The cornerstone of decision processing is data
    warehousing, and warehouse information stores
    should be organized and modeled into relational
    and multidimensional database products.

Decision ProcessingFour Tasks (Contd)
  • Analyzing and modeling information

The traditional approach to decision processing
is to build a data warehouse and supply business
users with a set of business intelligence tools
(query, reporting, OLAP and data mining, for
example) to process information in data warehouse
information stores. A better approach is employ
turn-key and web-based analytic application
packages that are designed to provide
comprehensive analyses for the business area
being researched. Key business metrics (ex.
Revenue dollars per sales rep per day) are
Decision ProcessingFour Tasks (Contd)
  • Distributing information

Business intelligence tools and analytic
applications distribute information and the
results of analysis operations to business users
via standard graphical and Web interfaces. To
help users uncover and organize this range of
business information, an enterprise information
portal (EIP) is required. An EIP provides a
single point of entry to any piece of business
information, no matter where it resides. The
main components of an EIP are information
assistant (Web browser interface) , an
information directory and a subscription
Decision Making Under Risk
  • Decisions are made under three sets of
  • Certainty
  • The decision makers know everything in advance of
    making the decision
  • Uncertainty
  • The decision makers know nothing about the
    probabilities or the consequences of decisions
  • Risk

Decision-Making Style
  • Decision-making styles of users are categorized
    as either
  • Analytic or
  • Heuristic

Analytic and Heuristic Decision Making
  • Analytical Decision Maker
  • Learns by analyzing
  • Uses step-by-step procedure
  • Values quantitative information and models
  • Builds mathematical models and algorithms
  • Seeks optimal solution
  • Heuristic Decision Maker
  • Learns by acting
  • Uses trial and error
  • Values experiences
  • Relies on common sense
  • Seeks completely satisfying solution

Topic 3 Data Mining, Whats it?
  • Data Mining has been defined as a decision
    support process in which a search is made for
    patterns of information in data. To detect
    patterns in data, Data Mining uses sophisticated
    statistical analysis and modeling technologies to
    uncover useful relationships hidden in databases.
    It predicts future trends and finds behavior
    allowing businesses to make predictive,
    knowledge-driven decisions.

Data Mining, Whats it?
  • The process of extracting valid, previously
    unknown, comprehensible, and actionable
    information from large databases and using it to
    make crucial business decisions, (Simoudis,1996).
  • Involves analysis of data and use of software
    techniques for finding hidden and unexpected
    patterns and relationships in sets of data.

Data Mining, Whats it?
  • Reveals information that is hidden and
    unexpected, as little value in finding patterns
    and relationships that are already intuitive.
  • Patterns and relationships are identified by
    examining the underlying rules and features in
    the data.
  • Tends to work from the data up and most accurate
    results normally require large volumes of data to
    deliver reliable conclusions.

Data Mining, Whats it?
  • Starts by developing an optimal representation of
    structure of sample data, during which time
    knowledge is acquired and extended to larger sets
    of data.
  • Data mining can provide huge paybacks for
    companies who have made a significant investment
    in data warehousing.
  • Relatively new technology, however already used
    in a number of industries.

Topic 4 Data Mining vs. Data Warehousing
  • Data Mining does not require that a Data
    Warehouse be built. Often, data can be downloaded
    from the operational files to flat files that
    contain the data ready for the data mining
  • Data Mining can be implemented rapidly on
    existing software and hardware platforms. Data
    Mining tools can analyze massive databases to
    deliver answers to questions such as, Which
    customers are most likely to respond to my next
    promotional mailing, and why?

Data Mining vs. Data Warehousing
  • Major challenge to exploit data mining is
    identifying suitable data to mine.
  • Data mining requires single, separate, clean,
    integrated, and self-consistent source of data.
  • A data warehouse is well equipped for providing
    data for mining.
  • Data quality and consistency is a pre-requisite
    for mining to ensure the accuracy of the
    predictive models. Data warehouses are populated
    with clean, consistent data.

Data Mining vs. Data Warehousing
  • Advantageous to mine data from multiple sources
    to discover as many interrelationships as
    possible. Data warehouses contain data from a
    number of sources.
  • Selecting relevant subsets of records and fields
    for data mining requires query capabilities of
    the data warehouse.
  • Results of a data mining study are useful if
    there is some way to further investigate the
    uncovered patterns. Data warehouses provide
    capability to go back to the data source.

Topic 5 How Data Mining Works?
  • How exactly is Data Mining able to tell you
    important things that you didnt know or what is
    going to happen next? The technique in Data
    Mining is called Predictive Modeling which is
    knowledge discovery process via relationships and
    patterns in broad sense.
  • Modeling is the act of building a model in one
    situation where you know the answer and then
    applying it to another situation that you dont.

Examples of Applications of Data Mining via
relationships and patterns
  • Retail / Marketing
  • Identifying buying patterns of customers
  • Finding associations among customer demographic
  • Predicting response to mailing campaigns
  • Market basket analysis

Examples of Applications of Data Mining via
relationships and patterns
  • Banking
  • Detecting patterns of fraudulent credit card use
  • Identifying loyal customers
  • Predicting customers likely to change their
    credit card affiliation
  • Determining credit card spending by customer

Examples of Applications of Data Mining via
relationships and patterns
  • Insurance
  • Claims analysis
  • Predicting which customers will buy new policies.
  • Medicine
  • Characterizing patient behaviour to predict
    surgery visits
  • Identifying successful medical therapies for
    different illnesses.

Examples of Applications of Data Mining via
relationships and patterns
  • Customer profiling characteristics of good
    customers are identified with the goals of
    predicting who will become one and helping
    marketers target new prospects.
  • Targeting specific marketing promotions to
    existing and potential customers offers similar
  • Market-basket analysis With Data Mining,
    companies can determine which products to stock
    in which stores, and even how to place them
    within a store.

Examples of Applications of Data Mining via
relationships and patterns
  • Customer Relationships Management-Determines
    characteristics of customers who are likely to
    leave for a competitor, a company can take action
    to retain that customer because doing so is
    usually for less expensive than acquiring a new
  • Fraud detection- With Data Mining, companies can
    identify potentially fraudulent transactions
    before they happen.

Topic 6 Data Mining Operations and Associated
In previous foils, predictive modeling in essence
includes other operations shown in the above
Descriptive The dealer sold 200 cars last month.
Explanatory For every increase in 1 in the
interest, auto sales decrease by 5 .
Traditional DW
Predictive predictions about future buyer
Data Mining
Level of Modeling vs. Level of Analytical
Normalized Tables
Statistical Analysis/ Artificial Intelligence

Denormalized Tables
Classification Value Prediction
Roll-up Drill Down
Predictive Modelling
  • Similar to the human learning experience
  • uses observations to form a model of the
    important characteristics of some phenomenon.
  • Uses generalizations of real world and ability
    to fit new data into a general framework.
  • Can analyze a database to determine essential
    characteristics (model) about the data set.

Predictive Modelling
  • Model is developed using a supervised learning
    approach, which has two phases training and
  • Training builds a model using a large sample of
    historical data called a training set.
  • Testing involves trying out the model on new,
    previously unseen data to determine its accuracy
    and physical performance characteristics.

Predictive Modelling
  • Applications of predictive modelling include
    customer retention management, credit approval,
    cross selling, and direct marketing.
  • Two techniques associated with predictive
    modelling A. classification
  • B. value prediction, distinguished by nature
    of the
  • variable being predicted.

Statistical Analysis of Actual Sales (dollars and
quantities) relative To these Signage Variables-a
predictive modeling example.
  • Content
  • Frequency
  • Depth
  • Focus
  • Depth
  • Scale
  • Length
  • Location
  • Statistical Analysis Correlation, Regression,
    Experiment Design,
  • Optimization. Now it goes into real time

  • There are two techniques associated with
    predictive modeling classification and value
    prediction, which are distinguished by the nature
    of the variable being predicted.

Predictive Modelling - Classification
  • Used to establish a specific predetermined class
    for each record in a database from a finite set
    of possible, class values.
  • Two specializations of classification tree
    induction and neural induction.

Example of Classification using Tree Induction
Example of Classification using Tree Induction
Customer renting property gt 2 years
Rent property
Customer agegt45
Rent property
Buy property
Example of Classification using Neural Induction
Example of Classification using Neural Induction
  • Each processing unit (circle) in one layer is
    connected to each processing unit in the next
    layer by a weighted value, expressing the
    strength of the relationship. The network
    attempts to mirror the way the human brain works
    in recognizing patterns by arithmetically
    combining all the variables with a given data
  • In this way, it is possible to develop nonlinear
    predictive models that learn by studying
    combinations of variables and how different
    combinations of variables affect different data

Predictive Modelling - Value Prediction
  • Used to estimate a continuous numeric value that
    is associated with a database record.
  • Uses the traditional statistical techniques of
    linear regression and non-linear regression.
  • Relatively easy-to-use and understand.

Predictive Modelling - Value Prediction
  • Linear regression attempts to fit a straight line
    through a plot of the data, such that the line is
    the best representation of the average of all
    observations at that point in the plot.
  • Problem is that the technique only works well
    with linear data and is sensitive to the presence
    of outliers (i.e.., data values, which do not
    conform to the expected norm).

Predictive Modelling - Value Prediction
  • Although non-linear regression avoids the main
    problems of linear regression, still not flexible
    enough to handle all possible shapes of the data
  • Statistical measurements are fine for building
    linear models that describe predictable data
    points, however, most data is not linear in

Predictive Modelling - Value Prediction
  • Data mining requires statistical methods that can
    accommodate non-linearity, outliers, and
    non-numeric data.
  • Applications of value prediction include credit
    card fraud detection or target mailing list

Database Segmentation
  • Aim is to partition a database into an unknown
    number of segments, or clusters, of similar
  • Uses unsupervised learning to discover
    homogeneous sub-populations in a database to
    improve the accuracy of the profiles.

Database Segmentation
  • Less precise than other operations thus less
    sensitive to redundant and irrelevant features.
  • Sensitivity can be reduced by ignoring a subset
    of the attributes that describe each instance or
    by assigning a weighting factor to each variable.
  • Applications of database segmentation include
    customer profiling, direct marketing, and cross

Example of Database Segmentation using a Scatter
Database Segmentation
  • Associated with demographic or neural clustering
    techniques, distinguished by
  • Allowable data inputs
  • Methods used to calculate the distance between
  • Presentation of the resulting segments for

Example of Database Segmentation using a
Link Analysis
  • Aims to establish links (associations) between
    records, or sets of records, in a database.
  • There are three specializations
  • Associations discovery
  • Sequential pattern discovery
  • Similar time sequence discovery
  • Applications include product affinity analysis,
    direct marketing, and stock price movement.

Link Analysis - Associations Discovery
  • Finds items that imply the presence of other
    items in the same event.
  • Affinities between items are represented by
    association rules.
  • e.g. When customer rents property for more than
    2 years and is more than 25 years old, in 40 of
    cases, customer will buy a property. Association
    happens in 35 of all customers who rent

Link Analysis - Sequential Pattern Discovery
  • Finds patterns between events such that the
    presence of one set of items is followed by
    another set of items in a database of events over
    a period of time.
  • e.g. Used to understand long term customer buying

Link Analysis - Similar Time Sequence Discovery
  • Finds links between two sets of data that are
    time-dependent, and is based on the degree of
    similarity between the patterns that both time
    series demonstrate.
  • e.g. Within three months of buying property, new
    home owners will purchase goods such as cookers,
    freezers, and washing machines.

Deviation Detection
  • Relatively new operation in terms of commercially
    available data mining tools.
  • Often a source of true discovery because it
    identifies outliers, which express deviation from
    some previously known expectation and norm.

Deviation Detection
  • Can be performed using statistics and
    visualization techniques or as a by-product of
    data mining.
  • Applications include fraud detection in the use
    of credit cards and insurance claims, quality
    control, and defects tracing.

A Summary Data-Driven Techniques
  • Data Visualization
  • Decision Trees
  • Clustering
  • Factor Analysis
  • Neural Network
  • Association Rules
  • Rule Induction
  • Based on Sakhr Younesss book Professional
    Data Warehousing with SQL Server 7.0 and OLAP

Data Visualization
A pie chart showing the sales of a product by
region is Sometimes much more effective than
presenting the same Data in a text or tabular
Decision Tree
Cluster Analysis
First segment (high incomegt8,000)
Have Children
Second Segment (8000gtmiddle income gt3000)
Last car is A used one
Third Segment (low income lt 3000)
Own car
Factor Analysis
  • Unlike cluster analysis, factor analysis builds a
    model from data. The technique finds underlying
    factors, also called latent variables and
    provides models for these factors based on
    variables in the data. For ex., a software
    company is considering a survey to find out the
    nine most perceived attributes of one of their
    products. They might categorize these products to
    categories such as service for technical support,
    availability for training and a help system.
  • Factor analysis is used for grouping together
    products based on a similarity of buying patterns
    so that vendors may bundle several products as
    one to sell them together at a lower price than
    their added individual prices..

Neural Networks
Association Rules
  • Association models are models that examine the
    extent to which values of one field depend on, or
    are produced by, values of another field. These
    models are often referred to as Market Basket
    Analysis when they are applied to retail
    industries to study the buying patterns of these
    customers, especially in grocery and retail
    stores that issue their own credit cards.
    Charging against these cards gives the store the
    chance to associate the purchases of customers
    with their identities, which allows them to study
    associations among other things.

Rules Induction
  • This is a powerful technique that involves a
    large number of rules using a set of if..then
    statements in the pursuit of all possible
    patterns in the dataset. For ex., if the customer
    is a male then, if he is between 30 and 40 years
    of ages, and his income is less than 50,000 and
    more than 20,000, he is likely to be driving a
    car that was bought as new.

A Summary Theory-Driven Techniques
  • Correlations
  • T-Tests
  • Analysis of Variables
  • Linear Regression
  • Logistic Regression
  • Discriminate Analysis
  • Forecasting Methods

Topic 7 The Data Mining Process
  • Define the problem.
  • Select the data.
  • Prepare the data.
  • Mine the data.
  • Deploy the model.
  • Take business action.
  • Are you ready for Data Mining?

Define the problem
  • A successful data mining initiative always starts
  • a well-defined project. To insure that the
    project produces incremental value, include an
    assessment of the status quo
  • solution and a review of technology,
    organization, and business processes.

Select the data
  • This step involves defining your data source .
    (not every
  • data source and record is required.) The data
    is usually extracted from the source system to a
    separate server.

Prepare the data
  • This step represents up to 80 percent of the
    total project effort. For data mining, the data
    must reside in one flat table (each record has
    many columns). In addition to being the most time
    consuming, the step is also the most critical.
    The resulting models are only as good as the data
    used to create them.

Mine the data
  • Typically the easiest and shortest phase, this
    step involves applying statistical and AI tools
    to create mathematical models. Data mining
    typically occurs on a server separate from the
    data warehousing and other corporate systems.

Deploy the Model
  • Model deployment is the process of implementing
    the mathematical models into operational systems
    to improve business results.

Take Business Action
  • Use the deployed model to achieve improved
    results to the business problem identified at the
    beginning of the process.

Step to Implement Data Mining
Discovery (patterns, relations Associations, etc.)
Prior Knowledge
Information Model
  • Just because you have a data warehouse doesnt
  • youre necessarily ready for data mining. Much of
  • work our company does in the data mining arena
  • more to do with data mining readiness assessment
  • with actually performing data mining.

Metrics you can use to gauge your data mining
  • Do you have a staff of experienced knowledge
  • Do you have the data?
  • Do you have marketing processes in place that can
    use this data?
  • Do you have a business champion who can embrace
    the process and results?
  • Do you have the technology infrastructure to
    support advanced analysis?

Topic 8 Data Mining Tools
  • Data mining tools are typically classified by the
    type of
  • algorithm they use to identify hidden patterns.
    There are
  • many different algorithms in use, but the four
  • popular are association, sequence, clustering (or
  • segmentation), and predictive modeling.

Data Mining Tools
  • There are a growing number of commercial data
    mining tools on the marketplace.
  • Important characteristics of data mining tools
  • Data preparation facilities
  • Selection of data mining operations
  • Product scalability and performance
  • Facilities for visualization of results.

Data Mining vs. OLAP
  • They are two separate breeds of analysis with
  • entirely different objectives, not to mention
  • tools, skill sets, and implementation methods.

Data Mining
  • With canned reports, ad hoc querying, and
  • OLAP, the end user defines a hypothesis and
  • determines which data to examine. With data
  • mining, the tool identifies the hypothesis, and
  • actually tells the user where in the data to
  • the exploration process.

Data Mining
  • Rather than using SQL to filter out values and
  • reduce the data into a concise answer set, data
    mining uses
  • algorithms that exhaustively review the
    relationships among
  • data elements to determine if any patterns exist.
    The whole
  • purpose of data mining is to yield new business
  • that a business person can act on.

OLAP vs. Data Mining Tools
OLAP Tools
Data Mining Tools
  • Are ad hoc, shrink wrapped tools that provide an
    interface to data
  • Are used when you have specific known questions
  • Looks and feels like a spreadsheet that allow
    rotation, slicing and graphic
  • Can be deployed to large number of users
  • Methods for analyzing multiple data types
  • -- Regression Trees
  • -- Neural networks
  • -- Genetic algorithms
  • Are used when you dont know what the questions
  • Usually textual in nature
  • Usually deployed to a small number of analysts

Data Mining Tools
  • Association, also frequently referred to as
    "affinity analysis," reviews numerous sets of
    items and looks for common groupings. An example
    of association is market basket analysis, which
    involves reviewing the products that consumers
    purchase in a single trip to the grocery store.

  • Finds items that imply the presence of other
    items in the same event.
  • Affinities between items are represented by
    association rules.
  • e.g. When a customer rents property for more
    than 2 years and is more than 25 years old, in
    40 of cases, the customer will buy a property.
    This association happens in 35 of all customers
    who rent properties.

Data Mining Tools
  • Sequential analysis helps data miners
    identify a set of order-specific items or events.
    Association identifies the existence of patterns
    or groups of items sequential
  • analysis identifies the order of those
    patterns or groups of items.

  • Finds patterns between events such that the
    presence of one set of items is followed by
    another set of items in a database of events over
    a period of time.
  • e.g. Used to understand long term customer
    buying behavior.

Link Analysis - Similar Time Sequence Discovery
  • Finds links between two sets of data that are
    time-dependent, and is based on the degree of
    similarity between the patterns that both time
    series demonstrate.
  • e.g. Within three months of buying property,
    new home owners will purchase goods such as
    cookers, freezers, and washing machines.

Data Mining Tools
  • Cluster analysis lets the data miner assemble
    data into unforeseen groups containing similar
    characteristics. Also known as "segmentation,"
    this type of data
  • mining is probably the most widely used.

  • Aim is to partition a database into an unknown
    number of segments, or clusters, of similar
  • Uses unsupervised learning to discover
    homogeneous sub-populations in a database to
    improve the accuracy of the profiles.

Data Mining Tools
  • As the name implies, predictive modeling
    involves developing a model from historical data
    for predicting a future event. The power of
    predictive modeling engines is that they can use
    a broad range of data attributes to identify
    future behavior. Both cluster analysis and
    predictive modeling tools identify distinct
    groups of items with common attributes the
    difference is that predictive modeling focuses on
    the likelihood of a particular outcome for a
    particular group.

Topic 9 Data Mining Applications for CRM
  • Which customers are most profitable to me? Why?
  • What promotions are most effective? For which
  • What kind of customers will be interested in my
    new product?
  • What customers are at risk to defect to my
  • How do I identify prospects with the greatest
    profit potentials?
  • Customer information is rapidly becoming a
    companys most
  • important asset to answer these questions.
    However, to answer these
  • Questions in broad generalities is not enough.
    Each customer must be
  • Analyzed and potentially treated uniquely.
    Customer relationship
  • management provides the framework for analyzing
  • Profitability and improving marketing

Customer Relationship Management -Framework
  • Many organizations have collected and stored a
    wealth of data about
  • their Customers, suppliers, and business
    partners. However, the
  • inability to Discover valuable information hidden
    in the data prevents
  • these organizations From transforming this data
    into knowledge. The
  • business desire is, therefore, to Extract valid,
    previously unknown,
  • and comprehensible information from large
    Databases and use it for
  • profits. To fulfill these goals, organizations
    need to follow these steps
  • - Capture and integrate both the internal and
    external data into a
  • comprehensive view that encompasses the whole
  • - Mine the integrated data for information.
  • - Organize and present the information with
    knowledge for decision-making.

Customer Relationship Management -Framework
  • From the architecture point of view, the entire
    CRM framework can
  • Be classified into three key components
  • Operational CRM The automation of horizontally
    integrated business processes, including customer
    touch-points, channels, and front-back office
  • Analytical CRM- The analysis of data created by
    the Operational CRM
  • Collaborative CRM- Applications of Collaborative
    services including e-mail, personalized
    publishing, e-communities, and similar vehicles
    designed to facilitate interactions between
    customers and organizations.

CRM Architecture
Business Rules and Metadata Management
Market Data Store
Decision Support Applications
Data Sources
Communication Channels
Contact History
Direct Mails
Campaign Mgt
Campaign Mgt
Call Center Call Center
Contact Mgt
Transaction History
ETL Tools
Customer Service Center
Analytics Data Mart
Data Mining Analytics
Marketing Data Marts

Reporting Data Mart
Reporting Data Mart
External Data
Workflow Management
Workflow Management
CRM -The Business Perspective
  • Tools and technologies will be applied to these
    real CRM business problems.
  • They are
  • Customer Profitability provides a blueprint for
    how to define and use customer profitability as
    the bedrock for your CRM processes.
  • Customer Acquisition shows how to use data
    mining to acquire new customers in the most
    profitable way possible.
  • Customer Cross-selling details how the
    technology architecture can be used to increase
    the value of existing customers by applying more
    to them.
  • Customer Retention uses a case study from the
    telecommunications industry to show how to
    execute successful CRM systems to retain your
    profitable customers.
  • Customer Segmentation provides the business
    methodology of how to segment and manage your
    customers in a consistent and repeatable way
    across the enterprise.

Information Mining and Knowledge Discovery for
Effective CRM
  • In the current and emerging competitive and
    highly dynamic business
  • Environment, only the most competitive companies
    will achieve
  • sustained market success. In order to capitalize
    on business
  • Opportunities, these organization will
    distinguish themselves by the
  • Capacity to leverage information about their
    marketplace, customers,
  • And operations. A central part of this strategy
    for long-term
  • Sustaining success will be an active information
    repository- an
  • Advanced data warehouse, in which information
    from various
  • Applications or parts of the business is
    coalesced and understood.

Information Mining
  • The shortest path from complex data to knowledge
    discovery is
  • Information mining instead of data mining to
    reflect the rich variety
  • Of forms that information required for business
    intelligence can take.
  • Information mining implies using powerful and
    sophisticated tools to
  • Do the following
  • Uncover associations, patterns, and trends
  • Detect deviations
  • Group and classify information
  • Develop predictive models

Information Mining
  • From a technical perspective, the real keys to
    successful information
  • Mining are its algorithms complex mathematical
    processes that
  • Compare and correlate data. Algorithms enable an
  • mining application to determine who the best
    customers for the
  • Business are or what they like to buy. They can
    also determine at
  • what time of day, in what combinations, or how an
    organization can
  • Optimize inventory, pricing, and merchandising in
    order to retain
  • These customers and cause them to buy more, at
    increased profit
  • Margins. A large volume of information is stored
    in anon-numeric
  • Forms documents, images and video files.

Text Mining and Knowledge Management
  • Text Mining is a subset of information mining
    technology that, in
  • turn, is a Component of a more general category
    of Knowledge
  • Management (KM) Knowledge, in this case, refers
    to the collective
  • expertise, experiences, know-How, and wisdom of
    an organization. In
  • a business world, knowledge is Represented not
    only by the
  • structured data found in traditional database,
    But in a wide variety of
  • unstructured sources such as word documents,
    Memos and letters, e-
  • mail messages, news feeds, Web pages, and so

Text Mining and Knowledge Management
  • Unlike data mining, text mining works with
    information stored in an
  • Unstructured collection of text documents.
    Specifically, online text
  • Mining refers to the process of searching
    through unstructured data
  • On the internet and deriving some meaning from
    it. Text mining goes
  • beyond applying statistical models to data files
    in fact, text mining
  • Uncovers relationships in a text collection, and
    leverages the
  • creativity of the knowledge work to explore
    these relationships and
  • Discover new knowledge.

Text Mining Technologies
  • There are two key key technologies that make
    online text mining
  • possible
  • Internet Searching - It has been around for a
    quite few years. Yahoo, Alta Vista, and Excite
    are three of the earliest. Search engines (and
    discovery services) operate by indexing the
    context in a particular Web site and allows users
    to search the indexes. Although useful, first
    generations of these tools often were wrong
    because they did nit correctly index the content
    they retrieved. Advances in text mining applied
    to the internet searching resulted in online text
    mining, representing the new generation of
    Internet search tools. With these products, users
    can gain more relevant information by processing
    smaller amount of links, pages and indexes.

Text Mining Technologies
  • Text Analysis - It has been around longer than
    Internet searching. Indeed, scientists have been
    trying to make computers understand natural
    languages for decades text analysis is an
    integral part of these efforts. The automatic
    analysis of text information can be used for
    several different general purposes
  • 1. To provide an overview of the contents of
    a large document collection, for ex., finding
    significant clusters of documents in a customer
    feedback collection could indicate where a
    companys products and services need improvement.
  • 2. To identify hidden structures between
    groups of objects this may help to organize an
    intranet site so that related documents are all
    connected by hyperlinks.

Text Mining Technologies
  • 3. To increase the efficiency and
    effectiveness of a search process to find similar
    or related information for ex., to search
    articles from a news service and discover all
    unique documents that contain hints on possible
    trends or technologies that have so far not been
    mentioned in their articles.
  • 4. To detect duplicate documents in an

Text Mining Technologies-Applications
  • 1. E-mail management. A popular use of text
    analysis is for messae routing in which the
    computer reads the message to decide who should
    deal with it. (Spam control is another good
  • 2. Document Management. By mining the
    different documents for meaning as they are put
    into a document repository, a company can
    establish a detailed index that allows the
    location of relevant documents at any time.
  • 3. Automated help desk. Some companies use
    text mining to respond to customer inquiries.
    Customers letters and e-mails are processed by a
    text mining applications.
  • 4. Market research. A market researcher can
    use online text mining to gather statistics on
    the occurrences of certain words,c phases,
    concepts, or themes on the World Wide Web. This
    information can be useful for establishing market
    demographics and demand curves.
  • 5. Business intelligence gathering. This is
    the most advanced use of text mining. (See next

  • Blogger is one of the most popular online
    blogging tool, works with
  • any browser, and is free, well designed and easy
    to use. Millions of
  • people are changing their information acquisition
    habits, and the web
  • Log, or blog has become a popular source.
  • Title-Publishing a blog with blogger/by Elizabeth
    Castro, Berkeley, Calif, Peachpit, 2005
  • Title- Blog Understaning the information thats
    changing your world/ Hugh Howitt, Nashiville,
    Tenn, Nelson Books, c2005
  • Webblogs (isbn 0321321235)

CRM in the eBusiness World
  • As e-business continues to mature and affect
    radical changes throughout all
  • Aspects of the businesses, the focus of new
    e-business-enabled application
  • Software will shift away from narrowly defined
    commerce platforms toward
  • A broader vision of managing customer
  • A new model that Forrester Research calls
    eRelationship Management (eRM)
  • Is defined as follows
  • A Web-centric approach to synchronizing customer
    relationships across
  • Communication channels, business functions, and

CRM in the eBusiness World
  • To implement this new e-business CRM model,
    companies should do the
  • Following
  • Create a dynamic customer context that can
    address every customer interaction that is
    different from a view of the customer constructed
    from data contained in the applications. This can
    be achieved by collecting and organizing customer
    data, calculating high-level matrices for each
    customer (I.e., customer profitability,
    satisfaction, and churn potential), and
    assembling and delivering dynamic context to
    customer touch points.
  • Generate consistent, custom responses by
    delivering a consolidated rules engine for
    routing, workflow, personalization, smart
    navigation, and consistent treatment of customers
  • Build and maintain a Content Directory to point
    to company, products, and business partner
    content and give to employees, business
    partners, and customers.

Topic 10 Data Mining From US Government Printing
  • Washington, March 25, 2003. Subcommittee on
    Technology, Information Policy, Intergovernmental
    Relations and the Census Oversight hearing on
    Data Mining Current Applications and the Future
    Possibilities-Available via
    house or
  • Background The hearing will explore instances
    where data mining technology is currently
    employed, examine the benefits and the pitfalls,
    and discuss the potential uses of data mining at
    the Federal level of government. A specific focus
    on privacy and abuse concerns surrounding this

Data Mining Current Applications and the Future
  • Data Mining technology has been utilized
    successfully for many years in both the private
    and public sectors to identify and analyze useful
    data that would otherwise be overlooked or
  • Government agencies have also used data mining
    techniques quite extensively to identify and
    eliminate fraud, waste and abuse. States work
    with localities by providing them access to their
    data sources. This has allowed local and state
    enforcement agencies to zero in on tax evaders,
    perpetrators of financial crimes or those
    conducting any number of fraudulent activities.
    At the federal level, the Treasury Department
    uses this technology to identify and prosecute
    money laundering schemes, the IRS to track down
    delinquent taxpayers, and the US Customers to
    identify drug trafficking activities at U.S,

Topic 11 Data Mining Techniques- A Summary
  • Artificial neural networks Non-linear predictive
    models that learn through training and resembles
    biological neural networks in structure.
  • Decision Trees Tree-shaped structures that
    represent sets of decisions. These decisions
    generate rules for the classification of a
  • Generic Algorithms Optimization techniques that
    use processes such as generic combination,
    mutation, and natural selection in a design based
    on the concepts of revolution.
  • Rule induction The extraction of useful if-then
    rules from data based on statistical

Data Mining Techniques- A Summary
  • Predictive modeling
  • Database Segmentation
  • Link analysis
  • Deviation detection
  • Classification
  • Value prediction
  • Demographic clustering
  • Neural clustering
  • Association discovery
  • Sequential pattern discovery
  • Similar time sequence discovery
  • Statistics
  • Visualization

Two Types of Data Mining Modeling- Verification
and Discovery
  • The verification model utilizes a process that
    looks in a database to detect trends and patterns
    in data that will help answer some specific
    questions about the business.
  • In this mode, the user generates a hypothesis
    about the data, issues a query against the data
    and examines the results of the query looking for
    verification of the hypothesis or the user
    decides that the hypothesis is not valid.

Verification Model
  • In this model, very little information is created
    in this extraction process either the hypothesis
    is verified or it is not.
  • Common tools used in this mode are queries,
    multidimensional analysis and visualization. What
    all have in common are that the user is
    essentially guiding the exploration of the data
    being inspected.

Discovery Model
  • A more popular model is the Discovery Model that
    utilizes a process that looks in a database to
    discover and/or predict future patterns. The
    discovery model is divided into two modes
    Descriptive and Predictive.

Discovery Model- Descriptive Mode
  • The Descriptive mode finds hidden patterns
    without a predetermined idea or hypothesis about
    what the patterns may be. In other words, the
    Data Mining software or program takes the
    initiative in finding what the interesting
    patterns are, without the user thinking of the
    relevant questions first. In this mode
    information is created about the data with very
    little or guidance from the user. The exploration
    of the data is done in such a way as to yield as
    large a number of useful facts about the data in
    the shortest amount of time.

Discovery Model- Predictive Mode
  • In the Predictive mode patterns discovered from
    the database are used to predict the future
    patterns or trends. Predictive modeling allows
    the user to submit records with some unknown
    field values, and the system will guess the
    unknown values based on previous patterns
    discovered from the database.
  • In comparing the two models, one can state that
    Verification can be very inefficient, timely
    and costly. Whereas, Discovery modeling can be
    very efficient, cost effective, less dependent on
    user input and increases modeling accuracy.