Based%20on%20the%20book%20 - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Based%20on%20the%20book%20

Description:

Data-mining technologies, such as rule induction, neural networks, genetic ... The four parts of data mining technology patterns, sampling, validation, ... – PowerPoint PPT presentation

Number of Views:862
Avg rating:3.0/5.0
Slides: 169
Provided by: circus6
Learn more at: http://www.circusoflife.com
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Based%20on%20the%20book%20


1
Data Mining Applications for CRM
  • Based on the book Building Data Mining
    Applications for CRM
  • By
  • Alex Berson
  • Stephen Smith
  • Kurt Thearling

2
Data Mining Applications for CRM
  • Summary of Topics
  • 1. Customer Relationship Management-Framework
    and
  • Architecture
  • 2. Reinforcing CRM with Data Mining
  • 3. Data Mining An Overview
  • 4. Key Terms
  • 5. Data Mining Methodology
  • 6. Classical Techniques Statistics,
    Neighborhoods, and Clustering
  • 7. Next Generation Techniques Trees,
    Networks, and Rules
  • 8. CRM -The Business Perspective
  • 9. Deploying Data Mining for CRM
  • 10. Data Quality
  • 11. Next Generation of Information Mining and
    Knowledge Discovery for Effective
    CRM
  • 12. CRM in the e-Business World

3
Topic 1 Customer Relationship Management-Framewor
k and Architecture
  • CRM is an enterprise approach to customer service
    that uses meaningful communication to understand
    and influence consumer behavior. The purpose of
    the process is twofold
  • a To impact all aspects to the consumer
    relationship (e.g., improve customer
    satisfaction, enhance customer loyalty, and
    increase profitability) and
  • B To ensure that employees within an
    organization are using CRM tools. The need for
    greater profitability requires an organization to
    proactively pursue its relationships with
    customers.

4
Customer Relationship Management-Framework and
Architecture
  • Which customers are most profitable to me? Why?
  • What promotions are most effective? For which
    customers?
  • What kind of customers will be interested in my
    new product?
  • What customers are at risk to defect to my
    competitor?
  • How do I identify prospects with the greatest
    profit potentials?
  • Customer information is rapidly becoming a
    companys most
  • important asset to answer these questions.
    However, to answer these
  • questions in broad generalities is not enough.
    Each customer must be
  • analyzed and potentially treated uniquely.
    Customer Relationship
  • Management provides the framework for analyzing
    customer
  • profitability and improving marketing
    effectiveness.

5
Customer Relationship Management -Framework and
Architecture
  • Many organizations have collected and stored a
    wealth of data about their
  • customers, suppliers, and business partners.
    However, the inability to
  • discover valuable information hidden in the data
    prevents these organizations
  • from transforming this data into knowledge. The
    business desire is, therefore, to
  • extract valid, previously unknown, and
    comprehensible information from large
  • databases and use it for profits. To fulfill
    these goals, organizations need to
  • follow these steps
  • - Capture and integrate both the internal and
    external data into a
  • comprehensive view that encompasses the whole
    organization.
  • - Mine the integrated data for information.
  • - Organize and present the information with
    knowledge for decision-making.

6
Data, Information, and Decision
  • Data Resource Management (DRM)
  • MIS (OLTP) OOAD
  • KM (Knowledge Mgt), KWS (Knowledge Work Systems)
  • DSS ESS, EIS (Executive Information Systems)
  • Data Warehousing/Data Mart/Data Mining/OLAP
    (Executive, Collaborative and individual
    levels)
  • Business Intelligence
  • Data
  • Information (Data Process)
  • Knowledge/Business Intelligence
  • Decision (Information
  • Knowledge)
  • Data/Information/Decision /Business Intelligence

7
Customer Relationship Management --Framework and
Architecture
  • From the architecture point of view, the entire
    CRM framework can
  • be classified into three key components
  • Operational CRM The automation of horizontally
    integrated business processes, including customer
    touch-points, channels, and front-back office
    integration.
  • Analytical CRM- The analysis of data created by
    the Operational CRM
  • Collaborative CRM- Applications of Collaborative
    services including e-mail, personalized
    publishing, e-communities, and similar vehicles
    designed to facilitate interactions between
    customers and organizations.

8
CRM Architecture
Business Rules and Metadata Management
Data Sources
Market Data Store
Decision Support Applications
Communication Channels
Contact History
Direct Mails
Campaign Mgt
Campaign Mgt
Call Center Call Center
Contact Mgt
Transaction History
ETL Tools
Customer Service Center
Analytics Data Mart
Data Mining Analytics
Marketing Data Marts
Customer Profile And account
Internet

E-mail
Reporting Data Mart
Reporting Data Mart
Other
External Data
Workflow Management
9
Campaign MGT Software-Managing Campaigns
  • Accommodation of many new touch points besides
    direct mail, for ex., the Web, direct TV ad.,
    hard copy advertising customer services, street
    brochure dispatch, and signage.
  • Focus on profitability (not only on which
    customer was most profitable, but also on what
    was the most profitable promotion that could be
    sent., e.g., send .025 postcard rather than the
    25 rebate if both have the same effect).
  • Optimization of the sequence of promotion
    delivery.
  • Tools for constructing experiments that allow the
    marketing professionals to test out the
    effectiveness of new promotions and new
    segmentation techniques, for ex., using different
    contents and timing for signage advertising.
  • Accommodation by the system of predictive
    modeling from data mining , which provides
    insights into future customer behavior and future
    customer profitability.

10
Web-Enabled Information Delivery
Structured Content
Query Engine Analytics Drill Down Agents
Web Browser
Web Server
SQL
CGI
HTML
HTML
Unstructured Content

How about the web Log, or blog which has become
a popular source for information acquisition.
11
Topic 2 Reinforcing CRM with Data Mining
  • Companies worldwide are beginning to realize that
    surviving an intensively competitive and global
    marketplace requires closer relationships with
    customers. In turn, enhanced customer
    relationships can boost profitability three ways
    a) by reducing costs by attracting more suitable
    customers, b) by generating profits through
    cross-selling and up-selling activities, and c)
    by extending profits through customer retention.
    Slightly expanded explanations of these
    activities follow

12
Reinforcing CRM with Data Mining
  • Attracting more suitable customers Data
    mining can help firms understand which customers
    are most likely to purchase specific products and
    services, thus enabling businesses to develop
    targeted marketing programs for higher response
    rates and better returns on investment.
  •    Better cross-selling and up-selling
    Businesses can increase their value proposition
    by offering additional products and services that
    are actually desired by customers, thereby
    raising satisfaction levels and reinforcing
    purchasing habits.
  • Better retention Data-mining techniques can
    identify which customers are more likely to
    defect and why. A company can use this
    information to generate ideas that allow them to
    maintain these customers.

13
DW Technologies and Tools-An Overview
Data Acquisition
Data Storage
Information Delivery
OLAP
Source Systems
Data Modeling
DW/ Data Marts
Extraction
Report Writer
Data Loading
Transformation
Staging Area
Quality Assurance
Load Image Creation
Alert Systems
Data Mining
14
DW Information Flow
15
Data Warehouse Database
  • The central data warehouse database is a
    cornerstone of data warehousing
  • environment. On the architecture diagram, the
    database is almost always
  • implemented on the relational database management
    system (RDBMS)
  • technology. Now the now approaches include the
    following
  • Multidimensional database (MDDBs)- This is
    tightly coupled with the online analytical
    processing (OLAP) tools that act as clients to
    the multidimensional data stores.
  • An innovative approach to speed up a traditional
    RDBMs by using new index structures to bypass
    relational table scans.
  • Parallel relational database designs that require
    a parallel computing platforms, for ex.,
    symmetric multiprocessor (SMP), massively
    parallel processors (MPPs), and or clusters of
    uni-or multiprocessors.

16
Information Delivery Tool Taxonomy
  • Tools are generally divided into five main groups
  • Data query and reporting tools.
  • Application development tools.
  • Executive Information System (EIS) tools.
  • Online analytical processing tools.
  • Data mining tools.

17
Topic 3 Data Mining An Overview
  • Data mining can help reduce information overload
    and improve decision making. This is achieved by
    extracting and refining useful knowledge through
    a process of searching for relationships and
    patterns from the extensive data collected by
    organizations. The extracted information is used
    to predict, classify, model, and summarize the
    data being mined. Data-mining technologies, such
    as rule induction, neural networks, genetic
    algorithms, fuzzy logic, and rough sets, are used
    for classification and pattern recognition in
    many industries.

18
Data Mining An Overview
  • A supermarket organizes its merchandise stock
    based on shoppers' purchase patterns.
  • An airline reservation system uses customers'
    travel patterns and trends to increase seat
    utilization.
  • Web pages alter their organizational structure or
    visual appearance based on information about the
    person who is requesting the pages.
  • Individuals perform a Web-based query to find the
    median income of households in Iowa.

19
Data Mining An Overview
  • Data mining builds models of customer
    behavior by using established statistical and
    machine-learning techniques. The basic objective
    is to construct a model for one situation in
    which the answer or output is known and then
    apply that model to another situation in which
    the answer or output is sought. The best
    applications of the above techniques are
    integrated with data warehouses and other
    interactive, flexible business analysis tools.
    The analytic data warehouse can thus improve
    business processes across the organization in
    areas such as campaign management, new product
    rollout, and fraud detection.

20
Data Mining An Overview
  • Data mining integrates different
    technologies to populate, organize, and manage
    the data store. Because quality data is crucial
    to accurate results, data-mining tools must be
    able to clean the data, making it consistent,
    uniform, and compatible with the data store. Data
    mining employs several techniques to extract
    important information. Operations are the actions
    that can be performed on accumulated data,
    including predictive modeling, database
    segmentation, link analysis, and deviation
    detection.

21
Taxonomy of Data Mining Tools
  • We can divided the entire data mining tool market
    into three main
  • groups General-purpose tools, integrated
    DSS/OLAP/data mining
  • tools, and rapidly growing, application-specific
    tools.
  • The General-purpose tools which occupy the larger
    and more mature
  • segment of the market include the following
  • SAS Enterprise Minor
  • IBM Intelligent Minor
  • Unica PRW
  • SPSS Clementine
  • SGI Mineset
  • Oracle Darwin
  • Angoss KnowledgeSeeker

22
Taxonomy of Data Mining Tools
  • The integrated data mining tool segment addresses
    a very real and
  • compelling business requirement of having a
    single multi-function,
  • decision-support tool that can provide management
    reporting, online
  • analytical processing, and data mining
    capabilities within a common
  • framework. Examples of these integrated tools
    include Cognos
  • scenario and Business Objects.
  • The application-specific tools segment is rapidly
    gaining momentum.
  • Among these tools are the following
  • KDI (focuses on retail)
  • Options Choices (focuses on insurance
    industries)
  • HNC (focuses on fraud detection)
  • Unica Model 1 (focuses on marketing)

23
Database Mining Workstation (HNC)
  • HNC is one of the most successful data mining
    companies. Its Database
  • Mining workstation (DMW) is a neural network tool
    that is widely-accepted
  • For credit card fraud analysis applications. DMW
    consists of Windowsbased
  • software applications and a custom processing
    board. Other HNC products
  • include Falcon and ProfitMax processing
    applications for financial services,
  • and the Advanced Telecommunications Abuse Control
    System (ATACS)
  • fraud-detection solution that HNC plans to deploy
    in the Telecommunications
  • Industries.

24
Taxonomy of Data Mining Tools
  • There are specific tools, for example, for the
    following applications
  • Financial Data Analysis neural networks have
    been used in forecasting stock prices, option
    trading, rating bonds, portfolio management,
    commodity-price prediction, and mergers and
    acquisition analysis. Using IBM Intelligent
    minor, Mellon Bank developed a credit
    card-attrition model to predict which customers
    will stop using Mellons credit card in the next
    few months.
  • Telecommunications Industry The
    hyper-competitive nature of the industry has
    created a need to understand customers, to keep
    them, to model effective ways to market new
    products.

25
Taxonomy of Data Mining Tools
  • Retail Industry Retail data mining can help
    identify customer-buying behaviors, discover
    consumer-shopping patterns and trends.
  • Healthcare and biomedical research The analysis
    of large quantities of time-stamped data will
    provide doctors with important information
    regarding the progress of the decease. For ex.,
    NeuroMedicalSystems used neural networks to
    perform a pap smear diagnostic aid.
  • Science and engineering To improve its
    manufacturing process. Boeing has successfully
    applied machine-learning algorithms to the
    discovery of informative and useful rules from
    its plant data.

26
Data Mining vs. Data Warehouse
  • Major challenge to exploit data mining is
    identifying suitable data to mine.
  • Data mining requires single, separate, clean,
    integrated, and self-consistent source of data.
  • A data warehouse is well equipped for providing
    data for mining.
  • Data quality and consistency is a pre-requisite
    for mining to ensure the accuracy of the
    predictive models. Data warehouses are populated
    with clean, consistent data.

27
Data Mining vs. Data Warehouse
  • Data Mining does not require that a Data
    Warehouse be built. Often, data can be downloaded
    from the operational files to flat files that
    contain the data ready for the data mining
    analysis.
  • Data Mining can be implemented rapidly on
    existing software and hardware platforms. Data
    Mining tools can analyze massive databases to
    deliver answers to questions such as, Which
    customers are most likely to respond to my next
    promotional mailing, and why?

28
Data Mining vs. Data Warehouse
  • Advantageous to mine data from multiple sources
    to discover as many interrelationships as
    possible. Data warehouses contain data from a
    number of sources.
  • Selecting relevant subsets of records and fields
    for data mining requires query capabilities of
    the data warehouse.
  • Results of a data mining study are useful if
    there is some way to further investigate the
    uncovered patterns. Data warehouses provide
    capability to go back to the data source.

29
Data Mining vs. OLAP
  • They are two separate breeds of analysis with
  • entirely different objectives, not to mention
  • tools, skill sets, and implementation methods.

30
Data Mining vs. OLAP
  • With canned reports, ad hoc querying, and OLAP,
    the
  • end user defines a hypothesis and determines
    which data
  • to examine. With data mining, the tool identifies
    the
  • hypothesis, and it actually tells the user where
    in the data
  • to start the exploration process.

31
Data Mining vs. OLAP
  • Rather than using SQL to filter out values and
    methodically
  • reduce the data into a concise answer set, data
    mining uses
  • algorithms that exhaustively review the
    relationships among
  • data elements to determine if any patterns exist.
    The whole
  • purpose of data mining is to yield new business
    information
  • that a business person can act on.

32
OLAP vs. Data Mining Tools
OLAP Tools
Data Mining Tools
  • Are ad hoc, shrink wrapped tools that provide an
    interface to data
  • Are used when you have specific known questions
  • Looks and feels like a spreadsheet that allow
    rotation, slicing and graphic
  • Can be deployed to large number of users
  • Methods for analyzing multiple data types
  • -- Regression Trees
  • -- Neural networks
  • -- Genetic algorithms
  • Are used when you dont know what the questions
    are
  • Usually textual in nature
  • Usually deployed to a small number of analysts

33
Topic 4 Key Terms
  • Application Service Providers
  • Offer outsourcing solutions that supply,
    develop, and manage application specific software
    and hardware so that customers' internal
    information technology resources can be freed up.
  • Business Intelligence
  • The type of detailed information that
    business managers need for analyzing sales
    trends, customers' purchasing habits, and other
    key performance metrics in the company.

34
Key Terms
  • Categorical Data
  • Fits into a small number of distinct
    categories of a discrete nature, in contrast to
    continuous data, and can be ordered (ordinal),
    for example, high, medium, or low temperatures,
    or nonordered (nominal), for example, gender or
    city.
  • Classification
  • The distribution of things into classes or
    categories of the same type, or the prediction of
    the category of data by building a model based on
    some predictor variables.

35
Key Terms
  • Clustering
  • Groups of items that are similar as
    identified by algorithms. For example, an
    insurance company can use clustering to group
    customers by income, age, policy types, and prior
    claims. The goal is to divide a data set into
    groups such that records within a group are as
    homogeneous as possible and groups are as
    heterogeneous as possible. When the categories
    are unspecified, this may be called unsupervised
    learning.
  • Genetic Algorithm
  • Optimization techniques based on
    evolutionary concepts that employ processes such
    as genetic combination, mutation, and natural
    selection in a design.

36
Key Terms
  • Online Profiling
  • The process of collecting and analyzing data
    from Web site visits, which can be used to
    personalize a customer's subsequent experiences
    on the Web site. Network advertisers, for
    example, can use online profiles to track a
    user's visits and activities across multiple Web
    sites, although such a practice is controversial
    and may be subject to various forms of
    regulation.
  • Rough Sets
  • A mathematical approach to extract knowledge
    from imprecise and uncertain data.

37
Key Terms
  • Rule Induction
  • The extraction of valid and useful
    if-then-else rules from data based on their
    statistical significance levels, which are
    integrated with commercial data warehouse and
    OLAP platforms.
  • Visualization
  • Graphically displayed data from simple
    scatter plots to complex multidimensional
    representations to facilitate better
    understanding.

38
Topic 5 Data Mining Methodology
  • The methodology used today in data mining, when
    it is well thought
  • out and well executed, consists of just a few
    very important concepts.
  • Finding a pattern in the data and building a
    model. In general, it means any sequence or
    pattern of data that occurs more often than one
    would it to if it were a random event.
  • Sampling or not having to use all of the data in
    order to make significant conclusions about what
    might be happening with other parts of the data.
  • Validating the predictive models that arise out
    of data mining algorithm.
  • Finally, coming down to finding the pattern or
    model that is the beat.
  • The four parts of data mining technology
    patterns, sampling, validation,
  • and choosing the model.

39
Pattern and Model
  • Pattern An event or combination of events in a
    database that occurs more
  • often than expected. Typically, this means that
    its actual occurrence is
  • significantly different than what would be by
    random chance. (for ex.,
  • 121212?
  • Model A description that adequately explains and
    predicts relevant data but
  • is generally much smaller than the data itself.
    For real-world applications, a
  • model can be anything from a mathematical
    Equation, to a set of rules that
  • describes customer segments, to the computer
    representation of a complex
  • neural network architecture, which translates to
    several sets of mathematical
  • equations.
  • Predictive model A model created or used to
    perform prediction. In contrast
  • to models created solely for pattern detection,
    exploration or general
  • organization of the data.

40
Types Of Models
Descriptive The dealer sold 200 cars last month.
Operational
(OLTP)
Explanatory For every increase in 1 in the
interest, auto sales decrease by 5 .
Traditional DW
OLAP
Predictive predictions about future buyer
behavior.
Data Mining
41
A high-level View of Modeling Process
Historical Data
Model Building
Prediction
Record ???
123
Model
42
The Needs for Sampling
  • Containing costs
  • Speeding up the data gathering
  • Improving effectiveness
  • Reducing bias

43
Sampling Design
  • Four steps
  • Determine the data to be collected or described
  • Determine the population to be sampled
  • Choose the type of sample
  • Decide on the sample size

44
Two Types of Data Mining Modeling- Verification
and Discovery
  • The verification model utilizes a process that
    looks in a database to detect trends and patterns
    in data that will help answer some specific
    questions about the business.
  • In this mode, the user generates a hypothesis
    about the data, issues a query against the data
    and examines the results of the query looking for
    verification of the hypothesis or the user
    decides that the hypothesis is not valid.

45
Verification Model
  • In this model, very little information is created
    in this extraction process either the hypothesis
    is verified or it is not.
  • Common tools used in this mode are queries,
    multidimensional analysis and visualization. What
    all have in common are that the user is
    essentially guiding the exploration of the data
    being inspected.

46
Discovery Model
  • A more popular model is the Discovery Model that
    utilizes a process that looks in a database to
    discover and/or predict future patterns. The
    discovery model is divided into two modes
    Descriptive and Predictive.

47
Discovery Model- Descriptive Mode
  • The Descriptive mode finds hidden patterns
    without a predetermined idea or hypothesis about
    what the patterns may be. In other words, the
    Data Mining software or program takes the
    initiative in finding what the interesting
    patterns are, without the user thinking of the
    relevant questions first. In this mode
    information is created about the data with very
    little or guidance from the user. The exploration
    of the data is done in such a way as to yield as
    large a number of useful facts about the data in
    the shortest amount of time.

48
Discovery Model- Predictive Mode
  • In the Predictive mode patterns discovered from
    the database are used to predict the future
    patterns or trends. Predictive modeling allows
    the user to submit records with some unknown
    field values, and the system will guess the
    unknown values based on previous patterns
    discovered from the database.
  • In comparing the two models, one can state that
    Verification can be very inefficient, timely
    and costly. Whereas, Discovery modeling can be
    very efficient, cost effective, less dependent on
    user input and increases modeling accuracy.

49
Predictive Modelling
  • Similar to the human learning experience
  • uses observations to form a model of the
    important characteristics of some phenomenon.
  • Uses generalizations of real world and ability
    to fit new data into a general framework.
  • Can analyze a database to determine essential
    characteristics (model) about the data set.

50
Predictive Modelling
  • Model is developed using a supervised learning
    approach, which has two phases training and
    testing.
  • Training builds a model using a large sample of
    historical data called a training set.
  • Testing involves trying out the model on new,
    previously unseen data to determine its accuracy
    and physical performance characteristics.

51
Predictive Modelling
  • Applications of predictive modelling include
    customer retention management, credit approval,
    cross selling, and direct marketing.
  • Two techniques associated with predictive
    modelling
  • A. classification
  • B. value prediction, distinguished by nature
    of the variable being predicted.

52
Predictive Modelling - Classification
  • Used to establish a specific predetermined class
    for each record in a database from a finite set
    of possible, class values.
  • Two specializations of classification tree
    induction and neural induction.

53
Example of Classification using Tree Induction
54
Example of Classification using Tree Induction
Customer renting property gt 2 years
No
Yes
Rent property
Customer agegt45
No
Yes
Rent property
Buy property
55
Example of Classification using Neural Induction
56
Example of Classification Using Neural Induction
  • Each processing unit (circle) in one layer is
    connected to each processing unit in the next
    layer by a weighted value, expressing the
    strength of the relationship. The network
    attempts to mirror the way the human brain works
    in recognizing patterns by arithmetically
    combining all the variables with a given data
    point.
  • In this way, it is possible to develop nonlinear
    predictive models that learn by studying
    combinations of variables and how different
    combinations of variables affect different data
    sets.

57
Predictive Modelling - Value Prediction
  • Used to estimate a continuous numeric value that
    is associated with a database record.
  • Uses the traditional statistical techniques of
    linear regression and non-linear regression.
  • Relatively easy-to-use and understand.

58
Predictive Modelling - Value Prediction
  • Linear regression attempts to fit a straight line
    through a plot of the data, such that the line is
    the best representation of the average of all
    observations at that point in the plot.
  • Problem is that the technique only works well
    with linear data and is sensitive to the presence
    of outliers (i.e.., data values, which do not
    conform to the expected norm).

59
Predictive Modelling - Value Prediction
  • Although non-linear regression avoids the main
    problems of linear regression, still not flexible
    enough to handle all possible shapes of the data
    plot.
  • Statistical measurements are fine for building
    linear models that describe predictable data
    points, however, most data is not linear in
    nature.

60
Predictive Modelling - Value Prediction
  • Data mining requires statistical methods that can
    accommodate non-linearity, outliers, and
    non-numeric data.
  • Applications of value prediction include credit
    card fraud detection or target mailing list
    identification.

61
Database Segmentation
  • Aim is to partition a database into an unknown
    number of segments, or clusters, of similar
    records.
  • Uses unsupervised learning to discover
    homogeneous sub-populations in a database to
    improve the accuracy of the profiles.

62
Database Segmentation
  • Less precise than other operations thus less
    sensitive to redundant and irrelevant features.
  • Sensitivity can be reduced by ignoring a subset
    of the attributes that describe each instance or
    by assigning a weighting factor to each variable.
  • Applications of database segmentation include
    customer profiling, direct marketing, and cross
    selling.

63
Example of Database Segmentation using a Scatter
plot
64
Database Segmentation
  • Associated with demographic or neural clustering
    techniques, distinguished by
  • Allowable data inputs
  • Methods used to calculate the distance between
    records
  • Presentation of the resulting segments for
    analysis.

65
Example of Database Segmentation using a
Visualization
66
Link Analysis
  • Aims to establish links (associations) between
    records, or sets of records, in a database.
  • There are three specializations
  • Associations discovery
  • Sequential pattern discovery
  • Similar time sequence discovery
  • Applications include product affinity analysis,
    direct marketing, and stock price movement.

67
Link Analysis - Associations Discovery
  • Finds items that imply the presence of other
    items in the same event.
  • Affinities between items are represented by
    association rules.
  • e.g. When customer rents property for more than
    2 years and is more than 25 years old, in 40 of
    cases, customer will buy a property. Association
    happens in 35 of all customers who rent
    properties.

68
Link Analysis - Sequential Pattern Discovery
  • Finds patterns between events such that the
    presence of one set of items is followed by
    another set of items in a database of events over
    a period of time.
  • e.g. Used to understand long term customer buying
    behaviour.

69
Link Analysis - Similar Time Sequence Discovery
  • Finds links between two sets of data that are
    time-dependent, and is based on the degree of
    similarity between the patterns that both time
    series demonstrate.
  • e.g. Within three months of buying property, new
    home owners will purchase goods such as cookers,
    freezers, and washing machines.

70
Deviation Detection
  • Relatively new operation in terms of commercially
    available data mining tools.
  • Often a source of true discovery because it
    identifies outliers, which express deviation from
    some previously known expectation and norm.

71
Deviation Detection
  • Can be performed using statistics and
    visualization techniques or as a by-product of
    data mining.
  • Applications include fraud detection in the use
    of credit cards and insurance claims, quality
    control, and defects tracing.

72
A Summary Data-Driven Techniques
  • Data Visualization
  • Decision Trees
  • Clustering
  • Factor Analysis
  • Neural Network
  • Association Rules
  • Rule Induction
  • Based on Sakhr Younesss book Professional
    Data Warehousing with SQL Server 7.0 and OLAP
    Services

73
Data Visualization
A pie chart showing the sales of a product by
region is sometimes much more effective than
presenting the same data in a text or tabular
form.
9
11
Northeast
South
North
39
21
West
20
East
74
Decision Tree
75
Cluster Analysis
First segment (high incomegt8,000)
Have Children
Second Segment (8000gtmiddle income gt3000)
Married
Last car is A used one
Third Segment (low income lt 3000)
Own car
76
Factor Analysis
  • Unlike cluster analysis, factor analysis builds a
    model from data. The technique finds underlying
    factors, also called latent variables and
    provides models for these factors based on
    variables in the data. For ex., a software
    company is considering a survey to find out the
    nine most perceived attributes of one of their
    products. They might categorize these products to
    categories such as service for technical support,
    availability for training and a help system.
  • Factor analysis is used for grouping together
    products based on a similarity of buying patterns
    so that vendors may bundle several products as
    one to sell them together at a lower price than
    their added individual prices..

77
Neural Networks
78
Association Rules
  • Association models are models that examine the
    extent to which values of one field depend on, or
    are produced by, values of another field. These
    models are often referred to as Market Basket
    Analysis when they are applied to retail
    industries to study the buying patterns of these
    customers, especially in grocery and retail
    stores that issue their own credit cards.
    Charging against these cards gives the store the
    chance to associate the purchases of customers
    with their identities, which allows them to study
    associations among other things.

79
Rules Induction
  • This is a powerful technique that involves a
    large number of rules using a set of if..then
    statements in the pursuit of all possible
    patterns in the dataset. For ex., if the customer
    is a male then, if he is between 30 and 40 years
    of ages, and his income is less than 50,000 and
    more than 20,000, he is likely to be driving a
    car that was bought as new.

80
A Summary Theory-Driven Techniques
  • Correlations
  • T-Tests
  • Analysis of Variables
  • Linear Regression
  • Logistic Regression
  • Discriminate Analysis
  • Forecasting Methods

81
Validating Picking the Model
  • Validating any model that comes out of a data
    mining tool is going to be the
  • most important thing that you can do. The
    validation required for data
  • mining is that after you build the model on some
    historical data, you apply
  • the model to similar historical data from which
    the model was not built.
  • Because the data is historical, you already know
    the outcome so that the
  • accuracy of the predictive model can be measured.
  • One of the most important things that needs to be
    done when you are
  • building a predictive model is to make sure that
    you have picked up the
  • essential patterns in the data that will hold
    true the next time you apply
  • your model.

82
Three Additional Ways in Which Data mining
Supports CRM Initiatives.
  • 1. Database marketing
  • 2. Customer acquisition
  • 3. Campaign optimization

83
Database Marketing
  • Data mining helps database marketers develop
    campaigns that are closer to the targeted needs,
    desires, and attitudes of their customers. If the
    necessary information resides in a database, data
    mining can model a wide range of customer
    activities. The key objective is to identify
    patterns that are relevant to current business
    problems. For example, data mining can help
    answer questions such as "Which customers are
    most likely to cancel their cable TV service?"
    and "What is the probability that a customer will
    spend over 120 from a given store?" Answering
    these types of questions can boost customer
    retention and campaign response rates, which
    ultimately increases sales and returns on
    investment.

84
Database Marketing
  • Database marketing software enables companies to
    send customers and prospective customers timely
    and relevant messages and value propositions.
    Modern campaign management software also monitors
    and manages customer communications on multiple
    channels including direct mail, telemarketing,
    e-mail, the Internet, point of sale, and customer
    service. Furthermore, this software can be used
    to automate and unify diverse marketing campaigns
    at their various stages of planning, execution,
    assessment, and refinement. The software can also
    launch campaigns in response to specific customer
    behaviors, such as the opening of a new account.

85
Database Marketing
  • Generally, better business results are obtained
    when data mining and campaign management work
    closely together. For example, campaign
    management software can apply the data-mining
    model's scores to sharpen the definition of
    targeted customers, thereby raising response
    rates and campaign effectiveness. Furthermore,
    data mining may help to resolve the problems that
    traditional campaign management processes and
    software typically do not adequately address,
    such as scheduling, resource assignment, and so
    forth. Although finding patterns in data is
    useful, data mining's main contribution is
    providing relevant information that enables
    better decision making. In other words, it is a
    tool that can be used along with other tools
    (e.g., knowledge, experience, creativity,
    judgment, etc.) to obtain better results. A
    data-mining system manages the technical details,
    thus enabling decision makers to focus on
    critical business questions such as "Which
    current customers are likely to be interested in
    our new product?" and "Which market segment is
    best for the launch of our new product?"

86
Customer Acquisition
  • The growth strategy of businesses depends
    heavily on acquiring new customers, which may
    require finding people who have been unaware of
    various products and services, who have just
    entered specific product categories (for example,
    new parents and the diaper category), or who have
    purchased from competitors. Although experienced
    marketers often can select the right set of
    demographic criteria, the process increases in
    difficulty with the volume, pattern complexity,
    and granularity of customer data. Highlighting
    the challenges of customer segmentation has
    resulted in an explosive growth in consumer
    databases. Data mining offers multiple
    segmentation solutions that could increase the
    response rate for a customer acquisition
    campaign. Marketers need to use creativity and
    experience to tailor new and interesting offers
    for customers identified through data-mining
    initiatives.

87
Campaign Optimization
  • Many marketing organizations have a variety of
    methods to interact with current and prospective
    customers. The process of optimizing a marketing
    campaign establishes a mapping between the
    organization's set of offers and a given set of
    customers that satisfies the campaign's
    characteristics and constraints, defines the
    marketing channels to be used, and specifies the
    relevant time parameters. Data mining can elevate
    the effectiveness of campaign optimization
    processes by modeling customers' channel-specific
    responses to marketing offers.

88
Topic 6 Classical Techniques Statistics,
Neighborhoods, and Clustering
  • Statistics can help to answer several important
    questions about the
  • data
  • What patterns are there in my database?
  • What is the chance that an event will occur?
  • What patterns are significant?
  • What is a high-level summary of the data that
    gives me some idea of what is contained in my
    database?

89
Statistics --Histogram
  • The first step in understanding statistics is to
    understand how the
  • data is collected into a higher-level formone of
    the most notable
  • Ways of doing this is with the histogram.

of customers or Amount of sales
90
Histogram
Number of customers
3000
2500
2000
1500
1000
500
1
11
21
31
41
51
61
71
81
Ages
91
Linear Regression Is Similar to the Task of
Findingthe Line that Minimizes the Total
Distance to a Set of Data.
Prediction (Average Consumer bank balance)
Predictor (Consumer annual income)
92
Linear Regression
  • The predictive model is the line shown in the
    previous chart. The line
  • will take a given value for a predictor and map
    it into a given value
  • for a prediction. The actual equation would look
    something like
  • Prediction a b predictor. This is just the
    equation for a line Y
  • A bX. As an example for a bank, the predicted
    average consumer
  • bank balance might equal to 1,000 0.01
    customers annual
  • income.

93
Linear Regression
  • Linear regression attempts to fit a straight line
    through a plot of the data, such that the line is
    the best representation of the average of all
    observations at that point in the plot.
  • Problem is that the technique only works well
    with linear data and is sensitive to the presence
    of outliers (i.e.., data values, which do not
    conform to the expected norm).

94
Linear Regression
  • Although non-linear regression avoids the main
    problems of linear regression, still not flexible
    enough to handle all possible shapes of the data
    plot.
  • Statistical measurements are fine for building
    linear models that describe predictable data
    points, however, most data is not linear in
    nature.

95
Linear Regression
  • Data mining requires statistical methods that can
    accommodate non-linearity, outliers, and
    non-numeric data.
  • Applications of value prediction include credit
    card fraud detection or target mailing list
    identification.

96
The Nearest Neighbor Prediction
  • One of the classic areas that nearest neighbor
    has been used for
  • prediction has been in text retrieval. The end
    user defines a document
  • (for ex., a Wall Street Journal) to be retrieved,
    then the nearest
  • neighbor characteristics with these documents
    that have been
  • marked are more likely to be retrieved.
  • Another good example is that the supermarkets
    tend to put similar
  • produces in the same area, for ex., an apple
    closer to an orange than
  • to tomato. Thus, if you know the predictive
    value of one of the
  • objects, you can predict it for the nearest
    neighbors.

97
Data Clustering
  • Clustering analysis is an important means of
    processing multimedia
  • data. It is basically the organization of a
    collection of patterns into
  • clusters of similar objects. Patterns within
    valid cluster are more
  • similar to each other than they are to a pattern
    in a different cluster.

98
Data Clustering
  • Clustering can allow us to carry out the
    following activities
  • that can help in query processing
  • Representing patterns in the data so that we can
    reduce the size of the media
  • Defining a way of measuring the proximity of
    different patterns in the data so that we can
    find the instances that match our example.
  • Clustering or grouping the data in preparation
    for matching
  • Data abstraction, particularly of features that
    we can store as metadata
  • Assessing the output by estimating how good the
    selection is.

99
Clustering and Nearest Neighbor
  • A simple example of clustering would be the
    clustering that most
  • people perform when they do the laundry- grouping
    the permanent
  • press, dry cleaning, whites, and brightly colored
    clothes is important
  • because they have similar characteristics.
  • A simple example of the nearest neighbor
    prediction algorithm Is
  • when you look at the people in your neighborhood.
    You may notice
  • that, in general, you all have somewhat similar
    income.

100
Statistical Analysis of Actual Sales (dollars and
quantities) relative to these Signage Variables-a
predictive modeling example.
  • Content
  • Frequency
  • Depth
  • Focus
  • Depth
  • Scale
  • Length
  • Location
  • Statistical Analysis Correlation, Regression,
    Experiment Design,
  • Optimization. Now it goes into real time
    analysis.

101
Signage
102
Signage
103
Topic 7 Next Generation Techniques Decision
Trees, Networks, and Rules
A Decision Tree
Customer renting property gt 2 years
No
Yes
Rent property
Customer agegt45
No
Yes
Rent property
Buy property
104
A Decision Tree
105
CART and CHAID
  • CART, which stands for Classification and
    Regression Trees, is a data
  • exploration and prediction algorithm developed by
    Leo Breiman,
  • Jerome Friedman, Richard Olshen and Charles
    Stone. It is nicely
  • detailed in their 1984 book, Classification and
    Regression Trees (
  • Breiman, Friedman, Olshen, and Stone, 1984. These
    researchers from
  • Standard University and the University of
    California at Berkeley
  • Showed how this new algorithm could be used on a
    variety of
  • different problems from the detection of chlorine
    from the data
  • contained in a mass spectrum. One of the great
    advantages of CART
  • is that the algorithm has the validation of the
    model and the discovery
  • of the optimally general model built deeply into
    the algorithm.
  • Another popular decision tree technology is CHARD
    (Chi-Square
  • Automatic Interaction Detector). CHARD is similar
    to CART in that
  • it builds a decision tree, but it differs in the
    way that it chooses its
  • splits.

106
B Neural Networks
  • A neural network is loosely based on the way some
    people believe
  • That the human brain is organized and how it
    learns. There are two
  • Main structures of consequence in the neural
    networks
  • The node, which loosely corresponds to the neuron
    in the human brain
  • The link, which loosely corresponds to the
    connections between neutrons (axons, dendrites,
    and synapses) in the human brain.

107
Neural Networks
When customer rents property for more than 2
years and is more than 25 years old, in 40 of
cases, customer will buy a property. Association
happens in 35 of all customers who rent
properties.
108
Example of Classification using Neural Induction
  • Each processing unit (circle) in one layer is
    connected to each processing unit in the next
    layer by a weighted value, expressing the
    strength of the relationship. The network
    attempts to mirror the way the human brain works
    in recognizing patterns by arithmetically
    combining all the variables with a given data
    point.
  • In this way, it is possible to develop nonlinear
    predictive models that learn by studying
    combinations of variables and how different
    combinations of variables affect different data
    sets.

109
How Does a Neural Induction Make a prediction?
  • The value age of 47 is normalized to fall between
    0.0 and 1.0, it has the value of 0.47, and the
    income is normalized to the value of 0.65. This
    simplified neural network makes the prediction of
    no default for a 47-year old making 65,000. The
    links are weighted at 0.7 and 0.1, and the
    resulting value, after multiplying the node
    values by the link weights, is 0.39.

Age
Weighted 0.7
0.47
default
0.39
0.65
Income
Weighted 0.1
0.47(0.7) 0.65(0.1) 0.39
110
C Rule Induction
  • This is a powerful technique that involves a
    large number of rules using a set of if..then
    statements in the pursuit of all possible
    patterns in the dataset. For ex., if the customer
    is a male then, if he is between 30 and 40 years
    of ages, and his income is less than 50,000 and
    more than 20,000, he is likely to be driving a
    car that was bought as new.

111
What Is A Rule?
Rule
Accuracy
Coverage
  • If breakfast cereal purchased, the
    85 20
  • milk is purchased.
  • If bread purchased, then Swiss choose
    15 6
  • will be purchased.
  • If 42 years old and purchased pretzels
    95 0.01
  • and dry roasted peanuts, then beer will
  • be purchased.

112
Topic 8 CRM -The Business Perspective
  • Tools and technologies will be applied to real
    business problems
  • across a variety of industries. They are
  • Customer Profitability provides a blueprint for
    how to define and use customer profitability as
    the bedrock for your CRM processes.
  • Customer Acquisition shows how to use data
    mining to acquire new customers in the most
    profitable way possible.
  • Customer Cross-selling details how the
    technology architecture can be used to increase
    the value of existing customers by applying more
    to them.
  • Customer Retention uses a case study from the
    telecommunications industry to show how to
    execute successful CRM systems to retain your
    profitable customers.
  • Customer Segmentation provides the business
    methodology of how to segment and manage your
    customers in a consistent and repeatable way
    across the enterprise.

113
The Business-Centric View of Data Mining Process
Business Problem
Data
Understand
Define Value
Data Definition
ROI Definition
Data Mining
Define Value
Predictive Model
Predicted ROI
Application
Display
ROI
114
Customer Profitability
  • Customer profitability is the bedrock of data
    mining. Data mining
  • earns its keep by helping you to understand and
    improve Customer
  • Profitability. How does the organization define
    what a profitable
  • customer is versus an unprofitable customer?
    Keeping a customer
  • loyal can have profound effects on per-customer
    profitability. The
  • compounding effect of customer loyalty on
    customer profitability also
  • increases because sales costs are lower and
    revenue generally has
  • increased. Data Mining can be used to predict
    customer profitability,
  • Under a variety of different marketing campaigns.

115
A Customer Value Matrix Showing Recommended
Service Level
Current Value Lifetime Value Potential Value Potential Lifetime Value Customer Service Level Best Service Level
1 High High High High Gold Gold
2 High Low High High Gold Gold
3 High Low High Low Gold Bronze
4 Low Low Low High Bronze Gold
5 Low Low High High Bronze Gold
6 Low Low Low Low Bronze Bronze
Segment
116
A Customer Value Matrix
  • This should be one of the first things that we
    should do
  • with data mining.
  • Segment 1 is our best customers. They will remain
    your best customers through their lives and their
    current value matches their potential.
  • Segment 2 is similar, except that they are likely
    to have low lifetime value, despite their high
    value today, probably because they are not loyal
    and likely switch to a competitor at some time in
    their customer life.
  • Segments 4 and 5 represent customers who, with
    the right care and service, can be transitioned
    to high-value customers, either short-term or
    long-term .
  • Segments 6 represents your low-value customers
    that you will treat with some of your least
    expensive services.

117
Customer Acquisition
  • The traditional approach to customer acquisition
    involved a
  • marketing manager developing a combination of
    mass marketing
  • (magazine advertisements, billboards, etc.) and
    direct marketing (
  • Telemarketing, mail, etc.) campaigns based on
    their knowledge of the
  • Particular customer base that was being targeted.
  • A marketing manager selects the demographics
    (Age, Gender,
  • interest in particular subjects, etc.) and then
    works with a data
  • vendor (sometimes known as a service bureau) to
    obtain Lists of
  • customers who meet those characteristics.
  • Although a marketer with a wealth pf experience
    can often choose relevant
  • demographic selection criteria, the process
    becomes more difficult as the
  • amount of data increases.
  • Data Mining can help this process.

118
Defining Some Key Customer Acquisition Concepts
  • The responses that come in as a result of a
    marketing campaign are called
  • response behaviors. Binary response behaviors
    (either a yes or no) are the
  • simplest kind of response.
  • Beyond binary response behaviors are a type of
    categorical response
  • behaviors which allows for multiple behaviors to
    be defined. The rules that
  • define the behaviors are based on the kind of
    business you are involved in.
  • There are usually several different kinds of
    positive response behaviors that
  • can be associated with an acquisition marketing
    campaign. They are
  • Customer inquiry Purchase of the offered product
    or products
  • Purchase of a product different from the one
    offered.

119
Response Analysis Broken Down By Behaviors
Behavior Measures 12/1/05 12/5/05 12/7/05 12/9/05 Total
Inquiry of Responses 1,556 1,340 328 352 3,576
Purchase A of Responses 210 599 128 167 1.104
Purchase B of Responses 739 476 164 97 1,476
Purchase C of Responses 639 647 113 105 1,504
120
Cross-Selling
  • Cross-selling is the process by which you offer
    your existing customers new
  • products and services. Customers who purchase
    baby diapers might also be
  • interested in hearing about your other baby
    products.
  • One form of cross-selling, sometimes called up
    selling, takes place when the
  • new offer is related to existing purchases by the
    customer. For., ex., an up-
  • sell opportunity might exist for a telephone
    company to market a premium
  • long-distance service to existing long-distance
    customers who currently have
  • the standard service.

121
How Cross-Selling Works
  • Assume that you are a marketing manager for a
    mid-size bank. You
  • have the following products available for your
    customers
  • Value checking account
  • Standard checking account
  • Gold credit card
  • Platinum credit card
  • Primary mortgage
  • Secondary mortgage
  • Of these products, youre responsible for
    marketing the mortgage products to
  • your Customers. Your goal is to find out which
    customers might be interested
  • in a mortgage offering at least 60 days before
    they would apply for the loan.
  • It is important that any predictions are made
    with sufficient lead time (in this
  • case, two months), so that any Interactions with
    the customers take place
  • before they are committed to a relationship with
    your competition.

122
How Cross-Selling Works
  • You have already done some thinking about your
    customers and their
  • motivations in this area and came up with several
    scenarios, which you
  • presented to your boss when pitching this new
    campaign
  • Customer preparing to buy a new home. These
    customers might be building up cash reserves in
    their checking and/or savings account in order to
    put together a down payment.
  • Customer preparing to refinance an existing home.
    These customers might be paying off credit card
    debt (thus making them more acceptable from a
    risk point of view), and hold a mortgage whose
    interest rate is higher than the current interest
    rate.
  • Customer preparing to add a second mortgage.
    These customers might have increasing credit card
    debt, an on-time payment history for their credit
    cards and existing mortgage (which means that
    they are a good risk), and enough equity in their
    house to cover the outstanding credit card
    balance.

123
Data Mining Process for Cross-Selling
  • The actual data mining process contains three
    distinct steps when
  • doing cross-selling process
  • Modeling of individual behaviors
  • Scoring data with predictive models
  • Optimization of the scoring matrices
  • Model A description that adequately explains and
    predicts relevant data that
  • but is generally much smaller than the data
    itself. For real-world
  • applications, a model can be anything from a
    mathematical Equation, to a set
  • of rules that describes customer segments, to the
    computer representation of
  • a complex neural network architecture, which
    translates to several sets of
  • mathematical equations.
  • Predictive model A model created or used to
    perform prediction. In contrast
  • to models created solely for pattern detection,
    exploration or general
  • organization of the data.

124
Customer Retention
  • As industries become more competitive and the
    cost of acquiring new
  • customers increases, the value of retaining
    current customers also increases.
  • for instance, in the cellular phone industry, it
    is estimated that the cost of
  • attracting and signing up a new customer is 300
    or more when the costs of
  • disconnected hardware and sales commissions are
    included. The cost of
  • retaining a current customer, however, can be as
    low as the price of a phone
  • call or the cost of updating their cellular phone
    to the latest technology
  • offering. Although expensive, this is still
    significantly cheaper than signing
  • up a wholly new customer.

125
A Case Study- Cellular Phone Industry
  • Customer churn is the term used in the cellular
    telephone industry to denote
  • the movement of cellular telephone customers from
    one provider to another.
  • In many industries, this is called customer
    attrition, but because of the highly
About PowerShow.com