Data Mining and Knowledge Discovery - PowerPoint PPT Presentation

About This Presentation
Title:

Data Mining and Knowledge Discovery

Description:

More fine-grained segmentation down to the cluster or individual level ... Pfizer, Drug Discovery: using data mining to find new drugs ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 40
Provided by: lia9
Category:

less

Transcript and Presenter's Notes

Title: Data Mining and Knowledge Discovery


1
  • Data Mining and Knowledge Discovery
  • for Strategic Business Optimization
  • Peter van der Putten
  • ALP Group, LIACS KiQ Ltd
  • November 2004

2
Why is a business in business?
  • Successful businesses create a lot of added value
    for their customers and capture it
  • Maximize long term profit
  • Optimize Maximize sales, minimize costs,
    minimize risk

3
Challenges
  • Businesses are bigger
  • Fragmentation of products, customer interaction
    channels, market segments
  • Fierce competition, chaotic economic climate and
    dynamic customer behavior
  • Data glut information overflow
  • Solution data mining knowledge discovery for
    strategic business optimization

4
Credit scoring case minimizing loan risk while
maximizing loan acception
5
Marketing case maximizing direct mail response
while minimizing cost
A model was created that predicts the probability
to respond to a mailing. By using the model to
select customers to mail we could reach 50 of
the responders by mailing only 20 of all
customers
6
Siebel
OMEGA predicts a slight preference for general
insurance and offers a one-click cross-sell
button.
Although the next customer might have preferences
as well, the exit risk is overriding. Using a
combination of predictive models and business
rules, OMEGA suggests to Siebel an immediate
attempt to retain the customer.
OMEGA offers Siebel the appropriate text for its
script engine.
Within general insurance, OMEGA predicts a
preference for car insurance and offers one-click
access to the appropriate script.
OMEGA again offers Siebel the appropriate text to
execute a retention script.
7
Overview
  • Why Data Mining?
  • The Data Mining Process
  • Data Mining Tasks
  • Data Mining Techniques
  • Future Outlook
  • Data Mining Opportunities by Sector and Function
  • QA

8
Some working definitions.
  • Data Mining and Knowledge Discovery in
    Databases (KDD) are used interchangeably
  • Data mining
  • the discovery of interesting, meaningful and
    actionable patterns hidden in large amounts of
    data
  • Multidisciplinary field originating from
    artificial intelligence, pattern recognition,
    statistics, machine learning, econometrics, .

9
Data mining is a process
  • Model Development
  • Objective
  • Data collection preparation
  • Model construction
  • Model evaluation
  • Combining models with business knowledge into
    decision logic
  • Model / decision logic deployment
  • Model / decision logic monitoring

10
Data mining tasks
  • Undirected, explorative, descriptive,
    unsupervised data mining
  • Matching search
  • Profile rule extraction
  • Clustering segmentation
  • Directed, predictive, supervised data mining
  • Predictive modeling

11
Data mining task example Clustering
segmentation
12
Data mining task example Clustering
segmentation
13
Start Looking Glass
14
Tussenresultaat looking glass
15
Resultaat Looking Glass
16
Resultaat Looking Glass
17
Data mining task examplepredictive modeling
18
Data mining task examplepredictive modeling
Collected data
19
Data mining task examplepredictive modeling
Known customer behaviour
20
Data mining task examplepredictive modeling
score (0 x Income) (-1 x Age) (25 x
Children)
21
Data mining task examplepredictive modeling
  • Recruitment
  • Who will respond to a mailing campaign?
  • To who can we cross sell which products?
  • What will be the customer value one year from
    now?
  • Retention
  • Who is going to cancel his/her mobile phone
    subscription. Should I attempt to keep this
    customer?
  • Which customers have accounts that will go
    dormant?
  • Risk
  • Should I sell a loan to this person?
  • How much money will someone claim on a policy?
  • Is this caller going to pay his bills?

22
Data mining techniques for predictive modeling
  • Linear and logistic regression
  • Decision trees
  • Neural Networks
  • Genetic Algorithms
  • .

23
Linear Regression Models
score (0 x Income) (-1 x Age) (25 x
Children)
24
Regression in pattern space
Only a single line available in pattern space to
separate classes
Class square
income
Class circle
age
25
Decision Trees
20000 customers
response 1
Income gt150000?
no
yes
18800 customers
1200 customers
Purchases gt10?
balancegt50000?
no
yes
no
800 customers
400 customers
etc.
response 1,8
response 0,1
26
Decision Trees in Pattern Space
Line pieces perpendicular to axes Each line is a
split in the tree, two answers to a question
income
age
27
Infotrees (Genetic Programming)
  • Nested regression formulas
  • sum(average(region, spend), max(age, children))

28
Infotrees in Pattern Space
Infotrees can seperate any class in pattern
space, even if the class boundary is non-linear ?
Can model complex customer behavior
income
age
29
Genetic Algorithms / Programming
  • How to find the best Infotree? Genetic algorithms
  • Based on the idea of evolution
  • Start with (random) Infotrees
  • Build a new generation
  • Fittest models can reproduce to create offspring,
    worst models die
  • Small amount of mutation occurs to keep exploring
  • Repeat process

30
Notes about Infotree models Cross-over
  • New models can be created by cross-over
  • part of one model is swapped with part of another
  • parts may chosen randomly or intelligently

31
Notes about Infotree modelsMutation
  • New models can be created by mutation
  • part of a model (a sub-tree, operator or
    predictor) is changed
  • part and type of change may chosen randomly or
    intelligently

Sub-tree
convex
concave
children
age
Operator
convex
concave
children
age
convex
Predictor
concave
children
age
32
Short Demo(if time allows)
  • Model to predict caravan policy ownership
  • Combining this model with other models and
    business rules

33
Data Mining the Future
  • Business (marketing)
  • More fine-grained segmentation down to the
    cluster or individual level
  • More personalised actions, inbound and outbound,
    in all customer contact channels
  • Optimization of both value for the business and
    the customer
  • Privacy
  • Technical
  • From Data Mining to Decisioning, combining
    multiple models with business rules
  • Monitoring business and model performance
  • Data Mining Process Automation

34
Lets discussData Mining Opportunities by
Function
  • Marketing, Sales, CRM
  • Product Development, RD
  • Manufacturing, Production, Logistics
  • Customer service
  • Finance
  • Procurement
  • Human Resources
  • IT
  • .

35
Lets discussData Mining Opportunities by Sector
  • Retail
  • Telco
  • Pharma
  • Government
  • Automotive
  • Oil
  • Charity
  • Consumers / Citizens
  • .

36
The Paper Requirements
  • 2500 words -10, APA style references
  • No plagiarism / copying! Rephrase in your own
    words, reference, cite quote
  • Two parts of each 1250 words
  • Your grasp of the research topic what is data
    mining? Own interpretation, clear, put into
    context
  • Memo to CEO/CIO of a specific company / industry
    what are the benefits/changes/opportunities and
    next steps (best practice, proof of concept)?
    Impact, convincing, plan to action.

37
The Paper Suggestions
  • Suggestions for companies
  • KPN Mobile, Marketing how to reduce loss of
    customers to competitors
  • Dutch Police, Strategic Innovation opportunities
    for law enforcement, privacy implications
  • Pfizer, Drug Discovery using data mining to find
    new drugs
  • Google, Product Management / RD opportunities
    for new data mining features to enhace customer
    experience
  • Your Idea!

38
The Paper Resources
  • Webpage for this talk
  • http//www.liacs.nl/putten/ictvision.html
  • General Writing Resources
  • http//www.liacs.nl/putten/writingpapers.html
  • Homepage
  • www.liacs.nl/putten , mail putten_at_liacs.nl

39
Dilberts Perspective on Data Mining
Write a Comment
User Comments (0)
About PowerShow.com