Artificial Neural Networks and Data Mining - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Artificial Neural Networks and Data Mining

Description:

Print: coupon A. coupon B. No coupon. 50,000 customer cards for training ... coupon not redeemed (false assignment to A or B): 1 ... – PowerPoint PPT presentation

Number of Views:573
Avg rating:3.0/5.0
Slides: 36
Provided by: uwel8
Category:

less

Transcript and Presenter's Notes

Title: Artificial Neural Networks and Data Mining


1
Artificial Neural NetworksandData Mining
Wismar Business School
  • Uwe Lämmel

www.wi.hs-wismar.de/laemmel Uwe.Laemmel_at_hs-wismar
.de
2
Content
  • Data Mining
  • Classification approach
  • Data Mining Cup
  • 2004 Who will cancel?
  • 2007 Who will get a rebate coupon?
  • 2008 How long will someone participate in a
    lottery ?
  • 2009 ?
  • Clustering approach
  • Behaviour of bank customers

3
Data Mining
  • Data Mining is a
  • systematic and automated discovery and
    extraction
  • of previously unknown knowledge
  • out of huge amount of data.
  • "KDD Knowledge Discovery in Data bases"
    synonym
  • Notion wrong Gold Mining ? Data Mining

4
Data Mining Applications
  • classification
  • clustering
  • association
  • prediction
  • text mining
  • web mining

5
Data Mining Process
CRISP-DM model
6
Content
  • Data Mining
  • Classification approach using NN
  • Data Mining Cup
  • 2004 Who will cancel?
  • 2007 Who will get a rebate coupon?
  • 2008 How long will someone participate in a
    lottery ?
  • 2009 ?
  • Clustering approach
  • Behaviour of bank customers

7
Classification using NN
training p.
  • prerequisite
  • set of training pattern (many patterns)
  • approach
  • code the values
  • divide set of training pattern into
  • training set
  • test set
  • build a network
  • train the network using the training set
  • check the network quality using the test set

coded p.
test set
training set
real data
8
Development of an NN-application
9
Build an Artificial Neural Network
  • Number of Input Neurons?
  • depends on the number of attributes
  • depends on the coding
  • Number of Output Neurons?
  • depends on the coding of the class attribute
  • Number of Hidden Neurons?
  • experiments necessary
  • generally not more than input neurons
  • quarter half of number of input neurons may
    work
  • see capacity of a neural network

10
Experiments using the JavaNNS
  • Build a network
  • Load training-pattern
  • open the Error Graph
  • open the Control Panel
  • Initialize the network
  • try different learning parameter 0.1, 0.2, 0.5,
    0.8
  • Start Learning

11
Getting Results
  • value the error
  • Finally
  • make the test-Pattern the actual one
  • Save Data
  • include output files
  • save as a .res-file
  • Evaluate the .res-file

12
Experiments
  • How can we improve the results?
  • Data pre-processing?
  • Architecture of ANN?
  • Learning Parameters?
  • Evaluation of the results post-processing?

record your work!
13
Content
  • Data Mining
  • Classification approach
  • Data Mining Cup
  • 2004 Who will cancel?
  • 2007 Who will get a rebate coupon?
  • 2008 How long will someone participate in a
    lottery ?
  • 2009 ?
  • Clustering approach
  • Behaviour of bank customers

14
Data Mining Cup www.dataminingcup.de
  • annual competition for students
  • runs April May /June
  • real world problem
  • problem
  • set of training data
  • set of data for classification
  • to be developed classification
  • supported by many companies (data/software)
  • 200 300 participants
  • workshop (user day)

15
DMC2004 A Mailing Action
  • mailing action of a company
  • special offer
  • estimated annual income per customer
  • given
  • 10,000 sets of customer datacontaining 1,000
    cancellers (training)
  • problem
  • test set contains 10,000 customer data
  • Who will cancel ?
  • Whom to send an offer?

16
Mailing Action Aim?
  • no mailing action
  • 9,000 x 72.00 648,000
  • everybody gets an offer
  • 1,000 x 43.80 9,000 x 66.30 640,500
  • maximum (100 correct classification)
  • 1,000 x 43.80 9,000 x 72.00 691,800

17
Goal Function Lift
  • basis no mailing action 9,000 72.00
  • goal extra income
  • liftM 43.8 cM 66.30 nkM 72.00 nkM

18
Data
?----- 32 input data ------?
ltimportant
resultsgt
missing values
19
Feed Forward Network What to do?
  • train the net with training set (10,000)
  • test the net using the test set ( another 10,000)
  • classify all 10,000 customer into canceller or
    loyal
  • evaluate the additional income

20
Results
data mining cup 2002
  • gain
  • additional income by the mailing actionif
    target group was chosen according analysis

21
DMC 2007 Rebate System
  • Check-out couponing allows an individual coupon
    generation at the check-out
  • The coupon is printed at the end of the sales
    slip depending on the current customer.
  • Questions
  • How can the retailer identify whether a customer
    is a potential couponing customer?
  • On what coupons he will respond?

22
Couponing
  • Print
  • coupon A
  • coupon B
  • No coupon
  • 50,000 customer cards for training
  • Classify another 50,000 customer!
  • Cost function
  • coupon not redeemed (false assignment to A or B)
    1
  • coupon A redeemed (correct assignment to A) 3
  • coupon B redeemed (correct assignment to B) 6
  • Maximize the value!

23
Data Understanding
  • What is the meaning of the attributes?
  • Type and range of values?

24
20202 Network
Profit 3?AA 6 ? BB (NANBBAAB)
  • results
  • winner 2007 7,890
  • my version 6,714
  • our students 6,468 (73/230)

25
DMC2008 Participation in a Lottery
  • Predicting, at the beginning of the lottery, how
    long participants will participate
  • 0 The first ticket has not been paid for
  • 1 Only the ticket for the first class has been
    paid for
  • 2 Only the first two classes were played
  • 3 The lottery was played until the end but no
    ticket purchased for the following lottery
  • 4 At least first ticket for the following
    lottery purchased

cost matrix
26
Data
  • 113,476 pattern!
  • 69 attributes
  • new customer (yes/no)
  • age
  • bank
  • car

27
10040205 Network
  • results
  • 1,030,240 RWTH Aachen (1) 1,024,535 RWTH
    Aachen (8)
  • 865,565 Bauhaus Univ. Weimar (100)
  • Univ. Wismar 878,550 835,035
  • 1,494,315 (212)

28
DMC 2009?
29
Content
  • Data Mining
  • Classification approach
  • Data Mining Cup
  • 2004 Who will cancel?
  • 2007 Who will get a rebate coupon?
  • 2008 How long will someone participate in a
    lottery ?
  • 2009 ?
  • Clustering approach
  • Behaviour of bank customers

30
Clustering Transaction Data
  • Cooperation
  • Hochschule Wismar
  • HypoVereinsbank
  • Medienhaus Rostock
  • Issue
  • What information can be extracted from turnover
    time series?
  • Strategy
  • Clustering time series data
  • Assign customers/accounts to clusters
  • Examine clusters

31
Transaction Data Time Series
  • Corporate clients
  • 223 branches
  • Cumulated transactions per
  • Month
  • Account
  • Type of transaction
  • ... for a total of 6 years
  • Original financial data not suitable
  • Order of values is important
  • Time displacements are problematic

32
Fourier versus Original Data
  • No displacement
  • Similarity detected on both
  • transaction curve and
  • frequency spectrum

Data is displaced frequency spectrum shows
similarity
33
Using a classification model
34
Clustering Prediction Results
  • 140.000 records
  • 1 record 1 account
  • 6x5 SOM max. 30 clusters
  • average changes of cluster assignments ca. 19

Variability per Business Sector22,3 Taxi 239/107
022,3 Ship Broker Offices 64/47120,9 Churches
228/109120,2 Trucking 1010/5008
35
Ende
Write a Comment
User Comments (0)
About PowerShow.com