Cutting edge curricula: Teaching data mining April 25, 2002 Tom Nugent, Senior Account Executive, SP - PowerPoint PPT Presentation

About This Presentation
Title:

Cutting edge curricula: Teaching data mining April 25, 2002 Tom Nugent, Senior Account Executive, SP

Description:

Click to edit Master text styles. Second level. Cutting edge curricula: Teaching data mining ... Click to edit Master text styles. Second level. Information ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 42
Provided by: Adm4
Category:

less

Transcript and Presenter's Notes

Title: Cutting edge curricula: Teaching data mining April 25, 2002 Tom Nugent, Senior Account Executive, SP


1
Cutting edge curricula Teaching data mining
April 25, 2002Tom Nugent, Senior Account
Executive, SPSS Inc.Greg James, Vice President,
National City Corporation
2
Todays agenda
  • The vision
  • Information growth
  • A brief history of data mining
  • Current situation
  • MKT 696 Data mining in marketing
  • Resources
  • Summary and recommendations

3
Vision
  • Data mining is an emerging, interdisciplinary
    skill-set focused on exploiting our
    ever-expanding data
  • Data mining will eventually become an accepted
    standard research practice to augment the
    traditional approaches weve employed for
    centuries

4
Our challenge as educators
  • Educational institutions will incorporate data
    mining tools and techniques into their mainstream
    curriculums within the next five years
  • The demand for these skills is emanating from
    industry and RD, where it is necessary to
    effectively extract information from massive,
    heterogeneous data sources

5
Information growth
  • In the last 20 years weve gone from data poor
    to data rich
  • This is true for all disciplines
  • Information growth is re-defining the way we
    analyze data

6
Information growth
  • According to a recent study by the U.C. Berkeley
    School of Information Management and Systems
  • It's taken the entire history of humanity
    through 1999 to accumulate 12 exabytes of
    information
  • By the middle of 2002 the second dozen exabytes
    will have been created

7
Information growth
  • The world produces 1 - 2 exabytes of unique
    information per year
  • What is an exabyte?
  • 1 000 000 000 000 000 000 bytesor
  • 1 000 000 000 gigabytesor
  • 50,000 times the volume of information in the
    Library of Congress

8
Information growth
  • The world's total yearly production of print,
    film, optical, and magnetic content would require
    roughly 1.5 billion gigabytes of storage. This is
    the equivalent of 250 megabytes per person for
    each man, woman, and child on earth.

http//www.sims.berkeley.edu/research/projects/how
-much-info/
9
Information growth
Worldwide PC hard drive capacity shipped.1999
Winchester Disk Drive Market Forecast and Review,
International Data Corporation report. (Some
years forecast)
10
Information growth
Hard drive cost per gigabyte.1999 Winchester
Disk Drive Market Forecast and Review,
International Data Corporation report. (Some
years forecast)
11
A brief history ofdata mining
  • Pre-1990s
  • Single algorithm programs
  • Lots of tinkering and fiddling
  • Non-existent user interfaces
  • You needed to be a programmer, domain expert,
    statistician and AI guru all at the same time
  • Existing software did not scale-up to large
    volumes of data

12
A brief history ofdata mining
  • Early 1990s
  • Early commercialized packages
  • Generally not called DM, referred to by their
    primary algorithms (decision trees, neural nets,
    etc.)
  • Basic user interfaces - could be used by
    non-programmers once data was properly formatted
  • Some packages offered sets of closely related
    algorithms

13
A brief history ofdata mining
  • Mid 1990s
  • First DM suites
  • DM now recognizable as a data analysis
    methodology
  • Movement among developer to offer a complete set
    of DM algorithms and supporting operations within
    the same, integrated package
  • Greatly enhanced user interfaces, database
    access, graphics and reporting

14
A brief history ofdata mining
  • Late 1990s
  • Full-featured DM suites
  • Data mining expanded out of leading business and
    research groups as all the requisite technologies
    matured (databases, workstations, DM suites)
  • Vertical applications for DM became more widely
    recognized (database marketing, fraud detection,
    personalization, genomics, etc.)
  • New emphasis on embracing traditional statistical
    methodologies

15
A brief history ofdata mining
  • Current Trends
  • DM methods are being incorporated into most
    disciplines, lessening the technical skill
    requirements for knowledge workers
  • Embedded DM functionality is being deployed
    aggressively in operational and decision support
    systems
  • Text mining is taking its place alongside numeric
    mining, opening an even larger venue for
    computer-assisted knowledge discovery

16
Current situation
  • Data minings future is interdisciplinary but its
    past is parochial
  • Computer Science
  • Statistics
  • Current advancements are being driven by
    commercial agendas
  • Technological
  • Economical

17
Current situation
DomainKnowledge
ProjectManagement
Statistics
DM Processes
DM Applications
MachineLearning
Research
DM Software
Databases
18
Current situation
  • Sequence of Adoption
  • Research Development Organizations
  • Leading Computer Science and Business Schools
  • Professional/Continuing Education
  • Mainstream Business Schools
  • Information/Library Science Schools

19
Current situation
  • The available educational materials reflect this
    heritage
  • Current materials are either too technical or too
    general for any venue other than Computer Science
    or Professional/Continuing Education
  • Several real-world case studies have been
    published, but they have not been thoroughly
    vetted for classroom use

20
(No Transcript)
21
Description
  • Covers the major techniques of data mining and
    their application to consumer marketing
  • Students are introduced to leading commercial
    data mining software
  • Hands-on projects are taken from recent marketing
    case studies to reinforce the concepts and
    theories taught in the classroom
  • Students are encouraged to explore data with
    their own creativity

22
Course outline
  • Introduction
  • The Data Mining Process A Marketing Perspective
  • The Data Mining Process A Data Analysis
    Perspective
  • An Overview of Data Mining Techniques
  • Data and Data Preparation
  • Customer Acquisition
  • Customer Expansion
  • Customer Retention
  • Customer Segmentation
  • Customer Profitability

23
Student body
  • 50 are marketing majors
  • 10 are non-marketing majors
  • 40 are non-major auditing
  • 85 are graduate students
  • 15 are undergraduate students

24
Resources
  • Software
  • Data Sets
  • Instructors
  • Curriculum
  • Case Studies
  • Books

25
Software
  • Clementine
  • SPSS Clementine training courses
  • Graduate Packs
  • AnswerTree
  • Windows 2000 or better
  • 256 Mb RAM or better
  • 1 Gb disk space or better
  • CD-RW drive

26
Data sets
  • KDD 97/98 Cup Competition
  • A dataset compiled by a non-profit organization
    of donation solicitations and responses
  • Well documented and studied
  • Has been used for building predictive models and
    exploratory analysis
  • URL http//kdd.ics.uci.edu/databases/kddcup98/kddc
    up98.html

27
Data sets
  • COIL 2000
  • A dataset compiled by an insurance company to
    predict who would be interested in purchasing a
    motor-home insurance policy
  • Well documented and studied
  • Has been used for building predictive models and
    exploratory analysis
  • URLhttp//kdd.ics.uci.edu/databases/tic/tic.html

28
Instructors
DomainKnowledge
ProjectManagement
Statistics
DM Processes
DM Applications
MachineLearning
Research
DM Software
Databases
29
Curriculum
  • A marketing orientation
  • Focusing on the application of data mining to
    consumer marketing problems
  • Material structured around customer relationship
    management functions
  • Thorough coverage of business performance
    objectives and evaluation

30
Curriculum
  • CRM case study
  • A complete case study is presented in the
    early weeks of the course
  • The case study quantifies all expected benefits
    from incremental improvements in customer
    acquisition, retention, optimization, and
    expansion
  • This case study provides the logical foundation
    for all subsequent business cases discussed in
    the class
  • This approach reinforces in the students mind
    the practical implications for all data mining
    exercises

31
Case studies
  • Publicly available case studies
  • KDD 97/98
  • COIL 2000
  • Proprietary case studies
  • Financial services direct marketing survey
  • Retailer market-basket analysis dataset
  • Medical insurance provider procedure dataset

32
Case studies
  • Data Mining Exercises
  • Two objectives
  • To drill students on the use of the software
  • To reinforce the business application of the data
    mining technique
  • Exercises must be carefully designed
  • Major outcomes should be known
  • Datasets need to be carefully sized for lab time

33
Textbooks
  • Data Mining Techniques for Marketing, Sales, and
    Customer Support, Berry and Linoff, Wiley, 1997
    ISBN 0471179809
  • Good overview of data mining techniques
  • Not a textbook
  • No structured exercises
  • Content geared towards a professional audience

34
Textbooks
  • Building Data Mining Applications for CRM,
    Berson, Smith and Thearling, McGraw-Hill, 2000
    ISBN 0071344446
  • Introduction to Analytic CRM
  • Not a textbook
  • No structured exercises
  • Content geared towards a management audience

35
Other books
  • Computer Science or Statistics
  • Data Mining Concepts and Techniques, Han and
    Kamber, Morgan Kaufmann, 2000 ISBN 1558604898
  • Data Mining Practical Machine Learning Tools and
    Techniques with Java Implementations, Witten
    and Frank, Morgan Kaufmann, 1999 ISBN 1558605525

36
Other books
  • Marketing
  • Mastering Data Mining The Art and Science
    of Customer Relationship Management,Berry and
    Linoff, Wiley, 1999 ISBN 0471331236
  • Data Mining Cookbook Modeling Data for
    Marketing, Risk and Customer Relationship
    Management,Parr Rud, Wiley, 2000 ISBN
    0471385646
  • Accelerating Customer Relationships Using CRM
    and Relationship Technologies, Swift, Prentice
    Hall, 2000 ISBN 0130889849

37
Other books
  • MIS or Management
  • Exploration Warehousing Turning Business
    Information into Business Opportunity, Terdeman,
    Imhoff and Inmon, Wiley, 2000 ISBN 0471374733

38
Summary
  • Market demand is growing for data mining classes
  • Developing applied data mining courses is getting
    easier
  • It is too early to predict whether data mining
    will remain a distinct elective or it will be
    absorbed into existing curricula

39
Recommendations
  • Focus on supplying skills where market demand
    exists
  • Exploit data minings highly interactive nature
  • Develop relationships with local organizations
  • Do not rely on an under-powered computer lab
  • Consider entering one of the annual data mining
    competitions

40

http//www.dataminingsummit.com
Questions?
41
Thank you!tnugent_at_spss.comsales_at_spss.com1.80
0.543.2185
Write a Comment
User Comments (0)
About PowerShow.com