Title: Cutting edge curricula: Teaching data mining April 25, 2002 Tom Nugent, Senior Account Executive, SP
1Cutting edge curricula Teaching data mining
April 25, 2002Tom Nugent, Senior Account
Executive, SPSS Inc.Greg James, Vice President,
National City Corporation
2Todays agenda
- The vision
- Information growth
- A brief history of data mining
- Current situation
- MKT 696 Data mining in marketing
- Resources
- Summary and recommendations
3Vision
- Data mining is an emerging, interdisciplinary
skill-set focused on exploiting our
ever-expanding data
- Data mining will eventually become an accepted
standard research practice to augment the
traditional approaches weve employed for
centuries
4Our challenge as educators
- Educational institutions will incorporate data
mining tools and techniques into their mainstream
curriculums within the next five years
- The demand for these skills is emanating from
industry and RD, where it is necessary to
effectively extract information from massive,
heterogeneous data sources
5Information growth
- In the last 20 years weve gone from data poor
to data rich
- This is true for all disciplines
- Information growth is re-defining the way we
analyze data
6Information growth
- According to a recent study by the U.C. Berkeley
School of Information Management and Systems
- It's taken the entire history of humanity
through 1999 to accumulate 12 exabytes of
information
- By the middle of 2002 the second dozen exabytes
will have been created
7Information growth
- The world produces 1 - 2 exabytes of unique
information per year
- What is an exabyte?
- 1 000 000 000 000 000 000 bytesor
- 1 000 000 000 gigabytesor
- 50,000 times the volume of information in the
Library of Congress
8Information growth
- The world's total yearly production of print,
film, optical, and magnetic content would require
roughly 1.5 billion gigabytes of storage. This is
the equivalent of 250 megabytes per person for
each man, woman, and child on earth.
http//www.sims.berkeley.edu/research/projects/how
-much-info/
9Information growth
Worldwide PC hard drive capacity shipped.1999
Winchester Disk Drive Market Forecast and Review,
International Data Corporation report. (Some
years forecast)
10Information growth
Hard drive cost per gigabyte.1999 Winchester
Disk Drive Market Forecast and Review,
International Data Corporation report. (Some
years forecast)
11A brief history ofdata mining
- Pre-1990s
- Single algorithm programs
- Lots of tinkering and fiddling
- Non-existent user interfaces
- You needed to be a programmer, domain expert,
statistician and AI guru all at the same time
- Existing software did not scale-up to large
volumes of data
12A brief history ofdata mining
- Early 1990s
- Early commercialized packages
- Generally not called DM, referred to by their
primary algorithms (decision trees, neural nets,
etc.)
- Basic user interfaces - could be used by
non-programmers once data was properly formatted
- Some packages offered sets of closely related
algorithms
13A brief history ofdata mining
- Mid 1990s
- First DM suites
- DM now recognizable as a data analysis
methodology
- Movement among developer to offer a complete set
of DM algorithms and supporting operations within
the same, integrated package
- Greatly enhanced user interfaces, database
access, graphics and reporting
14A brief history ofdata mining
- Late 1990s
- Full-featured DM suites
- Data mining expanded out of leading business and
research groups as all the requisite technologies
matured (databases, workstations, DM suites)
- Vertical applications for DM became more widely
recognized (database marketing, fraud detection,
personalization, genomics, etc.)
- New emphasis on embracing traditional statistical
methodologies
15A brief history ofdata mining
- Current Trends
- DM methods are being incorporated into most
disciplines, lessening the technical skill
requirements for knowledge workers
- Embedded DM functionality is being deployed
aggressively in operational and decision support
systems
- Text mining is taking its place alongside numeric
mining, opening an even larger venue for
computer-assisted knowledge discovery
16Current situation
- Data minings future is interdisciplinary but its
past is parochial
- Computer Science
- Statistics
- Current advancements are being driven by
commercial agendas
- Technological
- Economical
17Current situation
DomainKnowledge
ProjectManagement
Statistics
DM Processes
DM Applications
MachineLearning
Research
DM Software
Databases
18Current situation
- Sequence of Adoption
- Research Development Organizations
- Leading Computer Science and Business Schools
- Professional/Continuing Education
- Mainstream Business Schools
- Information/Library Science Schools
19Current situation
- The available educational materials reflect this
heritage
- Current materials are either too technical or too
general for any venue other than Computer Science
or Professional/Continuing Education
- Several real-world case studies have been
published, but they have not been thoroughly
vetted for classroom use
20(No Transcript)
21Description
- Covers the major techniques of data mining and
their application to consumer marketing
- Students are introduced to leading commercial
data mining software
- Hands-on projects are taken from recent marketing
case studies to reinforce the concepts and
theories taught in the classroom
- Students are encouraged to explore data with
their own creativity
22Course outline
- Introduction
- The Data Mining Process A Marketing Perspective
- The Data Mining Process A Data Analysis
Perspective
- An Overview of Data Mining Techniques
- Data and Data Preparation
- Customer Acquisition
- Customer Expansion
- Customer Retention
- Customer Segmentation
- Customer Profitability
23Student body
- 50 are marketing majors
- 10 are non-marketing majors
- 40 are non-major auditing
- 85 are graduate students
- 15 are undergraduate students
24Resources
- Software
- Data Sets
- Instructors
- Curriculum
- Case Studies
- Books
25Software
- Clementine
- SPSS Clementine training courses
- Graduate Packs
- AnswerTree
- Windows 2000 or better
- 256 Mb RAM or better
- 1 Gb disk space or better
- CD-RW drive
26Data sets
- KDD 97/98 Cup Competition
- A dataset compiled by a non-profit organization
of donation solicitations and responses
- Well documented and studied
- Has been used for building predictive models and
exploratory analysis
- URL http//kdd.ics.uci.edu/databases/kddcup98/kddc
up98.html
27Data sets
- COIL 2000
- A dataset compiled by an insurance company to
predict who would be interested in purchasing a
motor-home insurance policy
- Well documented and studied
- Has been used for building predictive models and
exploratory analysis
- URLhttp//kdd.ics.uci.edu/databases/tic/tic.html
28Instructors
DomainKnowledge
ProjectManagement
Statistics
DM Processes
DM Applications
MachineLearning
Research
DM Software
Databases
29Curriculum
- A marketing orientation
- Focusing on the application of data mining to
consumer marketing problems
- Material structured around customer relationship
management functions
- Thorough coverage of business performance
objectives and evaluation
30Curriculum
- CRM case study
- A complete case study is presented in the
early weeks of the course
- The case study quantifies all expected benefits
from incremental improvements in customer
acquisition, retention, optimization, and
expansion - This case study provides the logical foundation
for all subsequent business cases discussed in
the class
- This approach reinforces in the students mind
the practical implications for all data mining
exercises
31Case studies
- Publicly available case studies
- KDD 97/98
- COIL 2000
- Proprietary case studies
- Financial services direct marketing survey
- Retailer market-basket analysis dataset
- Medical insurance provider procedure dataset
32Case studies
- Data Mining Exercises
- Two objectives
- To drill students on the use of the software
- To reinforce the business application of the data
mining technique
- Exercises must be carefully designed
- Major outcomes should be known
- Datasets need to be carefully sized for lab time
33Textbooks
- Data Mining Techniques for Marketing, Sales, and
Customer Support, Berry and Linoff, Wiley, 1997
ISBN 0471179809
- Good overview of data mining techniques
- Not a textbook
- No structured exercises
- Content geared towards a professional audience
34Textbooks
- Building Data Mining Applications for CRM,
Berson, Smith and Thearling, McGraw-Hill, 2000
ISBN 0071344446
- Introduction to Analytic CRM
- Not a textbook
- No structured exercises
- Content geared towards a management audience
35Other books
- Computer Science or Statistics
- Data Mining Concepts and Techniques, Han and
Kamber, Morgan Kaufmann, 2000 ISBN 1558604898
- Data Mining Practical Machine Learning Tools and
Techniques with Java Implementations, Witten
and Frank, Morgan Kaufmann, 1999 ISBN 1558605525
36Other books
- Marketing
- Mastering Data Mining The Art and Science
of Customer Relationship Management,Berry and
Linoff, Wiley, 1999 ISBN 0471331236
- Data Mining Cookbook Modeling Data for
Marketing, Risk and Customer Relationship
Management,Parr Rud, Wiley, 2000 ISBN
0471385646 - Accelerating Customer Relationships Using CRM
and Relationship Technologies, Swift, Prentice
Hall, 2000 ISBN 0130889849
37Other books
- MIS or Management
- Exploration Warehousing Turning Business
Information into Business Opportunity, Terdeman,
Imhoff and Inmon, Wiley, 2000 ISBN 0471374733
38Summary
- Market demand is growing for data mining classes
- Developing applied data mining courses is getting
easier
- It is too early to predict whether data mining
will remain a distinct elective or it will be
absorbed into existing curricula
39Recommendations
- Focus on supplying skills where market demand
exists
- Exploit data minings highly interactive nature
- Develop relationships with local organizations
- Do not rely on an under-powered computer lab
- Consider entering one of the annual data mining
competitions
40http//www.dataminingsummit.com
Questions?
41Thank you!tnugent_at_spss.comsales_at_spss.com1.80
0.543.2185