Data Mining: Crossing the Chasm - PowerPoint PPT Presentation

About This Presentation
Title:

Data Mining: Crossing the Chasm

Description:

... data mining is to make the transition from being an early market technology to ... in the technology adoption life ... Data mining, a great technology ... – PowerPoint PPT presentation

Number of Views:278
Avg rating:3.0/5.0
Slides: 36
Provided by: RakeshA5
Category:

less

Transcript and Presenter's Notes

Title: Data Mining: Crossing the Chasm


1
Data Mining Crossing the Chasm
  • Rakesh Agrawal
  • IBM Almaden Research Center

2
Thesis
  • The greatest challenge facing data mining is to
    make the transition from being an early market
    technology to mainstream technology
  • We have the opportunity to make this transition
    successful

3
Outline
  • Chasm in the technology adoption life cycle, à la
    Geoffrey Moore
  • Experience with Quest/Intelligent Miner
  • Ideas for successful chasm crossing
  • Geoffrey A Moore. Crossing the Chasm. Harper
    Business. http//www.chasmgroup.com

4
Technology Adoption Life Cycle
Pragmatists Stick with the herd!
Conservatives Hold on!
Visionaries Get ahead of the herd!
Skeptics No way!
Techies Try it!
Late Majority
Early Majority
Early Adopters
Laggards
Innovators
Psychographic profile of each group is different
5
Innovators Technology Enthusiasts
  • Intrigued by any fundamental advance in
    technology
  • Like to alpha test new products
  • Can ignore the missing elements
  • Want access to top technologists
  • Want no-profit pricing (preferably free)

Gatekeepers to early adopters
6
Early Adopters Visionaries
  • Driven by vision of dramatic competitive
    advantage via revolutionary breakthroughs
  • Great imagination for strategic applications
  • Not so price-sensitive
  • Want rapid time to market
  • Demand high degree of customization

Fund the development of early market
7
Early Majority Pragmatists
  • Want sustainable productivity improvement
    through evolutionary change
  • Astute managers of mission-critical apps
  • Understand real-world issues and tradeoffs
  • Focus on proven applications want to see the
    solution in production

Bulwark of the mainstream market
8
Late Majority Conservatives
  • Want to stay even with the competition
  • Risk averse
  • Price sensitive
  • Need completely pre-assembled solutions

Extend technology life cycles
9
Laggards Skeptics
  • Driven to maintain status quo
  • Good at debunking marketing hype
  • Disbelieve productivity-improvement arguments
  • Can be formidable opposition to early adoption of
    a technology

Retard the development of high-tech markets
10
Crack in the curve
Chasm
Mainstream Market
Early Market
The greatest peril in the development of a
high-tech market lies in making the transition
from an early market dominated by a few
visionaries to a mainstream market dominated by
pragmatists.
11
Visionaries vs. Pragmatists
  • Adventurous
  • First strike capability
  • Early buy-in
  • State of the art
  • Think big
  • Spend big
  • Prudent
  • Staying power
  • Wait-and-see
  • Industry standard
  • Manage expectation
  • Spend to budget

12
Is data mining following this curve?
  • Yes!!!
  • My personal viewpoint based on Quest/Intelligent
    Miner experience

13
Quest
  • Started as skunk work in early nineties
  • Inspired by needs articulated by industry
    visionaries
  • Transaction data collected over a long period
  • Current tools/SQL dont cut it
  • About ready to throw data

14
Approach
  • Examine real applications
  • Identify operations that cut across applications
  • Design fast, scalable algorithms for each
    operation
  • Develop applications by composing operations

15
Operations
  • Associations
  • Sequential Patterns
  • Similar time series
  • New Operations
  • Completeness, scalability
  • Classification
  • Clustering
  • Deviations
  • Adopted from Statistics/Learning
  • Scalability

http//www.almaden.ibm.com/cs/quest
16
Bringing Quest to market
  • Visionaries who inspired Quest did not become
    first customers
  • Wanted evidence that the technology worked
  • Frustrating attempts to interest major IBM
    customers
  • Integration with existing applications
  • Too-far-out technology
  • Resistance from in-house analytic groups

17
First hits
  • Small information-based companies who provided
    data in exchange for free results
  • CIO who wanted to be seen as the technology
    pioneer in his industry
  • CIO who wanted the success story to feature in
    the companys annual report

Led to the formation of a group offering services
using Quest
18
Characteristics of engagements
  • Mostly associations and sequential patterns
  • Completeness a big plus
  • Unanticipated uses
  • Feedback for further development

19
Into the product land
  • Formation of a small out-of-plan product group
    to productize Quest
  • Facilitated by a closet mathematician
  • Successes of the services group used for market
    validation
  • Continued development and infusion of technology

20
Intelligent Miner
  • Serious product
  • Integrates technologies from various groups
  • Fast, scalable, runs on multiple platforms
  • Several early market success
    stories

http//www.software.ibm.com/data/iminer/
21
Are we in the chasm?
  • Perceived to be sophisticated technology, usable
    only by specialists
  • Long, expensive projects
  • Stand-alone, loosely-coupled with data
    infrastructures
  • Difficult to infuse into existing
    mission-critical applications

22
Chasm Crossing
  • Personal speculations on some technical
    challenges
  • Do not imply IBM research/product directions

23
XML-based Data Mining Standard (1)
  • Model Building
  • A pair of standard DTDs for each operation
  • Interchangeable library of operator
    implementations

Data Specs
Standard DTD
Parameters
Operator
Library
Standard DTD
Model
Ack Mattos, Pirahesh, Schwenkries
24
XML-based Data Mining Standard (2)
Standard DTDs
  • Model Deployment
  • Mapping XML object provides mapping between names
    and format in the model object and the data
    record
  • Model could have been developed on a different
    system

Data Record
Mapping
Model
Application
Library
Standard DTD
Result
25
Implications
  • Standard interfaces for application developers to
    incorporate data mining
  • Coupling with relational databases
  • mappings from DTDs to relational schemas
  • implementation using existing infrastructure

26
Data Mining Benchmarks
  • UC Irvine repository
  • Generating synthetic benchmarks modeled after
    real data sets is a hard problem
  • How to map names into meaningful literals
  • How to preserve empirical distributions

Ack Srikant, Ullman
27
Auto-focus data mining
  • Automatic parameter tuning
  • Automatic algorithm selection (à la join method
    selection in database query optimization)

Ack Andreas Arning
28
Web Greatest opportunity
  • Huge collection of data (e.g. Yahoo collecting
    50GB every day)
  • Universal digital distribution medium makes data
    mining results actionable in fundamentally new
    ways
  • But watch for privacy pitfall

29
Privacy-preserving data mining
  • Technical vs. legislated solutions
  • Implication for data mining algorithms when some
    fields of a data record have been fudged
    according to the users privacy sensitivity

Ack R. Srikant
30
Personalization
  • Internet might provide for the first time tools
    necessary for users to capture information about
    themselves and to selectively release this
    information
  • Will we be providing these tools?
  • John Hagel, Marc Singer. Net Worth. Harvard
    Business School Press.

31
What about Association Rules?
  • Very long patterns
  • Separating wheat from chaff
  • Principled introduction of domain knowledge

32
What else?
  • Formal foundations of data mining

33
Summary
  • Closely couple data mining with database systems
  • Embed data mining into applications
  • Focus on web
  • Standard interfaces
  • Benchmarks
  • Auto focussing
  • Personalization
  • Privacy

34
Concluding remarks
  • Data mining, a great technology
  • Combination of intriguing theoretical questions
    with large commercial interest in the technology
  • Poised for transitioning into mainstream
    technology
  • Will we rise to the challenge as a community?

35
Acknowledgments
Write a Comment
User Comments (0)
About PowerShow.com