Machine Learning meets the Real World: Successes and new research directions - PowerPoint PPT Presentation

About This Presentation
Title:

Machine Learning meets the Real World: Successes and new research directions

Description:

Machine Learning meets the Real World: Successes and new research directions Andrea Pohoreckyj Danyluk Department of Computer Science Williams College, Williamstown, MA – PowerPoint PPT presentation

Number of Views:164
Avg rating:3.0/5.0
Slides: 45
Provided by: Compute85
Category:

less

Transcript and Presenter's Notes

Title: Machine Learning meets the Real World: Successes and new research directions


1
Machine Learning meets the Real WorldSuccesses
and new research directions
  • Andrea Pohoreckyj Danyluk
  • Department of Computer Science
  • Williams College, Williamstown, MA
  • October 11, 2002

2
Data, data everywhere...
  • Scientific data collection routinely produces
    gigabytes of data per day
  • Telecommunications ATT produces 275 million
    call records
  • Web Google handles 70 million searches
  • Retail WalMart records 20 million sales
    transactions

3
A wealth of information
  • Scientific data
  • Detection of oil spills from satellite images
  • Prediction of molecular bioactivity for drug
    design
  • Telecommunications
  • Fraud detection to distinguish between bad and
    normal usage of cell phones

4
A wealth of information
  • Web mining
  • Characterize killer pages
  • Retail
  • Determine better product placement
  • Direct mail
  • Predict who is most likely to donate to a charity

5
Machine learning success(Machine learning is
ubiquitous)
  • Scientific discovery
  • Detection of oil spills from satellite images
  • Telecommunications
  • Diagnosis of problems in the local loop
  • Printing
  • Determine causes of banding (printing cylinder
    problems)
  • Control
  • Self-steering vehicles

6
Why research in machine learning is so good today
  • Research in machine learning benefits from
  • Abundant data
  • Interest in fielding new applications
  • Even more data
  • Push on limits of our understanding, technology,
    etc.

7
Plan for this talk
  • Original
  • Discuss success stories and failures
  • Failures help identify new areas of research
  • New plan
  • One success story in detail
  • Lesson learned can identify new areas of
    research even when we succeed

8
Induction of decision trees
  • Not the only (or even the most hot) algorithms
  • Have been used in many contexts
  • Important for understanding our success story
    local-loop network diagnosis

9
Inductive learning
  • Given a collection of observations of the form
    (ltxgt, fltxgt)
  • Find gltxgt that approximates fltxgt

10
Sample data
11
Predictive modelI.e., gltxgt
12
Learning objectives
  • Learn a tree that is correct
  • Learn a tree that is compact
  • At every level in the tree, select a test that
    best differentiates examples of one class from
    another

13
TDIDT
  • If all examples are from the same class
  • The tree is a leaf with that class name
  • Else
  • Pick a test to make
  • Construct one edge for each possible test outcome
  • Partition the examples by test outcome
  • Build subtrees recursively

14
Which is better?
15
The Gain Criterion
  • Measure the information of the collection
  • Measure the information of each possible split
  • Choose the split with greatest information gain

16
Information (Entropy)
  • Let T be a set of examples
  • Let C1, C2, , Cn be class labels
  • freq(Ci,T) number of examples in T that belong
    to class Ci.
  • T number of examples in T
  • Select example and announce its class
  • info - log2 freq(Ci,T)/T

17
Information (Entropy)
  • Let T be a set of examples
  • Info(T)
  • -? (freq(Ci,T)/T) (log2 (freq(Ci/T)/T))

18
Entropy after a split
  • Let X be an attribute with n possible values.
  • Let Tj be the examples that have the value j for
    attribute X.Average entropy that results from
    making split on XinfoX(T) ?? ( Ti / T
    ) info(Ti), sum over n possible values of X.

19
Information Gain
  • Compute infoX(T) for every attribute
  • Select attribute that maximizes
  • info(T) infoX(T)

20
Which is better?
21
Scrubber (the success story)
  • Diagnoses problems in the local loop
  • Problem may be due to trouble in
  • Customer premise equipment
  • Facilities connecting customer to cable
  • Cable
  • Central office
  • Millions of troubles reported annually

22
MAX, 1990
  • Acts as Maintenance Administrator (MA)
  • Sequence of action
  • Customer calls
  • Rep takes information initiates tests
  • Trouble report sent to MA
  • MA puts trouble in dispatch queue for specific
    type of technician

23
Scrubber 2
  • Performed a task at a later point in the pipeline
  • Survey dispatch queues to determine whether
    dispatch appropriate
  • Dispatch not immediate
  • Many problems resolved exogenously

24
Scrubber 3
  • Scrubber 2 for new application platform
  • Centralized knowledge server
  • Cover twice as large a network

25
Implementation difficulties
  • Original expert system shell no longer supported
  • Knowledge base evolved into opacity
  • Many tweaks over a decade
  • Many knowledge engineers
  • Most not available to work on Scrubber3

26
Requirements
  • Level of performance at least as good as prior
    system
  • Overall accuracy
  • False positives and false negatives in range
  • Comprehensible
  • For understanding and acceptance by experts

27
Additional requirements (ours)
  • Improved performance
  • Improved extensibility

28
Phase I Modeling Scrubber 2
  • Applied a decision tree learning algorithm
  • Input data
  • Trouble reports
  • Scrubber 2 diagnoses

29
Data
  • 26,000 trouble reports
  • 40 attributes (1/2 continuous 1/2 symbolic)
  • Two classes
  • Dispatch
  • Dont -- I.e., call customer to verify ok

30
Background knowledge
  • C4.5 selected
  • 17 of 40 attributes used

31
Phase I results
  • Decision trees with predictive accuracy of .99,
    with as few as 10,000 examples
  • Less than two days of work (easy!)

32
Phase II Acceptance
  • Comprehensibility ? Readability
  • Need to observe rationality in learned knwoledge
  • Original trees on order of 1000 nodes
  • The simpler the model, the better it can be
    understood
  • Comprehensibility Readability Simplicity
    Fidelity

33
Trading off simplicity and correctness
  • Pruning nodes sacrifices correctness
  • Appropriate when comprehensibility an issue
  • Langley and Schwabacher, 2001
  • Note not pruning to avoid overfitting

34
Phase II results
  • Used only two most prominent attributes
  • New decision trees created
  • Still fell into acceptable zone

35
Phase III Working toward extensibility
  • Hoped to gain flexibility for
  • Local modifiability
  • Additional attribute values
  • Moved toward probabilistic decision tree
  • Leaves labeled with probability estimates, not
    decisions
  • Stubby trees easy to represent in tabular form

36
Phase IIIb More data
  • Focus on two attributes gave us access to an
    extensive data set
  • Many more trouble reports
  • Abridged (two-attribute) form had not been
    considered useful earlier

37
Phase III results
  • Simple diagnostic model
  • Greater empirical confidence -- impt due to small
    disjunct problem
  • Big general rules cover approximately 50 of
    the data
  • Remaining 50 covered by small disjuncts

38
Summarizing the success story
  • C4.5 applied to induce Scrubber 2 model
  • Pruned model for comprehensibility/simplicity
  • Converted new model into probabilistic one
  • Used newly gained data for additional tuning and
    confidence
  • Small(?), simple model in very short time

39
Lessons can be learned from success
  • Lesson 1 the importance of comprehensibility
  • Rationality
  • Readability
  • Simplicity

40
Lessons can be learned from success
  • Lesson 2 the need for algorithms to handle small
    data sets
  • Creative ways to engineer interesting features
    from few
  • Openness to alternative sources of data
  • Algorithms specifically tuned to handle small
    data sets
  • Langley has noted this to be an issue of
    scientific data -- but true for industrial data
    as well

41
Lessons can be learned from success
  • Lesson 3 the need to think about systematic
    error
  • Locally systematic error only look like noise
    with enough data
  • Clearly related to the problem of small data sets
  • How do our algorithms hold up?

42
Lessons can be learned from success
  • Lesson 4 the need to think about the future
  • Learning results put into practice will be
    modifed and extended
  • Must new models be learned?
  • Can improvement be incremental?

43
Lessons can be learned from success
  • Lesson 5 creative uses of the technology
  • Learning for the purposes of re-engineering isnt
    standard
  • New applications will serve to fuel new research

44
Further reading and acknowledgements
  • Carla Brodley et al, American Scientist,
    Jan./Feb. 99
  • Pat Langley, various publications
  • Thanks to Foster Provost and many others at Nynex
    / Bell Atlantic
Write a Comment
User Comments (0)
About PowerShow.com