Activity monitoring: Anomaly detection as on-line classification Tom Fawcett HP Laboratories 1501 Page Mill Rd. Palo Alto, CA Symposium on Machine Learning for Anomaly Detection May 22-23, 2004 - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Activity monitoring: Anomaly detection as on-line classification Tom Fawcett HP Laboratories 1501 Page Mill Rd. Palo Alto, CA Symposium on Machine Learning for Anomaly Detection May 22-23, 2004

Description:

1. Symposium on Machine Learning for Anomaly Detection. Activity ... Mixing one account's examples between train and test sets may leak test info into training ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 24
Provided by: cllSta
Category:

less

Transcript and Presenter's Notes

Title: Activity monitoring: Anomaly detection as on-line classification Tom Fawcett HP Laboratories 1501 Page Mill Rd. Palo Alto, CA Symposium on Machine Learning for Anomaly Detection May 22-23, 2004


1
Activity monitoringAnomaly detection as on-line
classificationTom FawcettHP Laboratories1501
Page Mill Rd.Palo Alto, CASymposium on
Machine Learning for Anomaly DetectionMay 22-23,
2004
2
Example Intrusion detection
3
Example Monitoring digital switch health
S1
S2
S3
...
Abnormal behavior culminating in hard failure
Si
4
Example Monitoring business news
  • 1 May 20, 1999VISX Inc. today announced that the
    Federal Trade Commission (FTC) has filed a notice
    of appeal of the decision issued earlier this
    month by the FTC administrative
  • 2 May 29 (Reuters) Federal advisers backed
    approval Thursday of a laser made by VISX Inc.
    used to correct nearsightedness with or without
    astigmatism.
  • 3 June 3 (PR Newswire) VISX, Inc. of Santa
    Clara, California, will become a component of the
    Nasdaq-100 Index, effective at the beginning of
    trading Thursday, June 10, 1999.
  • 4 June 4 (PR Newswire) --- Amazon.com, facing
    threats of legal action from The New York Times,
    has asked the U.S. District Court in Seattle to
    allow Amazon.com to continue advertising.
  • 5 June 5, 1999 --- Motorola today announced that
    its MPC923, MPC950 and MPC960 PowerPC processors
    have been officially certified by Microsoft
    Corporation to support the
  • 6 June 8, 1999 (PR Newswire) --- WebTV Networks,
    Inc. and EchoStar Communications Corp. at CES
    today announced the Microsoft WebTV Network Plus
    service for satellite and the EchoStar

5
Monitoring business news VISX
  • 1 May 20, 1999VISX Inc. today announced that the
    Federal Trade Commission (FTC) has filed a notice
    of appeal of the decision issued earlier this
    month by the FTC administrative
  • 2 May 29 (Reuters) Federal advisers backed
    approval Thursday of a laser made by VISX Inc.
    used to correct nearsightedness with or without
    astigmatism.
  • 3 June 3 (PR Newswire) VISX, Inc. of Santa
    Clara, California, will become a component of the
    Nasdaq-100 Index, effective at the beginning of
    trading Thursday, June 10, 1999.

VISX stock price
6
Commonalities of the domains
  • Temporal data comprise time series
  • Large number of entities (users, companies,
    accounts, devices)
  • Large volumes of data (commands, news stories,
    calls) on entity activity
  • General goal is to alert on interesting, rare
    events (intrusions, fraud, unusual business
    activity)

Onset of significant activity
  • Detection goals
  • Identify as many interesting events as possible
  • Alert as soon as possible
  • Minimize false alarms

7
Activity monitoring problems
  • Topic detection and tracking
  • Allan, Papka Lavrenko (SIGIR-98)
  • Crabtree Soltysiak (IJCAI-97)
  • Allen (ed.), 2002
  • Fraud detection
  • Chan Stolfo (KDD-98)
  • Cox et al. (DMKD-97)
  • Fawcett Provost (KDD-97)
  • Burge Shawe-Taylor (FDRM-97)
  • Ezawa Norton (KDD-95)
  • News/event alerts
  • Yang, Pierce Carbonell
  • Fawcett Provost (KDD-99)
  • Epidemic/bio-terrorism detectionWong et al.
    2002, 2003Shmueli 2004
  • Intrusion detection
  • Lee, Stolfo, Mok (KDD-99)
  • Lane Brodley (KDD-98)
  • Ryan et al. (FDRM-97)
  • DuMouchel Schonlau (KDD-98)
  • Network alarm monitoring
  • Sasisekharan et al. (KDD-94)
  • Weiss Hirsh (KDD-98)
  • Klemettinen 99
  • Hardware fault detection
  • Dasgupta Forrest 96
  • Smyth 92

8
Standard supervised learning approach
Onset of significant activity
9
Challenges for machine learning approaches
  • Very skewed class distributions inherent
    asymmetry
  • Differing error costs
  • Imprecision in class and cost distributions
  • Temporal dependencies among alarms
  • Earlier is better than later
  • Several is (usually) no better than one
  • Solutions may use different representations
  • Different timescales, different granularity

w 1 commandw 1 login session w 1
process life
10
Formalism
  • D set of data streams being monitored
  • Di lt d1, d2, d3, ..., dngt sequence of data
    items in stream Di
  • ?? alarm time
  • ?? onset of positive activity
  • Each episode has at most one ?

Benefit/cost of alarms s(?, a, H, Di) benefit
of ? if true positive f(?, H, Di) cost of ? if
false positive
11
Formalism
  • D set of data streams being monitored
  • Di lt d1, d2, d3, ..., dngt sequence of data
    items in stream i
  • ?? alarm time
  • ?? onset of positive activity

Benefit/cost of alarms s(?, ?, H, Di) benefit
of ? if true positive f(?, H, Di) cost of ? if
false positive
?
12
Detecting digital switch failures
?
Minimum advance notice
Hard failure point
Onset of observable switch abnormalities
13
How is this framework better?
? More realistic evaluation of solution methods
  • Differing error costs
  • Skewed class distributions

AMOC analysis
  • Temporal dependencies among alarms
  • Earlier is better than later
  • Several is no better than one
  • Solutions may use different representations
  • Different timescales, granularities
  • Time and alarm history in s and f
  • AMOC normalizes WRT time
  • (no definite notion of false positive max)

14
AMOC curves
Random alarms with different frequencies (.1/hr,
.2/hr, etc.
1 if 0 ? a-t ? 50 otherwise
s(t,a)
f 1
ROC curves vs AMOC curves
15
Activity monitoring Solution approaches
  • Fundamental problem characteristics
  • Asymmetry of classes Positive activity is
    inherently rare
  • Discriminating method differentiates positive
    and normal activity
  • vs
  • Profiling method models normal activity without
    reference to positive.(ie, learning from
    negative examples only)
  • Multi-level representation of data
  • Uniform modeling Models activity uniformly
    across all monitored entities
  • vs
  • Individual modeling Models Di activity
    individually

16
Example Monitoring business news
  • Goal Scan news stories associated with
    businesses, alarm on stories that correlate with
    interesting behavior.
  • Interesting 10 change in stock price (up or
    down) within 34.5 hours
  • Data Yahoo stories and stock prices from 6000
    companies over 3 months
  • DC-1 system
  • Developed for cell phone fraud detection
  • Performs discriminating, individual modeling

DIntel
?2
?1
17
Example Monitoring business news
Textual indicators for price spikes
said it expects same period revenues increase
over per sharefourth compared income quarter
fiscalearnings per diluted fiscal quarter
ended expenses months endedtoday
reported consensus quarter earnings year
ended repurchaselower than shortfall Q1234 fo
urth-quarter first callbelow analyst for
quarter research and development
AMOC curve
1 if 0 ? a-t ? 34.5 hours0 otherwise
s(t,a)
f 1
18
Pitfalls in evaluation
Why performance may look better than it should
  • Evaluating too locally
  • Windows shouldnt overlap
  • Behavior may be episodic or local (bull market
    behavior)
  • ? Need out-of-time sampling


Di
Train
Test
19
Pitfalls in evaluation
  • Mixing events from a single account between train
    and test sets
  • Goal of evaluation is to determine how well
    system will work on new, unseen accounts.
  • Events within an account may be much more similar
    to each other than to events in other accounts
  • Mixing one accounts examples between train and
    test sets may leak test info into training
  • Need out-of-account sampling

Train

Train
Test
Test

20
Conclusions
  • This form of anomaly detection is inherently
    classification
  • Alarms ? True positives, false positives, etc.
  • Classification methods can be brought to bear
  • But temporal aspects make standard classification
    metrics inappropriate
  • Activity monitoring domains are common in machine
    learning. Solution methods strategies can be
    shared and adapted.

21
end
22
Activity monitoring Learning methods

D1

D2

D3

D4

D5
...
Problem characteristics
Class asymmetry Discriminating methodvsProfiling
method
Multi-level representation Uniform
modelingvsIndividual modeling
23
Transforming tau Circuit failure
Hard failure(end of episode)
Beginning of positive visible activity
Degradation
Implicit lookahead interval
Write a Comment
User Comments (0)
About PowerShow.com