Title: Activity monitoring: Anomaly detection as on-line classification Tom Fawcett HP Laboratories 1501 Page Mill Rd. Palo Alto, CA Symposium on Machine Learning for Anomaly Detection May 22-23, 2004
1Activity monitoringAnomaly detection as on-line
classificationTom FawcettHP Laboratories1501
Page Mill Rd.Palo Alto, CASymposium on
Machine Learning for Anomaly DetectionMay 22-23,
2004
2Example Intrusion detection
3Example Monitoring digital switch health
S1
S2
S3
...
Abnormal behavior culminating in hard failure
Si
4Example Monitoring business news
- 1 May 20, 1999VISX Inc. today announced that the
Federal Trade Commission (FTC) has filed a notice
of appeal of the decision issued earlier this
month by the FTC administrative - 2 May 29 (Reuters) Federal advisers backed
approval Thursday of a laser made by VISX Inc.
used to correct nearsightedness with or without
astigmatism. - 3 June 3 (PR Newswire) VISX, Inc. of Santa
Clara, California, will become a component of the
Nasdaq-100 Index, effective at the beginning of
trading Thursday, June 10, 1999. - 4 June 4 (PR Newswire) --- Amazon.com, facing
threats of legal action from The New York Times,
has asked the U.S. District Court in Seattle to
allow Amazon.com to continue advertising. - 5 June 5, 1999 --- Motorola today announced that
its MPC923, MPC950 and MPC960 PowerPC processors
have been officially certified by Microsoft
Corporation to support the - 6 June 8, 1999 (PR Newswire) --- WebTV Networks,
Inc. and EchoStar Communications Corp. at CES
today announced the Microsoft WebTV Network Plus
service for satellite and the EchoStar
5Monitoring business news VISX
- 1 May 20, 1999VISX Inc. today announced that the
Federal Trade Commission (FTC) has filed a notice
of appeal of the decision issued earlier this
month by the FTC administrative - 2 May 29 (Reuters) Federal advisers backed
approval Thursday of a laser made by VISX Inc.
used to correct nearsightedness with or without
astigmatism. - 3 June 3 (PR Newswire) VISX, Inc. of Santa
Clara, California, will become a component of the
Nasdaq-100 Index, effective at the beginning of
trading Thursday, June 10, 1999.
VISX stock price
6Commonalities of the domains
- Temporal data comprise time series
- Large number of entities (users, companies,
accounts, devices) - Large volumes of data (commands, news stories,
calls) on entity activity - General goal is to alert on interesting, rare
events (intrusions, fraud, unusual business
activity)
Onset of significant activity
- Detection goals
- Identify as many interesting events as possible
- Alert as soon as possible
- Minimize false alarms
7Activity monitoring problems
- Topic detection and tracking
- Allan, Papka Lavrenko (SIGIR-98)
- Crabtree Soltysiak (IJCAI-97)
- Allen (ed.), 2002
- Fraud detection
- Chan Stolfo (KDD-98)
- Cox et al. (DMKD-97)
- Fawcett Provost (KDD-97)
- Burge Shawe-Taylor (FDRM-97)
- Ezawa Norton (KDD-95)
- News/event alerts
- Yang, Pierce Carbonell
- Fawcett Provost (KDD-99)
- Epidemic/bio-terrorism detectionWong et al.
2002, 2003Shmueli 2004
- Intrusion detection
- Lee, Stolfo, Mok (KDD-99)
- Lane Brodley (KDD-98)
- Ryan et al. (FDRM-97)
- DuMouchel Schonlau (KDD-98)
- Network alarm monitoring
- Sasisekharan et al. (KDD-94)
- Weiss Hirsh (KDD-98)
- Klemettinen 99
- Hardware fault detection
- Dasgupta Forrest 96
- Smyth 92
8Standard supervised learning approach
Onset of significant activity
9Challenges for machine learning approaches
- Very skewed class distributions inherent
asymmetry - Differing error costs
- Imprecision in class and cost distributions
- Temporal dependencies among alarms
- Earlier is better than later
- Several is (usually) no better than one
- Solutions may use different representations
- Different timescales, different granularity
w 1 commandw 1 login session w 1
process life
10Formalism
- D set of data streams being monitored
- Di lt d1, d2, d3, ..., dngt sequence of data
items in stream Di - ?? alarm time
- ?? onset of positive activity
- Each episode has at most one ?
Benefit/cost of alarms s(?, a, H, Di) benefit
of ? if true positive f(?, H, Di) cost of ? if
false positive
11Formalism
- D set of data streams being monitored
- Di lt d1, d2, d3, ..., dngt sequence of data
items in stream i - ?? alarm time
- ?? onset of positive activity
Benefit/cost of alarms s(?, ?, H, Di) benefit
of ? if true positive f(?, H, Di) cost of ? if
false positive
?
12Detecting digital switch failures
?
Minimum advance notice
Hard failure point
Onset of observable switch abnormalities
13How is this framework better?
? More realistic evaluation of solution methods
- Differing error costs
- Skewed class distributions
AMOC analysis
- Temporal dependencies among alarms
- Earlier is better than later
- Several is no better than one
- Solutions may use different representations
- Different timescales, granularities
- Time and alarm history in s and f
- AMOC normalizes WRT time
- (no definite notion of false positive max)
14AMOC curves
Random alarms with different frequencies (.1/hr,
.2/hr, etc.
1 if 0 ? a-t ? 50 otherwise
s(t,a)
f 1
ROC curves vs AMOC curves
15Activity monitoring Solution approaches
- Fundamental problem characteristics
- Asymmetry of classes Positive activity is
inherently rare - Discriminating method differentiates positive
and normal activity - vs
- Profiling method models normal activity without
reference to positive.(ie, learning from
negative examples only) - Multi-level representation of data
- Uniform modeling Models activity uniformly
across all monitored entities - vs
- Individual modeling Models Di activity
individually
16Example Monitoring business news
- Goal Scan news stories associated with
businesses, alarm on stories that correlate with
interesting behavior. - Interesting 10 change in stock price (up or
down) within 34.5 hours - Data Yahoo stories and stock prices from 6000
companies over 3 months - DC-1 system
- Developed for cell phone fraud detection
- Performs discriminating, individual modeling
DIntel
?2
?1
17Example Monitoring business news
Textual indicators for price spikes
said it expects same period revenues increase
over per sharefourth compared income quarter
fiscalearnings per diluted fiscal quarter
ended expenses months endedtoday
reported consensus quarter earnings year
ended repurchaselower than shortfall Q1234 fo
urth-quarter first callbelow analyst for
quarter research and development
AMOC curve
1 if 0 ? a-t ? 34.5 hours0 otherwise
s(t,a)
f 1
18Pitfalls in evaluation
Why performance may look better than it should
- Evaluating too locally
- Windows shouldnt overlap
- Behavior may be episodic or local (bull market
behavior) - ? Need out-of-time sampling
Di
Train
Test
19Pitfalls in evaluation
- Mixing events from a single account between train
and test sets - Goal of evaluation is to determine how well
system will work on new, unseen accounts. - Events within an account may be much more similar
to each other than to events in other accounts - Mixing one accounts examples between train and
test sets may leak test info into training - Need out-of-account sampling
Train
Train
Test
Test
20Conclusions
- This form of anomaly detection is inherently
classification - Alarms ? True positives, false positives, etc.
- Classification methods can be brought to bear
- But temporal aspects make standard classification
metrics inappropriate - Activity monitoring domains are common in machine
learning. Solution methods strategies can be
shared and adapted.
21end
22Activity monitoring Learning methods
D1
D2
D3
D4
D5
...
Problem characteristics
Class asymmetry Discriminating methodvsProfiling
method
Multi-level representation Uniform
modelingvsIndividual modeling
23Transforming tau Circuit failure
Hard failure(end of episode)
Beginning of positive visible activity
Degradation
Implicit lookahead interval