Path-State Modeling for Time Series Anomaly Detection - PowerPoint PPT Presentation

About This Presentation
Title:

Path-State Modeling for Time Series Anomaly Detection

Description:

Gecko (Stan Salvador) Identify model states (parabolic segments) ... Gecko. Transition threshold = 3. Error threshold = 10 or 20 ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 34
Provided by: mattma
Learn more at: https://cs.fit.edu
Category:

less

Transcript and Presenter's Notes

Title: Path-State Modeling for Time Series Anomaly Detection


1
Path-State Modeling for Time Series Anomaly
Detection
  • Matt Mahoney

2
Outline
  • Review of time series anomaly detection
  • Gecko
  • Compression
  • Path modeling
  • Piecewise linear approximation of path
  • Fast testing using state
  • Experimental results on NASA valve data

3
Problem How to Detect Anomalies in Time Series
Data
  • Normal Marotta Fuel Valve Solenoid Current (Used
    on Space Shuttle)
  • Abnormal (poppet partially blocked)

4
Goal
  • Reduce human workload in specifying normal
    model
  • Editable rule based model (in SCL)
  • Real time testing (1K-10K samples per second)

5
Manual Method
  • Identify features (zero crossings, peaks)
  • Specify correct behavior using SCL rules


6
Gecko (Stan Salvador)
  • Identify model states (parabolic segments)
  • Multiple training series are averaged by dynamic
    time warping
  • Classify points (x,dx,d2x) using RIPPER
  • Construct linear state machine
  • Pass/fail test result

7
Compression Model
Normal, uncompressed
Abnormal, uncompressed
Normal, compressed
Abnormal, compressed
Normal 1
Normal 2
Normal 1 or 2
Abnormal
8
TEK Compression Anomaly Scores
9
Goal Evaluation
Manual Gecko Compres-sion
Reduce Workload No Yes Yes
Real Time Yes Yes Possible
Editable model Yes Yes No
10
Problem with Gecko/RIPPER State Machine May
Underconstrain Model
Training Segment 1 x 0, dx 0 Segment 2 0 lt
x lt 1, dx 1
Test Segment 1 x 0, dx 0 Segment 2 0 lt x lt
1, dx 3
dx gt 0.5
State 1
State 2
Accept
11
Path Model
dx
Test Path (d2 4)
3
2
Training Path (scaled to unit cube)
1
x
1 2 3
12
Path Model Example
Training Training Normal Too steep
Too low
dx
d2x
x
Anomaly Score
13
Example TEK Results
Anomaly Score
TEK 0 TEK 1 TEK 10 TEK 11
TEK 12 (Training) (Normal)
14
Problems with Path Modeling
  • Testing is slow, O(n2)
  • Compares n test points to n training points each
  • Model is complex (stores n points)

15
Proposed Solution
  • Piecewise linear approximation of path
  • Editable (k segments, k ltlt n)
  • Faster testing, O(kn)
  • State machine model (nearest segment)
  • Fast testing, O(n) (same as Gecko)
  • Local minima problem (same as Gecko)

16
Piecewise Approximation Algorithm
  • Repeat n k times
  • Remove vertex with lowest cost dh2
  • Run time is O(n log n) using doubly linked heap

h
d
17
Test k compare to all segments
Nearest segment 0-19
x
dx
Anomaly Score
TEK0 training TEK3 near normal TEK12 stuck
poppet TEK16 late release
18
Paths (not segmented)
TEK 16
TEK 12
x
TEK 0
TEK 3
d2x
dx
19
TEK 0 approximation with k 20 segments
20
Test 2 compare only to current and next segment
(fails)
TEK 0 training TEK 3 OK TEK 12 local
minima TEK 16 local minima
21
Test 4 segments (previous, current, next 2)
succeeds
Training OK Skips past
minimum Transitions back
22
Test 4 fails with k 50
Training OK Not
complete Delayed completion
23
Test 5 (previous, current, next 2, and one random
segment) succeeds
24
Path Fitting (optimal if no sharp bends)
  • Repeat n k times
  • Remove lowest cost vertex (cost dh2)
  • Move adjacent vertices by h/4 toward removed
    vertex

25
Vertex Removal vs. Path Fitting
  • TEK 0 self anomaly scores
  • Path fitting better for k gt 50
  • Vertex removal better for k lt 50

Vertex removal Path fitting K Maximum Total
Maximum Total 200 0.000008 0.000656 0.000005
0.000350 100 0.000057 0.005802 0.000019
0.003903 50 0.000345 0.027968 0.000542 0.025327
20 0.010298 0.601229 0.015872 0.961845
26
Path Modeling vs. Gecko
  • Data Voltage Test 1 at 14V, 16V, 18V... to 32V
  • 10 x 20K points
  • 31 sets of 1-3 training files
  • Gecko
  • Transition threshold 3
  • Error threshold 10 or 20
  • Results pass at 10 (P), pass at 20 (P/F) or fail
  • Path Modeling
  • Filter delay 2 x 50 samples per dimension
  • k 50 segments
  • Test 5 (last, current, next 2, and random)
  • Results maximum and total anomaly score

27
Typical Results
Test file Train Maximum Total
Gecko V37898 V14 T21 R00s.txt 0.041018
58.254755 V37898 V16 T21 R00s.txt 0.021778
43.696323 V37898 V18 T21 R00s.txt 0.006596
26.814669 V37898 V20 T21 R00s.txt 0.000913
0.705107 P V37898 V22 T21 R00s.txt 0.008819
48.095410 P/F V37898 V24 T21 R00s.txt 0.006635
23.487464 P V37898 V26 T21 R00s.txt 0.000361
0.593473 P V37898 V28 T21 R00s.txt 0.009032
48.236476 V37898 V30 T21 R00s.txt 0.033475
194.134671 V37898 V32 T21 R00s.txt 0.076193
448.467580
28
Gecko Summary (Stan)
  • Gecko
  • 1 training file correct behavior
  • 10 self 10 P (100 correct)
  • 90 others 3 P/F, 87 F (97-100 correct)
  • 2-3 training files some generalization
  • 26 self 23 P, 3 F (14V, 14V, 16V) (88 correct)
  • 14V is too different from the others
  • 22 between 8 P, 6 P/F, 8 F (36-63 correct)
  • 162 others 1 P/F, 161 F (99-100 correct)

29
Path Model Summary
  • Anomaly score proportional to training-test
    difference (correct)
  • Multiple training sets no generalization
    (expected)

30
Run Time Performance
  • Tested on data set 1 (218 x 20K points)
  • 50 training files 106 samples
  • 168 test files 3.36 x 106 samples
  • 750 MHz Duron, tsad4.cpp, g -O 2.95.2
  • Read and filter 106 points 23 sec
  • Approximate to k 100 segments 30 sec.
  • Test k 162 sec (500 ns per point per segment)

31
Summary
Path Model Gecko
Meets all goals Yes Yes
Output Numeric Pass/fail
Training speed O(n log n) O(n2) (DTW)
Test speed O(n) O(n)
Parameters Filter delay, number of segments Transition and error thresholds
Local minima Yes Yes
Generalization No Some
32
Future Work
  • Test path modeling with other data sets
  • UCR archive, http//www.cs.ucr.edu/eamonn/TSDMA/
  • Power load profiles, http//www.delelect.com/pdfs/
    Del-Res.txt
  • Test with multiple dimensions
  • Generalization?

33
Thank You
  • Further Reading
  • http//cs.fit.edu/mmahoney/nasa/
Write a Comment
User Comments (0)
About PowerShow.com