Title: Applications and Parameter Analysis of Temporal Chaos Game Representation
1Spatiotemporal Stream Mining Applied to Seismic
Data
Margaret H. Dunham CSE Department Southern
Methodist University Dallas, Texas 75275
USA mhd_at_engr.smu.edu
2Outline
- CTBTO Data
- CTBTO Modeling Requirements
- EMM
Work in Progress! Input/Feedback Needed!
3CTBTO Data
- As a Data Miner I must first understand your DATA
- Diverse Seismic, Hydroacoustic, Infrasound,
Radionuclide - Spatial (source and sensor)
- Temporal
- STREAM Data
4From Sensors to Streams
- Stream Data - Data captured and sent by a set of
sensors - Real-time sequence of encoded signals which
contain desired information. - Continuous, ordered (implicitly by arrival time
or explicitly by timestamp or by geographic
coordinates) sequence of items - Stream data is infinite - the data keeps coming.
5CTBTO Data Mining
- Data Mining techniques must be defined based on
your data and applications - Cant use predefined fixed models and
prediction/classification techniques. - Must not redo massive amounts of algorithms
already created.
6CTBTO DM Requirements
- Model
- Handle different data types (seismic,
hydroacoustic, etc.) - Spatial Temporal (Spatiotemporal)
- Hierarchical
- Scalable
- Online
- Dynamic
- Anomaly Detection
- Not just specific wave type or data values
- Relationships between arrival of waves/data
- Combined values of data from all sensors
7EMM (Extensible Markov Model)
- Time Varying Discrete First Order Markov Model
- Nodes are clusters of real world states.
- Overlap of learning and validation phases
- Learning
- Transition probabilities between nodes
- Node labels (centroid or medoid of cluster)
- Nodes are added and removed as data arrives
- Applications prediction, anomaly detection
8Research Objectives
- Apply proven spatiotemporal modeling technique to
seismic data - Construct EMM to model sensor data
- Local EMM at location or area
- Hierarchical EMM to summarize lower level models
- Represent all data in one vector of values
- EMM learns normal behavior
- Develop new similarity metrics to include all
sensor data types (Fusion) - Apply anomaly detection algorithms
9EMM Creation/Learning
lt18,10,3,3,1,0,0gt lt17,10,2,3,1,0,0gt lt16,9,2,3,1,0,
0gt lt14,8,2,3,1,0,0gt lt14,8,2,3,0,0,0gt lt18,10,3,3,1,
1,0.gt
10Input Data Representation
- Vector of sensor values (numeric) at precise time
points or aggregated over time intervals. - Need not come from same sensor types.
- Similarity/distance between vectors used to
determine creation of new nodes in EMM.
11Anomaly Detection with EMM
- Objective Detect rare (unusual, surprising)
events - Advantages
- Dynamically learns what is normal
- Based on this learning, can predict what is not
normal - Do not have to a priori indicate normal behavior
- Applications
- Network Intrusion
- Data IP traffic data, Automobile traffic data
- Seismic
- Unusual Seismic Events
- Automatically Filter out normal events
Detected unusual weekend traffic pattern
Weekdays Weekend Minnesota DOT Traffic Data
12EMM with Seismic Data
- Input Wave arrivals (all or one per sensor)
- Identify states and changes of states in seismic
data - Wave form would first have to be converted into a
series of vectors representing the activity at
various points in time. - Initial Testing with RDG data
- Use amplitude, period, and wave type
13New Distance Measure
- Data ltamplitude, period, wave typegt
- Different wave type 100 difference
- For events of same wave type
- 50 weight given to the difference in amplitude.
- 50 weight given to the difference in period.
- If the distance is greater than the threshold, a
state change is required. - ?amplitude
- amplitudenew amplitudeaverage /
amplitudeaverage - ?period
- periodnew periodaverage /
periodaverage
14EMM with Seismic Data
States 1, 2, and 3 correspond to Noise, Wave A,
and Wave B respectively.
15Preliminary Testing
- RDG data February 1, 1981 6 earthquakes
- Find transition times close to known earthquakes
- 9 total nodes
- 652 total transitions
- Found all quakes
16EMM Nodes
.
Node Average amplitude Average period Phase code
1 1.649?m 0.119 sec P (primary wave)
2 8.353?m 0.803 sec P (primary wave)
3 23.237?m 0.898 sec P (primary wave)
4 87.324?m 0.997 sec P (primary wave)
5 253.333?m 1.282 sec P (primary wave)
6 270.524?m 0.96 sec P (primary wave)
7 7.719?m 20.4 sec P (primary wave)
8 723.088?m 1.962 sec P (primary wave)
9 1938.772?m 1.2 sec P (primary wave)
17Hierarchical EMM
18Now What?
DATA NEEDED
Interest DM COMMUNITY
NOISE MAY NOT BE BAD
KDD CUP
19References
- Zhigang Li and Margaret H. Dunham, STIFF A
Forecasting Framework for Spatio-Temporal Data,
Proceedings of the First International Workshop
on Knowledge Discovery in Multimedia and Complex
Data, May 2002, pp 1-9. - Zhigang Li, Liangang Liu, and Margaret H. Dunham,
Considering Correlation Between Variables to
Improve Spatiotemporal Forecasting, Proceedings
of the PAKDD Conference, May 2003, pp 519-531. - Jie Huang, Yu Meng, and Margaret H. Dunham,
Extensible Markov Model, Proceedings IEEE ICDM
Conference, November 2004, pp 371-374. - Yu Meng and Margaret H. Dunham, Efficient
Mining of Emerging Events in a Dynamic
Spatiotemporal, Proceedings of the IEEE PAKDD
Conference, April 2006, Singapore. (Also in
Lecture Notes in Computer Science, Vol 3918,
2006, Springer Berlin/Heidelberg, pp 750-754.) - Yu Meng and Margaret H. Dunham, Mining
Developing Trends of Dynamic Spatiotemporal Data
Streams, Journal of Computers, Vol 1, No 3,
June 2006, pp 43-50. - Charlie Isaksson, Yu Meng, and Margaret H.
Dunham, Risk Leveling of Network Traffic
Anomalies, International Journal of Computer
Science and Network Security, Vol 6, No 6, June
2006, pp 258-265. - Margaret H. Dunham and Vijay Kumar, Stream
Hierarchy Data Mining for Sensor Data,
Innovations and Real-Time Applications of
Distributed Sensor Networks (DSN) Symposium,
November 26, 2007, Shreveport Louisiana.