Title: iTree: Exploring Time-Varying Data using Indexable Tree
1iTree Exploring Time-Varying Data using
Indexable Tree
- Yi Gu and Chaoli Wang
- Michigan Technological University
- Presented at IEEE Pacific Visualization Symposium
- 28 February 2013
- Sydney, Australia
2TAC-based time-varying data visualization
- Time-activity curve (TAC)
- Time-varying medical imaging data Fang et al.
2007 - Importance analysis
- Multiscale data clustering
- Temporal sequencing
- Trend identification
- What iTree can do for us?
- Handle ever-growing size and complexity
(efficient data compacting) - Index and query TACs adaptively (effective data
indexing) - Interact with space-time data (intuitive visual
exploration)
3(No Transcript)
4Symbolic Aggregate ApproXimation (SAX)
Keoghs SIGKDD 2007 tutorial slide
First convert the time series to piecewise
aggregate approximation (PAA) representation,
then convert the PAA to symbols
It takes linear time Lin et al. 2003
SAX word can be represented by symbols (e.g., a,
b, c) or bits (e.g., 00, 01, 10 or 02, 12, 22)
baabccbc word length 8 bit cardinality 2
5SAX for time-varying volume data (1)
- Handle time-varying data
- Use group of voxels over time intervals by going
through voxel by voxel for the 1st time step,
then the 2nd etc. - Modify the original SAX/iSAX algorithms to
- Better differentiate SAX words (effectiveness)
- Improve computational performance (efficiency)
- Make iSAX amenable for visual mapping
(visualization) - PAA conversion
- Convert a TAC T of length n to a PAA C of length w
6SAX for time-varying volume data (2)
- Transfer function based breakpoint identification
- H histogram after logarithm and normalization
of the original histogram - H new histogram by multiplying H by the opacity
value
After
Before
7SAX for time-varying volume data (3)
- SAX word generation
- Construct an alphabet F and transform C into an
array of symbol C to form a SAX word - Distance between two symbols
- Distance between two SAX words
- Distance between two SAX words is the lower bound
of the Euclidean distance defined based on the
PAA representation
8SAX lower bounding
Exact (Euclidean) distance D(Q,S)
Lower bounding distance DLB(Q,S)
Raw data
Approx. resp.
Q
Q
S
S
DLB(Q,S)
D(Q,S)
Lower bounding means that for all Q and S, we
have
DLB(Q,S) ? D(Q,S)
Keoghs SIGKDD 2007 tutorial slide
9SAX construction (in sec)
Choose 8 to 12 word length and 16 to 32
quantization level are appropriate for quality
and speed tradeoff
Less than 10 minutes to construct SAX excluding
I/O time
10iSAX for time-varying volume data (1)
- iSAX organizes SAX words hierarchically
- A node represents a set of TACs with the same or
similar SAX words - Split a node when the number of SAX words exceeds
a certain threshold - How to split?
- The original iSAX chooses the symbol with the
left-most smallest bit cardinality to split - We choose a symbol covering the largest value
range to split
11Comparison
Original breakpoint identification and symbol
splitting
Our new breakpoint identification and symbol
splitting
12(No Transcript)
13iSAX for time-varying volume data (2)
- iSAX construction
- Voxel IDs for each terminal node are saved into a
file - Use the SAX word itself as the file name to
facilitate search - Out-of-core acceleration strategy
- Partition all voxels or groups into at most 2w
buckets and save each non-empty bucket into a
file - Choose the file with the largest voxel/group
count to split if larger than a threshold dn - Continue this until no file is larger than dn
14iSAX for time-varying volume data (3)
- Approximate and exact search
- Both take the PAA representation and a threshold
d as input - Approximate search only compares each of the file
names with the PAA converted SAX word if the
distance is less than d - Exact search needs an additional step compute
PAA-based distance to the input PAA and return
those voxels that have a distance less than d
15iTree (1)
- From iSAX (internal) hierarchy to iTree
(external) - Number of non-empty children of the root is
fairly large - Solution level promoting
- iSAX has a larger number of hierarchy with small
fanout (2) - Solution sibling grouping
- Sibling nodes are not arranged according to their
similarity - Solution sibling reordering
- Resulting properties
- The height of the iTree is determined by the
maximal bit cardinality for representing any
symbol in the SAX words - The iTree is balanced no node has an excessively
large fanout - Neighboring sibling nodes have a higher degree of
similarity in terms of spatial closeness and
temporal trend
16iTree (2)
- iTree drawing and focuscontext visualization
- Hyperbolic layout Laming and Rao 1996
- Accommodate a large number of nodes
- Allow focuscontext interaction
- Add the time ring to indicate the time dimension
- Query in multiple coordinated views (volume view,
iTree view and SAX view)
17iSAX/iTree construction (in sec)
Reduce the number of nodes an order of magnitude
smaller from iSAX to iTree
18Brute-force/approx./exact search (in sec)
Brute-force search does not use any indexing
scheme but simply goes over the PAA
representation of data for identifying similar
voxels
The time cost for approx. search does not
increase much from current interval to all time
steps (only involving using the names of index
files for distance computation)
19(No Transcript)
20(No Transcript)
21Summary
- iTree
- Data organization, visual representation and user
interaction framework for time-varying data
analysis and visualization - Applicable for tackling big time-varying data
sets - Limitations
- Breakpoint identification depends on input
transfer function - Blockwise TACs lead to block discontinuity in
data classification - Future work
- Motif finding (locate previously unknown,
frequently occurring patterns) - Time-varying multivariate data
- Acknowledgements
- U.S. National Science Foundation