iTree: Exploring Time-Varying Data using Indexable Tree - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

iTree: Exploring Time-Varying Data using Indexable Tree

Description:

iTree: Exploring Time-Varying Data using Indexable Tree Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization Symposium – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 22
Provided by: kwanl1
Learn more at: https://pages.mtu.edu
Category:

less

Transcript and Presenter's Notes

Title: iTree: Exploring Time-Varying Data using Indexable Tree


1
iTree Exploring Time-Varying Data using
Indexable Tree
  • Yi Gu and Chaoli Wang
  • Michigan Technological University
  • Presented at IEEE Pacific Visualization Symposium
  • 28 February 2013
  • Sydney, Australia

2
TAC-based time-varying data visualization
  • Time-activity curve (TAC)
  • Time-varying medical imaging data Fang et al.
    2007
  • Importance analysis
  • Multiscale data clustering
  • Temporal sequencing
  • Trend identification
  • What iTree can do for us?
  • Handle ever-growing size and complexity
    (efficient data compacting)
  • Index and query TACs adaptively (effective data
    indexing)
  • Interact with space-time data (intuitive visual
    exploration)

3
(No Transcript)
4
Symbolic Aggregate ApproXimation (SAX)
Keoghs SIGKDD 2007 tutorial slide
First convert the time series to piecewise
aggregate approximation (PAA) representation,
then convert the PAA to symbols
It takes linear time Lin et al. 2003
SAX word can be represented by symbols (e.g., a,
b, c) or bits (e.g., 00, 01, 10 or 02, 12, 22)
baabccbc word length 8 bit cardinality 2
5
SAX for time-varying volume data (1)
  • Handle time-varying data
  • Use group of voxels over time intervals by going
    through voxel by voxel for the 1st time step,
    then the 2nd etc.
  • Modify the original SAX/iSAX algorithms to
  • Better differentiate SAX words (effectiveness)
  • Improve computational performance (efficiency)
  • Make iSAX amenable for visual mapping
    (visualization)
  • PAA conversion
  • Convert a TAC T of length n to a PAA C of length w

6
SAX for time-varying volume data (2)
  • Transfer function based breakpoint identification
  • H histogram after logarithm and normalization
    of the original histogram
  • H new histogram by multiplying H by the opacity
    value

After
Before
7
SAX for time-varying volume data (3)
  • SAX word generation
  • Construct an alphabet F and transform C into an
    array of symbol C to form a SAX word
  • Distance between two symbols
  • Distance between two SAX words
  • Distance between two SAX words is the lower bound
    of the Euclidean distance defined based on the
    PAA representation

8
SAX lower bounding
Exact (Euclidean) distance D(Q,S)
Lower bounding distance DLB(Q,S)
Raw data
Approx. resp.
Q
Q
S
S
DLB(Q,S)
D(Q,S)
Lower bounding means that for all Q and S, we
have
DLB(Q,S) ? D(Q,S)
Keoghs SIGKDD 2007 tutorial slide
9
SAX construction (in sec)
Choose 8 to 12 word length and 16 to 32
quantization level are appropriate for quality
and speed tradeoff
Less than 10 minutes to construct SAX excluding
I/O time
10
iSAX for time-varying volume data (1)
  • iSAX organizes SAX words hierarchically
  • A node represents a set of TACs with the same or
    similar SAX words
  • Split a node when the number of SAX words exceeds
    a certain threshold
  • How to split?
  • The original iSAX chooses the symbol with the
    left-most smallest bit cardinality to split
  • We choose a symbol covering the largest value
    range to split

11
Comparison
Original breakpoint identification and symbol
splitting
Our new breakpoint identification and symbol
splitting
12
(No Transcript)
13
iSAX for time-varying volume data (2)
  • iSAX construction
  • Voxel IDs for each terminal node are saved into a
    file
  • Use the SAX word itself as the file name to
    facilitate search
  • Out-of-core acceleration strategy
  • Partition all voxels or groups into at most 2w
    buckets and save each non-empty bucket into a
    file
  • Choose the file with the largest voxel/group
    count to split if larger than a threshold dn
  • Continue this until no file is larger than dn

14
iSAX for time-varying volume data (3)
  • Approximate and exact search
  • Both take the PAA representation and a threshold
    d as input
  • Approximate search only compares each of the file
    names with the PAA converted SAX word if the
    distance is less than d
  • Exact search needs an additional step compute
    PAA-based distance to the input PAA and return
    those voxels that have a distance less than d

15
iTree (1)
  • From iSAX (internal) hierarchy to iTree
    (external)
  • Number of non-empty children of the root is
    fairly large
  • Solution level promoting
  • iSAX has a larger number of hierarchy with small
    fanout (2)
  • Solution sibling grouping
  • Sibling nodes are not arranged according to their
    similarity
  • Solution sibling reordering
  • Resulting properties
  • The height of the iTree is determined by the
    maximal bit cardinality for representing any
    symbol in the SAX words
  • The iTree is balanced no node has an excessively
    large fanout
  • Neighboring sibling nodes have a higher degree of
    similarity in terms of spatial closeness and
    temporal trend

16
iTree (2)
  • iTree drawing and focuscontext visualization
  • Hyperbolic layout Laming and Rao 1996
  • Accommodate a large number of nodes
  • Allow focuscontext interaction
  • Add the time ring to indicate the time dimension
  • Query in multiple coordinated views (volume view,
    iTree view and SAX view)

17
iSAX/iTree construction (in sec)
Reduce the number of nodes an order of magnitude
smaller from iSAX to iTree
18
Brute-force/approx./exact search (in sec)
Brute-force search does not use any indexing
scheme but simply goes over the PAA
representation of data for identifying similar
voxels
The time cost for approx. search does not
increase much from current interval to all time
steps (only involving using the names of index
files for distance computation)
19
(No Transcript)
20
(No Transcript)
21
Summary
  • iTree
  • Data organization, visual representation and user
    interaction framework for time-varying data
    analysis and visualization
  • Applicable for tackling big time-varying data
    sets
  • Limitations
  • Breakpoint identification depends on input
    transfer function
  • Blockwise TACs lead to block discontinuity in
    data classification
  • Future work
  • Motif finding (locate previously unknown,
    frequently occurring patterns)
  • Time-varying multivariate data
  • Acknowledgements
  • U.S. National Science Foundation
Write a Comment
User Comments (0)
About PowerShow.com