Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris - PowerPoint PPT Presentation

About This Presentation
Title:

Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris

Description:

Polaris: A System for Query, Analysis, and Visualization of ... Polaris not solely for ... Extended Polaris to fully support and expose hierarchical ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 36
Provided by: christoph133
Category:

less

Transcript and Presenter's Notes

Title: Query, Analysis, and Visualization of Hierarchically Structured Data using Polaris


1
Query, Analysis, and Visualization of
Hierarchically Structured Data using Polaris
  • Chris Stolte, Diane Tang, Pat Hanrahan
  • July 2002

2
Motivation
  • Large databases have become very common
  • Corporate data warehouses
  • Amazon, Walmart,
  • Scientific projects
  • Human Genome Project
  • Sloan Digital Sky Survey
  • Need tools to extract meaning from these
    databases
  • Programmatic data mining/statistical analysis
  • Visual exploration and analysis

3
Hierarchical Structure
  • Challenge these databases are very large
  • Queries can not visit every record
  • Visualizations can not display every record
  • Analysts have augmented databases with
    hierarchical structure
  • Provide meaningful levels of abstraction
  • Leveraged by both computer and analyst
  • Derived from semantics or programmatic analysis
  • Tools need to take advantage of these hierarchies

4
Contributions
  • Interactive tool for analysis of data warehouses
    with hierarchical structure
  • Based on Polaris
  • Rapid construction of table-based visualizations
  • Algebraic formalism
  • Analysis of flat relational databases
  • To support hierarchies, we need to extend
  • User interface
  • Algebraic formalism
  • Generation of data queries
  • C. Stolte, D. Tang, and P. Hanrahan. Polaris A
    System for Query, Analysis, and Visualization of
    Multi-dimensional Relational Databases. In IEEE
    Transactions on Visualization and Computer
    Graphics, January 2002.

5
Outline
  • Review of Polaris
  • Demo
  • Formalism
  • Hierarchies and Data Cubes
  • Extensions to Polaris
  • Demo
  • Formalism
  • Discussion

6
Schema Denormalized Relation
Market State Year Quarter Month Product
Type Product Profit Sales Payroll Marketing Inven
tory Margin COGS ...
Hypothetical nation-widecoffee chain
data(courtesy Visual Insights)
Ordinal fields (categorical)
Quantitative fields (metrics)
7
Demo I Original Polaris
8
Polaris Review
  • Provide an interface for rapidly and
    incrementally generating table-based graphical
    displays
  • Users construct visualizations via a
    drag-and-drop interface
  • Queries are automatically generated
  • Interface is simple and expressive because built
    upon a formalism

9
Polaris Formalism
  • UI interpreted as visual specification that
    defines
  • table configuration
  • type of graphic in each pane
  • encoding of data as visual properties of marks
  • data transformations
  • Specification automatically compiled into
    necessary queries drawing commands

10
Polaris Formalism
  • UI interpreted as visual specification that
    defines
  • table configuration
  • type of graphic in each pane
  • encoding of data as visual properties of marks
  • data transformations
  • Specification automatically compiled into
    necessary queries drawing commands

11
Specifying Table Configurations
  • Interface define table configuration by dropping
    fields on shelves
  • Formalism shelf content interpreted as
    expressions in table algebra

12
Table Algebra
  • Operands are the database fields
  • each operand interpreted as a set
  • quantitative and ordinal fields interpreted
    differently
  • Three operators
  • concatenation (), cross (X), nest (/)

13
Table Algebra Operands
  • Ordinal fields interpret domain as a set that
    partitions table into rows and columns
  • Quarter (Qtr1),(Qtr2),(Qtr3),(Qtr4) ?
  • Quantitative fields treat domain as single
    element set and encode spatially as axes
  • Profit (Profit-410,650) ?

14
Concatenation () operator
  • Ordered union of set interpretations

Profit Sales (Profit-310,620),(Sales0,1000
)
15
Cross (x) operator
  • Cross-product of set interpretations

Quarter x ProductType
(Qtr1,Coffee), (Qtr1, Tea), (Qtr2, Coffee),
(Qtr2, Tea), (Qtr3, Coffee), (Qtr3, Tea), (Qtr4,
Coffee), (Qtr4,Tea)
ProductType x Profit
16
Nest (/) operator
  • Quarter x Month
  • would create entry twelve entries for each
    quarter. i.e., (Qtr1, December)
  • Quarter / Month
  • would only create three entries per quarter
  • based on tuples in database not semantics
  • can be expensive to compute

17
Outline
  • Review of Polaris
  • Demo
  • Formalism
  • Hierarchies and Data Cubes
  • Extensions to Polaris
  • Demo
  • Formalism
  • Discussion

18
Data Cubes
  • Structure relation as n-dimensional cube

19
Hierarchies and Data Cubes
  • Each dimension in the cube is structured as a
    tree
  • Each level in tree corresponds to level of detail
  • Nodes correspond to domain values

20
Hierarchies and Data Cubes
  • Some hierarchies known a priori
  • Provide semantic meaning
  • Time (day, month, year)Location (city, state,
    country)
  • Can be automatically generated
  • Classification algorithms
  • Clustering
  • Enable analyst to reason at high level of
    abstraction then drill down
  • Interface must expose underlying hierarchical
    structure

21
Hierarchy Model
  • Our model assumes that hierarchies
  • Can be modeled using star or snowflake schema
  • Have uniform depth
  • Have homogenous node types
  • Other models relax these constraints
  • Chose to focus on model commonly found in
    commercial data warehouse and data cube products

22
Outline
  • Review of Polaris
  • Demo
  • Formalism
  • Hierarchies and Data Cubes
  • Extensions to Polaris
  • Demo
  • Formalism
  • Discussion

23
Schema Star Schema
Dimension Table
Fact table
Time Year Quarter Month
Location Market State
State Month Product Profit Sales Payroll Marketing
Inventory Margin COGS ...
Products Product Type Product Name
Measures
24
Demo II Revised Polaris
25
Extending the Formalism
  • Redefine operands as dimension levels and
    measures not simply database fields
  • Need to define set interpretation of a dimension
    level
  • Domain is not a single ordered list
  • Composed of node values at particular level in
    hierarchy
  • Node values are uniquely defined by the path from
    root node
  • Possible definitions?

26
Set Interpretation Option 1
  • Define set interpretation by listing each node
    value with unique path to root
  • 1998.Qtr1.Jan, ., 1998.Qtr4.Dec
  • () Provides unique set interpretation
  • (-) Limits expressiveness
  • Any table including Months must include Year
  • Not possible to summarize across years (e.g.,
    Total Sales in January for all Years)
  • Not a standard projection of data cube but very
    useful

27
Set Interpretation Option 2
  • Define set interpretation by listing each node
    value without path to root
  • Jan, Feb, ., Dec
  • Order by depth first traversal
  • Consolidate non-unique values
  • This worksbut how do we leverage known
    relationship between dimension levels?

28
Dot (.) Operator
  • Nest isnt aware of defined hierarchical
    relationships
  • Year / Months might workif all data present
  • Inefficient
  • New operator Dot (.)
  • Nest computed using the dimension table rather
    then the fact table
  • Sufficient to provide support for aggregation,
    drill down, and roll up in algebra.

29
Generating Queries
  • Queries generated from specification.
  • Panes correspond to either a slice of a
    projection or an aggregation of a projection.
  • Multiple queries required if level-of-detail
    varies.
  • Algebraic manipulation can be used to determine
    minimal set of queries.
  • Interpreter generates SQL, MDX, or Rivet queries.

30
Related Visualization Projects
  • Formalisms for Graphics
  • Wilkinsons Grammar of Graphics
  • Bertins Semiology of Graphics
  • Mackinlays APT
  • Visual Exploration of Databases
  • VQE, DeVise, Visage, DataSplash/Tioga-2,
  • Visualization and Data Mining
  • MineSet,

31
Data Mining and Visualization
  • Polaris not solely for visual analysis
  • Precursor to algorithmic analysis to identify
    areas of interest
  • Validate results and establish trust and
    understanding
  • Incorporate decision trees and classification
    algorithms into data warehouses as hierarchies

32
Summary
  • Extended Polaris to fully support and expose
    hierarchical structure of data cubes
  • Extended not only interface but underlying
    algebraic formalism

33
Future Work
  • Use underlying formalism as basis for other
    visualization tools
  • Interactive pan-and-zoom systems

34
Future Work
  • Visual presentation of metadata
  • Hierarchies are one example of rich, domain
    specific metadata
  • As important to analysis as data itself
  • How to visualize this metadata?

35
Future Work
  • Interactive visualization
  • Prefetching and Caching
Write a Comment
User Comments (0)
About PowerShow.com