Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases

Description:

Polaris. A System for Query, Analysis and Visualization of Multidimensional Relational Databases ... Polaris supports this exploratory process through its ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 62
Provided by: ugury8
Category:

less

Transcript and Presenter's Notes

Title: Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases


1
Polaris A System for Query, Analysis and
Visualization of Multidimensional Relational
Databases
  • Ugur YENIER

2
Introduction
  • Need for data interfaces emerging with
  • Data warehousing
  • Scientific Computation
  • Business Analysis
  • Graphic Representations are more effective
  • Allowing multiple views of the same data
  • Easy discovery on massive data

3
Introduction
  • Meaning of data
  • Discover Structure
  • Find Patterns
  • Derive Casual Relationships
  • n-Dimensional Data Cubes
  • Cube Dimension Relational Schema Dimension

4
Introduction
  • Most popular method PIVOT TABLE
  • Allow data cube to be rotated, pivoted
  • Dimensions Rows or Columns
  • Remaining Dimensions are aggregated
  • Cross-tabulations and summaries are provided
  • Further exploit Graphs
  • Projections of data cubes in
  • Bar Charts
  • Scatter points
  • Parallel Coordinate Displays

5
Introduction
  • POLARIS
  • Interface for exploring multi-dimensional
    databases
  • Extends Pivot Table to directly generate rich
    graphical displays
  • Builds tables using algebraic formalism involving
    fields of the database
  • Each table contains layers and panes

6
Overview
  • Support interactive exploration of large
    multidimensional relational databases
  • A relational database may contain heterogeneous
    but interrelated tables
  • Field Characteristics
  • Nominal
  • Ordinal
  • Quantitative
  • Interval

7
Overview
  • Polaris Field Categorization
  • Intervals Quantitative
  • (Ordered) Nominal Ordinal
  • Dimension Product Name
  • Measure Product Prize, Size
  • Ordinal Fields ?Dimensions
  • Quantitative ? Measure

8
Overview
  • Target Specifications
  • Data-dense displays
  • Multiple display types
  • Exploratory Interface
  • Polaris meets specs. providing rapidly and
    incrementally generating table-based displays
  • Table Rows, Columns LAYERS

9
Overview
  • Each table axis may contain multiple nested
    dimensions
  • Each table entry (pane) consists a set of records
    represented with marks
  • Sample Polaris Interface

10
(No Transcript)
11
Interface Characteristics
  • Multivariate Multiple dimension of data can be
    explicitly encoded
  • Comparative Small-multiple displays to compare,
    exposing patterns and trends
  • Familiar Statisticians are accustomed to using
    tabular display of graphics

12
Visualization
  • Multiple data sources may be combined in a single
    visualization
  • Dimensions are displayed in x,y,z shelves
  • Record partitioning and layering
  • Grouping information
  • Graphic Type
  • Field mappings to retinal properties

13
Visualization
  • Selecting a mark pops up detail window displaying
    specified tuples
  • It is possible to draw a rubber band around a set
    of marks to brush
  • (will be discussed later)

14
Generating Graphics
  • There are three components
  • Specifications of the different table
    configurations
  • Type of graphics inside each pane
  • Details of the visual encodings

15
Table Algebra
  • Formal mechanism to specify table configurations
  • When a field is placed in a shelf, algebra
    expression is generated
  • x,y axes partition into rows and columns, z
    partitions to layers

16
Table Algebra
  • A,B,C representing ordinal fields
  • P,Q,R representing quantitative fields
  • Assignment of sets to symbols reflect difference
    in how two types of fields will be encoded in the
    structure of the tables
  • Ordinal fields into rows and columns
  • Quantitative fields into axes within the panes

17
Table Algebra
  • Valid expression is an ordered sequence of one or
    more symbols
  • Between each adjacent symbol there are operators
  • Operators (in order of precedence)
  • (X) Cross
  • (/) Nest
  • () Concatenation

18
Concatenation
  • Performs union operations

19
Cross
  • Performs Cartesian product operations

20
Nest
  • Similar to cross operator, but only creates set
    entries for which there exist records with those
    domain values.
  • Interpretation is B within A
  • For example, given the fields quarter and month,
    the expression quarter/month would be interpreted
    as months within each quarter

21
Table Algebra
  • Every expression in the algebra can be reduced to
    a single set
  • Each entry in the set being an ordered
    concatenation of zero of more ordinal values with
    zero or more quantitative field names
  • This set evaluation of an expression is
    normalized set form

22
Normalized Set Form
  • Table axis is partitioned into columns (rows or
    layers) so that there is a one-to-one
    correspondence between set entries in the set and
    columns

23
Normalized Set Form
24
Types of Graphics
  • Once the table configuration is specified, next
    step is to specify the type of graphic in each
    pane
  • Three graphic families
  • ordinal-ordinal
  • ordinal-quantitative
  • quantitative-quantitative
  • Each family contains a number of ways to mark
    records

25
Type of Graphics
  • Supported Polaris types
  • Rectangle
  • Circle
  • Glyph
  • Text
  • Gantt bar
  • Line
  • Polygon
  • Image

26
(No Transcript)
27
Types of Graphics
  • Dependent and independent dimensions are
    interpreted differently
  • By default dimensions are treated as independent
    dimensions
  • Aggregations affect the type of graphics

28
Ordinal-Ordinal
  • Axis variables are typically independent of each
    other
  • Task is focused on understanding patterns and
    trends in some function
  • ƒ(Ox,Oy) ? R
  • Typical example is studying sales and margin as a
    function of product type, month and state of
    items sold by a coffee chain

29
(No Transcript)
30
Ordinal-Quantitative
  • Typically bar chart, possibly clustered or
    stacked, the dot plot and Gantt Chart
  • Quantitative variable is often dependent on the
    ordinal variable and the aim is to understand or
    compare the properties of some function
  • ƒ(O) ? Q

31
Ordinal-Quantitative
  • Matrix of bar charts used to study several
    functions of the independent variables product
    and month

32
Ordinal-Quantitative
  • The cardinality of the record set does affect the
    structure of the graphics
  • When the cardinality of the record is set is one,
    the graphics are simple bar or dot plots
  • When the cardinality of the record is set to
    greater than one, the graphic is stacked bar chart

33
Ordinal-Quantitative
  • Major wars over the past 500 year shown as a
    Gantt chart
  • Additional layer in figure displays pictures of
    major scientists plotted as a function of the
    independent variables country of birth and date
    of birth

34
Quantitative- Quantitative
  • Used to understand the distribution of data as a
    function of one or both quantitative variables
    and to discover casual relationships between the
    two quantitative variables

35
Quantitative-Quantitative
  • Typical Map
  • Flight scheduling varies with the region of the
    country in which the flight originated
  • Number of flights between major airports has been
    plotted as a function of latitude and longitude
  • Plotted in two layers, the location plots and the
    geography of each state as a polygon

36
Visual Mappings
  • Each record in a pane is mapped to a mark
  • Two Components
  • Type of graphic and mark
  • Encoding of fields of the records into visual or
    retinal properties of the selected mark
  • Visual properties in Polaris are based on
    Bertin's retinal variables
  • Shape
  • Size
  • Orientation
  • Color (value and hue)
  • Texture (not supported in the current version of
    Polaris)

37
Retinal Properties
  • The different retinal properties that can be used
    to encode fields of the data and examples of the
    default mappings that are generated when a given
    type of data field is encoded in each of the
    retinal properties

38
Visual Mappings
  • Retinal properties of the display greatly
    enhances the data density and the variety of
    displays that can be generated
  • Analysts should not be required to construct the
    mappings
  • Instead, they should be able to simply specify
    that a field be encoded as a visual property
  • System should then generate an effective mapping
    from the domain of the field to the range of the
    visual property

39
DATA TRANSFORMATIONS AND VISUAL QUERIES
  • Rapidly change the table configuration, type of
    graphic, and visual encodings used to visualize a
    data set for interactive exploration
  • Resulting display is also manipulable
  • Analyst is able to sort, filter, and transform
    the data to uncover useful relationships and
    information
  • Also form ad hoc groupings and partitions that
    reflect this newly uncovered information

40
Data Transformations and Visual Queries
  • Polaris supports four features to perform visual
    queries
  • Deriving additional fields
  • Sorting and filtering
  • Brushing and tool tips
  • Undo and Redo

41
Deriving Additional Fields
  • The generated fields are aggregates or
    statistical summaries
  • Polaris currently provides five methods for
    deriving additional fields
  • Simple aggregation of quantitative measures
  • Counting of distinct values in ordinal dimensions
  • Discrete partitioning of quantitative measures
  • Ad hoc grouping within ordinal dimensions
  • Threshold aggregation

42
Deriving Additional Fields Simple Aggregation
  • Basic aggregation operations (that are applied to
    a single quantitative field)
  • Summation
  • Average
  • Minimum
  • Maximum
  • Right-Click and apply, change type
  • Easily extended to provide any statistical
    aggregate that can be generated from relational
    data

43
Deriving Additional FieldsCounting of Ordinal
Dimensions
  • Counting of distinct values for an ordinal field
    within the data set
  • Right-Click and apply
  • Applying the count operator changes the field
    type (to quantitative) and thus change the table
    configuration and graph type in each pane

44
Deriving Additional FieldsDiscrete Partitioning
  • Used to discretize a continuous domain
  • Polaris provides two discretization methods
  • Binning, allows the analyst to specify a regular
    bin size in which to aggregate the data, useful
    for creating graphs, such as histograms, in which
    there are many regularly sized bins
  • Partitioning, allows the user to individually
    specify the size and name of each bin, useful for
    encoding additional categorizations into the data
  • Right-Click and apply

45
Deriving Additional FieldsAd hoc Grouping
  • Ordinal version of quantitative partitioning,
    where the user can choose to group together
    different ordinal values
  • Allows the analyst to add own domain knowledge to
    the analysis and to change the groupings as the
    exploration uncovers additional patterns
  • Right-Click and apply

46
Deriving Additional FieldsThreshold Aggregation
  • It is derived from two source fields an ordinal
    field and a quantitative field
  • If the quantitative field is less than a certain
    threshold value for any values of an ordinal
    field, those values are aggregated together to
    form an "Other" category
  • Allows the user to specify threshold values below
    which the data is considered uninteresting
  • Right-Click and apply

47
Sorting and Filtering
  • Filtering allows the user to choose which values
    to display so that he can focus on and devote
    more screen space and attention to the areas of
    interest
  • For ordinal fields, a listbox with all possible
    values is shown and the user can check or uncheck
    each value to display it or not
  • For quantitative fields, a dynamic query slider
    allows the user to choose a new domain
  • Additionally, there are textboxes showing the
    chosen minimum and maximum values that the user
    can use to directly enter a new domain.

48
Sorting and Filtering
  • Sorting allows the user to uncover hidden
    patterns by changing the order of values within a
    field's domain or the ordering of tuples in the
    data
  • The ordering of tuples affects the drawing order
    of marks within a pane.
  • Polaris provides three ways for a user to sort
    the domain.
  • User can bring up the filter window and drag-and
    drop the values within that window to reorder the
    domain
  • If the field has been used to partition the table
    into rows or columns, the user can drag-and-drop
    the table row or column headers to reorder the
    domain values
  • Polaris provides programmatic sorting, allowing
    the user to sort one field based on the value in
    another field

49
Brushing and Tooltips
  • Analysts want to directly interact with the data,
    visually querying the data to highlight
    correlated marks or getting more details on
    demand
  • Brushing allows the user to choose a set of
    interesting data points by drawing a rubberband
    around them
  • Tooltips allow the user to get more details on
    demand.

50
Brushing
  • The user selects a single field whose values are
    then used to identify related marks and tuples
  • All marks corresponding to tuples sharing
    selected field values with the selected tuples
    are subsequently highlighted in all other panes
    or linked Polaris views
  • Allowing correlation between different
    projections of the same data set or relationships
    between distinct data sets.

51
Tooltips
  • If the user hovers over a data point or pane,
    additional details, such as specific field values
    for the tuple corresponding to the selected mark,
    are shown
  • Analysts can use tooltips to understand the
    relationship between the graphical marks and the
    underlying data

52
Undo and Redo
  • Unlimited undo and redo within an analysis sessio
  • Users can use the "Back" and "Forward" buttons on
    the top toolbar to either return to a previous
    visual specification or to move forward again.

53
GENERATING DATABASE QUERIES
54
Results
  • Throughout the analyses users want to see data
    and how they want to see it change continually
  • Analysts
  • form hypotheses
  • create new views to perform tests and experiments
  • Certain displays enable an understanding of
    overall trends, whereas others show causal
    relationships
  • As the analysts better understand the data, they
    may want to drill-down in the visible dimensions
    or display entirely different dimensions
  • Polaris supports this exploratory process through
    its visual interface
  • By formally categorizing the types of graphics,
    Polaris is able to provide a simple interface for
    rapidly generating a wide range of displays
  • This allows analysts to focus on the analysis
    task rather than the steps needed to retrieve and
    display the data

55
Discussions
  • Comparison with similar work is omitted in this
    presentation
  • Interpretation of visual specifications as
    database queries
  • Interactivity and performance of Polaris

56
Interpretation of Visual Specifications as
Database Queries
  • Polaris generates the SQL query for each table
    pane
  • Similar to CUBE operator generating the queries
    to create the cross-tab and Pivot Table displays
  • However the CUBE operator is not applicable for
    Polaris because it assumes that the sets of
    relations partitioned into each table pane do not
    overlap

57
Interactivity and Performance of Polaris
  • Polaris at its first implementations focuses on
    the techniques, semantics and formalism rather
    then the interactivity
  • It has been experienced that the query response
    time does not need to be real-time in order to
    maintain a feeling of exploration (several tens
    of seconds)

58
Interactivity and Performance of Polaris
  • Test Data
  • A subset of a packet trace of a mobile network
    over a 13 week period, approx. 6 million tuples
  • A subset of the data collected from Sloan Digital
    Sky Survey (approx. 650MB)
  • Both stored on MS SQL Server 2000
  • Paper does not provide numeric data on
    performance but the personal experiences of the
    testers

59
Conclusion
  • Polaris extends the well known Pivot Table
    interface to display relational query results
    using a rich inexpensive set of graphical
    displays
  • Succinct visual specification for describing
    table-based graphical displays of relational data
  • Interpretation of visual specifications as a
    precise sequence of relational database operations

60
Future Work
  • Performance evaluation
  • Hierarchical data cubes
  • Correspondence of marks to data tuples (dynamic
    mark generation)
  • Animation shelf to display sequencing data

61
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com