Title: Polaris A System for Query, Analysis and Visualization of Multidimensional Relational Databases
1Polaris A System for Query, Analysis and
Visualization of Multidimensional Relational
Databases
2Introduction
- Need for data interfaces emerging with
- Data warehousing
- Scientific Computation
- Business Analysis
- Graphic Representations are more effective
- Allowing multiple views of the same data
- Easy discovery on massive data
3Introduction
- Meaning of data
- Discover Structure
- Find Patterns
- Derive Casual Relationships
- n-Dimensional Data Cubes
- Cube Dimension Relational Schema Dimension
4Introduction
- Most popular method PIVOT TABLE
- Allow data cube to be rotated, pivoted
- Dimensions Rows or Columns
- Remaining Dimensions are aggregated
- Cross-tabulations and summaries are provided
- Further exploit Graphs
- Projections of data cubes in
- Bar Charts
- Scatter points
- Parallel Coordinate Displays
5Introduction
- POLARIS
- Interface for exploring multi-dimensional
databases - Extends Pivot Table to directly generate rich
graphical displays - Builds tables using algebraic formalism involving
fields of the database - Each table contains layers and panes
6Overview
- Support interactive exploration of large
multidimensional relational databases - A relational database may contain heterogeneous
but interrelated tables - Field Characteristics
- Nominal
- Ordinal
- Quantitative
- Interval
7Overview
- Polaris Field Categorization
- Intervals Quantitative
- (Ordered) Nominal Ordinal
- Dimension Product Name
- Measure Product Prize, Size
- Ordinal Fields ?Dimensions
- Quantitative ? Measure
8Overview
- Target Specifications
- Data-dense displays
- Multiple display types
- Exploratory Interface
- Polaris meets specs. providing rapidly and
incrementally generating table-based displays - Table Rows, Columns LAYERS
9Overview
- Each table axis may contain multiple nested
dimensions - Each table entry (pane) consists a set of records
represented with marks - Sample Polaris Interface
10(No Transcript)
11Interface Characteristics
- Multivariate Multiple dimension of data can be
explicitly encoded - Comparative Small-multiple displays to compare,
exposing patterns and trends - Familiar Statisticians are accustomed to using
tabular display of graphics
12Visualization
- Multiple data sources may be combined in a single
visualization - Dimensions are displayed in x,y,z shelves
- Record partitioning and layering
- Grouping information
- Graphic Type
- Field mappings to retinal properties
13Visualization
- Selecting a mark pops up detail window displaying
specified tuples - It is possible to draw a rubber band around a set
of marks to brush - (will be discussed later)
14Generating Graphics
- There are three components
- Specifications of the different table
configurations - Type of graphics inside each pane
- Details of the visual encodings
15Table Algebra
- Formal mechanism to specify table configurations
- When a field is placed in a shelf, algebra
expression is generated - x,y axes partition into rows and columns, z
partitions to layers
16Table Algebra
- A,B,C representing ordinal fields
- P,Q,R representing quantitative fields
- Assignment of sets to symbols reflect difference
in how two types of fields will be encoded in the
structure of the tables - Ordinal fields into rows and columns
- Quantitative fields into axes within the panes
17Table Algebra
- Valid expression is an ordered sequence of one or
more symbols - Between each adjacent symbol there are operators
- Operators (in order of precedence)
- (X) Cross
- (/) Nest
- () Concatenation
18Concatenation
- Performs union operations
19Cross
- Performs Cartesian product operations
20Nest
- Similar to cross operator, but only creates set
entries for which there exist records with those
domain values.
- Interpretation is B within A
- For example, given the fields quarter and month,
the expression quarter/month would be interpreted
as months within each quarter
21Table Algebra
- Every expression in the algebra can be reduced to
a single set - Each entry in the set being an ordered
concatenation of zero of more ordinal values with
zero or more quantitative field names - This set evaluation of an expression is
normalized set form
22Normalized Set Form
- Table axis is partitioned into columns (rows or
layers) so that there is a one-to-one
correspondence between set entries in the set and
columns
23Normalized Set Form
24Types of Graphics
- Once the table configuration is specified, next
step is to specify the type of graphic in each
pane - Three graphic families
- ordinal-ordinal
- ordinal-quantitative
- quantitative-quantitative
- Each family contains a number of ways to mark
records
25Type of Graphics
- Supported Polaris types
- Rectangle
- Circle
- Glyph
- Text
- Gantt bar
- Line
- Polygon
- Image
26(No Transcript)
27Types of Graphics
- Dependent and independent dimensions are
interpreted differently - By default dimensions are treated as independent
dimensions - Aggregations affect the type of graphics
28Ordinal-Ordinal
- Axis variables are typically independent of each
other - Task is focused on understanding patterns and
trends in some function -
- ƒ(Ox,Oy) ? R
- Typical example is studying sales and margin as a
function of product type, month and state of
items sold by a coffee chain
29(No Transcript)
30Ordinal-Quantitative
- Typically bar chart, possibly clustered or
stacked, the dot plot and Gantt Chart - Quantitative variable is often dependent on the
ordinal variable and the aim is to understand or
compare the properties of some function - ƒ(O) ? Q
31Ordinal-Quantitative
- Matrix of bar charts used to study several
functions of the independent variables product
and month
32Ordinal-Quantitative
- The cardinality of the record set does affect the
structure of the graphics - When the cardinality of the record is set is one,
the graphics are simple bar or dot plots - When the cardinality of the record is set to
greater than one, the graphic is stacked bar chart
33Ordinal-Quantitative
- Major wars over the past 500 year shown as a
Gantt chart - Additional layer in figure displays pictures of
major scientists plotted as a function of the
independent variables country of birth and date
of birth
34Quantitative- Quantitative
- Used to understand the distribution of data as a
function of one or both quantitative variables
and to discover casual relationships between the
two quantitative variables
35Quantitative-Quantitative
- Typical Map
- Flight scheduling varies with the region of the
country in which the flight originated - Number of flights between major airports has been
plotted as a function of latitude and longitude - Plotted in two layers, the location plots and the
geography of each state as a polygon
36Visual Mappings
- Each record in a pane is mapped to a mark
- Two Components
- Type of graphic and mark
- Encoding of fields of the records into visual or
retinal properties of the selected mark - Visual properties in Polaris are based on
Bertin's retinal variables - Shape
- Size
- Orientation
- Color (value and hue)
- Texture (not supported in the current version of
Polaris)
37Retinal Properties
- The different retinal properties that can be used
to encode fields of the data and examples of the
default mappings that are generated when a given
type of data field is encoded in each of the
retinal properties
38Visual Mappings
- Retinal properties of the display greatly
enhances the data density and the variety of
displays that can be generated - Analysts should not be required to construct the
mappings - Instead, they should be able to simply specify
that a field be encoded as a visual property - System should then generate an effective mapping
from the domain of the field to the range of the
visual property
39DATA TRANSFORMATIONS AND VISUAL QUERIES
- Rapidly change the table configuration, type of
graphic, and visual encodings used to visualize a
data set for interactive exploration - Resulting display is also manipulable
- Analyst is able to sort, filter, and transform
the data to uncover useful relationships and
information - Also form ad hoc groupings and partitions that
reflect this newly uncovered information
40Data Transformations and Visual Queries
- Polaris supports four features to perform visual
queries - Deriving additional fields
- Sorting and filtering
- Brushing and tool tips
- Undo and Redo
41Deriving Additional Fields
- The generated fields are aggregates or
statistical summaries - Polaris currently provides five methods for
deriving additional fields - Simple aggregation of quantitative measures
- Counting of distinct values in ordinal dimensions
- Discrete partitioning of quantitative measures
- Ad hoc grouping within ordinal dimensions
- Threshold aggregation
42Deriving Additional Fields Simple Aggregation
- Basic aggregation operations (that are applied to
a single quantitative field) - Summation
- Average
- Minimum
- Maximum
- Right-Click and apply, change type
- Easily extended to provide any statistical
aggregate that can be generated from relational
data
43Deriving Additional FieldsCounting of Ordinal
Dimensions
- Counting of distinct values for an ordinal field
within the data set - Right-Click and apply
- Applying the count operator changes the field
type (to quantitative) and thus change the table
configuration and graph type in each pane
44Deriving Additional FieldsDiscrete Partitioning
- Used to discretize a continuous domain
- Polaris provides two discretization methods
- Binning, allows the analyst to specify a regular
bin size in which to aggregate the data, useful
for creating graphs, such as histograms, in which
there are many regularly sized bins - Partitioning, allows the user to individually
specify the size and name of each bin, useful for
encoding additional categorizations into the data
- Right-Click and apply
45Deriving Additional FieldsAd hoc Grouping
- Ordinal version of quantitative partitioning,
where the user can choose to group together
different ordinal values - Allows the analyst to add own domain knowledge to
the analysis and to change the groupings as the
exploration uncovers additional patterns - Right-Click and apply
46Deriving Additional FieldsThreshold Aggregation
- It is derived from two source fields an ordinal
field and a quantitative field - If the quantitative field is less than a certain
threshold value for any values of an ordinal
field, those values are aggregated together to
form an "Other" category - Allows the user to specify threshold values below
which the data is considered uninteresting - Right-Click and apply
47Sorting and Filtering
- Filtering allows the user to choose which values
to display so that he can focus on and devote
more screen space and attention to the areas of
interest - For ordinal fields, a listbox with all possible
values is shown and the user can check or uncheck
each value to display it or not - For quantitative fields, a dynamic query slider
allows the user to choose a new domain - Additionally, there are textboxes showing the
chosen minimum and maximum values that the user
can use to directly enter a new domain.
48Sorting and Filtering
- Sorting allows the user to uncover hidden
patterns by changing the order of values within a
field's domain or the ordering of tuples in the
data - The ordering of tuples affects the drawing order
of marks within a pane. - Polaris provides three ways for a user to sort
the domain. - User can bring up the filter window and drag-and
drop the values within that window to reorder the
domain - If the field has been used to partition the table
into rows or columns, the user can drag-and-drop
the table row or column headers to reorder the
domain values - Polaris provides programmatic sorting, allowing
the user to sort one field based on the value in
another field
49Brushing and Tooltips
- Analysts want to directly interact with the data,
visually querying the data to highlight
correlated marks or getting more details on
demand - Brushing allows the user to choose a set of
interesting data points by drawing a rubberband
around them - Tooltips allow the user to get more details on
demand.
50Brushing
- The user selects a single field whose values are
then used to identify related marks and tuples - All marks corresponding to tuples sharing
selected field values with the selected tuples
are subsequently highlighted in all other panes
or linked Polaris views - Allowing correlation between different
projections of the same data set or relationships
between distinct data sets.
51Tooltips
- If the user hovers over a data point or pane,
additional details, such as specific field values
for the tuple corresponding to the selected mark,
are shown - Analysts can use tooltips to understand the
relationship between the graphical marks and the
underlying data
52Undo and Redo
- Unlimited undo and redo within an analysis sessio
- Users can use the "Back" and "Forward" buttons on
the top toolbar to either return to a previous
visual specification or to move forward again.
53GENERATING DATABASE QUERIES
54Results
- Throughout the analyses users want to see data
and how they want to see it change continually - Analysts
- form hypotheses
- create new views to perform tests and experiments
- Certain displays enable an understanding of
overall trends, whereas others show causal
relationships - As the analysts better understand the data, they
may want to drill-down in the visible dimensions
or display entirely different dimensions - Polaris supports this exploratory process through
its visual interface - By formally categorizing the types of graphics,
Polaris is able to provide a simple interface for
rapidly generating a wide range of displays - This allows analysts to focus on the analysis
task rather than the steps needed to retrieve and
display the data
55Discussions
- Comparison with similar work is omitted in this
presentation - Interpretation of visual specifications as
database queries - Interactivity and performance of Polaris
56Interpretation of Visual Specifications as
Database Queries
- Polaris generates the SQL query for each table
pane - Similar to CUBE operator generating the queries
to create the cross-tab and Pivot Table displays - However the CUBE operator is not applicable for
Polaris because it assumes that the sets of
relations partitioned into each table pane do not
overlap
57Interactivity and Performance of Polaris
- Polaris at its first implementations focuses on
the techniques, semantics and formalism rather
then the interactivity - It has been experienced that the query response
time does not need to be real-time in order to
maintain a feeling of exploration (several tens
of seconds)
58Interactivity and Performance of Polaris
- Test Data
- A subset of a packet trace of a mobile network
over a 13 week period, approx. 6 million tuples - A subset of the data collected from Sloan Digital
Sky Survey (approx. 650MB) - Both stored on MS SQL Server 2000
- Paper does not provide numeric data on
performance but the personal experiences of the
testers
59Conclusion
- Polaris extends the well known Pivot Table
interface to display relational query results
using a rich inexpensive set of graphical
displays - Succinct visual specification for describing
table-based graphical displays of relational data
- Interpretation of visual specifications as a
precise sequence of relational database operations
60Future Work
- Performance evaluation
- Hierarchical data cubes
- Correspondence of marks to data tuples (dynamic
mark generation) - Animation shelf to display sequencing data
61