Chapter 8: Trends in DBMS 8.1 Database Support for Field Entities 8.2 Content-based Retrieval 8.3 Introduction to Spatial Data Warehouses 8.4 Summary

About This Presentation

Title:

Chapter 8: Trends in DBMS 8.1 Database Support for Field Entities 8.2 Content-based Retrieval 8.3 Introduction to Spatial Data Warehouses 8.4 Summary

Description:

Chapter 8: Trends in DBMS 8.1 Database Support for Field Entities 8.2 Content-based Retrieval 8.3 Introduction to Spatial Data Warehouses 8.4 Summary – PowerPoint PPT presentation

Number of Views:97

Avg rating:3.0/5.0

Slides: 40

Provided by: Registe4

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 8: Trends in DBMS 8.1 Database Support for Field Entities 8.2 Content-based Retrieval 8.3 Introduction to Spatial Data Warehouses 8.4 Summary

1
Chapter 8 Trends in DBMS8.1 Database Support
for Field Entities8.2 Content-based
Retrieval8.3 Introduction to Spatial Data
Warehouses8.4 Summary
2
Why Learn about Field Data-sets?

Field data is timely and abundant
Sensors (e.g. satellite based ones) provide
periodic snapshot of Earth
Most up-to-date data about current events (e.g.
fires, flood)
Field data are useful
In creating, revising and evaluating vector data
sets
Digital archival of fragile historical paper maps
To manually get details not captured in vector
interpretations
Example Location selection for a facility (e.g.
a grocery store)
Consider a set of Aerial photographs of different
locations
Vector interpretation includes roads, water
bodies, elevation
What other information can aerial imagery reveal
for construction planning?
trees (types and location), buildings,

3
What are Field Data-sets?

Field data set examples
Satellite images, aerial photographs
Digitized paper maps
Earth Science data-sets, e.g. rainfall,
temperature maps
Data types of Spatial field data sets
Images
satellite based, e.g. www.terradata.com
aerial photographs
measurements from a Geo-registered sensor
networks, e.g. weather
Video, i.e. time series of images
Audio data
Focus Primarily images though some discussion
will apply to other data types

4
Fields and Rasters a Sampling of Field Values

Definitions
Field a mapping from a spatial domain to a
value domain
Image a mapping from a rectangular grid to a
value domain
A rectangular grid is a collection of cells
called pixels
Raster is geo-registered image, i.e. grid axis
have absolute spatial locations
Fields are often approximated as rasters
Example Figure 8.1
Identify spatial domain, field, rectangular grid,
raster approximation
Fields can be approximated as images if relative
spatial locations are adequate

Figure 8.1
5
Computing with Field Data

Field data manipulated using operations of
map algebra
image algebra
An Algebra is a mathematical structure consisting
of
Operands and Operations
Map Algebra
Operand rasters
Operations Can be classified into four groups
Local, Focal, Zonal and Global
Image Algebra
Operand images
Operations crop, zoom, rotate

6
Local Operation

A local operation maps a raster into another
raster such that the value of a cell in the new
raster depends only on the value of that cell in
the original raster
Examples unary operation thresholding
binary operation point wise addition

Figure 8.2
7
Focal Operation

In a focal operation, the value of a cell in the
new raster is dependent on the values of the cell
and its neighboring cells in the original raster
Examples unary operations focal sum, gradient,

Neighborhoods Rook, Bishop and Queen
Figure 8.3
8
Zonal Operation

In a global operation, the value of a cell in the
new raster is a function of the location or
values of all cells in the original or another
raster
Examples zonal sum, zonal average, ...

Figure 8.4
9
Global Operation

In a zonal operation, the value of a cell in the
new raster is a function of the value of that
cell in the original layer and the values of
other cells which appear in the same zone
specified in another raster
Example distance from nearest facility

Figure 8.5
10
Image Operations Trim

Image Operations
Ignore the absolute locations of pixels.
Come from image processing literature
Ex. smoothing, low pass filter, high pass filter
Example A trim operation extracts an
axis-aligned subset of the original raster

Figure 8.6
11
Storage and Retrieval of Raster Data - 1

Traditional Approach
Store raster data in a file system
Use custom software to retrieve data-items of
interest
Example personal photographs stored on MS
Windows
Q? What attributes can one attach to digital
photographs ?
Q? Is there an easy way to retrieve all pictures
taken in San Francisco?
Limitations
Rigid schema
limited ability to add and manage additional
attributes
Canned Queries only
limited ability to support ad-hoc queries
Data quality
limited ability to identify duplicates or similar
data-items

12
Storage and Retrieval of Raster Data in a SDBMS

A database approach
Database tables store
raster data items
attributes (i.e. meta-data), e.g. creation date,
geo-location, subject, ...
Use SQL like query language to retrieve desired
data-items
retrieve all raster data-items overlapping with
city of San Francisco (Q1)
retrieve latest raster data-item within city of
Paris (Q2)
retrieve raster data-items similar to a given
image (Q3)
Pros
table schema definition allows user defined
attributes
improve ability to pose ad-hoc queries (Ex. Q1,
Q2)
improve data reliability and quality
example Query Q3 may be used for duplicate
reduction

13
Storage and Retrieval of Raster Data - Challenges

Challenges in database based approach
Storage size( raster data item) gt size (disk
blocks)
Retrieval raster has rich content
a picture is worth a thousand word!
Approaches to storage challenge
Delegate storage to DBMS
Use Binary Large Object (BLOB) data-type
create table my_picture(
image BLOB
creation_date date
place point
)
Do-it-yourself
divide a raster data-item into smaller slices
Q? Which way of slicing reduce disk I/Os for
common queries?

14
How is Raster Data Stored on Secondary Storage?

Slicing approaches
Linear, e.g. one row per disk block (see Figure
8.8(b))
Tiling - see Figure 8.8(c )
Tiling is preferred
For queries extracting rectangular sub-images
Example - terraserver.com

Figure 8.8
15
How is Raster Data Queried?

Retrieval challenge of rich content
Meta-data approach
Content based retrieval
Meta-data approach
Select a set of descriptive attributes
simpler SQL data types, e.g. numeric, string,
date, ...
example source, location, time stamp, subject,
resolution, ...
Store values of descriptive attributes for each
raster data-item
Allow SQL queries on the descriptive attributes
Limitation of meta-data approach
Restricts queries to content captured by
descriptive attributes
Does not support Similarity based queries
example Find all raster data-items similar to a
given raster data item

16
Content Based Retrieval (CBR)

Examples
Q1. Find all raster data-items similar to a given
raster data item
Q2. Locate a photograph of a river in Minnesota
with trees nearby
Q3. Find all images of state parks which have a
lake within them, are within a radius of one
hundred miles from Chicago, and are southwest of
Chicago
State of the Art
However, few robust implementations of CBR are
available as of 2002
several research prototypes address similarity
query Q1
Result quality is similar to those of web
searches (e.g. www.google.com)
some of the retrieved raster data-item are useful
many similar data item are not retrieved in the
result
usable in application domains such as publishing
Our goal is to understand a current approach to
similarity queries
involving spatial similarities

17
Content Based Retrieval (CBR)

Spatial Similarity
Consider a pair of raster images with common
objects (e.g. parks, lakes)
Spatial similarity between raster images can be
defined based on
similarity of spatial relationships (e.g.
topological, directional)
Q? Which pairs exhibit higher similarity?
P1 (inside, disjoint) or P2 (inside, covered
by)
P3 (disjoint, touch) or P4 (disjoint, inside)
P5 (north west, north) or P6 (west, east)
A graph framework for comparing spatial
relationships
Nodes spatial relationships
Edges connect most similar nodes
Similarity metric number of edge on shortest
path between 2 nodes
See Figures 8.9 and 8.10

18
Topological Relationship Similarity

Study Figure 8.9, pp.234
Nodes topological relationships
Edges most similar
Similarity measure path length
Inference from Model
P2 (inside, covered by) more similar than P1
(inside, disjoint)
Do you agree?
Review Figure 2.3 (pp.30)

Figure 8.9
19
Direction Relationship Similarity

Study Figure 8.10, pp.235
Nodes topological relationships Edges most
similar
Similarity measure path length
Inference P5 (north-west, north) more similar
than P6 (west, east)

Figure 8.10
20
Distance Similarity

Distance similarity is based on
Euclidean distance between the centroids of the
objects.
Example Image R is more similar to P than Q in
Figure 8.11 (pp.235)

Figure 8.11
21
A Computational Approach to CBR

Attribute Relation Graph (ARG)
Node objects in a raster
Edges relationships
Example raster of Figure 8.12(a)
ARG in Figure 8.12(b)
point object O3
rectangles O1, O2
edge (O1, O2) shows that they are disjoint, at 61
degree direction and 5.2 units distant
Vector representation of ARG
Lists objects and edge properties
Example in Figure 8.12

Figure 8.12
22
A Computational Approach to CBR

Steps
Represent each raster data item by its ARG vector
Map query raster data item by its ARG vector
Find most similar raster data-items in the
database by comparing ARG vector representations
use a distance metric
use a multi-dimensional index
Comment Result quality is similar to those of
web searches. Some of the retrieved raster
data-item are useful

Figure 8.13
23
Why are Data Warehouses Interesting?

Data Warehouse facilitate group decision making
Consider a dataset
1 measure (i.e. Sales)
3 dimensions (e.g. Company, Year, Region)
Analysis questions
Q1. Rank Regions by total sales.
Q2. Rank years by total sales.
Q3. Where are sales consistently growing?
Cross tabulates summaries reports used to analyze
the trends
Example

24
Generating Cross-tabulation Summaries

Traditional Approach
Use custom software pulling data out of a DBMS
Limitations redundant of work, inefficient use
of resources
Data Warehouse approach
Cross-tab. Can be generated using a set of simple
report
each report is generated from a SQL Select ...
group by statement
Example Fig. 8.19 (pp. 244) and Table 8.3 (pp.
245)
cross-tab example in last slide is a union of
SALES-L0-A, SALES-L1-A, SALES-L1-B and SALES-L2
table 8.3 shows SQL queries to compute each part
Advantage
rest of SQL is available for pre/post processing
of data
performance gains by eliminating unnecessary
copying of data

25
Example Data Warehouse
Figure 8.19
26
Cross-tabulation vs. Report Hierarchy

Spreadsheet view of a report
Views a report a N-dim. Spreadsheet
N number of dimension attributes
Each cell contains value of measure
Cross-tabulation view of a Report hierarchy
Example report hierarchy for SALES-L0-A,
SALES-L2-A, SALES-L1-B, SALES-L2, Figure 8.19
(pp.244)

27
What is a Data Warehouse?

Data Warehouse is a special purpose database
Primarily used for specialized data analysis
purposes
Facilitates generation and navigation of a
hierarchy of reports
Special purpose data-sets and queries
Data consists of
a few measure attributes
a set of dimension attributes
The measure attribute depends on dimension
attributes
Queries generate reports
report measure for selected values of dimensions
aggregate measure for given subset of dimensions
What is a spatial data warehouse?
Data warehouses with spatial measures or
dimensions
Example census data - census tract is a spatial
dimension
Example logistics data - route is a spatial
dimension

28
Data Warehouse Operations

Operations on a data warehouse
Roll-up, Drill-down
Slice, Dice
Pivot
Roll-up
Inputs A report R, A subset S of dimensions in R
Output A sequence of reports summarizing R
Example 1 RSALES-Base, S(Year, Region) in
Figure 8.19 (pp. 244)
Output consists of reports SALES-L0-A,
SALES-L1-B, SALES-L2
Example 2 RSALES-Base, S(Region, Year)
Output consists of reports SALES-L0-A,
SALES-L1-A, SALES-L2
Drill-down
Inputs A report R, A dimension D not in R
Output A reports detailing R on D
Example R SALES-L1-B, D Region in Figure
8.19 (pp.244)
Output report SALES-L0-A

29
Data Warehouse Operations

Slice, Dice
Reduce dimensions in a table (Figure 8.7, pp.232)
Inputs A report R, A value V for a dimension D
in R
Output A subset of R where DV
Example RSALES-L0-A, DYear, V1994 in Figure
8.19 (pp.244)
output Table 8.5 (pp.246)
includes tuple (ALL, 1994, America, 35)

Figure 8.7
30
Data Warehouse Operations

Pivot
For a spreadsheet view of reports
Transposes a spreadsheet
Example
Inputs A spreadsheet view of a report R
Output A transposed spreadsheet
Example R SALES-L0-A, Figure 8.19 (pp.244)

31
Logical Data Model of a DWH

Purpose of a logical data model
Specify a framework to specify computational
structure
Allow extension of SQL to model new needs
Cube operation
Input A fact table
Output A set of summary reports covering all
subsets of dimension columns
equivalent to union of all tables and reports in
Figure 8.19 (pp.244)
Example Figure 8.18, pp.243
SELECT Company, Year, Region, Sum(Sales) AS Sales
FROM SALES
GROUP BY CUBE Company, Year, Region

32
Figure 8.18
33
Physical Data Model of a DWH

Purpose Computationally efficient implementation
Ideas
Pre-computation -
pre-compute some of reports and use those to
compute other reports
New indexing methods, e.g. bit-map index
Query Processing Strategies
strategies for aggregate functions
new strategies for multi-table joins
Let us look at strategies for aggregate functions

34
DWH Physical Model Aggregate Function Strategies

Aggregate Functions
Compute summary statistics for a given set of
values
Examples sum, average, centroid (Table 8.1,
pp.238)
Strategies for efficient computation
Characterize easy to compute aggregate functions
3 categories
distributive
algebraic
holistic
First 2 categories can be computed easily in one
scan of the dataset

35
Definitions of Aggregate Function Categories

Notation
F,G,G1,G2,,Gn are aggregate functions where n is
small
S is a set of values, e.g. S(1,2,3,4)
P(S1,S2,,Sp) is a partition of S, e.g.
P(S1,S2), S1(1,2), S2(3,4)
Distributive (F) if there exists a G such that
F(S) G(F(S1),F(S2),,F(Sn))
Example sum is distributive
Illustration sum(1,2,3,4) sum(sum(1,2),sum(3,4)
)
Algebraic (F) if there exists G1,,Gn, (where n
is small) and
F(S) G(G1(S1),,Gn(S1),G2(S1),,Gn(S2),,G1(Sp),
, Gn(Sp))
Example average is distributive
Illustration average(1,2,3,4)count(1,2)averag
e(1,2)count(3,4)average / count(1,2)count(3,
4)