Chapter 8: Trends in DBMS 8.1 Database Support for Field Entities 8.2 Content-based Retrieval 8.3 Introduction to Spatial Data Warehouses 8.4 Summary - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 8: Trends in DBMS 8.1 Database Support for Field Entities 8.2 Content-based Retrieval 8.3 Introduction to Spatial Data Warehouses 8.4 Summary

Description:

Chapter 8: Trends in DBMS 8.1 Database Support for Field Entities 8.2 Content-based Retrieval 8.3 Introduction to Spatial Data Warehouses 8.4 Summary – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Chapter 8: Trends in DBMS 8.1 Database Support for Field Entities 8.2 Content-based Retrieval 8.3 Introduction to Spatial Data Warehouses 8.4 Summary


1
Chapter 8 Trends in DBMS8.1 Database Support
for Field Entities8.2 Content-based
Retrieval8.3 Introduction to Spatial Data
Warehouses8.4 Summary
2
Why Learn about Field Data-sets?
  • Field data is timely and abundant
  • Sensors (e.g. satellite based ones) provide
    periodic snapshot of Earth
  • Most up-to-date data about current events (e.g.
    fires, flood)
  • Field data are useful
  • In creating, revising and evaluating vector data
    sets
  • Digital archival of fragile historical paper maps
  • To manually get details not captured in vector
    interpretations
  • Example Location selection for a facility (e.g.
    a grocery store)
  • Consider a set of Aerial photographs of different
    locations
  • Vector interpretation includes roads, water
    bodies, elevation
  • What other information can aerial imagery reveal
    for construction planning?
  • trees (types and location), buildings,

3
What are Field Data-sets?
  • Field data set examples
  • Satellite images, aerial photographs
  • Digitized paper maps
  • Earth Science data-sets, e.g. rainfall,
    temperature maps
  • Data types of Spatial field data sets
  • Images
  • satellite based, e.g. www.terradata.com
  • aerial photographs
  • measurements from a Geo-registered sensor
    networks, e.g. weather
  • Video, i.e. time series of images
  • Audio data
  • Focus Primarily images though some discussion
    will apply to other data types

4
Fields and Rasters a Sampling of Field Values
  • Definitions
  • Field a mapping from a spatial domain to a
    value domain
  • Image a mapping from a rectangular grid to a
    value domain
  • A rectangular grid is a collection of cells
    called pixels
  • Raster is geo-registered image, i.e. grid axis
    have absolute spatial locations
  • Fields are often approximated as rasters
  • Example Figure 8.1
  • Identify spatial domain, field, rectangular grid,
    raster approximation
  • Fields can be approximated as images if relative
    spatial locations are adequate

Figure 8.1
5
Computing with Field Data
  • Field data manipulated using operations of
  • map algebra
  • image algebra
  • An Algebra is a mathematical structure consisting
    of
  • Operands and Operations
  • Map Algebra
  • Operand rasters
  • Operations Can be classified into four groups
  • Local, Focal, Zonal and Global
  • Image Algebra
  • Operand images
  • Operations crop, zoom, rotate

6
Local Operation
  • A local operation maps a raster into another
    raster such that the value of a cell in the new
    raster depends only on the value of that cell in
    the original raster
  • Examples unary operation thresholding
  • binary operation point wise addition

Figure 8.2
7
Focal Operation
  • In a focal operation, the value of a cell in the
    new raster is dependent on the values of the cell
    and its neighboring cells in the original raster
  • Examples unary operations focal sum, gradient,

Neighborhoods Rook, Bishop and Queen
Figure 8.3
8
Zonal Operation
  • In a global operation, the value of a cell in the
    new raster is a function of the location or
    values of all cells in the original or another
    raster
  • Examples zonal sum, zonal average, ...

Figure 8.4
9
Global Operation
  • In a zonal operation, the value of a cell in the
    new raster is a function of the value of that
    cell in the original layer and the values of
    other cells which appear in the same zone
    specified in another raster
  • Example distance from nearest facility

Figure 8.5
10
Image Operations Trim
  • Image Operations
  • Ignore the absolute locations of pixels.
  • Come from image processing literature
  • Ex. smoothing, low pass filter, high pass filter
  • Example A trim operation extracts an
    axis-aligned subset of the original raster

Figure 8.6
11
Storage and Retrieval of Raster Data - 1
  • Traditional Approach
  • Store raster data in a file system
  • Use custom software to retrieve data-items of
    interest
  • Example personal photographs stored on MS
    Windows
  • Q? What attributes can one attach to digital
    photographs ?
  • Q? Is there an easy way to retrieve all pictures
    taken in San Francisco?
  • Limitations
  • Rigid schema
  • limited ability to add and manage additional
    attributes
  • Canned Queries only
  • limited ability to support ad-hoc queries
  • Data quality
  • limited ability to identify duplicates or similar
    data-items

12
Storage and Retrieval of Raster Data in a SDBMS
  • A database approach
  • Database tables store
  • raster data items
  • attributes (i.e. meta-data), e.g. creation date,
    geo-location, subject, ...
  • Use SQL like query language to retrieve desired
    data-items
  • retrieve all raster data-items overlapping with
    city of San Francisco (Q1)
  • retrieve latest raster data-item within city of
    Paris (Q2)
  • retrieve raster data-items similar to a given
    image (Q3)
  • Pros
  • table schema definition allows user defined
    attributes
  • improve ability to pose ad-hoc queries (Ex. Q1,
    Q2)
  • improve data reliability and quality
  • example Query Q3 may be used for duplicate
    reduction

13
Storage and Retrieval of Raster Data - Challenges
  • Challenges in database based approach
  • Storage size( raster data item) gt size (disk
    blocks)
  • Retrieval raster has rich content
  • a picture is worth a thousand word!
  • Approaches to storage challenge
  • Delegate storage to DBMS
  • Use Binary Large Object (BLOB) data-type
  • create table my_picture(
  • image BLOB
  • creation_date date
  • place point
  • )
  • Do-it-yourself
  • divide a raster data-item into smaller slices
  • Q? Which way of slicing reduce disk I/Os for
    common queries?

14
How is Raster Data Stored on Secondary Storage?
  • Slicing approaches
  • Linear, e.g. one row per disk block (see Figure
    8.8(b))
  • Tiling - see Figure 8.8(c )
  • Tiling is preferred
  • For queries extracting rectangular sub-images
  • Example - terraserver.com

Figure 8.8
15
How is Raster Data Queried?
  • Retrieval challenge of rich content
  • Meta-data approach
  • Content based retrieval
  • Meta-data approach
  • Select a set of descriptive attributes
  • simpler SQL data types, e.g. numeric, string,
    date, ...
  • example source, location, time stamp, subject,
    resolution, ...
  • Store values of descriptive attributes for each
    raster data-item
  • Allow SQL queries on the descriptive attributes
  • Limitation of meta-data approach
  • Restricts queries to content captured by
    descriptive attributes
  • Does not support Similarity based queries
  • example Find all raster data-items similar to a
    given raster data item

16
Content Based Retrieval (CBR)
  • Examples
  • Q1. Find all raster data-items similar to a given
    raster data item
  • Q2. Locate a photograph of a river in Minnesota
    with trees nearby
  • Q3. Find all images of state parks which have a
    lake within them, are within a radius of one
    hundred miles from Chicago, and are southwest of
    Chicago
  • State of the Art
  • However, few robust implementations of CBR are
    available as of 2002
  • several research prototypes address similarity
    query Q1
  • Result quality is similar to those of web
    searches (e.g. www.google.com)
  • some of the retrieved raster data-item are useful
  • many similar data item are not retrieved in the
    result
  • usable in application domains such as publishing
  • Our goal is to understand a current approach to
    similarity queries
  • involving spatial similarities

17
Content Based Retrieval (CBR)
  • Spatial Similarity
  • Consider a pair of raster images with common
    objects (e.g. parks, lakes)
  • Spatial similarity between raster images can be
    defined based on
  • similarity of spatial relationships (e.g.
    topological, directional)
  • Q? Which pairs exhibit higher similarity?
  • P1 (inside, disjoint) or P2 (inside, covered
    by)
  • P3 (disjoint, touch) or P4 (disjoint, inside)
  • P5 (north west, north) or P6 (west, east)
  • A graph framework for comparing spatial
    relationships
  • Nodes spatial relationships
  • Edges connect most similar nodes
  • Similarity metric number of edge on shortest
    path between 2 nodes
  • See Figures 8.9 and 8.10

18
Topological Relationship Similarity
  • Study Figure 8.9, pp.234
  • Nodes topological relationships
  • Edges most similar
  • Similarity measure path length
  • Inference from Model
  • P2 (inside, covered by) more similar than P1
    (inside, disjoint)
  • Do you agree?
  • Review Figure 2.3 (pp.30)

Figure 8.9
19
Direction Relationship Similarity
  • Study Figure 8.10, pp.235
  • Nodes topological relationships Edges most
    similar
  • Similarity measure path length
  • Inference P5 (north-west, north) more similar
    than P6 (west, east)

Figure 8.10
20
Distance Similarity
  • Distance similarity is based on
  • Euclidean distance between the centroids of the
    objects.
  • Example Image R is more similar to P than Q in
    Figure 8.11 (pp.235)

Figure 8.11
21
A Computational Approach to CBR
  • Attribute Relation Graph (ARG)
  • Node objects in a raster
  • Edges relationships
  • Example raster of Figure 8.12(a)
  • ARG in Figure 8.12(b)
  • point object O3
  • rectangles O1, O2
  • edge (O1, O2) shows that they are disjoint, at 61
    degree direction and 5.2 units distant
  • Vector representation of ARG
  • Lists objects and edge properties
  • Example in Figure 8.12

Figure 8.12
22
A Computational Approach to CBR
  • Steps
  • Represent each raster data item by its ARG vector
  • Map query raster data item by its ARG vector
  • Find most similar raster data-items in the
    database by comparing ARG vector representations
  • use a distance metric
  • use a multi-dimensional index
  • Comment Result quality is similar to those of
    web searches. Some of the retrieved raster
    data-item are useful

Figure 8.13
23
Why are Data Warehouses Interesting?
  • Data Warehouse facilitate group decision making
  • Consider a dataset
  • 1 measure (i.e. Sales)
  • 3 dimensions (e.g. Company, Year, Region)
  • Analysis questions
  • Q1. Rank Regions by total sales.
  • Q2. Rank years by total sales.
  • Q3. Where are sales consistently growing?
  • Cross tabulates summaries reports used to analyze
    the trends
  • Example

24
Generating Cross-tabulation Summaries
  • Traditional Approach
  • Use custom software pulling data out of a DBMS
  • Limitations redundant of work, inefficient use
    of resources
  • Data Warehouse approach
  • Cross-tab. Can be generated using a set of simple
    report
  • each report is generated from a SQL Select ...
    group by statement
  • Example Fig. 8.19 (pp. 244) and Table 8.3 (pp.
    245)
  • cross-tab example in last slide is a union of
  • SALES-L0-A, SALES-L1-A, SALES-L1-B and SALES-L2
  • table 8.3 shows SQL queries to compute each part
  • Advantage
  • rest of SQL is available for pre/post processing
    of data
  • performance gains by eliminating unnecessary
    copying of data

25
Example Data Warehouse
Figure 8.19
26
Cross-tabulation vs. Report Hierarchy
  • Spreadsheet view of a report
  • Views a report a N-dim. Spreadsheet
  • N number of dimension attributes
  • Each cell contains value of measure
  • Cross-tabulation view of a Report hierarchy
  • Example report hierarchy for SALES-L0-A,
    SALES-L2-A, SALES-L1-B, SALES-L2, Figure 8.19
    (pp.244)

27
What is a Data Warehouse?
  • Data Warehouse is a special purpose database
  • Primarily used for specialized data analysis
    purposes
  • Facilitates generation and navigation of a
    hierarchy of reports
  • Special purpose data-sets and queries
  • Data consists of
  • a few measure attributes
  • a set of dimension attributes
  • The measure attribute depends on dimension
    attributes
  • Queries generate reports
  • report measure for selected values of dimensions
  • aggregate measure for given subset of dimensions
  • What is a spatial data warehouse?
  • Data warehouses with spatial measures or
    dimensions
  • Example census data - census tract is a spatial
    dimension
  • Example logistics data - route is a spatial
    dimension

28
Data Warehouse Operations
  • Operations on a data warehouse
  • Roll-up, Drill-down
  • Slice, Dice
  • Pivot
  • Roll-up
  • Inputs A report R, A subset S of dimensions in R
  • Output A sequence of reports summarizing R
  • Example 1 RSALES-Base, S(Year, Region) in
    Figure 8.19 (pp. 244)
  • Output consists of reports SALES-L0-A,
    SALES-L1-B, SALES-L2
  • Example 2 RSALES-Base, S(Region, Year)
  • Output consists of reports SALES-L0-A,
    SALES-L1-A, SALES-L2
  • Drill-down
  • Inputs A report R, A dimension D not in R
  • Output A reports detailing R on D
  • Example R SALES-L1-B, D Region in Figure
    8.19 (pp.244)
  • Output report SALES-L0-A

29
Data Warehouse Operations
  • Slice, Dice
  • Reduce dimensions in a table (Figure 8.7, pp.232)
  • Inputs A report R, A value V for a dimension D
    in R
  • Output A subset of R where DV
  • Example RSALES-L0-A, DYear, V1994 in Figure
    8.19 (pp.244)
  • output Table 8.5 (pp.246)
  • includes tuple (ALL, 1994, America, 35)

Figure 8.7
30
Data Warehouse Operations
  • Pivot
  • For a spreadsheet view of reports
  • Transposes a spreadsheet
  • Example
  • Inputs A spreadsheet view of a report R
  • Output A transposed spreadsheet
  • Example R SALES-L0-A, Figure 8.19 (pp.244)

31
Logical Data Model of a DWH
  • Purpose of a logical data model
  • Specify a framework to specify computational
    structure
  • Allow extension of SQL to model new needs
  • Cube operation
  • Input A fact table
  • Output A set of summary reports covering all
    subsets of dimension columns
  • equivalent to union of all tables and reports in
    Figure 8.19 (pp.244)
  • Example Figure 8.18, pp.243
  • SELECT Company, Year, Region, Sum(Sales) AS Sales
  • FROM SALES
  • GROUP BY CUBE Company, Year, Region

32
Figure 8.18
33
Physical Data Model of a DWH
  • Purpose Computationally efficient implementation
  • Ideas
  • Pre-computation -
  • pre-compute some of reports and use those to
    compute other reports
  • New indexing methods, e.g. bit-map index
  • Query Processing Strategies
  • strategies for aggregate functions
  • new strategies for multi-table joins
  • Let us look at strategies for aggregate functions

34
DWH Physical Model Aggregate Function Strategies
  • Aggregate Functions
  • Compute summary statistics for a given set of
    values
  • Examples sum, average, centroid (Table 8.1,
    pp.238)
  • Strategies for efficient computation
  • Characterize easy to compute aggregate functions
  • 3 categories
  • distributive
  • algebraic
  • holistic
  • First 2 categories can be computed easily in one
    scan of the dataset

35
Definitions of Aggregate Function Categories
  • Notation
  • F,G,G1,G2,,Gn are aggregate functions where n is
    small
  • S is a set of values, e.g. S(1,2,3,4)
  • P(S1,S2,,Sp) is a partition of S, e.g.
    P(S1,S2), S1(1,2), S2(3,4)
  • Distributive (F) if there exists a G such that
  • F(S) G(F(S1),F(S2),,F(Sn))
  • Example sum is distributive
  • Illustration sum(1,2,3,4) sum(sum(1,2),sum(3,4)
    )
  • Algebraic (F) if there exists G1,,Gn, (where n
    is small) and
  • F(S) G(G1(S1),,Gn(S1),G2(S1),,Gn(S2),,G1(Sp),
    , Gn(Sp))
  • Example average is distributive
  • Illustration average(1,2,3,4)count(1,2)averag
    e(1,2)count(3,4)average / count(1,2)count(3,
    4)

36
Example Distributive Aggregate Function
  • Examples in cross-tabulation scenario (Figure
    8.14, pp.238)
  • Example 1 Min is distributive
  • Example 2 Count is distributive

Figure 8.14
37
Examples Algebraic Aggregate Functions
  • Examples in cross-tabulation scenario (Figure
    8.15, pp.239)
  • Average and Variance are algebraic

Figure 8.15
38
Discussion - Spatial Data Warehouse
  • Example
  • Consider the example in Figure 8.16, pp.241
  • A map interpretation may be attached to each
    report
  • each row has a spatial footprint, which can be
    aggregated by geometric-union
  • The collection of maps may be called a mapcube
  • Issues
  • What is needed in OGIS standard to support
    map-cube operation?
  • Hierarchical collection of maps in mapcube
  • what is an appropriate cartography to convey the
    relationship among maps?

39
Spatial Data Warehouses and Mapcube
Figure 8.16
40
Figure 8.17
41
Summary
  • Field data
  • Useful in many applications due to rich content
  • Represented as raster or image
  • Operations can be categorized into local, focal,
    zonal, and global
  • Field data storage and retrieval
  • Tiling is a preferred way to divide raster data
    into disk blocks
  • Meta-data based query is often used for retrieval
  • Content based retrieval may be used for
    similarity searches
  • Data warehouses support analysis e.g.
    cross-tabulation reports
  • SQL CUBE operator support generation of DWH
    reports
  • Distributive and Algebraic aggregate functions
    can be computed easily
Write a Comment
User Comments (0)
About PowerShow.com