High-Performance,%20Federated%20and%20Service-Oriented%20Geographic%20Information%20Systems

About This Presentation

Title:

High-Performance,%20Federated%20and%20Service-Oriented%20Geographic%20Information%20Systems

Description:

Interoperable Service-oriented Geographic Information Systems ... Emergency early warning systems. Home-land security and natural disasters. 4. Research Issues ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 51

Provided by: asa2

Learn more at: http://grids.ucs.indiana.edu

Category:

more less

Transcript and Presenter's Notes

Title: High-Performance,%20Federated%20and%20Service-Oriented%20Geographic%20Information%20Systems

1
High-Performance, Federated and Service-Oriented
Geographic Information Systems

Ahmet Sayar
(asayar_at_cs.indiana.edu)
Advisor Prof. Geoffrey C. Fox

2
Outline

Motivations
Research Issues
Architecture Federated Service-Oriented
Geographic Information System
Performance enhancing designs - measurements and
analysis
Conclusions

3
Geographic Information Systems (GIS)

GIS is a system for creating, storing, sharing,
analyzing, manipulating and displaying geo-data
and associated attributes.
Inherently requires federation (see the figure)
Autonomy for scalability flexibility and
extensibility
Distributed data access for geo-data resources
(databases, digital libraries etc.)
Utilizing remote analysis, simulation or
visualization tools.
Open Standards
OGC
ISO/TC-211

4
Motivations

Requirements for
Interoperable Service-oriented Geographic
Information Systems
Necessity for sharing and integrating
heterogeneous data and computation resources to
produce knowledge.
Uniform data access/query, display and analysis
from a single access point
Responsive and interactive information systems
GIS applications require quick response
Emergency early warning systems
Home-land security and natural disasters.

5
Research Issues

Interoperability
Defining component based Service-oriented GIS
data Grid framework
Adoption of Open Geographic Standards -data
model and services
Applying Web Service principles to GIS data
services
Integrating Web Service and Open Geographic
Standards
Federation
Capability-based federation of GIS Web Service
components
Unified data access/query, display from a single
access point through integrated data-views
Addressing high-performance support for
responsiveness
Streaming GIS Web Services and Pre-fetching
framework
Client-based caching
Parallel processing through attribute based query
decomposition

6
Web Service components and data-flow
Service-oriented GIS

WMS are data rendering services providing human
comprehensible data (binary map images)
WFS are data services providing data in common
data model GML Geographic Markup Language
behaving as mediator and annotation services.
WMS and WFS have their own type of capability
metadata defined by Open Geographic specs.
Inter-service communication is done through
getCapability service interface.
UDDI based registry services.
Components are Web Services and all control goes
through SOAP messages
XML-based query language (standard schema)

Built over
Web Services standards (WS-I) and
Open Geographic Standards (OGC and ISO/TC-211)
Consists of two types of online services
Web Map Services (WMS) and Web Feature Services
(WFS)
And two types of data
Binary data map images (provided by WMS),
Structured-data GML content (core data) and
presentation (attribute and geometry elements)
(provided by WFS)

Relation of the components and data flow
GIS
WMS GML rendering
WFS (mediator)
wsdl
wsdl
Binary data
GML
getCapability getMap getFeatureInfo
getCapability getFeature DescribeFeatureType
7
Capability-based Federation of Standard GIS Web
Service Components

Built over the proposed standard Web Service
components and common data models
Federation is done by aggregating GIS Web
Services capabilities metadata
Inspired from OGCs cascading WMS
Unified data access/query/display from a single
access point
Providing application-based hierarchical data
definitions
layer based data and service (WMS and WFS)
compositions
Capability is basically a metadata about
dataservice
Servers information content and acceptable
request parameter values

8
Why Capability metadata

Web Services provide key low level capability but
do not define an information or data architecture
These are left to domain specific capabilities
metadata and data description language (GML).
Machine and human readable information
Enables easy integration and federation
Enables developing application based standard
interactive re-usable tools
for data query display and analysis
Seamless data/access/query

9
High-performance Support for Responsive GIS

Designs, measurements and analysis

10
Performance Investigation

Interoperability requirements bring up some
compliance costs
Common data model (GML)
Web Services (SOAP protocol for communication)
Approaches Enhancing the GIS systems
responsiveness
Data transfer and rendering
Streaming GIS Web Services (1)
Structured/annotated GML data rendering (2)
Federator-oriented approaches
Pre-fetching (3)
Client-based caching (4)
Query decomposition and parallel processing (5)
Testing with large scale Geo-science applications
Earthquake forecasting (PI),
Virtual California (VC)
Aim Turning compliance requirements into
competitiveness

11
Conventional OGC-GIS systemsBaseline Performance
Test

Naïve approach is characterized as
Stateless services
On-demand data access,
Single-threaded and no-caching
Systems developed with Open Geographic Standards
have
High degree of interoperability but poor
performance results

Test Setup
12
(1) Streaming GIS Web-Services

Concern is large-sized XML-structured data
transfer
XML representation of data tend to be
significantly larger than binary representations
The larger data sizes consume the greater network
bandwidth
We still need to use it for interoperability
reasons
In initial development of the proposed
Service-oriented GIS we used GIS Web Services and
SOAP over HTTP as transfer protocol.
BUT, this had some limitations over the
performance.
We investigated Streaming Data Transfer
topic-based publish-subscribe messaging systems
for exchanging SOAP messages and data payloads.

13
(1) Streaming GIS Web-Services (Cont)

Lines 1, 2 and 3 show classic publish-find-bind
triangle of Web Services
SOAP is used for negotiation (line-3) standard
getFeature request
Publisher information in (topic, IP, port) triple
is returned.
Publisher streams, subscriber receives.
The performance gain is average 40

14
(2) GML Data Processing

Processing XML data Parsing and rendering to
create map images.
Two well-known approaches are document models
(DOM) and push models (SAX).
We use pull approach for XML processing
Parses only what is asked for
No support for document validation (major gains
of performance)
Doesnt build complete object model in memory
(unlike DOM)
Contents are returned directly to application
from calls to parser (unlike SAX)

Data Size Total rendering timings (1GB allocated VM) Total rendering timings (1GB allocated VM)
(KB) DOM (dom4j) pull (Xpp)
1 469.22 15.59
10 494.06 72.81
100 625.54 183.06
1,000 760.20 270.47
5,000 1,422.91 671.74
10,000 3,557.44 1,025.67
100,000 -OUT OF MEM - 7,059.72
150,000 -OUT OF MEM - 11,047.89
200,000 -OUT OF MEM - 14,949.12
15
Federator-Oriented Performance Enhancing Designs
16
(3) Pre-fetching

Getting the GML data before it is needed
Extension for Pre-fetching Module is shown in
grey region
Overcomes the network bandwidth problem and
repeated data conversions.
This technique is good for infrequently changing
archived data
In other case, it might cause consistency problem
Red curve map rendering over the pre-fetched
data (ready to use GML data)
Black curve map rendering through on-demand
fetching

PR runs pre-defined task in pre-defined
periodicity
17
(3) Pre-fetching vs. On-demand Fetching
Data Size MB Average Response Pre-fetching StdDev Average Response On-demand StdDev
0.01 19,261.90 481.57 1,808.13 140.32
0.1 19,112.30 673.69 2,635.46 313.48
0.5 19,222.48 631.35 5,001.29 238.94
1 19,427.48 305.94 8,225.73 200.27
5 20,146.00 516.50 33,419.31 394.48
10 20,165.90 546.53 64,506.78 283.24
50 22,882.52 509.98 316,906.00 623.08
100 23,990.43 603.59 643,344.00 548.65

For 100MB, pre-fetching is about 30 times faster
conventional on-demand fetching.
The larger the data size the higher the
performance gains.

18
(4) Client-based Caching

Each client has separate caching area allocated.
Application of working-window and locality
principles into map images rendering
Clients are differentiated according to the
client assigned session-id parameter in the
header of queries.
Always keep the least recently-used data
Brings up some overhead to keep up working-window
for each client.

19
Brief Architecture
Server-side
Create identity card. Update at every request
from the client

FormerRequest Class
String uuid /unique-user-id/
String bbox /bounding box of the users last
request/
Double density /data size falling into
per unit square/
Vector feature_data /geometry
elements of the last request/

Register to client table
uuid-1 FormerRequest-1
uuid-2 FormerRequest-2
..
Set identity to message header
Client-side
ClientWSStub binding binding (ClientWSStub )
new ServiceLocator().WMSServices(
servaddress)) String sessionID
session.getid() //uuid-1 String channel_name
getMapChannel /Add SessionID to the SOAP
messages header/ binding.setHeader(service_addre
ss, channel_name, sessionID) Map mymap
binding.getMap(request)
20
Why Client-based Caching

Makes stateless GIS Web Services stateful
Allows share workload as equal as possible for
the most efficient parallel processing.
Comparing with Google-like Map Servers
In large scale applications it is impossible to
cache whole data
Limited storage and computation capabilities
Google-like map servers are fast because
They replace computation with storage.
Pre-making all images and cut up into tiles
They formalize the accepted requests in terms of
parameters, and responses in terms of the tile
compositions.
BUT, good for only the client-server based
applications
It cant be applied to distributed dynamic data
rendering and extensible applications.
They dont deal with the feature enriched maps
enabling attribute-based querying,
And structured/annotated scientific data
rendering.

21
(5) Parallel Processing over Client-based Caching
Main query ? cached-data extraction ?
rectangulation - RectanglesRi ?partitioning
sub-queries ri ? assigning separate threads
? assembling the results
1
2
3
Successive request
Cached Data
4
22
Challenge Geo-Data Characteristic

A point data is described with location attribute
(x, y) coordinates.
Linestrings, polylines, polygons etc are defined
as set of points.
Data sets falling into a queried region is
formulated as bounding box (bbox)
Coordinates of a rectangle (a, b, c, d)
Geo-data is characterized as un-evenly
distributed and variable sized according to their
locations attributes.
Ex. Human population

Need for advanced techniques for workload sharing
!

23
Attribute-based Query Decomposition

Cached data extraction
Rectangulation over the remaining R1, R2, R3,
R4
Each rectangle goes through partitioning process.
Blind partitioning
Such as first time queries
Uses default partitioning number
Smart partitioning
client-based caching
FormerRequest Object
All partitions are assigned to separate threads
and results are merged to create final response

R3
R2
R1
R1
R2
R2
R4
R1
Partition into 4
24
Smart Partitioning through Client-based Caching

Based-on the locality principles.
Assumption Former and current requests have
similar data density
Cached data area
CD_size_br2 (maxxc - minxc)(maxyc -
minyc)
Main-query area
R_size_br2 (maxx - minx)(maxy - miny)
Thr Pre-defined threshold value changing from
data to data.
Pn The number of partitions calculated for a
rectangle

(maxxc, maxyc)
Determining the most efficient number of
partitions (Pn)
(maxx, maxy)
Cache
Query
(minxc, minyc)
(minx, miny)
If Pn gt 2 Cut the rectangle into Pn number of
equal sized regions.
25
Assigning Partitions to Workers

Partitions are assigned to the worker nodes in
round-robin fashion.
We keep a pool of worker nodes for each feature
layer that parallel processing is applied.
According to the algorithm
PN number of partitions
WN number of worker nodes in the pool
share is the number of partitions each worker is
supposed to get

Check if there is still remaining partitions
waiting

Assignments
First rmg of worker nodes assigned share1
And others (WN-rmg) are assigned share number of
partitions

26
Vertical partitioning in case of having 5
partitions
27
Data Access Timings-No Cached Data-

Tdata access Tquery conversion (getFeature to
SQL) TGML conversion TStreaming the data from
WFS to federator
TBuilding GML at federator

Federator
WFS
DB
28
Overhead and Response Timings ex. case
10-threaded parallel processing

The performance does not increase in the same
ratio at which the thread number increases
Overheads Query partitioning, sub-query
creation, map creation and map transfer.
There is no performance gain for less then a
threshold-data size handled.

Federator
Event-based dynamic map tools
WFS
WFS
DB
Browser
29
Partial Usage of Cached Data (Ex. case1/2 cached)
Comparison of the response times Comparison of the response times Comparison of the response times Comparison of the response times Comparison of the response times Comparison of the response times
Data Half cache-10 thrd Half cache-10 thrd NO Cache-10 thrd NO Cache-10 thrd NO Cache-Single thrd NO Cache-Single thrd
MB Avg. Time StdDev avg time std dev Avg. Time StdDev
0.01 3,095.19 204.22 2,329.50 131.46 1,808.13 140.32
0.1 3,576.73 283.8 2,760.00 104.35 2,635.46 313.48
0.5 3,721.77 210.41 3,460.40 120.24 5,001.29 238.94
1 4,311.73 192.45 4,640.53 106.42 8,225.73 200.27
5 11,294.58 313.59 16,725.4 201.62 33,419.31 394.48
10 18,371.72 296.19 23,118.4 941.83 64,506.78 283.24

There is no performance gain for the small sizes
of data due to the overheads.
For 10mb, the proposed system is almost 4 times
faster than the ordinary on-demand one-threaded
system.
The performance gain increases
As the data size increases.
As the overlapped cached region increase
100 overlapping -gt look like pre-fetching case

WFS
DB
WFS
CT
Fedrtr
WFS
30
Conclusions

Streaming data transfer techniques allow data
rendering even on partially returned data.
Pull parsing results in best outcomes for XML
encoded GML data rendering - Eliminating the
requirement of data validation.
Federators natural characteristic allowed us
develop advanced caching and parallel processing
designs.
Pre-fetching and parallel-processing techniques
are mutually exclusive.
Best performance outcomes are achieved through
pre- fetching but can cause data inconsistency .
Triggering periodicity must be defined carefully.
Parallel-processing techniques success is based
on how well we share the workload to worker
nodes.
Un-evenly distributed and variable sized geo-data
characteristics.
We saw that
Application of working-window and locality
principles by means of client-based caching.
Parallel processing through attribute-based query
decomposition
Helped us increase the system responsiveness to a
greater extent.

31
Conclusions General Framework

Heterogeneous data sources are queried as a
single resource
Heterogeneous Autonomous local resources
controlling definition of data
Single resource Remove the burden of
individually accessing each data source with
ad-hoc query languages.
WFS-based mediation
Data and query conversions
Easy extension with new data and service
resources
Open Geographic and Web Service standards
No physical data integration
Data always at local source
Easy maintenance of data and high degree of
autonomy
Seamless interaction with the system through
integrated data views as multi-layered map images

32
Contributions

A federated Service-oriented Geographic
Information Systems framework
Integrating Web Services with Open Geographic
Standards to support interoperability at both
data and service levels
Production of knowledge from distributed data
sources in multi-layered map images.
Hierarchical data definitions through capability
metadata federations
Enabling unified interactive data access/query
and display.
Investigated performance efficient designs and
did detailed benchmarking
Streaming GIS Web Services
Federator-oriented high-performance design
techniques
Pre-fetching
Client-based caching Working-window and
locality principles
Parallel processing through attribute-based query
decomposition

33
Acknowledgement

The work described in this presentation is part
of the QuakeSim project which is supported by the
Advanced Information Systems Technology Program
of NASA's Earth-Sun System Technology Office.
Galip Aydin Web Feature Server (WFS)

34
Thanks!....
35
BACK-UP SLIDES
36
Capability-based Federation of the standard Web
Service Components

Built over the proposed standard Web Service
components and common data models
Unified data access/query/display from a single
access point
Providing application-based hierarchical data
definitions
layer based data and service (WMS and WFS)
compositions
Federation is done by aggregating GIS Web
Services capabilities metadata
Capability is basically a metadata about
dataservice
Servers information content and acceptable
request parameter values

Application-based hierarchical data
Application- Pattern Informatics
Layer-1 State-boundary over Satellite
Data-1
State-boundary (WFS-1)
Data-2
Satellite-Image(WMS-2)
Layer-2
Google map (WMS-1)
Layer-3- Earthquake-Seismic
Data-1
Earthquake-Seismic(WFS-3)

a, b, c and d
a
Sample Layers for PI

NASA satellite layer
Earthquake-seismic layer
Google Map Layer
State-boundaries Layer

c
b
d
Events - Move, - Zooming
in/out - Panning (drag-drop) -
Rectangular region - Distance calc.
- Attribute querying
37
Hierarchical data Integrated data-view
1
2
3
1 Google map layer 2 States boundary lines
layer 3 seismic data layer
Event-based Interactive Tools Query and data
analysis over integrated data views
38
(No Transcript)
39

Integrated views
Event-based querying through integrated views.
WFS-based mediators
XML-based query language
Federation related specific related works (might
not be active)
MIX mediation of information using XML
SRB/MCAT (SDSC)
TSIMMIS (Stanford Univ)
XML-based standard queries for the standard
services.
Capability gives the list of data provided,
attribute lists they can be queried and
constraints on the queries to make create valid
requests such as getMap, getFeature.)
We do syntactical and structural integration.

40
Hierarchical data / Integrated data-viewFor
IEISS Geo-science Application

Application-based hierarchical data
Application- IEISS
Layer-1 Gas-pipeline over Satellite
Data-1
Gas-pipeline (WFS-1)
Data-2
Satellite-Image(WMS-2)
Layer-2
Google map (WMS-1)
Layer-3- Electric-power
Data-1
Electric-power(WFS-3)

41
GetCapabilities Schema and Sample Request Instance
42
GetMap Schema and Sample Request Instance
43
(No Transcript)
44
Event-based Interactive Map Tools

ltevent_controllergt
ltevent name"init" class"Path.InitListener"
next"map.jsp"/gt
ltevent name"REFRESH" class" Path.InitListener "
next"map.jsp"/gt
ltevent name"ZOOMIN" class" Path.InitListener "
next"map.jsp"/gt
ltevent name"ZOOMOUT" class"Path.InitListener"
next"map.jsp"/gt
ltevent name"RECENTER" class"Path.InitListenerne
xt"map.jsp"/gt
ltevent name"RESET" class" Path.InitListener "
next"map.jsp"/gt
ltevent name"PAN" class" Path.InitListener "
next"map.jsp"/gt
ltevent name"INFO" class" Path.InitListener "
next"map.jsp"/gt
lt/event_controllergt

45
Sample GML document
46
Sample GetFeature Request Instance
47
A Template simple capabilities file for a WMS
48
Generalizing the Problem Domain

Query heterogeneous data sources as a single
resource
Heterogeneous local resource controls definition
of the data
Single resource remove the burden of
individually accessing each data source
Easy extension with new data and service
resources
No real integration of data
Data always at local source
Easy maintenance of data
Seamless interaction with the system
Collaborative decision makings

Client/User-Query
Integrated View
federation services
Mediator
Mediator
Mediator
DB
Files
Data in files, HTML, XML/Relational Databases,
Spatial Sources/sensors
49
Generalization of the Proposed Architecture

GIS-style information model can be redefined in
any application areas such as Chemistry and
Astronomy
Application Specific Information Systems (ASIS).
We need to define Application Specific
Language (ASL) -gt GML expressing domain specific
features, semantic of data
Feature Service (ASFS) -gt WFS Serving data in
common language (ASL)
Visualization Services (ASVS) -gt WMS Visualizes
information and provide a way of navigating ASFS
compatible/mediated data resources
Capabilities metadata for ASVS and ASFS.

We need to define Application Specific
Federator federating the capabilities of
distributed ASVS and ASFS to create
application-based hierarchy of distributed data
and service resources.
Mediators Query and data format conversions
Data sources maintain their internal structure
Large degree of autonomy
No actual physical data integration

Unified data query/access/display
Federator ASVS
1
3
1
2
4
2
Mediator
Mediator
Standard service API
Standard service API
3
Capability Federation ASL-Rendering Standard
service API
50
Contributions (Systems Software)

Developing Web Map Server (WMS) in Open
Geographic Standards
Extended with Web Service Standards and
Streaming map creation capabilities
Developing GIS Federator
Provides application specific layer-structured
hierarchical data as a composition of distributed
standard GIS Web Service components
Enable uniform data access and query
Interactive map tools for data display, query and
analysis.
Browser and event-based.
Extended with AJAX (Asynchronous Java and XML)

Write a Comment

User Comments (0)

About PowerShow.com

High-Performance,%20Federated%20and%20Service-Oriented%20Geographic%20Information%20Systems - PowerPoint PPT Presentation

High-Performance,%20Federated%20and%20Service-Oriented%20Geographic%20Information%20Systems

Interoperable Service-oriented Geographic Information Systems ... Emergency early warning systems. Home-land security and natural disasters. 4. Research Issues ... – PowerPoint PPT presentation