Title: High-Performance,%20Federated%20and%20Service-Oriented%20Geographic%20Information%20Systems
1High-Performance, Federated and Service-Oriented
Geographic Information Systems
- Ahmet Sayar
- (asayar_at_cs.indiana.edu)
- Advisor Prof. Geoffrey C. Fox
2Outline
- Motivations
- Research Issues
- Architecture Federated Service-Oriented
Geographic Information System - Performance enhancing designs - measurements and
analysis - Conclusions
3Geographic Information Systems (GIS)
- GIS is a system for creating, storing, sharing,
analyzing, manipulating and displaying geo-data
and associated attributes. - Inherently requires federation (see the figure)
- Autonomy for scalability flexibility and
extensibility - Distributed data access for geo-data resources
(databases, digital libraries etc.) - Utilizing remote analysis, simulation or
visualization tools. - Open Standards
- OGC
- ISO/TC-211
4Motivations
- Requirements for
- Interoperable Service-oriented Geographic
Information Systems - Necessity for sharing and integrating
heterogeneous data and computation resources to
produce knowledge. - Uniform data access/query, display and analysis
from a single access point - Responsive and interactive information systems
- GIS applications require quick response
- Emergency early warning systems
- Home-land security and natural disasters.
5Research Issues
- Interoperability
- Defining component based Service-oriented GIS
data Grid framework - Adoption of Open Geographic Standards -data
model and services - Applying Web Service principles to GIS data
services - Integrating Web Service and Open Geographic
Standards - Federation
- Capability-based federation of GIS Web Service
components - Unified data access/query, display from a single
access point through integrated data-views - Addressing high-performance support for
responsiveness - Streaming GIS Web Services and Pre-fetching
framework - Client-based caching
- Parallel processing through attribute based query
decomposition
6Web Service components and data-flow
Service-oriented GIS
- WMS are data rendering services providing human
comprehensible data (binary map images) - WFS are data services providing data in common
data model GML Geographic Markup Language - behaving as mediator and annotation services.
- WMS and WFS have their own type of capability
metadata defined by Open Geographic specs. - Inter-service communication is done through
getCapability service interface. - UDDI based registry services.
- Components are Web Services and all control goes
through SOAP messages - XML-based query language (standard schema)
- Built over
- Web Services standards (WS-I) and
- Open Geographic Standards (OGC and ISO/TC-211)
- Consists of two types of online services
- Web Map Services (WMS) and Web Feature Services
(WFS) - And two types of data
- Binary data map images (provided by WMS),
- Structured-data GML content (core data) and
presentation (attribute and geometry elements)
(provided by WFS)
Relation of the components and data flow
GIS
WMS GML rendering
WFS (mediator)
wsdl
wsdl
Binary data
GML
getCapability getMap getFeatureInfo
getCapability getFeature DescribeFeatureType
7Capability-based Federation of Standard GIS Web
Service Components
- Built over the proposed standard Web Service
components and common data models - Federation is done by aggregating GIS Web
Services capabilities metadata - Inspired from OGCs cascading WMS
- Unified data access/query/display from a single
access point - Providing application-based hierarchical data
definitions - layer based data and service (WMS and WFS)
compositions - Capability is basically a metadata about
dataservice - Servers information content and acceptable
request parameter values
8Why Capability metadata
- Web Services provide key low level capability but
do not define an information or data architecture
- These are left to domain specific capabilities
metadata and data description language (GML). - Machine and human readable information
- Enables easy integration and federation
- Enables developing application based standard
interactive re-usable tools - for data query display and analysis
- Seamless data/access/query
9High-performance Support for Responsive GIS
- Designs, measurements and analysis
10Performance Investigation
- Interoperability requirements bring up some
compliance costs - Common data model (GML)
- Web Services (SOAP protocol for communication)
- Approaches Enhancing the GIS systems
responsiveness - Data transfer and rendering
- Streaming GIS Web Services (1)
- Structured/annotated GML data rendering (2)
- Federator-oriented approaches
- Pre-fetching (3)
- Client-based caching (4)
- Query decomposition and parallel processing (5)
- Testing with large scale Geo-science applications
- Earthquake forecasting (PI),
- Virtual California (VC)
- Aim Turning compliance requirements into
competitiveness
11Conventional OGC-GIS systemsBaseline Performance
Test
- Naïve approach is characterized as
- Stateless services
- On-demand data access,
- Single-threaded and no-caching
- Systems developed with Open Geographic Standards
have - High degree of interoperability but poor
performance results
Test Setup
12(1) Streaming GIS Web-Services
- Concern is large-sized XML-structured data
transfer - XML representation of data tend to be
significantly larger than binary representations - The larger data sizes consume the greater network
bandwidth - We still need to use it for interoperability
reasons - In initial development of the proposed
Service-oriented GIS we used GIS Web Services and
SOAP over HTTP as transfer protocol. - BUT, this had some limitations over the
performance. - We investigated Streaming Data Transfer
- topic-based publish-subscribe messaging systems
for exchanging SOAP messages and data payloads.
13(1) Streaming GIS Web-Services (Cont)
- Lines 1, 2 and 3 show classic publish-find-bind
triangle of Web Services - SOAP is used for negotiation (line-3) standard
getFeature request - Publisher information in (topic, IP, port) triple
is returned. - Publisher streams, subscriber receives.
- The performance gain is average 40
14(2) GML Data Processing
- Processing XML data Parsing and rendering to
create map images. - Two well-known approaches are document models
(DOM) and push models (SAX). - We use pull approach for XML processing
- Parses only what is asked for
- No support for document validation (major gains
of performance) - Doesnt build complete object model in memory
(unlike DOM) - Contents are returned directly to application
from calls to parser (unlike SAX)
Data Size Total rendering timings (1GB allocated VM) Total rendering timings (1GB allocated VM)
(KB) DOM (dom4j) pull (Xpp)
1 469.22 15.59
10 494.06 72.81
100 625.54 183.06
1,000 760.20 270.47
5,000 1,422.91 671.74
10,000 3,557.44 1,025.67
100,000 -OUT OF MEM - 7,059.72
150,000 -OUT OF MEM - 11,047.89
200,000 -OUT OF MEM - 14,949.12
15Federator-Oriented Performance Enhancing Designs
16(3) Pre-fetching
- Getting the GML data before it is needed
- Extension for Pre-fetching Module is shown in
grey region - Overcomes the network bandwidth problem and
repeated data conversions. - This technique is good for infrequently changing
archived data - In other case, it might cause consistency problem
- Red curve map rendering over the pre-fetched
data (ready to use GML data) - Black curve map rendering through on-demand
fetching
PR runs pre-defined task in pre-defined
periodicity
17(3) Pre-fetching vs. On-demand Fetching
Data Size MB Average Response Pre-fetching StdDev Average Response On-demand StdDev
0.01 19,261.90 481.57 1,808.13 140.32
0.1 19,112.30 673.69 2,635.46 313.48
0.5 19,222.48 631.35 5,001.29 238.94
1 19,427.48 305.94 8,225.73 200.27
5 20,146.00 516.50 33,419.31 394.48
10 20,165.90 546.53 64,506.78 283.24
50 22,882.52 509.98 316,906.00 623.08
100 23,990.43 603.59 643,344.00 548.65
- For 100MB, pre-fetching is about 30 times faster
conventional on-demand fetching. - The larger the data size the higher the
performance gains.
18(4) Client-based Caching
- Each client has separate caching area allocated.
- Application of working-window and locality
principles into map images rendering - Clients are differentiated according to the
client assigned session-id parameter in the
header of queries. - Always keep the least recently-used data
- Brings up some overhead to keep up working-window
for each client.
19Brief Architecture
Server-side
Create identity card. Update at every request
from the client
- FormerRequest Class
- String uuid /unique-user-id/
- String bbox /bounding box of the users last
request/ - Double density /data size falling into
per unit square/ - Vector feature_data /geometry
elements of the last request/
Register to client table
uuid-1 FormerRequest-1
uuid-2 FormerRequest-2
..
Set identity to message header
Client-side
ClientWSStub binding binding (ClientWSStub )
new ServiceLocator().WMSServices(
servaddress)) String sessionID
session.getid() //uuid-1 String channel_name
getMapChannel /Add SessionID to the SOAP
messages header/ binding.setHeader(service_addre
ss, channel_name, sessionID) Map mymap
binding.getMap(request)
20Why Client-based Caching
- Makes stateless GIS Web Services stateful
- Allows share workload as equal as possible for
the most efficient parallel processing. - Comparing with Google-like Map Servers
- In large scale applications it is impossible to
cache whole data - Limited storage and computation capabilities
- Google-like map servers are fast because
- They replace computation with storage.
- Pre-making all images and cut up into tiles
- They formalize the accepted requests in terms of
parameters, and responses in terms of the tile
compositions. - BUT, good for only the client-server based
applications - It cant be applied to distributed dynamic data
rendering and extensible applications. - They dont deal with the feature enriched maps
enabling attribute-based querying, - And structured/annotated scientific data
rendering.
21(5) Parallel Processing over Client-based Caching
Main query ? cached-data extraction ?
rectangulation - RectanglesRi ?partitioning
sub-queries ri ? assigning separate threads
? assembling the results
1
2
3
Successive request
Cached Data
4
22Challenge Geo-Data Characteristic
- A point data is described with location attribute
- (x, y) coordinates.
- Linestrings, polylines, polygons etc are defined
as set of points. - Data sets falling into a queried region is
formulated as bounding box (bbox) - Coordinates of a rectangle (a, b, c, d)
- Geo-data is characterized as un-evenly
distributed and variable sized according to their
locations attributes. - Ex. Human population
- Need for advanced techniques for workload sharing
!
23Attribute-based Query Decomposition
- Cached data extraction
- Rectangulation over the remaining R1, R2, R3,
R4 - Each rectangle goes through partitioning process.
- Blind partitioning
- Such as first time queries
- Uses default partitioning number
- Smart partitioning
- client-based caching
- FormerRequest Object
- All partitions are assigned to separate threads
and results are merged to create final response
R3
R2
R1
R1
R2
R2
R4
R1
Partition into 4
24Smart Partitioning through Client-based Caching
- Based-on the locality principles.
- Assumption Former and current requests have
similar data density - Cached data area
- CD_size_br2 (maxxc - minxc)(maxyc -
minyc) - Main-query area
- R_size_br2 (maxx - minx)(maxy - miny)
- Thr Pre-defined threshold value changing from
data to data. - Pn The number of partitions calculated for a
rectangle
(maxxc, maxyc)
Determining the most efficient number of
partitions (Pn)
(maxx, maxy)
Cache
Query
(minxc, minyc)
(minx, miny)
If Pn gt 2 Cut the rectangle into Pn number of
equal sized regions.
25Assigning Partitions to Workers
- Partitions are assigned to the worker nodes in
round-robin fashion. - We keep a pool of worker nodes for each feature
layer that parallel processing is applied. - According to the algorithm
- PN number of partitions
- WN number of worker nodes in the pool
- share is the number of partitions each worker is
supposed to get
- Check if there is still remaining partitions
waiting
- Assignments
- First rmg of worker nodes assigned share1
- And others (WN-rmg) are assigned share number of
partitions
26Vertical partitioning in case of having 5
partitions
27Data Access Timings-No Cached Data-
- Tdata access Tquery conversion (getFeature to
SQL) TGML conversion TStreaming the data from
WFS to federator - TBuilding GML at federator
Federator
WFS
DB
28Overhead and Response Timings ex. case
10-threaded parallel processing
- The performance does not increase in the same
ratio at which the thread number increases - Overheads Query partitioning, sub-query
creation, map creation and map transfer. - There is no performance gain for less then a
threshold-data size handled.
Federator
Event-based dynamic map tools
WFS
WFS
DB
Browser
29Partial Usage of Cached Data (Ex. case1/2 cached)
Comparison of the response times Comparison of the response times Comparison of the response times Comparison of the response times Comparison of the response times Comparison of the response times
Data Half cache-10 thrd Half cache-10 thrd NO Cache-10 thrd NO Cache-10 thrd NO Cache-Single thrd NO Cache-Single thrd
MB Avg. Time StdDev avg time std dev Avg. Time StdDev
0.01 3,095.19 204.22 2,329.50 131.46 1,808.13 140.32
0.1 3,576.73 283.8 2,760.00 104.35 2,635.46 313.48
0.5 3,721.77 210.41 3,460.40 120.24 5,001.29 238.94
1 4,311.73 192.45 4,640.53 106.42 8,225.73 200.27
5 11,294.58 313.59 16,725.4 201.62 33,419.31 394.48
10 18,371.72 296.19 23,118.4 941.83 64,506.78 283.24
- There is no performance gain for the small sizes
of data due to the overheads. - For 10mb, the proposed system is almost 4 times
faster than the ordinary on-demand one-threaded
system. - The performance gain increases
- As the data size increases.
- As the overlapped cached region increase
- 100 overlapping -gt look like pre-fetching case
WFS
DB
WFS
CT
Fedrtr
WFS
30Conclusions
- Streaming data transfer techniques allow data
rendering even on partially returned data. - Pull parsing results in best outcomes for XML
encoded GML data rendering - Eliminating the
requirement of data validation. - Federators natural characteristic allowed us
develop advanced caching and parallel processing
designs. - Pre-fetching and parallel-processing techniques
are mutually exclusive. - Best performance outcomes are achieved through
pre- fetching but can cause data inconsistency . - Triggering periodicity must be defined carefully.
- Parallel-processing techniques success is based
on how well we share the workload to worker
nodes. - Un-evenly distributed and variable sized geo-data
characteristics. - We saw that
- Application of working-window and locality
principles by means of client-based caching. - Parallel processing through attribute-based query
decomposition - Helped us increase the system responsiveness to a
greater extent.
31Conclusions General Framework
- Heterogeneous data sources are queried as a
single resource - Heterogeneous Autonomous local resources
controlling definition of data - Single resource Remove the burden of
individually accessing each data source with
ad-hoc query languages. - WFS-based mediation
- Data and query conversions
- Easy extension with new data and service
resources - Open Geographic and Web Service standards
- No physical data integration
- Data always at local source
- Easy maintenance of data and high degree of
autonomy - Seamless interaction with the system through
integrated data views as multi-layered map images
32Contributions
- A federated Service-oriented Geographic
Information Systems framework - Integrating Web Services with Open Geographic
Standards to support interoperability at both
data and service levels - Production of knowledge from distributed data
sources in multi-layered map images. - Hierarchical data definitions through capability
metadata federations - Enabling unified interactive data access/query
and display. - Investigated performance efficient designs and
did detailed benchmarking - Streaming GIS Web Services
- Federator-oriented high-performance design
techniques - Pre-fetching
- Client-based caching Working-window and
locality principles - Parallel processing through attribute-based query
decomposition
33Acknowledgement
- The work described in this presentation is part
of the QuakeSim project which is supported by the
Advanced Information Systems Technology Program
of NASA's Earth-Sun System Technology Office. - Galip Aydin Web Feature Server (WFS)
34Thanks!....
35BACK-UP SLIDES
36Capability-based Federation of the standard Web
Service Components
- Built over the proposed standard Web Service
components and common data models - Unified data access/query/display from a single
access point - Providing application-based hierarchical data
definitions - layer based data and service (WMS and WFS)
compositions - Federation is done by aggregating GIS Web
Services capabilities metadata - Capability is basically a metadata about
dataservice - Servers information content and acceptable
request parameter values
- Application-based hierarchical data
- Application- Pattern Informatics
- Layer-1 State-boundary over Satellite
- Data-1
- State-boundary (WFS-1)
- Data-2
- Satellite-Image(WMS-2)
- Layer-2
- Google map (WMS-1)
- Layer-3- Earthquake-Seismic
- Data-1
- Earthquake-Seismic(WFS-3)
a, b, c and d
a
Sample Layers for PI
- NASA satellite layer
- Earthquake-seismic layer
- Google Map Layer
- State-boundaries Layer
c
b
d
Events - Move, - Zooming
in/out - Panning (drag-drop) -
Rectangular region - Distance calc.
- Attribute querying
37Hierarchical data Integrated data-view
1
2
3
1 Google map layer 2 States boundary lines
layer 3 seismic data layer
Event-based Interactive Tools Query and data
analysis over integrated data views
38(No Transcript)
39- Integrated views
- Event-based querying through integrated views.
- WFS-based mediators
- XML-based query language
- Federation related specific related works (might
not be active) - MIX mediation of information using XML
- SRB/MCAT (SDSC)
- TSIMMIS (Stanford Univ)
- XML-based standard queries for the standard
services. - Capability gives the list of data provided,
attribute lists they can be queried and
constraints on the queries to make create valid
requests such as getMap, getFeature.) - We do syntactical and structural integration.
40Hierarchical data / Integrated data-viewFor
IEISS Geo-science Application
- Application-based hierarchical data
- Application- IEISS
- Layer-1 Gas-pipeline over Satellite
- Data-1
- Gas-pipeline (WFS-1)
- Data-2
- Satellite-Image(WMS-2)
- Layer-2
- Google map (WMS-1)
- Layer-3- Electric-power
- Data-1
- Electric-power(WFS-3)
41GetCapabilities Schema and Sample Request Instance
42GetMap Schema and Sample Request Instance
43(No Transcript)
44Event-based Interactive Map Tools
- ltevent_controllergt
- ltevent name"init" class"Path.InitListener"
next"map.jsp"/gt - ltevent name"REFRESH" class" Path.InitListener "
next"map.jsp"/gt - ltevent name"ZOOMIN" class" Path.InitListener "
next"map.jsp"/gt - ltevent name"ZOOMOUT" class"Path.InitListener"
next"map.jsp"/gt - ltevent name"RECENTER" class"Path.InitListenerne
xt"map.jsp"/gt - ltevent name"RESET" class" Path.InitListener "
next"map.jsp"/gt - ltevent name"PAN" class" Path.InitListener "
next"map.jsp"/gt - ltevent name"INFO" class" Path.InitListener "
next"map.jsp"/gt - lt/event_controllergt
45Sample GML document
46Sample GetFeature Request Instance
47A Template simple capabilities file for a WMS
48Generalizing the Problem Domain
- Query heterogeneous data sources as a single
resource - Heterogeneous local resource controls definition
of the data - Single resource remove the burden of
individually accessing each data source - Easy extension with new data and service
resources - No real integration of data
- Data always at local source
- Easy maintenance of data
- Seamless interaction with the system
- Collaborative decision makings
Client/User-Query
Integrated View
federation services
Mediator
Mediator
Mediator
DB
Files
Data in files, HTML, XML/Relational Databases,
Spatial Sources/sensors
49Generalization of the Proposed Architecture
- GIS-style information model can be redefined in
any application areas such as Chemistry and
Astronomy - Application Specific Information Systems (ASIS).
- We need to define Application Specific
- Language (ASL) -gt GML expressing domain specific
features, semantic of data - Feature Service (ASFS) -gt WFS Serving data in
common language (ASL) - Visualization Services (ASVS) -gt WMS Visualizes
information and provide a way of navigating ASFS
compatible/mediated data resources - Capabilities metadata for ASVS and ASFS.
- We need to define Application Specific
- Federator federating the capabilities of
distributed ASVS and ASFS to create
application-based hierarchy of distributed data
and service resources. - Mediators Query and data format conversions
- Data sources maintain their internal structure
- Large degree of autonomy
- No actual physical data integration
Unified data query/access/display
Federator ASVS
1
3
1
2
4
2
Mediator
Mediator
Standard service API
Standard service API
3
Capability Federation ASL-Rendering Standard
service API
50Contributions (Systems Software)
- Developing Web Map Server (WMS) in Open
Geographic Standards - Extended with Web Service Standards and
- Streaming map creation capabilities
- Developing GIS Federator
- Provides application specific layer-structured
hierarchical data as a composition of distributed
standard GIS Web Service components - Enable uniform data access and query
- Interactive map tools for data display, query and
analysis. - Browser and event-based.
- Extended with AJAX (Asynchronous Java and XML)