High-Performance,%20Federated%20and%20Service-Oriented%20Geographic%20Information%20Systems - PowerPoint PPT Presentation

About This Presentation
Title:

High-Performance,%20Federated%20and%20Service-Oriented%20Geographic%20Information%20Systems

Description:

Interoperable Service-oriented Geographic Information Systems ... Emergency early warning systems. Home-land security and natural disasters. 4. Research Issues ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 51
Provided by: asa2
Category:

less

Transcript and Presenter's Notes

Title: High-Performance,%20Federated%20and%20Service-Oriented%20Geographic%20Information%20Systems


1
High-Performance, Federated and Service-Oriented
Geographic Information Systems
  • Ahmet Sayar
  • (asayar_at_cs.indiana.edu)
  • Advisor Prof. Geoffrey C. Fox

2
Outline
  • Motivations
  • Research Issues
  • Architecture Federated Service-Oriented
    Geographic Information System
  • Performance enhancing designs - measurements and
    analysis
  • Conclusions

3
Geographic Information Systems (GIS)
  • GIS is a system for creating, storing, sharing,
    analyzing, manipulating and displaying geo-data
    and associated attributes.
  • Inherently requires federation (see the figure)
  • Autonomy for scalability flexibility and
    extensibility
  • Distributed data access for geo-data resources
    (databases, digital libraries etc.)
  • Utilizing remote analysis, simulation or
    visualization tools.
  • Open Standards
  • OGC
  • ISO/TC-211

4
Motivations
  • Requirements for
  • Interoperable Service-oriented Geographic
    Information Systems
  • Necessity for sharing and integrating
    heterogeneous data and computation resources to
    produce knowledge.
  • Uniform data access/query, display and analysis
    from a single access point
  • Responsive and interactive information systems
  • GIS applications require quick response
  • Emergency early warning systems
  • Home-land security and natural disasters.

5
Research Issues
  • Interoperability
  • Defining component based Service-oriented GIS
    data Grid framework
  • Adoption of Open Geographic Standards -data
    model and services
  • Applying Web Service principles to GIS data
    services
  • Integrating Web Service and Open Geographic
    Standards
  • Federation
  • Capability-based federation of GIS Web Service
    components
  • Unified data access/query, display from a single
    access point through integrated data-views
  • Addressing high-performance support for
    responsiveness
  • Streaming GIS Web Services and Pre-fetching
    framework
  • Client-based caching
  • Parallel processing through attribute based query
    decomposition

6
Web Service components and data-flow
Service-oriented GIS
  • WMS are data rendering services providing human
    comprehensible data (binary map images)
  • WFS are data services providing data in common
    data model GML Geographic Markup Language
  • behaving as mediator and annotation services.
  • WMS and WFS have their own type of capability
    metadata defined by Open Geographic specs.
  • Inter-service communication is done through
    getCapability service interface.
  • UDDI based registry services.
  • Components are Web Services and all control goes
    through SOAP messages
  • XML-based query language (standard schema)
  • Built over
  • Web Services standards (WS-I) and
  • Open Geographic Standards (OGC and ISO/TC-211)
  • Consists of two types of online services
  • Web Map Services (WMS) and Web Feature Services
    (WFS)
  • And two types of data
  • Binary data map images (provided by WMS),
  • Structured-data GML content (core data) and
    presentation (attribute and geometry elements)
    (provided by WFS)

Relation of the components and data flow
GIS
WMS GML rendering
WFS (mediator)
wsdl
wsdl
Binary data
GML
getCapability getMap getFeatureInfo
getCapability getFeature DescribeFeatureType
7
Capability-based Federation of Standard GIS Web
Service Components
  • Built over the proposed standard Web Service
    components and common data models
  • Federation is done by aggregating GIS Web
    Services capabilities metadata
  • Inspired from OGCs cascading WMS
  • Unified data access/query/display from a single
    access point
  • Providing application-based hierarchical data
    definitions
  • layer based data and service (WMS and WFS)
    compositions
  • Capability is basically a metadata about
    dataservice
  • Servers information content and acceptable
    request parameter values

8
Why Capability metadata
  • Web Services provide key low level capability but
    do not define an information or data architecture
  • These are left to domain specific capabilities
    metadata and data description language (GML).
  • Machine and human readable information
  • Enables easy integration and federation
  • Enables developing application based standard
    interactive re-usable tools
  • for data query display and analysis
  • Seamless data/access/query

9
High-performance Support for Responsive GIS
  • Designs, measurements and analysis

10
Performance Investigation
  • Interoperability requirements bring up some
    compliance costs
  • Common data model (GML)
  • Web Services (SOAP protocol for communication)
  • Approaches Enhancing the GIS systems
    responsiveness
  • Data transfer and rendering
  • Streaming GIS Web Services (1)
  • Structured/annotated GML data rendering (2)
  • Federator-oriented approaches
  • Pre-fetching (3)
  • Client-based caching (4)
  • Query decomposition and parallel processing (5)
  • Testing with large scale Geo-science applications
  • Earthquake forecasting (PI),
  • Virtual California (VC)
  • Aim Turning compliance requirements into
    competitiveness

11
Conventional OGC-GIS systemsBaseline Performance
Test
  • Naïve approach is characterized as
  • Stateless services
  • On-demand data access,
  • Single-threaded and no-caching
  • Systems developed with Open Geographic Standards
    have
  • High degree of interoperability but poor
    performance results

Test Setup
12
(1) Streaming GIS Web-Services
  • Concern is large-sized XML-structured data
    transfer
  • XML representation of data tend to be
    significantly larger than binary representations
  • The larger data sizes consume the greater network
    bandwidth
  • We still need to use it for interoperability
    reasons
  • In initial development of the proposed
    Service-oriented GIS we used GIS Web Services and
    SOAP over HTTP as transfer protocol.
  • BUT, this had some limitations over the
    performance.
  • We investigated Streaming Data Transfer
  • topic-based publish-subscribe messaging systems
    for exchanging SOAP messages and data payloads.

13
(1) Streaming GIS Web-Services (Cont)
  • Lines 1, 2 and 3 show classic publish-find-bind
    triangle of Web Services
  • SOAP is used for negotiation (line-3) standard
    getFeature request
  • Publisher information in (topic, IP, port) triple
    is returned.
  • Publisher streams, subscriber receives.
  • The performance gain is average 40

14
(2) GML Data Processing
  • Processing XML data Parsing and rendering to
    create map images.
  • Two well-known approaches are document models
    (DOM) and push models (SAX).
  • We use pull approach for XML processing
  • Parses only what is asked for
  • No support for document validation (major gains
    of performance)
  • Doesnt build complete object model in memory
    (unlike DOM)
  • Contents are returned directly to application
    from calls to parser (unlike SAX)

Data Size Total rendering timings (1GB allocated VM) Total rendering timings (1GB allocated VM)
(KB) DOM (dom4j) pull (Xpp)
1 469.22 15.59
10 494.06 72.81
100 625.54 183.06
1,000 760.20 270.47
5,000 1,422.91 671.74
10,000 3,557.44 1,025.67
100,000 -OUT OF MEM - 7,059.72
150,000 -OUT OF MEM - 11,047.89
200,000 -OUT OF MEM - 14,949.12
15
Federator-Oriented Performance Enhancing Designs
16
(3) Pre-fetching
  • Getting the GML data before it is needed
  • Extension for Pre-fetching Module is shown in
    grey region
  • Overcomes the network bandwidth problem and
    repeated data conversions.
  • This technique is good for infrequently changing
    archived data
  • In other case, it might cause consistency problem
  • Red curve map rendering over the pre-fetched
    data (ready to use GML data)
  • Black curve map rendering through on-demand
    fetching

PR runs pre-defined task in pre-defined
periodicity
17
(3) Pre-fetching vs. On-demand Fetching
Data Size MB Average Response Pre-fetching StdDev Average Response On-demand StdDev
0.01 19,261.90 481.57 1,808.13 140.32
0.1 19,112.30 673.69 2,635.46 313.48
0.5 19,222.48 631.35 5,001.29 238.94
1 19,427.48 305.94 8,225.73 200.27
5 20,146.00 516.50 33,419.31 394.48
10 20,165.90 546.53 64,506.78 283.24
50 22,882.52 509.98 316,906.00 623.08
100 23,990.43 603.59 643,344.00 548.65
  • For 100MB, pre-fetching is about 30 times faster
    conventional on-demand fetching.
  • The larger the data size the higher the
    performance gains.

18
(4) Client-based Caching
  • Each client has separate caching area allocated.
  • Application of working-window and locality
    principles into map images rendering
  • Clients are differentiated according to the
    client assigned session-id parameter in the
    header of queries.
  • Always keep the least recently-used data
  • Brings up some overhead to keep up working-window
    for each client.

19
Brief Architecture
Server-side
Create identity card. Update at every request
from the client
  • FormerRequest Class
  • String uuid /unique-user-id/
  • String bbox /bounding box of the users last
    request/
  • Double density /data size falling into
    per unit square/
  • Vector feature_data /geometry
    elements of the last request/

Register to client table
uuid-1 FormerRequest-1
uuid-2 FormerRequest-2
..
Set identity to message header
Client-side
ClientWSStub binding binding (ClientWSStub )
new ServiceLocator().WMSServices(
servaddress)) String sessionID
session.getid() //uuid-1 String channel_name
getMapChannel /Add SessionID to the SOAP
messages header/ binding.setHeader(service_addre
ss, channel_name, sessionID) Map mymap
binding.getMap(request)
20
Why Client-based Caching
  • Makes stateless GIS Web Services stateful
  • Allows share workload as equal as possible for
    the most efficient parallel processing.
  • Comparing with Google-like Map Servers
  • In large scale applications it is impossible to
    cache whole data
  • Limited storage and computation capabilities
  • Google-like map servers are fast because
  • They replace computation with storage.
  • Pre-making all images and cut up into tiles
  • They formalize the accepted requests in terms of
    parameters, and responses in terms of the tile
    compositions.
  • BUT, good for only the client-server based
    applications
  • It cant be applied to distributed dynamic data
    rendering and extensible applications.
  • They dont deal with the feature enriched maps
    enabling attribute-based querying,
  • And structured/annotated scientific data
    rendering.

21
(5) Parallel Processing over Client-based Caching
Main query ? cached-data extraction ?
rectangulation - RectanglesRi ?partitioning
sub-queries ri ? assigning separate threads
? assembling the results
1
2
3
Successive request
Cached Data
4
22
Challenge Geo-Data Characteristic
  • A point data is described with location attribute
  • (x, y) coordinates.
  • Linestrings, polylines, polygons etc are defined
    as set of points.
  • Data sets falling into a queried region is
    formulated as bounding box (bbox)
  • Coordinates of a rectangle (a, b, c, d)
  • Geo-data is characterized as un-evenly
    distributed and variable sized according to their
    locations attributes.
  • Ex. Human population
  • Need for advanced techniques for workload sharing
    !

23
Attribute-based Query Decomposition
  • Cached data extraction
  • Rectangulation over the remaining R1, R2, R3,
    R4
  • Each rectangle goes through partitioning process.
  • Blind partitioning
  • Such as first time queries
  • Uses default partitioning number
  • Smart partitioning
  • client-based caching
  • FormerRequest Object
  • All partitions are assigned to separate threads
    and results are merged to create final response

R3
R2
R1
R1
R2
R2
R4
R1
Partition into 4
24
Smart Partitioning through Client-based Caching
  • Based-on the locality principles.
  • Assumption Former and current requests have
    similar data density
  • Cached data area
  • CD_size_br2 (maxxc - minxc)(maxyc -
    minyc)
  • Main-query area
  • R_size_br2 (maxx - minx)(maxy - miny)
  • Thr Pre-defined threshold value changing from
    data to data.
  • Pn The number of partitions calculated for a
    rectangle

(maxxc, maxyc)
Determining the most efficient number of
partitions (Pn)
(maxx, maxy)
Cache
Query
(minxc, minyc)
(minx, miny)
If Pn gt 2 Cut the rectangle into Pn number of
equal sized regions.
25
Assigning Partitions to Workers
  • Partitions are assigned to the worker nodes in
    round-robin fashion.
  • We keep a pool of worker nodes for each feature
    layer that parallel processing is applied.
  • According to the algorithm
  • PN number of partitions
  • WN number of worker nodes in the pool
  • share is the number of partitions each worker is
    supposed to get
  • Check if there is still remaining partitions
    waiting
  • Assignments
  • First rmg of worker nodes assigned share1
  • And others (WN-rmg) are assigned share number of
    partitions

26
Vertical partitioning in case of having 5
partitions
27
Data Access Timings-No Cached Data-
  • Tdata access Tquery conversion (getFeature to
    SQL) TGML conversion TStreaming the data from
    WFS to federator
  • TBuilding GML at federator

Federator
WFS
DB
28
Overhead and Response Timings ex. case
10-threaded parallel processing
  • The performance does not increase in the same
    ratio at which the thread number increases
  • Overheads Query partitioning, sub-query
    creation, map creation and map transfer.
  • There is no performance gain for less then a
    threshold-data size handled.

Federator
Event-based dynamic map tools
WFS
WFS
DB
Browser
29
Partial Usage of Cached Data (Ex. case1/2 cached)
Comparison of the response times Comparison of the response times Comparison of the response times Comparison of the response times Comparison of the response times Comparison of the response times
Data Half cache-10 thrd Half cache-10 thrd NO Cache-10 thrd NO Cache-10 thrd NO Cache-Single thrd NO Cache-Single thrd
MB Avg. Time StdDev avg time std dev Avg. Time StdDev
0.01 3,095.19 204.22 2,329.50 131.46 1,808.13 140.32
0.1 3,576.73 283.8 2,760.00 104.35 2,635.46 313.48
0.5 3,721.77 210.41 3,460.40 120.24 5,001.29 238.94
1 4,311.73 192.45 4,640.53 106.42 8,225.73 200.27
5 11,294.58 313.59 16,725.4 201.62 33,419.31 394.48
10 18,371.72 296.19 23,118.4 941.83 64,506.78 283.24
  • There is no performance gain for the small sizes
    of data due to the overheads.
  • For 10mb, the proposed system is almost 4 times
    faster than the ordinary on-demand one-threaded
    system.
  • The performance gain increases
  • As the data size increases.
  • As the overlapped cached region increase
  • 100 overlapping -gt look like pre-fetching case

WFS
DB
WFS
CT
Fedrtr
WFS
30
Conclusions
  • Streaming data transfer techniques allow data
    rendering even on partially returned data.
  • Pull parsing results in best outcomes for XML
    encoded GML data rendering - Eliminating the
    requirement of data validation.
  • Federators natural characteristic allowed us
    develop advanced caching and parallel processing
    designs.
  • Pre-fetching and parallel-processing techniques
    are mutually exclusive.
  • Best performance outcomes are achieved through
    pre- fetching but can cause data inconsistency .
  • Triggering periodicity must be defined carefully.
  • Parallel-processing techniques success is based
    on how well we share the workload to worker
    nodes.
  • Un-evenly distributed and variable sized geo-data
    characteristics.
  • We saw that
  • Application of working-window and locality
    principles by means of client-based caching.
  • Parallel processing through attribute-based query
    decomposition
  • Helped us increase the system responsiveness to a
    greater extent.

31
Conclusions General Framework
  • Heterogeneous data sources are queried as a
    single resource
  • Heterogeneous Autonomous local resources
    controlling definition of data
  • Single resource Remove the burden of
    individually accessing each data source with
    ad-hoc query languages.
  • WFS-based mediation
  • Data and query conversions
  • Easy extension with new data and service
    resources
  • Open Geographic and Web Service standards
  • No physical data integration
  • Data always at local source
  • Easy maintenance of data and high degree of
    autonomy
  • Seamless interaction with the system through
    integrated data views as multi-layered map images

32
Contributions
  • A federated Service-oriented Geographic
    Information Systems framework
  • Integrating Web Services with Open Geographic
    Standards to support interoperability at both
    data and service levels
  • Production of knowledge from distributed data
    sources in multi-layered map images.
  • Hierarchical data definitions through capability
    metadata federations
  • Enabling unified interactive data access/query
    and display.
  • Investigated performance efficient designs and
    did detailed benchmarking
  • Streaming GIS Web Services
  • Federator-oriented high-performance design
    techniques
  • Pre-fetching
  • Client-based caching Working-window and
    locality principles
  • Parallel processing through attribute-based query
    decomposition

33
Acknowledgement
  • The work described in this presentation is part
    of the QuakeSim project which is supported by the
    Advanced Information Systems Technology Program
    of NASA's Earth-Sun System Technology Office.
  • Galip Aydin Web Feature Server (WFS)

34
Thanks!....
35
BACK-UP SLIDES
36
Capability-based Federation of the standard Web
Service Components
  • Built over the proposed standard Web Service
    components and common data models
  • Unified data access/query/display from a single
    access point
  • Providing application-based hierarchical data
    definitions
  • layer based data and service (WMS and WFS)
    compositions
  • Federation is done by aggregating GIS Web
    Services capabilities metadata
  • Capability is basically a metadata about
    dataservice
  • Servers information content and acceptable
    request parameter values
  • Application-based hierarchical data
  • Application- Pattern Informatics
  • Layer-1 State-boundary over Satellite
  • Data-1
  • State-boundary (WFS-1)
  • Data-2
  • Satellite-Image(WMS-2)
  • Layer-2
  • Google map (WMS-1)
  • Layer-3- Earthquake-Seismic
  • Data-1
  • Earthquake-Seismic(WFS-3)

a, b, c and d
a
Sample Layers for PI
  1. NASA satellite layer
  2. Earthquake-seismic layer
  3. Google Map Layer
  4. State-boundaries Layer

c
b
d
Events - Move, - Zooming
in/out - Panning (drag-drop) -
Rectangular region - Distance calc.
- Attribute querying
37
Hierarchical data Integrated data-view
1
2
3
1 Google map layer 2 States boundary lines
layer 3 seismic data layer
Event-based Interactive Tools Query and data
analysis over integrated data views
38
(No Transcript)
39
  • Integrated views
  • Event-based querying through integrated views.
  • WFS-based mediators
  • XML-based query language
  • Federation related specific related works (might
    not be active)
  • MIX mediation of information using XML
  • SRB/MCAT (SDSC)
  • TSIMMIS (Stanford Univ)
  • XML-based standard queries for the standard
    services.
  • Capability gives the list of data provided,
    attribute lists they can be queried and
    constraints on the queries to make create valid
    requests such as getMap, getFeature.)
  • We do syntactical and structural integration.

40
Hierarchical data / Integrated data-viewFor
IEISS Geo-science Application
  • Application-based hierarchical data
  • Application- IEISS
  • Layer-1 Gas-pipeline over Satellite
  • Data-1
  • Gas-pipeline (WFS-1)
  • Data-2
  • Satellite-Image(WMS-2)
  • Layer-2
  • Google map (WMS-1)
  • Layer-3- Electric-power
  • Data-1
  • Electric-power(WFS-3)

41
GetCapabilities Schema and Sample Request Instance
42
GetMap Schema and Sample Request Instance
43
(No Transcript)
44
Event-based Interactive Map Tools
  • ltevent_controllergt
  • ltevent name"init" class"Path.InitListener"
    next"map.jsp"/gt
  • ltevent name"REFRESH" class" Path.InitListener "
    next"map.jsp"/gt
  • ltevent name"ZOOMIN" class" Path.InitListener "
    next"map.jsp"/gt
  • ltevent name"ZOOMOUT" class"Path.InitListener"
    next"map.jsp"/gt
  • ltevent name"RECENTER" class"Path.InitListenerne
    xt"map.jsp"/gt
  • ltevent name"RESET" class" Path.InitListener "
    next"map.jsp"/gt
  • ltevent name"PAN" class" Path.InitListener "
    next"map.jsp"/gt
  • ltevent name"INFO" class" Path.InitListener "
    next"map.jsp"/gt
  • lt/event_controllergt

45
Sample GML document
46
Sample GetFeature Request Instance
47
A Template simple capabilities file for a WMS
48
Generalizing the Problem Domain
  • Query heterogeneous data sources as a single
    resource
  • Heterogeneous local resource controls definition
    of the data
  • Single resource remove the burden of
    individually accessing each data source
  • Easy extension with new data and service
    resources
  • No real integration of data
  • Data always at local source
  • Easy maintenance of data
  • Seamless interaction with the system
  • Collaborative decision makings

Client/User-Query
Integrated View
federation services
Mediator
Mediator
Mediator
DB
Files
Data in files, HTML, XML/Relational Databases,
Spatial Sources/sensors
49
Generalization of the Proposed Architecture
  • GIS-style information model can be redefined in
    any application areas such as Chemistry and
    Astronomy
  • Application Specific Information Systems (ASIS).
  • We need to define Application Specific
  • Language (ASL) -gt GML expressing domain specific
    features, semantic of data
  • Feature Service (ASFS) -gt WFS Serving data in
    common language (ASL)
  • Visualization Services (ASVS) -gt WMS Visualizes
    information and provide a way of navigating ASFS
    compatible/mediated data resources
  • Capabilities metadata for ASVS and ASFS.
  • We need to define Application Specific
  • Federator federating the capabilities of
    distributed ASVS and ASFS to create
    application-based hierarchy of distributed data
    and service resources.
  • Mediators Query and data format conversions
  • Data sources maintain their internal structure
  • Large degree of autonomy
  • No actual physical data integration

Unified data query/access/display
Federator ASVS
1
3
1
2
4
2
Mediator
Mediator
Standard service API
Standard service API
3
Capability Federation ASL-Rendering Standard
service API
50
Contributions (Systems Software)
  • Developing Web Map Server (WMS) in Open
    Geographic Standards
  • Extended with Web Service Standards and
  • Streaming map creation capabilities
  • Developing GIS Federator
  • Provides application specific layer-structured
    hierarchical data as a composition of distributed
    standard GIS Web Service components
  • Enable uniform data access and query
  • Interactive map tools for data display, query and
    analysis.
  • Browser and event-based.
  • Extended with AJAX (Asynchronous Java and XML)
Write a Comment
User Comments (0)
About PowerShow.com