Title: High%20Performance%20Federated%20Geographic%20Information%20Systems
1High Performance Federated Geographic Information
Systems
- Ahmet Sayar
- (asayar_at_cs.indiana.edu)
- Indiana University
- Department of Computer Science
- Advisor Prof. Geoffrey C. Fox
2Geographic Information Systems (GIS)
- GIS is a system for creating, storing, sharing,
analyzing, manipulating and displaying geo-data
and associated attributes. - Distributed nature of geo-data various
client-server models, databases, HTTP, FTP - Modern GIS requires
- Distributed data access for spatial databases
- Utilizing remote analysis, simulation or
visualization tools - Analyses of spatial data in map-based formats
- The primary function of GIS is to display
information as maps with potentially many
different layers of information
Feature enriched multi-layer maps. Each feature
data is collected from distributed resources and
rendered.
3Interoperability Standards
- Two major standards bodies Open Geospatial
Consortium (OGC) and ISO/TC211 - Their aim is to make geographic information and
services neutral and available across any
network, application, or platform - OGC solves the semantic heterogeneity by defining
standards for services and the data model - Web Map Services (WMS) - rendering map images
- Web Feature Services (WFS) serving data in
common data model - Geographic Markup Language (GML) Content and
presentation - Domain specific capability-metadata defining
data/service
4Motivations
- Necessity for sharing and integrating
heterogeneous data resources to produce knowledge - Problems in data and storage heterogeneities
- Burden of individually accessing each data source
- Data access/query do not scale with the data size
increases - Distributed nature of data and ownership
- Interoperability/compliance costs
- GIS require large data movement, processing and
rendering in a responsive manner - Decision making for building early-warning
systems - Crisis management for homeland security and
natural disasters etc.
5Research Issues
- Interoperability Extensibility
- Adoption of domain specific Open Standards -data
model and services - Integrating Web Service principles into some
features of GIS. - Other GIS applications should be able to consume
data without having to do costly format
conversions - Federation
- Capability metadata aggregation of standard GIS
Web Service components - Unified data access/query/display from a single
access point - Generalizing the proposed federated GIS system to
general domains in terms of architectural
principles and requirements - Performance Data access/query optimizations
- Adaptive load balancing and unpredictable
workload estimation for range queries - Parallel data access/query via attribute-based
query decomposition
6Federated Geographic Information System
- Distributed Service Architecture combining
metadatadata enabling - Unified and transparent access to data sources
- Distributed, fault-tolerant and responsive data
access - OGC Open Standards components with standard
service interfaces for serving data and metadata
enabled us to develop such a framework - Architecture is built over standard Web Services,
and is based-on the common data model and
capability metadata defined by OGC standards. - Distributed data sources having metadata.
- Metadata Capability - specific to GIS
- Data is structured/annotated and includes
metadata.
7Federating Standard GIS Web Services
- Since the standard GIS Web Services have standard
service API and capability metadata, they can be
composed by aggregating their capabilities. - Capability is a type of metadata (OGC defined)
- Service/data federation through a Federator
- Collects/harvests domain specific standard
capabilities - Provides a global view of distributed data
sources - Enables heterogeneous data sources to be
integrated into Geo-science Grid applications
-single point of access through standard Web
Service interfaces - Quality of services
- Fine-grained dynamic information presentation
- Enables more complex information creation by
leveraging multiple data sources - Provides stateful access/query over stateless
data services - Enables application of parallel data access
- Just-in-time or late-binding federation
8Geo-data Sets-in common data model-
- Geographic Markup Language (GML)
- XML encoding for the transport and storage of
geographic information - GML allows geographic data and its attributes to
be moved between disparate systems with ease - Separation of content and presentation
- The spatial (attributive) and non-spatial
(geometric) properties of geographic features - Enables display and query together
- Can be processed by many XML tools in various
development environments - Each type of data sets has its own schema
- Composed of standard Geometry schema
(geometry.xsd) and Feature Schema (feature.xsd) - Common data model examples from other domains
- Astronomy -gt VOTable Tabular data representation
in XML - Chemistry -gt CML Chemical data representation in
XML
9Standard Data ComponentsWMS and WFS
- Provide data sets in standard formats with
standard service interfaces - Translate information into common data models
with corresponding metadata - WMS Geo-data rendering services - providing map
images - getCapability, getMap, getFeatureInfo
- WFS Data services - providing data in common
data model - GetCapability, getfeature, describeFeatureType
- Common data model
- WMS Image types (map images)
- WFS GML (XML-encoded)
- SkyServers in Astronomy serve the same purpose as
WMS/WFS in Geo-science - Defined by IVOA Open standards
- Attribute-based uniform access to distributed
heterogeneous resources - Standard data models (VOTable and FITS) are
provided with standard service interfaces
10Federator
- Enables unified data access/query/display over
standard data components - Aggregator of capability metadata of standard
data components - Aggregates, composes and orchestrates WMS and WFS
services - Expresses the compositions in its aggregated
capability file - Federator is a actually a Web Map Server (WMS)
but is extended with federation and display
services - Operates like a WMS to clients and a client to
the other WMS and WFS - Combines information from several resources
(components) - Allows browsing of information from a single
access point - Manages constraints across heterogeneous sites
- Federator is like Storage Resource Broker (SRB)
developed by SDSC - providing storage repository abstraction for
transparent access to multiple types of storage
resources. - SRB uses central metadata catalog server (MCAT)
for discovering data/services. - Our federator uses aggregated capability metadata
file kept in its local disk.
11Capability Metadata-OGC Defined-
- Functions as service metadata, providing
information about what the service offers - Defines the actual operations that are supported
by the service instance, the output formats
offered for those operations, and the URL prefix
for each operation. - Clients determine whether they can work with that
server based on its capabilities. - All OGC services have getCapability service
interfaces and each service type has its own
type of capability schema. - Capability metadata are accessed online through
standard service interface getCapability
12Illustration of Standard Services Capability
Files-with major tag elements-
ltCapabilitiesgt ltServicegt ltNamegt ltOnlineResour
cegt ltContactInfogt lt/Servicegt ltCapabilitygt ltR
equestgt ltGetCapabilitygt
ltGetMapgt ltGetFeaturInfogt lt
/Requestgt ltLayerListgt ltData-1 Satellite
imggt ltData-2 gas-pipelinegt ltData-3
Google-mapgt lt/LayerListgt lt/Capabilitygt lt/Capabi
litiesgt
ltCapabilitiesgt ltServicegt ltNamegt ltOnlineResour
cegt ltContactInfogt lt/Servicegt ltCapabilitygt ltR
equestgt ltGetCapabilitygt
ltGetFeaturegt ltDescribeFeaturTypegt lt/Re
questgt ltDataListgt ltData-1
gas-pipelinegt ltData-2 electric-powergt
ltData-3 other-datagt lt/ DataList
gt lt/Capabilitygt lt/Capabilitiesgt
13Federators Template Capability Metadata
14Federator-oriented data access/query
optimization for distributed map rendering
15Performance Investigation
- Interoperability requirements compliance costs
- Using XML-encoded common data model (GML)
- Using Web Services XML-based standard SOAP
protocol - Costly query/response conversions at data
resource (ex. WFS) - XML-queries to SQL
- Relational objects to GML
- Variable-sized and unevenly-distributed nature of
geo-data - Examples Human population and earthquake-seismici
ty data - NOT easy to perform load-balancing and parallel
processing
gtgt Unexpected workload distribution The work is
decomposed into independent work pieces, and the
work pieces are of highly variable sized
16Adaptive Range Query Optimization
- Data is defined and queried in ranges (location)
- Dynamic nature of data
- Query approximation problem
- Optimal partitioning of data is difficult to
achieve because polygons-points-linestrings are
neither distributed uniformly nor of similar size - The load they impose varies, depending on query
range - It is difficult to develop a fair partitioning
strategy that is optimal for all range queries
17Parallel Range Queries
18Workload Estimation Table (WT)
- Aim Cutting the 2-dimensional query ranges into
smaller pieces with approximately equal query
sizes. - Created once and synchronized/refined routinely
with DB - Consideration of data dense/sparse regions
- Each layer-data has its own distribution
characteristics and WT - WT is consisted of ltkey, valuegt ltbbox, sizegt
pairs. - size pre-defined threshold query size
- Lets illustrate this with a sample scenario
- Whole data range in database is (0,0,1,1) and
32MB of data size - Each corresponds to 1MB and
- Max query size for each partition is 5MB (max 5
in each partition)
(1,1)
(1,1)
Whole data in Database
WT consists of ltkey, valuegt key rectangle value
query-size
8
8
4
4
3
15
17
32
7
4
5
4
9
(0,0)
(0,0)
19WT Creation/refinement- Two-level recursive
binary cuts -
- PT(R, t, er) PT(R1, t, er) PT(R2, t, er)
- t The max value of acceptable query size for a
partition - er (error rate) The max acceptable degree of
fluctuations in partitions query sizes - er size(R1)-size(R2) / size(R2)
- PT(R, t, er)
- (R1,size1)(R2,size2) PTInBalance(R, er)
- If ((size1 or size2) t) /(sizes are almost the
same)/ - Put the partitions into memory/disk as pairs
- ltR1, size1gt
- ltR2, size2gt
- And return
- else
- PT(R1,t,er) PT(R2,t,er)
20WT Utilization in Parallel Queries
- Lets say federator gets a query whose range is R
- R is positioned in the WT to see the most
efficient partitions for parallel queries
(1,1)
- R overlaps with p5, p6, p7, p8, p9, and p10
- Instead of making one query in range R
- Make 6 parallel queries
- p5, p6, p7, p8, r1 and r2
- R p5p6p7p8r1r2
- There are still minor fluctuations
- Inevitable partial overlapping (r1 and
r2)
p4
p12
p6
p5
p9
R
p8
p7
r2
r1
p10
p11
(0,0)
WT (Reflecting the distribution characteristics
of data in DB)
21Performance Evaluationover the Streaming GIS Web
Services
- How do the of WFS and of partitions together
affect the performance? - When the WFS number is kept same, how does the
partition-threshold size in WT affect the of
parallel queries and the performance? - Performance is evaluated with earthquake seismic
data kept in relational tables in MySQL database - Replicated WFS and Databases
- Servers/nodes are deployed on 2 (Quad-core)
processors running at 2.33 GHz with 8 GB of RAM.
22(No Transcript)
23Test-Case Scenario Multiple Distinct WFS and WMS
- Federator federates
- 1 WMS Satellite map images (NASA JPL Labs)
- 2 WFS
- Earthquake seismic data (Indiana University
Community Grids Labs -CGL) - State boundary lines (United States Geological
Surveys -USGS) - Measurements
- Baseline test Sequential access to the sources
- Parallel access/query via federator
WMS
Binary image
Satellite Maps
NASA-JPL California
GetMap
Event-based dynamic map tools
Federator
WFS-1
GML
Earthquake Seismic data
CGL Indiana
DB1
Binary image
2
1
1
WFS-2
DB2
State boundary lines
USGS Colorado
Satellite Map JPL
2
Earthquake data -CGL
State boundary lines -USGS
24(No Transcript)
25- Further improvement Applying adaptive parallel
query optimization technique for individual data
sets. - WT for state boundaries partition_size2MB and
error_rate1.0 - Data sources frameworkwfs.usgs.gov and
gridfarm18.ucs.indiana.edu - WT for earthquake seismic data
partition_size1MB and error_rate0.2 - Data sources gridfarm12.ucs.indiana.edu and
gf.17.ucs.indiana.edu
26Summary Conclusions
- Federators natural characteristics allow
advanced caching and parallel processing designs - Inherently datasets come from separate data
sources - Individual dataset decomposition and parallel
processing - We parallelized the range queries by using data
partitioning (to reduce synchronization) and
dynamic load balancing (to improve speedup) - Success of the parallel access/query is based on
how well we share the workload with worker nodes. - WT not only decompose the work to workers, but
also take the unevenly shared workloads into
consideration. - WT optimize the parallel queries by adaptively
decomposing the workload - Modular Extensible with any third-party OGC
compliant data service - Enables the use of large data in Geo-science Grid
applications in a responsive manner.
27Generalizing the Problem Domain
- GIS-style information model can be redefined in
any application area such as Chemistry and
Astronomy - Application Specific Information Systems (ASIS).
- Querying heterogeneous data sources as a single
resource - Heterogeneous local resource controls the
definition of data - Single resource removes the hassle of
individually accessing each data source - Easy extension with new data and service
resources - Data is always at its originating source
Client/User-Query
Integrated View
federation services
Mediator
Mediator
Mediator
DB
Files
Data in files, HTML, XML/Relational Databases
28Architectural Requirements
- Developing a proposed GIS-like federated system
requires - Defining a core language (such as GML) expressing
the primitives of the domain - domain specific encoding of common data defines
the query and response constraints over the
service and data provided - Key service components (such as WMS and WFS),
service interfaces and message formats defining
services interactions - for data serving in standard data model
- for rendering the data in common data model
- The capability files enabling inter-service
communication to link services for the federation - defines service and data attributes, and their
constraints and limitations to enable clients to
make valid queries and get expected results.
29Generalization of the Proposed Architecture - ASIS
- Language (ASL) -gt GML expressing domain specific
features, semantics of data - Feature Service (ASFS) -gt WFS Serving data in
common language (ASL) - Visualization Services (ASVS) -gt WMS Visualizes
information and provides a way of navigating ASFS
compatible/mediated data resources - Capabilities metadata for ASVS and ASFS.
- We need to define Application Specific
- Federator federating the capabilities of
distributed ASVS and ASFS to create
application-based hierarchy of distributed data
sources. - Mediators Query and data format conversions
- Data sources maintain their internal structure
- No actual physical data integration
Unified data query/access/display
Federator ASVS
ASVS ASFS
1
3
1
2
4
2
Mediator
Mediator
Standard service API
Standard service API
3
Capability Federation ASL-Rendering Standard
service API
30Survey on Feasibility of Generalization
- GIS is a mature domain in terms of information
system studies and experiences and standard
bodies, but many other fields do not have this. - Comparison/matching of ASISs elements with
selected science domains - Three selected domains are Geo-science, Astronomy
and Chemistry - Comparison is based on data model, services and
metadata counterparts
Standard Bodies
OGC and ISO/TC211
IVOA
None
31Contributions
- A SOA architecture to provide a common platform
to integrate Geo-data sources into Geo-science
Grid applications seamlessly and responsively. - Federated Service-oriented GIS framework
- Distributed service arch to manage production of
knowledge as integrated data-views in the form of
multi-layer map images - Hierarchical data definitions through capability
metadata federations - Unified interactive data access/query and display
from a single access point. - Blueprint architecture for generalization of
GIS-like federated information system enabling
attribute-based transparent data access/query - Adaptive data access/query optimization and
applications to distributed map rendering - Dynamic load balancing for sharing unpredictable
workload - Parallel optimized range queries through
partitioning
32Contributions (Systems Software)
- Web Map Server (WMS) in Open Geographic Standards
- Extended with Web Service Standards, and
- Streaming map creation capabilities
- GIS Federator
- Extended from WMS
- Provides application-specific and
layer-structured hierarchical data as a
composition of distributed GIS Web Service
components - Enables uniform data access and query from a
single access point. - Interactive map tools for data display, query and
analysis. - Browser and event-based
- Extended with AJAX (Asynchronous Java and XML)
33Acknowledgement
- The work described in this presentation is part
of the QuakeSim project which is supported by the
Advanced Information Systems Technology Program
of NASA's Earth-Sun System Technology Office. - Galip Aydin Web Feature Server (WFS)
34Thanks!....
35BACK-UP SLIDES
36Possible Future Research Directions
- Integrating dynamic/adaptable resources discovery
and capability aggregation service to federator. - Applying distributed hard-disk approach (ex.
Hadoop) to handle large scale of workload
estimation tables - Layered WT for different zoom levels
- Avoiding from unnecessary number of parallel
queries - Extending the system with Web2.0 standards
- Handling/optimizing multiple range-queries
- Currently we handle only bbox ranges
37Integrated data-viewMulti-layered Map images
- Query heterogeneous data sources as a single
resource - Heterogeneous local resource controls definition
of the data - Single resource remove the burden of
individually accessing each data source - Easy extension with new data and service
resources - No real integration of data
- Data always at local source
- Easy maintenance of data
- Seamless interaction with the system
- Collaborative decision makings
Client/User-Query
Integrated View
Display Federation services
GML
GML
WMS
WFS
WFS
Mediator
Mediator
Mediator
DB
Files
Data in files, HTML, XML/Relational Databases,
Spatial Sources/sensors
38Hierarchical data Integrated data-view
1
2
3
1 Google map layer 2 States boundary lines
layer 3 seismic data layer
Event-based Interactive Tools Query and data
analysis over integrated data views
39GetCapabilities Schema and Sample Request Instance
40GetMap Schema and Sample Request Instance
41(No Transcript)
42Event-based Interactive Map Tools
- ltevent_controllergt
- ltevent name"init" class"Path.InitListener"
next"map.jsp"/gt - ltevent name"REFRESH" class" Path.InitListener "
next"map.jsp"/gt - ltevent name"ZOOMIN" class" Path.InitListener "
next"map.jsp"/gt - ltevent name"ZOOMOUT" class"Path.InitListener"
next"map.jsp"/gt - ltevent name"RECENTER" class"Path.InitListenerne
xt"map.jsp"/gt - ltevent name"RESET" class" Path.InitListener "
next"map.jsp"/gt - ltevent name"PAN" class" Path.InitListener "
next"map.jsp"/gt - ltevent name"INFO" class" Path.InitListener "
next"map.jsp"/gt - lt/event_controllergt
43Sample GML document
44Sample GetFeature Request Instance
45Sample GetFeature request to get feature data
(GML) from WFS.
Partition list as bbox values for sample case
- Pn5 - Main query getMap bbox
110,35 -100,40
46Map rendering from GML
B
47Standard Query (GetFeature)
- lt?xml version"1.0" encoding"iso-8859-1"?gt
- ltwfsGetFeature outputFormat"GML2"
xmlnsgml"http//www.opengis.net/gml" gt - ltwfsQuery typeName"global_hotspots"gt
- ltwfsPropertyNamegtLATITUDElt/wfsProperty
Namegt - ltwfsPropertyNamegtLONGITUDElt/wfsPropert
yNamegt - ltwfsPropertyNamegtMAGNITUDElt/wfsProper
tyNamegt - ltogcFiltergt
- ltogcBBOXgt
- ltogcPropertyNamegtcoordinateslt/ogcP
ropertyNamegt - ltgmlBoxgt
- ltgmlcoordinatesgt-124.85,32.26
-113.36,42.75lt/gmlcoordinatesgt - lt/gmlBoxgt
- lt/ogcBBOXgt
- lt/ogcFiltergt
- lt/wfsQuerygt
- ltwfsQuery typeName"global_hotspots"gt
- ltogcFiltergt
- ltogcPropertyIsBetweengt
- ltogcLiteralgtMAGNITUDElt/ogcLiteralgt
Corresponding SQL query Select LATITUDE,
LONGITUDE, MAGNITUDE from Earthquake-Seismic
where -124.85 lt X lt -113.36 32.26 lt Y lt
42.75 7 lt MAGNITUDE lt 10
48Streaming data transfer
- XML Encoding Size of the geospatial data
increases with GML encoding which increases
transfer times, or may cause exceptions - SOAP message creation overhead
- Strategies Streaming data flow extensions to GIS
Web Services - Web Service -as a handshake protocol.
- Data is transferred over publish-subscribe
messaging systems. - Enables client to render map images with
partially returned data
Extension
49Motivating Use Cases
- Earthquake science applications
- Pattern Informatics (PI)
- Earthquake forecasting code developed by Prof.
John Rundle (UC Davis) and collaborators, uses
seismic archives. - Virtual California (VC)
- Time series analysis code, can be applied to GPS
and seismic archives. It can be applied to
real-time and archival data. - Interdependent Energy Infrastructure Simulation
System (IEISS) Los Alamos National Laboratory
(LANL) - Models infrastructure networks (e.g. electric
power systems and natural gas pipelines) and
simulates their physical behavior,
interdependencies between systems.