Title: High Performance Web Service Architecture for Sensors and Geographic Information Systems
1High Performance Web Service Architecture for
Sensors and Geographic Information Systems
2Geographic Information Systems
- A Geographic Information System is a system for
creating, storing, sharing, analyzing,
manipulating and displaying spatial data and
associated attributes. - GIS history saw the evolution from mainframe GIS
to Desktop GIS to Distributed GIS. - Modern GIS require
- Distributed data access for spatial databases
- Utilizing remote analysis, simulation or
visualization tools.
3Traditional Distributed GIS Approach
- Problems with traditional approaches
- Distributed nature of the geo-data various
client-server models, databases, HTTP, FTP, RDBs,
XML DBs etc. - Data format problems, conversion overheads
- Data processing issues, hardware and software
requirements, COM/ActiveX, CORBA/IIOP frameworks - Which introduce three challenges
- Assembling data from distributed repositories
- Adoption of universal standards for format
interoperability - Interoperable services for better utilization of
computational resources
4Open Geographic Standards
- Open GIS Standards bodies aim to make geographic
information and services neutral and available
across any network, application, or platform. - Two major standard bodies OGC and ISO/TC211,
former being most popular - OGC Specifications are widely accepted
- Data Format Specs GML, SensorML, OM
- Service Specs WFS, WMS, WCS
- OGC Services are HTTP GET/POST based limited
data transport capabilities (HTTP, FTP, files
etc.) - Not Web Services tightly coupled, point to point
communication results in centralized, synchronous
applications.
5Motivations
- Lack of service orchestration capabilities
- Complex problems require GIS applications to
collaborate. - Coupling data sources to scientific applications
- Data transport requirements
- Proliferation of Sensors
- Ability to analyze data on-the-fly, continuous
streaming support, scalable systems for addition
of new sensors. - High performance and high rate messaging
- Real-time data access, rapid response systems,
crisis management etc. - From the Grids perspective
- To apply general Grid/Distributed computing
principles to GIS - Investigate how to integrate with geophysical and
other scientific applications
6Motivating Use Cases
- Pattern Informatics
- Earthquake forecasting code developed by Prof.
John Rundle (UC Davis) and collaborators, uses
seismic archives. - Regularized Dynamic Annealing Hidden Markov
Method (RDAHMM) - Time series analysis code, can be applied to GPS
and seismic archives, can be applied to real-time
data. - Interdependent Energy Infrastructure Simulation
System (IEISS) - Models infrastructure networks (e.g. electric
power systems and natural gas pipelines) and
simulates their physical behavior,
interdependencies between systems. - SOPAC GPS Networks provide real-time messages.
7Research Issues 1
- Applying Web Service principles to GIS data
services - Orchestration of Services, workflows, simple
services are not suitable for large data sets and
where quick response is required - High Performance support in GIS services.
- Interoperability
- The system should bridge GIS and Web Service
communities by adapting standards from both. - Other GIS applications should be able to consume
data without having to do costly format
conversions.
8Research Issues 2
- Scalability
- The system should be able to handle high volume
and high rate data transport and processing. - Plugging new sensors, data sources or
geoprocessing applications should not degrade
systems overall performance. - Flexibility and extendibility
- How to develop real-time services to process
sensor data on the fly. - Ability to add new filters without system
failures. - Quality of Service Issues
- Is latency introduced by services in processing
real-time sensor data acceptable?
9SOA for GIS Geophysical Data Grid
- We utilize Web Services to realize Service
Oriented Architecture, OGC data formats and
application interfaces for interoperability at
both levels. - GIS Data Grid Properties
- Based on the sources geospatial data can be seen
as archival and real-time data. The architecture
provides standard control and access interfaces
for both types. - Supports alternate transport and representation
schemes, uses topic based messaging
infrastructure for large volume data transport. - UDDI based FTHPIS as services registry.
- Streaming and non-streaming services to access
archived data. - Real-Time and near real-time services for
accessing sensor metadata and sensor measurements.
10Geophysical Data Grid Architecture
Real-Time Data Grid
Archival Data Grid
11GIS Grid 1 - Archival Data Services
- Web Feature Service is the default OGC
specification for vector data. - We have built Web Service version of WFS for
accessing geospatial data on distributed
databases. - The first Web Service version of WFS has been
successfully used in several scientific workflows
with other services (WMS, HPSearch, FTHPIS). - WFS can access multiple distributed databases,
can query other WFSs for remote features. - Problems with Web Service version of the WFS
- Request-response, not asynchronous,
- Performance GI Services are not designed to
handle non-trivial data transfers. Large data
requests, SOAP overhead. - XML Encoding Size of the geospatial data
increases with GML encoding which increases
transfer times, or may cause exceptions
12WFS Performance Improvements Streaming WFS
- To improve performance of the WFS
- Utilized publish/subscribe messaging system for
high performance data transfer. Similar to WFS
but data and control channel separation, allows
one to many data distribution. - Used streaming database connection (MySQL) for
faster retrieval of the query results, and lower
GML creation overhead. - Binary XML Frameworks are integrated for reducing
XML payload size which improves transfer times. - Binding data transfer to Grid messaging
middleware reduces SOAP creation overhead.
13WFS Interaction with services and data sources
14GIS Grid Example IEISS Integration
WMS Ahmet Sayar UDDI, Context Service Mehmet
Aktas
15Streaming WFS Performance
- We test the system for up to 10.000 features
- The tests reveal the performance of the
streaming service with and without Binary XML
integration - We use BNUX and Fast Infoset Binary XML
Frameworks for compressing the GML
FeatureCollection documents - The BNUX and FI timings include encoding and
decoding costs
16GIS Grid 2 - Real-Time Data Services
- Sensors and sensor networks are being deployed
for measuring various geo-physical entities. - Sensors and GIS are closely related. Sensor
measurements are used by GIS for statistical or
analytical purposes. - With the proliferation of the sensors, data
collection and processing paradigms are changing. - Most scientific geo-applications are designed to
work with archived data. - Critical Infrastructure Systems and Crisis
Management environments require fast and accurate
access to real-time sources and a
flexible/pluggable architecture for geoprocessing
of the data.
17SensorGrid Architecture
- Major components
- Real-Time filters
- Grid Messaging Substrate
- Information Service
- Filters can be run as Web Services to create
workflows. - Filter Chains can be deployed for complex
processing. - Streaming messaging provide high-performance
transfer options.
18Real-Time Filters
- Real-time data processing is supported by
employing filters around publish/subscribe
messaging system. - The filters are extended from a generic class to
inherit publish and subscribe capabilities. - They can be connected in parallel or serial as
chains to solve complex problems.
19Filter Metadata and Chains
Parallel Operation
Serial Operation
20Use Case - GPS Sensors
- A good example for scientific sensors are GPS
station networks. GPS measurements are used for
determining post-seismic deformation,
understanding long-term crustal movement etc. - SOPAC GPS networks
- 8 networks for 80 stations produce 1Hz high
resolution data. - Socket based real-time binary-RYO format access
is available, but not utilized! - We developed filters to provide multiple format
(RYO, ASCII, GML) real-time streaming access. - OHIO principle and chain of filters.
- We use publish/subscribe based NaradaBrokering
for managing real-time streams, topics for
hierarchical organization of the sensors.
21SOPAC Real-Time Filters for GPS Streams
22Application Integration with Real-Time Filters
- Station Monitor Filter records real-time
positions for 10 minutes and calculates position
changes - Graph Plotter Application creates visual
representation of the positions.
- RDAHMM Filter records real-time positions for 10
minutes and invokes RDAHMM application which
determines state changes in the XYZ signal. - Graph Plotter Application creates visual
representation of the RDAHMM output.
23AJAX and Real-Time positions on Google maps
24Recording and Replaying Sensor Streams
- Filters can be used to record and replay
scenarios, such as Earthquakes in GPS case. - We developed RYO Recorder and RYO Publisher
Filters. - The RYO Recorder creates daily archives of the
GPS Streams. - RYO Publisher can be used to play daily or
certain segments of the records. - We replayed the 2004 Southern California
Earthquake using Parkfield GPS network archive
25SensorGrid Performance Tests
- Two Major Goals System Stability and Scalability
- Ensuring stability of the distributed Filter
Services for continuous operation. - Finding the maximum number of publishers
(sensors) and clients that can be supported with
a single broker. - Investigate if system scales for large number of
sensors and clients.
26Test Methodology
Ttransfer (T2 T1) (T4 T3)
- The test system consists of a NaradaBrokering
server and a three-filter chain for publishing,
converting and receiving RYO messages. - We take 4 timings for determining mean end-to-end
delivery times of GPS measurements. - The tests were run at least for 24 hours.
- GridFarm001-008 servers are used in these tests.
271- System Stability Test
- The basic system with three filters and one
broker. - The figure shows average results for every 30
minutes. - The average transfer time shows the continuous
operation does not degrade the system performance.
282 Multiple Publishers Test
- We add more GPS networks by running more
publishers. - The results show that 1000 publishers can be
supported with no performance loss. This is an
operating system limit.
293 Multiple Clients Test
1000 Clients
Adding clients
- We add more clients by running multiple Simple
Filters which subscribe to the same ASCII topic. - The system can support as many as 1000 clients
with very low performance decrease.
30Extending Scalability
- The limit of the basic system appears to be 1000
clients or publishers. - This is due to an Operating System restriction of
open file descriptors (1024 for Red Hat Linux). - To overcome this limit we create NaradaBrokering
networks with linking multiple brokers. - We run 2 brokers to support 1500 clients.
- Number of brokers can be increased indefinitely,
so we can potentially support any number of
publishers and subscribers.
314 Multiple Brokers Test
- Messages published to first broker can be
received from the second broker. - We take timings on each broker.
- We connect 750 clients to each broker and run for
24 hours. - The results show that the performance is very
good and similar to single broker test.
324 Multiple Brokers Test
750 Clients
750 Clients
33Real-Time Filters Test Results
- The RYO Publisher filter runs at 1Hz and
publishes 24-hour archive of the CRTN_01 GPS
network, which contains 9 GPS stations. - The single broker configuration can support 1000
clients or publishers (GPS networks - 9000
individual stations). - The system can be scaled up by creating
NaradaBrokering broker networks. - Message order was preserved in all tests.
34Contributions
- A SOA approach to create a common platform to
support both archival and real-time geospatial
data in data-centric Grids. - Merging Web Services and Open Geographic
Standards for supporting interoperability at both
data and application levels. - We have shown that the GIS Services can be
implemented as streaming services. - Integration of Binary XML Frameworks with the
Streaming Services shows performance gains for
long network distances. - We have shown that the Sensor Grids can be built
on top of the publish/subscribe middleware. - Real-Time continuous data support is realized in
a Service Architecture. - Scalable architecture implementation for large
number of sensor networks.