Title: HYDROSEEK and HYDROTAGGER A Search Engine for Hydrologists GIS in Water Resources Lecture
1HYDROSEEK and HYDROTAGGERA Search Engine for
HydrologistsGIS in Water Resources Lecture
- M. Piasecki
- November, 2007
2Lecture
- Demo of HydroSeek
- What are the search criteria?
- Functionality of the Engine Interface
- Data Sources
- Common Sources
- Common Problems (Completeness, Syntax, Semantics)
- Ontologies
- Ontology details
- Concept-to-data variable tagging
- Architecture
- Flow Chart
- Technologies used
- Demo of HydroTagger
- Why the Tagging?
- Technologies
3www.HydroSeek.org
4HIS Goals
- Hydrologic Data Access System better access to
a large volume of high quality hydrologic data - Support for Observatories synthesizing
hydrologic data for a region - Advancement of Hydrologic Science data modeling
and advanced analysis - Hydrologic Education better data in the
classroom, basin-focused teaching
5Observations Catalog
Specifies what variables are measured at each
site, over what time interval, and how many
observations of each variable are available
6Objective
- Search multiple heterogeneous data sources
simultaneously regardless of semantic or
structural differences between them
What we are doing now ..
7What we would like to do ..
GetValues
Semantic Mediator
GetValues
GetValues
GetValues
generic request
GetValues
GetValues
GetValues
GetValues
8Data sources
USGS
EPA
CIMS
TCEQ
NADP
9Content Aggregation
Hypertext links to data providers A quick,
simple way of sharing knowledge on where to find
the data
Interactive Map Interface WME launched in January
2001 Modernized STORET launched in January 1999
NWIS Web launched in September 2000
10Spatial Coverage
- STORET has 758 sites in Texas, TCEQ has 8407.
- STORET has 47,602 sites in Florida, NWIS has
27,906. - NWIS has 121,545 in Minnesota, STORET has 22,260.
11Data Availability
12Temporal Coverage
1957-1977
1977-2003
2003-2007
Nitrogen
13Interface Problem
- NWIS 175 form elements on a single page
- STORET NWIS TCEQ CIMS ???A drop down
menu ?8 - String search across parameter list? How about
synonyms? - Elevation, water surface vs. stage height
14Completeness Problem Metadata Catalog
- Better query performance
- Freedom
- Fewer errors
Availability of geographic identifiers for
stations in EPA STORET
15Heterogeneity Problem
- Syntax
- E.g. date time formats, Gregorian versus
Julian - Data format/structure
- E.g. XML, HTML, tab/tilde/comma separated
- text, gunzipped tar balls
- Semanticsmore ..
-
16Issues with Semantics
- Hyponymy
- Parameter Groundwater level, Stream
stage, Reservoir - level versus Water level
- Pseudo hyponymy due to lack of metadata
- Parameter Manganese, 6N hydrochloric acid
extracted, - recoverable, dry weight, milligrams per
kilogram versus - Manganese, milligrams per kilogram
- Synonymy
- Total Kjeldahl Nitrogen vs.
AmmoniaOrganic Nitrogen
17Search Strategy
- Search ? Fine tune ? Retrieve
- rather than
- Search ? Retrieve
- avoid high precision, low recall
- and low precision, high recall
- problems.
18Layered Ontology Model
19Core
Navigation
Compound
20Knowledge Base
- Supports classification of search results
- Entities in the ontology are associated with
measured variables in a relational database - Helps solving semantic heterogeneity issues
between data repositories
Escherichia coli E. coli E. coli is-a
Indicator Organism Copper is-a
Micronutrient Copper isMeasuredIn
Medium Medium Water, Soil Micronutrient
is-a Nutrient
21(No Transcript)
22Point Observations Information Model
http//www.cuahsi.org/his/webservices.html
USGS
Data Source
GetSites
Network
Streamflow gages
GetSiteInfo
Sites
Neuse River near Clayton, NC
GetVariables
Variables
Discharge, stage (Daily or instantaneous)
GetVariableInfo
GetValues
Values
Value, Time, Qualifier, Offset
206 cfs, 13 August 2006
- A data source operates an observation network
- A network is a set of observation sites
- A site is a point location where one or more
variables are measured - A variable is a property describing the flow or
quality of water - A value is an observation of a variable at a
particular time - A qualifier is a symbol that provides
additional information about the value - An offset allows specification of measurements
at various depths in water
23Hydroseek Webservices
- Most Hydroseek functions are available as web
services (SOAP) - Support for queries using GlobalChangeMasterDirect
ory GCMD keywords - Supports output in GeographyMarkupLanguage GML as
well as WaterML
MicroSoft Server
VirtualEarth Map
San Diego Supercomputer Center Server
Native Services
WaterOneFlow
Drexel Server
WaterOneFlow
WaterOneFlow
24GetStations
25GetStationsByHU
26GetStationCatalogueFiltered
27GetStationCatalogue
28- Allows searching multiple heterogeneous data
sources simultaneously regardless of semantic or
structural differences between them - Modular extensible
Architecture Outline
Inside the CUAHSI HOD Module
29WaterML and WaterOneFlow
STORET
Data
GetSiteInfo GetVariableInfo GetValues
NAM
Data
NWIS
Data
WaterML
WaterOneFlow Web Service
Data Repositories
Client
EXTRACT
TRANSFORM
LOAD
WaterML is an XML language for communicating
water data WaterOneFlow is a set of web services
based on WaterML
30The Database-Ontology Link
www.HdyroTagger.org
31HydroSeek ODM neededan upgrade, i.e.
additionaltables.
32How does the Tagging work?
Step 1 Users need to register on the web-site
first before they can use the HydroTagger. When
registering select the testbed site you are
affiliated with. Each testbed site needs ONE
administrator who can then admit additional users
for that specific testbed site. Please send an
email to identify the designated tagger site
administrator so we can promote that person to
the role.
33How does the Tagging work?
Step 2 The Sniffer jumps into action and trawls
through the testbed sites to find and identify
new variablenames (once a week, currently every
Sunday night) It does so by using the regular
web-services published through the WSDL (no
hacking!!!) It returns i) data updating
information and ii) variablenames used and
compares these to those used by HydroSeek.
34How does the Tagging work?
Step 3 The Tagger now updates the HydroSeek
catalogue (an amalgamation of all 10 testbed
catalogues) with the newly found data
entries. If it finds a new variablename
(introduced during the dataloading process using
the Data-Loader), it puts it into a table and
offers it up to he HydroTagger GUI for semantic
Tagging.
35Thank youQuestions?