HYDROSEEK and HYDROTAGGER A Search Engine for Hydrologists GIS in Water Resources Lecture - PowerPoint PPT Presentation

About This Presentation
Title:

HYDROSEEK and HYDROTAGGER A Search Engine for Hydrologists GIS in Water Resources Lecture

Description:

HYDROSEEK and HYDROTAGGER A Search Engine for Hydrologists GIS in Water Resources Lecture – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 33
Provided by: michaelp90
Category:

less

Transcript and Presenter's Notes

Title: HYDROSEEK and HYDROTAGGER A Search Engine for Hydrologists GIS in Water Resources Lecture


1
HYDROSEEK and HYDROTAGGERA Search Engine for
HydrologistsGIS in Water Resources Lecture
  • M. Piasecki
  • November, 2007

2
Lecture
  • Demo of HydroSeek
  • What are the search criteria?
  • Functionality of the Engine Interface
  • Data Sources
  • Common Sources
  • Common Problems (Completeness, Syntax, Semantics)
  • Ontologies
  • Ontology details
  • Concept-to-data variable tagging
  • Architecture
  • Flow Chart
  • Technologies used
  • Demo of HydroTagger
  • Why the Tagging?
  • Technologies

3
www.HydroSeek.org
4
HIS Goals
  • Hydrologic Data Access System better access to
    a large volume of high quality hydrologic data
  • Support for Observatories synthesizing
    hydrologic data for a region
  • Advancement of Hydrologic Science data modeling
    and advanced analysis
  • Hydrologic Education better data in the
    classroom, basin-focused teaching

5
Observations Catalog
Specifies what variables are measured at each
site, over what time interval, and how many
observations of each variable are available
6
Objective
  • Search multiple heterogeneous data sources
    simultaneously regardless of semantic or
    structural differences between them

What we are doing now ..
7
What we would like to do ..
GetValues
Semantic Mediator
GetValues
GetValues
GetValues
generic request
GetValues
GetValues
GetValues
GetValues
8
Data sources
USGS
EPA
CIMS
TCEQ
NADP
9
Content Aggregation
Hypertext links to data providers A quick,
simple way of sharing knowledge on where to find
the data
Interactive Map Interface WME launched in January
2001 Modernized STORET launched in January 1999
NWIS Web launched in September 2000
10
Spatial Coverage
  • STORET has 758 sites in Texas, TCEQ has 8407.
  • STORET has 47,602 sites in Florida, NWIS has
    27,906.
  • NWIS has 121,545 in Minnesota, STORET has 22,260.

11
Data Availability
12
Temporal Coverage
1957-1977
1977-2003
2003-2007
Nitrogen
13
Interface Problem
  • NWIS 175 form elements on a single page
  • STORET NWIS TCEQ CIMS ???A drop down
    menu ?8
  • String search across parameter list? How about
    synonyms?
  • Elevation, water surface vs. stage height

14
Completeness Problem Metadata Catalog
  • Better query performance
  • Freedom
  • Fewer errors

Availability of geographic identifiers for
stations in EPA STORET
15
Heterogeneity Problem
  • Syntax
  • E.g. date time formats, Gregorian versus
    Julian
  • Data format/structure
  • E.g. XML, HTML, tab/tilde/comma separated
  • text, gunzipped tar balls
  • Semanticsmore ..

16
Issues with Semantics
  • Hyponymy
  • Parameter Groundwater level, Stream
    stage, Reservoir
  • level versus Water level
  • Pseudo hyponymy due to lack of metadata
  • Parameter Manganese, 6N hydrochloric acid
    extracted,
  • recoverable, dry weight, milligrams per
    kilogram versus
  • Manganese, milligrams per kilogram
  • Synonymy
  • Total Kjeldahl Nitrogen vs.
    AmmoniaOrganic Nitrogen

17
Search Strategy
  • Search ? Fine tune ? Retrieve
  • rather than
  • Search ? Retrieve
  • avoid high precision, low recall
  • and low precision, high recall
  • problems.

18
Layered Ontology Model
19
Core
Navigation
Compound
20
Knowledge Base
  • Supports classification of search results
  • Entities in the ontology are associated with
    measured variables in a relational database
  • Helps solving semantic heterogeneity issues
    between data repositories
  • OWL Ontologies

Escherichia coli E. coli E. coli is-a
Indicator Organism Copper is-a
Micronutrient Copper isMeasuredIn
Medium Medium Water, Soil Micronutrient
is-a Nutrient
21
(No Transcript)
22
Point Observations Information Model
http//www.cuahsi.org/his/webservices.html
USGS
Data Source
GetSites
Network
Streamflow gages
GetSiteInfo
Sites
Neuse River near Clayton, NC
GetVariables
Variables
Discharge, stage (Daily or instantaneous)
GetVariableInfo
GetValues
Values
Value, Time, Qualifier, Offset
206 cfs, 13 August 2006
  • A data source operates an observation network
  • A network is a set of observation sites
  • A site is a point location where one or more
    variables are measured
  • A variable is a property describing the flow or
    quality of water
  • A value is an observation of a variable at a
    particular time
  • A qualifier is a symbol that provides
    additional information about the value
  • An offset allows specification of measurements
    at various depths in water

23
Hydroseek Webservices
  • Most Hydroseek functions are available as web
    services (SOAP)
  • Support for queries using GlobalChangeMasterDirect
    ory GCMD keywords
  • Supports output in GeographyMarkupLanguage GML as
    well as WaterML

MicroSoft Server
VirtualEarth Map
San Diego Supercomputer Center Server
Native Services
WaterOneFlow
Drexel Server
WaterOneFlow
WaterOneFlow
24
GetStations
25
GetStationsByHU
26
GetStationCatalogueFiltered
27
GetStationCatalogue
28
  • Allows searching multiple heterogeneous data
    sources simultaneously regardless of semantic or
    structural differences between them
  • Modular extensible

Architecture Outline
Inside the CUAHSI HOD Module
29
WaterML and WaterOneFlow
STORET
Data
GetSiteInfo GetVariableInfo GetValues
NAM
Data
NWIS
Data
WaterML
WaterOneFlow Web Service
Data Repositories
Client
EXTRACT
TRANSFORM
LOAD
WaterML is an XML language for communicating
water data WaterOneFlow is a set of web services
based on WaterML
30
The Database-Ontology Link
www.HdyroTagger.org
31
HydroSeek ODM neededan upgrade, i.e.
additionaltables.
32
How does the Tagging work?
Step 1 Users need to register on the web-site
first before they can use the HydroTagger. When
registering select the testbed site you are
affiliated with. Each testbed site needs ONE
administrator who can then admit additional users
for that specific testbed site. Please send an
email to identify the designated tagger site
administrator so we can promote that person to
the role.
33
How does the Tagging work?
Step 2 The Sniffer jumps into action and trawls
through the testbed sites to find and identify
new variablenames (once a week, currently every
Sunday night) It does so by using the regular
web-services published through the WSDL (no
hacking!!!) It returns i) data updating
information and ii) variablenames used and
compares these to those used by HydroSeek.
34
How does the Tagging work?
Step 3 The Tagger now updates the HydroSeek
catalogue (an amalgamation of all 10 testbed
catalogues) with the newly found data
entries. If it finds a new variablename
(introduced during the dataloading process using
the Data-Loader), it puts it into a table and
offers it up to he HydroTagger GUI for semantic
Tagging.
35
Thank youQuestions?
Write a Comment
User Comments (0)
About PowerShow.com