Title: The Earth System Grid Discovery and Semantic Web Technologies
1The Earth System Grid Discovery and Semantic Web
Technologies
- Line Pouchard
- Oak Ridge National Laboratory
- Computer Science and Mathematics Division
- Luca Cinquini, Gary Strand
- National Center for Atmospheric Research
Scientific Web Technologies for Searching and
Retrieving Scientific Data ISWCII, Sanibel
Island, FL, October 20, 2003
2- A geographically distributed team of climate and
computer scientists - Climate scientists are our target users
- 20-100 simultaneous users
- Scientists providing expertise and leadership to
the Inter- Governmental Panel on Climate Change
(IPCC)
- A computing and data Grid collaboratory
sponsored by the US - Department of Energy.
- A distributed system for storage, access, and
discovery of - post-processing data resulting from climate
simulations on - super-computers.
3ESG Collaboration Network
Grid and Network
Infrastructure
4Current Status of Climate Data
- Data sizes (estimated to be produced in the next
3-4 years for IPCC) , types of storage, location
of storage - NCAR (Boulder, CO) 8.961 Terabytes, NERSC
(Berkeley CA) 3.514 TB, ORNL (Oak Ridge, TN)
6.443 TB. Total 18 912 TB. - Stored on mass storage archives, disk caches and
tapes. - Data replicated at 3 locations in the US.
- Data format conventions and simulation output
formats - Minimal metadata produced or associated by
current simulations. - Multiple output formats.
- Many complex standards.
- Discovery and retrieval
- Datasets are not described in details.
- Metadata resides in the data managers head.
- Largely manual access.
- Different access mechanisms at different sites.
Far from seamless automated data discovery and
access
5ESG goals for search and retrieval
- Enable searches and downloads through a seamless
process - Data search across multiple sites and storage
locations. - Access to all ESG functionality from the desktop
through a single point of entry (a Web Data
portal). - Some degree of access control (authentication,
certificates). - Keep track of datasets particularly on deep
storage (archives, caches, tapes) - Data formats
- Find related datasets campaign, ensembles
- Simulation model descriptions and configurations
- Related simulations parent, child, sibling
- Browse-able, search-able, and extensible metadata
- Several levels of users
- easy-to-use, integrated tools (otherwise, no one
will use them) - Collaborate with other groups CCLRC e-Science
Center and the British Atmospheric Data Center.
6Discovery Ontology and Metadata Services
7Motivations for a prototype ontology
- Development of an ESG metadata schema
- Help structure and guide the development efforts
- Provide a context
- Trust
- Provenance and logistic information
- Data quality and curation
- Prepare for a federation of data sources and
inter-operability between metadata schemas - the ability to perform searches across these
sources from a single point of entry.
8ESG ontology concepts and relationships
- Datasets
- Files names (tells a lot)
- Formats and conventions
- Coverage (space, time, multi-dimensional physical
grids) - Calendar years
- Parameters
- Related datasets
- Campaigns
- ESG Service
- Used_by
- Pedigree
- Participants, roles in ESG
- Provenance traces origins and transformations
- Is_generated_by
- Storage location
- Scientific Use Simulations
- has_parent, has-child, has_sibling
- Input_type
- hardware_type
9Guiding principles for the development of an ESG
ontology
- Separate entities describing things from
entities describing processes. - Decouple concepts specific to a domain area from
those common to other (Grid) projects. - Keep terminology intuitive to users.
- Make explicit relationships between XML elements.
- Ontology tools were used to analyze current ESG
schemas at every stage of development.
10Person 0,1 firstName 0,1 lastName 0,1
contact
isA
LEGEND
Object 1 id
Institution 0,1 name 0,1 type 0,1 contact
AbstractClass
Class
participant role
worksFor
isA
inheritance
association
Project 0,n topic type 0,1 funding
Activity 0,1 name 0,1 description 0,1
rights 0,n date type 0,n note 0,n
participant role 0,n reference uri
Service 0,1 name 0,1 description
isA
isPartOf
Campaign
isA
serviceId
Investigation
Ensemble
isA
isPartOf
Experiment
Analysis
Observation
Simulation 0,n simulationInput type 0,n
simulationHardware
hasParent hasChild hasSibling
Dataset 0,1 type 0,1 conventions 0,n date
type 0,n format type uri 0,1
timeCoverage 0,1 spaceCoverage
generatedBy
isPartOf
11(No Transcript)
12Discovery Services Architecture
Storage
Physical File Names Storage Location
Download
ESG Portal
Metadata
Searches
Searches
Discovery Service
Logical File Names
Logical File Names
Metadata Catalog Service
Replica Location Services
13Leveraging Semantic Web efforts in Grid projects
- The Semantic Web
- Highlighted the need for sharing information
based on content. - Provided web-based languages for knowledge
acquisition and reasoning. - Offers directions for ontology reconciliation.
- There exists ontologies in the Earth Sciences.
- Challenges presented by ESG
- Real-life complexity.
- Scientists as beginners and expert users demand
usability - Measures of success.
- Changing a scientist s work habits requires an
immediate and visible payoff - Data sizes scalability of the approach.