Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project experience - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project experience

Description:

Community cyberinfrastructure and Xinformatics Assessment of convergence and innovation based on pro – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 67
Provided by: deborahlm
Category:

less

Transcript and Presenter's Notes

Title: Community cyberinfrastructure and X-informatics - Assessment of convergence and innovation based on project experience


1
Community cyberinfrastructure and X-informatics -
Assessment of convergence and innovation based on
project experience
  • Peter Fox
  • High Altitude Observatory,
  • NCAR
  • Work performed in part with Deborah McGuinness
    (RPI), Rob Raskin (JPL), Krishna Sinha (VT), Luca
    Cinquini (NCAR), Patrick West (NCAR), Stephan
    Zednik (NCAR), Paulo Pinheiro da Silva (UTEP), Li
    Ding (RPI) and others

2
Outline
  • Background and inevitabilities
  • Informatics -gt e-Science
  • Informatics methodology e.g. Semantic Web as a
    approach and a technology
  • Virtual Observatories use cases, some examples,
    and non-specialist use
  • Data ingest, integration, mining and where we are
    heading
  • Discussion

3
Background
  • Scientists should be able to access a global,
    distributed knowledge base of scientific data
    that
  • appears to be integrated
  • appears to be locally available
  • But data is obtained by multiple instruments,
    using various protocols, in differing
    vocabularies, using (sometimes unstated)
    assumptions, with inconsistent (or non-existent)
    meta-data. It may be inconsistent, incomplete,
    evolving, and distributed
  • And there exist(ed) significant levels of
    semantic heterogeneity, large-scale data, complex
    data types, legacy systems, inflexible and
    unsustainable implementation technology

4
But data has Lots of Audiences
Information products have
Information
More Strategic
Less Strategic
From Why EPO?, a NASA internal report on
science education, 2005
SCIENTISTS TOO
5
Shifting the Burden from the Userto the Provider
6
The Astronomy approach data-types as a service
Limited interoperability
  • VOTable
  • Simple Image Access Protocol
  • Simple Spectrum Access Protocol
  • Simple Time Access Protocol

VO App2
VO App3
VO App1
Open Geospatial Consortium Web Feature,
Coverage, Mapping Service Sensor Web
Enablement Sensor Observation, Planning,
Analysis Service use the same approach
VO layer
DBn
DB2
DB3

DB1
7
Mind the Gap!
  • As a result of finding out who is doing what,
    sharing experience/ expertise, and substantial
    coordination
  • There is/ was still a gap between science and the
    underlying infrastructure and technology that is
    available
  • Informatics - information science includes the
    science of (data and) information, the practice
    of information processing, and the engineering of
    information systems. Informatics studies the
    structure, behavior, and interactions of natural
    and artificial systems that store, process and
    communicate (data and) information. It also
    develops its own conceptual and theoretical
    foundations. Since computers, individuals and
    organizations all process information,
    informatics has computational, cognitive and
    social aspects, including study of the social
    impact of information technologies. Wikipedia.
  • Cyberinfrastructure is the new research
    environment(s) that support advanced data
    acquisition, data storage, data management, data
    integration, data mining, data visualization and
    other computing and information processing
    services over the Internet.

8
Progression after progression
IT Cyber Infrastructure Cyber Informatics Core Informatics Science Informatics, aka Xinformatics Science, SBAs
9
Virtual Observatories
  • Make data and tools quickly and easily accessible
    to a wide audience.
  • Operationally, virtual observatories need to find
    the right balance of data/model holdings, portals
    and client software that researchers can use
    without effort or interference as if all the
    materials were available on his/her local
    computer using the users preferred language
    i.e. appear to be local and integrated
  • Likely to provide controlled vocabularies that
    may be used for interoperation in appropriate
    domains along with database interfaces for access
    and storage -gt thus part IT, part CI, part
    Informatics

10
VO API
Web Serv.
VO Portal
Query, access and use of data
  • Mediation Layer
  • Ontology - capturing concepts of Parameters,
    Instruments, Date/Time, Data Product (and
    associated classes, properties) and Service
    Classes
  • Maps queries to underlying data
  • Generates access requests for metadata, data
  • Allows queries, reasoning, analysis, new
    hypothesis generation, testing, explanation, et
    c.

Semantic mediation layer - VSTO - low level
Metadata, schema, data
DBn
DB2
DB3

DB1
11
Semantic Web Methodology and Technology
Development Process
  • Establish and improve a well-defined methodology
    vision for Semantic Technology based application
    development
  • Leverage controlled vocabularies, et c.

Adopt Technology Approach
Leverage Technology Infrastructure
Science/Expert Review Iteration
Rapid Prototype
Open World Evolve, Iterate, Redesign, Redeploy
Use Tools
Analysis
Use Case
Develop model/ ontology
Small Team, mixed skills
12
Science and technical use cases
  • Find data which represents the state of the
    neutral atmosphere anywhere above 100km and
    toward the arctic circle (above 45N) at any time
    of high geomagnetic activity.
  • Extract information from the use-case - encode
    knowledge
  • Translate this into a complete query for data -
    inference and integration of data from
    instruments, indices and models
  • Provide semantically-enabled, smart data query
    services via a SOAP web for the Virtual
    Ionosphere-Thermosphere-Mesosphere Observatory
    that retrieve data, filtered by constraints on
    Instrument, Date-Time, and Parameter in any order
    and with constraints included in any combination.

13
(No Transcript)
14
But data has Lots of Audiences
More Strategic
Less Strategic
From Why EPO?, a NASA internal report on
science education, 2005
15
What is a Non-Specialist Use Case?
Someone should be able to query a virtual
observatory without having specialist knowledge
Teacher accesses internet goes to An Educational
Virtual Observatory and enters a search for
Aurora.
16
What should the User Receive?
Teacher receives four groupings of search
results 1) Educational materials
http//www.meted.ucar.edu/topics_spacewx.php and
http//www.meted.ucar.edu/hao/aurora/ 2)
Research, data and tools via research VOs but
the search for brightness, or green/red line
emission is mediated for them 3) Did you know?
Aurora is a phenomena of the upper terrestrial
atmosphere (ionosphere) also known as Northern
Lights 4) Did you mean? Aurora Borealis or
Aurora Australis, etc.
17
Semantic Information Integration Concept map for
educational use of science data in a lesson plan
18
(No Transcript)
19
Informatics issues for Virtual Observatories
  • Scaling to large numbers of data providers and
    redefining the roles/ relations among them
  • Branding and attribution (where did this data
    come from and who gets the credit, is it the
    correct version, is this an authoritative
    source?)
  • Provenance/derivation (propagating key
    information as it passes through a variety of
    services, copies of processing algorithms, )
  • Crossing discipline boundaries
  • Data quality, preservation, stewardship
  • Security, access to resources, policies

20
Provenance
  • Origin or source from which something comes, its
    intention for use, whom or what it was generated
    for, the manner of manufacture, history of
    subsequent owners, sense of place and time of
    manufacture, production or discovery documented
    in detail sufficient to allow reproducibility

21
Use cases
  • Who (person or program) added the comments to the
    science data file for the best vignetted,
    rectangular polarization brightness image from
    January, 26, 2005 184909UT taken by the ACOS
    Mark IV polarimeter?
  • What was the cloud cover and atmospheric seeing
    conditions during the local morning of January
    26, 2005 at MLSO?
  • Find all good images on March 21, 2008.
  • Why are the quick look images from March 21,
    2008, 1900UT missing?
  • Why does this image look bad?

22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
Quick look browse
Yasukawa Computer crash
Yasukawa Computer crash
  • Yasukawa Rain, cloud

26
(No Transcript)
27
Visual browse
28
(No Transcript)
29
(No Transcript)
30
Search
31
(No Transcript)
32
A Better Way to Access Data
The Problem
Scientists only use data from a single instrument
because it is difficult to access, process, and
understand data from multiple instruments. A
typical data query might be Give me the
temperature, pressure, and water vapor from the
AIRS instrument from Jan 2005 to Jan
2008 Search for MLS/Aura Level 2, SO2 Slant
Column Density from 2/1/2007
A Solution
Using a simple process, SESDI allows data from
various sources to be registered in an ontology
so that it can be easily accessed and understood.
Scientists can use only the ontology components
that relate to their data. An SESDI query might
look like Show all areas in California where
sulfur dioxide (SO2) levels were above normal
between Jan 2000 and Jan 2007 This query will
pull data from all available sources registered
in the ontology and allow seamless data fusion.
Because the query is measurement related,
scientists do not need to understand the details
of the instruments and data types.
33
Determine the statistical signatures of volcanic
forcings on the height of the tropopause
34
Detection and attribution relations
35
(No Transcript)
36
(No Transcript)
37
Leveraged VSTO semantic framework indicating how
volcano and atmospheric parameters and databases
can immediately be plugged in to the semantic
data framework to enable data integration.
38
Data Registration Framework
Data Discovery
Data Integration
Level 1 Data Registration at the Discovery
Level, e.g. Volcano location and activity
Level 2 Data Registration at the Inventory
Level, e.g. list of datasets by, types, times,
products
Level 3 Data Registration at the Item
Detail Level, e.g. access to individual quantities
Earth Sciences Virtual Database A Data Warehouse
where Schema heterogeneity problem is Solved
schema based integration
Ontology based Data Integration
A.K.Sinha, Virginia Tech, 2006
39
How to find the data?
  • Think about it the way the data providers do

40
SEDRE Semantically Enabled Data Registration
Engine
  • SEDRE an application that enables scientists to
    semantically register data sets for optimal
    querying and semantic integration
  • SEDRE enables mapping of heterogeneous data to
    concepts in domain ontologies

A. K. Sinha, A. Rezgui, Virginia Tech
41
Registering Atmospheric Data (2)
42
Discussion (1)
  • Taken together, an emerging set of collected
    experience manifests an emerging informatics core
    capability that is starting to take data
    intensive science into a new realm of
    realizability and potentially, sustainability
  • Use cases
  • X-informatics
  • Core Informatics
  • Cyber Informatics
  • Evolvable technical infrastructure

43
Progression after progression
IT Cyber Infrastructure Cyber Informatics Core Informatics Science Informatics Science, Societal Benefit Areas, Edu
  • One example
  • CI OPeNDAP server running over HTTP/HTTPS
  • Cyberinformatics Data (product) and service
    ontologies, triple store
  • Core informatics Reasoning engine (Pellet),
    OWL, CMAP,
  • Science (X) informatics Use cases, science
    domain terms, concepts in an ontology

44
Discussion (2)
  • The data and information challenges are (almost)
    being identified as increasingly common
  • Data and information science is becoming the
    fourth column (along with theory, experiment
    and computation)
  • Semantics are a very key ingredient for progress
    in informatics
  • A sustained involvement of key inter-disciplinary
    team members is very important -gt leads to
    incentives, rewards, etc. and a balance of
    research and production

45
Summary
  • Informatics is playing a key role in filling the
    gap between science (and the spectrum of
    non-expert) use and generation and the underlying
    cyberinfrastructure
  • This is evident due to the emergence of
    Xinformatics (world-wide)
  • Our experience is implementing informatics as
    semantics in Virtual Observatories (as a working
    paradigm) and Grid environments
  • VSTO is only one example of success
  • Data mining, data integration, smart search,
    provenance
  • Informatics is a profession and a community
    activity and requires efforts in all 3 sub-areas
    (science, core, cyber) and must be synergistic

46
More Information
  • Virtual Solar Terrestrial Observatory (VSTO)
    http//vsto.hao.ucar.edu, http//www.vsto.org
  • Semantically-Enalbed Science Data Integration
    (SESDI) http//sesdi.hao.ucar.edu
  • Semantic Provenance Capture in Data Ingest
    Systems (SPCDIS) http//spcdis.hao.ucar.edu
  • SAM/Semantic Knowledge Integration Framework
    (SKIF) http//skif.hao.ucar.edu
  • Conferences numerous
  • Journals Earth Science Informatics
  • Texts ltemptygt, a few are in progress
  • Courses
  • Semantic e-Science, fall 2008 course at RPI
  • Geoinformatics, at Purdue
  • Contact Peter Fox pfox_at_ucar.edu

47
Spare room
48
Translating the Use-Case - non-monotonic?
GeoMagneticActivity has ProxyRepresentation Geophy
sicalIndex is a ProxyRepresentation (in Realm of
Neutral Atmosphere) Kp is a GeophysicalIndex
hasTemporalDomain daily hasHighThreshold
xsd_number 8 Date/time when KP gt 8
Specification needed for query to
CEDARWEB Instrument Parameter(s) Operating
Mode Observatory Date/time Return-type data
  • Input
  • Physical properties State of neutral atmosphere
  • Spatial
  • Above 100km
  • Toward arctic circle (above 45N)
  • Conditions
  • High geomagnetic activity
  • Action Return Data

49
VSTO - semantics and ontologies in an operational
environment vsto.hao.ucar.edu, www.vsto.org
50
(No Transcript)
51
(No Transcript)
52
Semantic Web Services
53
Semantic Web Services
OWL document returned using VSTO ontology - can
be used both syntactically or semantically
54
Semantic Web Services
55
Semantic Web Services
56
VSTO achievements
  • Conceptual model and architecture developed by
    combined team KR experts, domain experts, and
    software engineers
  • Semantic framework developed and built with a
    small, cohesive, carefully chosen team in a
    relatively short time (deployments in 1st year)
  • Production portal released, includes security, et
    c. with community migration (and so far
    endorsement)
  • VSTO ontology version 1.2, (vsto.owl) in
    production, 2.0 in preparation
  • Web Services encapsulation of semantic interfaces
    in use
  • Solar Terrestrial use-cases are driving the
    completion of the ontologies (e.g. instruments)
  • Using ontologies and the overall framework in
    other applications (volcanoes, climate, oceans,
    water, )

57
Semantic Web Basics
  • The triple subject-predicate-object
  • Interferometer is-a optical instrument
  • Optical instrument has focal length
  • An ontology is a representation of this knowledge
  • W3C is the primary (but not sole) governing
    organization for languages, specifications, best
    practices, et c.
  • RDF - Resource Description Framework
  • OWL 1.0 - Ontology Web Language (OWL 1.1 on the
    way)
  • Encode the knowledge in triples, in a
    triple-store, software is built to traverse the
    semantic network, it can be queried or reasoned
    upon
  • Put semantics between/ in your interfaces, i.e.
    between layers and components in your
    architecture, i.e. between users and
    information to mediate the exchange

58
Semantic Web Benefits
  • Unified/ abstracted query workflow Parameters,
    Instruments, Date-Time
  • Decreased input requirements for query in one
    case reducing the number of selections from eight
    to three
  • Generates only syntactically correct queries
    which was not always insurable in previous
    implementations without semantics
  • Semantic query support by using background
    ontologies and a reasoner, our application has
    the opportunity to only expose coherent query
    (portal and services)
  • Semantic integration in the past users had to
    remember (and maintain codes) to account for
    numerous different ways to combine and plot the
    data whereas now semantic mediation provides the
    level of sensible data integration required, now
    exposed as smart web services
  • understanding of coordinate systems,
    relationships, data synthesis, transformations,
    et c.
  • returns independent variables and related
    parameters
  • A broader range of potential users (PhD
    scientists, students, professional research
    associates and those from outside the fields)

59
Example 1 Registration of Volcanic Data
  • Location Codes
  • U - Above the 180 turn at Holei Pali (upper
    Chain of Craters Road)
  • L - Below Holei Pali (lower Chain of Craters
    Road)
  • UL - Individual traverses were made both above
    and below the 180 turn at Holei Pali
  • H - Highway 11

SO2 Emission from Kilauea east rift zone -
vehicle-based (Source HVO)
Abreviations t/dmetric tonne (1000 kg)/day,
SDstandard deviation, WSwind speed, WDwind
direction east of true north, Nnumber of
traverses
60
Registering Volcanic Data (1)
61
Registering Volcanic Data (2)
  • No explicit lat/long data
  • Volcano identified by name
  • Volcano ontology framework will link name to
    location

62
Example 2 Registration of Atmospheric Data
Satellite data for SO2 emissions
Abbreviation SCD Slant Column Density (in
Dobson Unit (DU))
63
Registering Atmospheric Data (1)
64
SAM Project ObjectivesS. Graves, R. Ramachandran
  • To create a prototype Semantic Analysis and
    Mining framework (SAM) comprising
  • Data mining and knowledge extraction web services
  • Linked ontologies describing the mining services,
    data and the problem domain
  • Web-based client
  • To allow users to discover and explore existing
    data and services, compose workflows for mining
    and invoke these workflows.
  • Semantic search
  • Automated web service invocation
  • Automated web service composition

65
Data Mining Ontology Design
Courtesy R. Ramachandran
66
Data Mining Ontology Snapshot
Courtesy R. Ramachandran
67
The Information Era Interoperability
Modern information and communications
technologies are creating an interoperable
information era in which ready access to data and
information can be truly universal. Open access
to data and services enables us to meet the new
challenges of understand the Earth and its space
environment as a complex system
  • managing and accessing large data sets
  • higher space/time resolution capabilities
  • rapid response requirements
  • data assimilation into models
  • crossing disciplinary boundaries.

68
Virtual Observatories
  • Conceptual examples
  • In-situ Virtual measurements
  • Related measurements
  • Remote sensing Virtual, integrative measurements
  • Data integration
  • Managing virtual data products/ sets

69
Virtual Solar Terrestrial Observatory
  • A distributed, scalable education and research
    environment for searching, integrating, and
    analyzing observational, experimental, and model
    databases.
  • Subject matter covers the fields of solar,
    solar-terrestrial and space physics
  • Provides virtual access to specific data, model,
    tool and material archives containing items from
    a variety of space- and ground-based instruments
    and experiments, as well as individual and
    community modeling and software efforts bridging
    research and educational use
  • 3 year NSF-funded (OCI/SCI) project - completed
  • Several follow-on projects

70
Problem definition
  • Data is coming in faster, in greater volumes and
    outstripping our ability to perform adequate
    quality control
  • Data is being used in new ways and we frequently
    do not have sufficient information on what
    happened to the data along the processing stages
    to determine if it is suitable for a use we did
    not envision
  • We often fail to capture, represent and propagate
    manually generated information that need to go
    with the data flows
  • Each time we develop a new instrument, we develop
    a new data ingest procedure and collect different
    metadata and organize it differently. It is then
    hard to use with previous projects
  • The task of event determination and feature
    classification is onerous and we don't do it
    until after we get the data

71
Building blocks
  • Data formats and metadata IAU standard FITS,
    with SoHO keyword convention, JPeG, GIF
  • Ontologies OWL-DL and RDF
  • The proof markup language (PML) provides an
    interlingua for capturing the information agents
    need to understand results and to justify why
    they should believe the results.
  • The Inference Web toolkit provides a suite of
    tools for manipulating, presenting, summarizing,
    analyzing, and searching PML in efforts to
    provide a set of tools that will let end users
    understand information and its derivation,
    thereby facilitating trust in and reuse of
    information.
  • Capturing semantics of data quality, event, and
    feature detection within a suitable community
    ontology packages (SWEET, VSTO)
Write a Comment
User Comments (0)
About PowerShow.com