Shifting%20the%20Burden%20from%20the%20User%20to%20the%20Data%20Provider - PowerPoint PPT Presentation

About This Presentation
Title:

Shifting%20the%20Burden%20from%20the%20User%20to%20the%20Data%20Provider

Description:

Scientists should be able to access a global, distributed knowledge base of ... which was not always insurable in previous implementations without semantics ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 44
Provided by: debor136
Learn more at: http://www.hdfeos.org
Category:

less

Transcript and Presenter's Notes

Title: Shifting%20the%20Burden%20from%20the%20User%20to%20the%20Data%20Provider


1
Shifting the Burden from the User to the Data
Provider
  • Peter Fox
  • High Altitude Observatory,
  • NCAR ()
  • With thanks to eGY and various NSF, DoE and NASA
    projects

2
Outline
  • Background, definitions
  • Informatics -gt e-Science
  • Data has lots of uses
  • Virtual Observatories use cases
  • Data Framework Examples
  • Data ingest, integration, mining and
  • Discussion

3
Background
  • Scientists should be able to access a global,
    distributed knowledge base of scientific data
    that
  • appears to be integrated
  • appears to be locally available
  • But data is obtained by multiple instruments,
    using various protocols, in differing
    vocabularies, using (sometimes unstated)
    assumptions, with inconsistent (or non-existent)
    meta-data. It may be inconsistent, incomplete,
    evolving, and distributed
  • And there exist(ed) significant levels of
    semantic heterogeneity, large-scale data, complex
    data types, legacy systems, inflexible and
    unsustainable implementation technology

4
But data has Lots of Audiences
Information products have
Information
More Strategic
Less Strategic
From Why EPO?, a NASA internal report on
science education, 2005
SCIENTISTS TOO
5
The Information Era Interoperability
Modern information and communications
technologies are creating an interoperable
information era in which ready access to data and
information can be truly universal. Open access
to data and services enables us to meet the new
challenges of understand the Earth and its space
environment as a complex system
  • managing and accessing large data sets
  • higher space/time resolution capabilities
  • rapid response requirements
  • data assimilation into models
  • crossing disciplinary boundaries.

6
Shifting the Burden from the Userto the Provider
7
Modern capabilities
8
Mind the Gap!
  • As a result of finding out who is doing what,
    sharing experience/ expertise, and substantial
    coordination
  • There is/ was still a gap between science and the
    underlying infrastructure and technology that is
    available
  • Informatics - information science includes the
    science of (data and) information, the practice
    of information processing, and the engineering of
    information systems. Informatics studies the
    structure, behavior, and interactions of natural
    and artificial systems that store, process and
    communicate (data and) information. It also
    develops its own conceptual and theoretical
    foundations. Since computers, individuals and
    organizations all process information,
    informatics has computational, cognitive and
    social aspects, including study of the social
    impact of information technologies. Wikipedia.
  • Cyberinfrastructure is the new research
    environment(s) that support advanced data
    acquisition, data storage, data management, data
    integration, data mining, data visualization and
    other computing and information processing
    services over the Internet.

9
Progression after progression
IT Cyber Infrastructure Cyber Informatics Core Informatics Science Informatics, aka Xinformatics Science, SBAs
10
Virtual Observatories
  • Conceptual examples
  • In-situ Virtual measurements
  • Related measurements
  • Remote sensing Virtual, integrative measurements
  • Data integration
  • Managing virtual data products/ sets

11
Virtual Observatories
  • Make data and tools quickly and easily accessible
    to a wide audience.
  • Operationally, virtual observatories need to find
    the right balance of data/model holdings, portals
    and client software that researchers can use
    without effort or interference as if all the
    materials were available on his/her local
    computer using the users preferred language
    i.e. appear to be local and integrated
  • Likely to provide controlled vocabularies that
    may be used for interoperation in appropriate
    domains along with database interfaces for access
    and storage and smart tools for evolution and
    maintenance.

12
Early days of discipline specific VOs
VO2
VO3
VO1
DBn
DB2
DB3

DB1
13
The Astronomy approach data-types as a service
Limited interoperability
  • VOTable
  • Simple Image Access Protocol
  • Simple Spectrum Access Protocol
  • Simple Time Access Protocol

VO App2
VO App3
VO App1
Open Geospatial Consortium Web Feature,
Coverage, Mapping Service Sensor Web
Enablement Sensor Observation, Planning,
Analysis Service use the same approach
VO layer
DBn
DB2
DB3

DB1
14
VO API
Web Serv.
VO Portal
Query, access and use of data
  • Mediation Layer
  • Ontology - capturing concepts of Parameters,
    Instruments, Date/Time, Data Product (and
    associated classes, properties) and Service
    Classes
  • Maps queries to underlying data
  • Generates access requests for metadata, data
  • Allows queries, reasoning, analysis, new
    hypothesis generation, testing, explanation, et
    c.

Semantic mediation layer - VSTO - low level
Metadata, schema, data
DBn
DB2
DB3

DB1
15
Content Coupling Energetics and Dynamics of
Atmospheric Regions WEB
Community data archive for observations and
models of Earth's upper atmosphere and
geophysical indices and parameters needed to
interpret them. Includes browsing capabilities
by periods, gt 310 instruments, models, gt 820
parameters
16
Content Mauna Loa Solar Observatory
Near real-time data products from Hawaii from a
variety of solar instruments. Source for space
weather, solar variability, and basic solar
physics Other content used too - Center for
Integrated Space Weather Modeling
17
Semantic Web Methodology and Technology
Development Process
  • Establish and improve a well-defined methodology
    vision for Semantic Technology based application
    development
  • Leverage controlled vocabularies, et c.

Adopt Technology Approach
Leverage Technology Infrastructure
Science/Expert Review Iteration
Rapid Prototype
Open World Evolve, Iterate, Redesign, Redeploy
Use Tools
Analysis
Use Case
Develop model/ ontology
Small Team, mixed skills
18
Science and technical use cases
  • Find data which represents the state of the
    neutral atmosphere anywhere above 100km and
    toward the arctic circle (above 45N) at any time
    of high geomagnetic activity.
  • Extract information from the use-case - encode
    knowledge
  • Translate this into a complete query for data -
    inference and integration of data from
    instruments, indices and models
  • Provide semantically-enabled, smart data query
    services via a SOAP web for the Virtual
    Ionosphere-Thermosphere-Mesosphere Observatory
    that retrieve data, filtered by constraints on
    Instrument, Date-Time, and Parameter in any order
    and with constraints included in any combination.

19
VSTO - semantics and ontologies in an operational
environment vsto.hao.ucar.edu, www.vsto.org
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
Semantic Web Benefits
  • Unified/ abstracted query workflow Parameters,
    Instruments, Date-Time
  • Decreased input requirements for query in one
    case reducing the number of selections from eight
    to three
  • Generates only syntactically correct queries
    which was not always insurable in previous
    implementations without semantics
  • Semantic query support by using background
    ontologies and a reasoner, our application has
    the opportunity to only expose coherent query
    (portal and services)
  • Semantic integration in the past users had to
    remember (and maintain codes) to account for
    numerous different ways to combine and plot the
    data whereas now semantic mediation provides the
    level of sensible data integration required, now
    exposed as smart web services
  • understanding of coordinate systems,
    relationships, data synthesis, transformations,
    et c.
  • returns independent variables and related
    parameters
  • A broader range of potential users (PhD
    scientists, students, professional research
    associates and those from outside the fields)

25
What is a Non-Specialist Use Case?
Someone should be able to query a virtual
observatory without having specialist knowledge
Teacher accesses internet goes to An Educational
Virtual Observatory and enters a search for
Aurora.
26
What should the User Receive?
Teacher receives four groupings of search
results 1) Educational materials
http//www.meted.ucar.edu/topics_spacewx.php and
http//www.meted.ucar.edu/hao/aurora/ 2)
Research, data and tools via VSTO, VSPO and
VITMO, knows to search for brightness, or
green/red line emission 3) Did you know? Aurora
is a phenomena of the upper terrestrial
atmosphere (ionosphere) also known as Northern
Lights 4) Did you mean? Aurora Borealis or
Aurora Australis, et c.
27
Semantic Information Integration Concept map for
educational use of science data in a lesson plan
28
(No Transcript)
29
Issues for Virtual Observatories
  • Scaling to large numbers of data providers and
    redefining the role(s)/ relations with them
  • Crossing discipline boundaries
  • Security, access to resources, policies
  • Branding and attribution (where did this data
    come from and who gets the credit, is it the
    correct version, is this an authoritative
    source?)
  • Provenance/derivation (propagating key
    information as it passes through a variety of
    services, copies of processing algorithms, )
  • Data quality, preservation, stewardship

These are currently burden areas for users
30
Problem definition
  • Data is coming in faster, in greater volumes and
    outstripping our ability to perform adequate
    quality control
  • Data is being used in new ways and we frequently
    do not have sufficient information on what
    happened to the data along the processing stages
    to determine if it is suitable for a use we did
    not envision
  • We often fail to capture, represent and propagate
    manually generated information that need to go
    with the data flows
  • Each time we develop a new instrument, we develop
    a new data ingest procedure and collect different
    metadata and organize it differently. It is then
    hard to use with previous projects
  • The task of event determination and feature
    classification is onerous and we don't do it
    until after we get the data

31
Use cases
  • Determine which flat field calibration was
    applied to the image taken on January, 26, 2005
    around 2100UT by the ACOS Mark IV polarimeter.
  • Which flat-field algorithm was applied to the set
    of images taken during the period November 1,
    2004 to February 28, 2005?
  • How many different data product types can be
    generated from the ACOS CHIP instrument?
  • What images comprised the flat field calibration
    image used on January 26, 2007 for all ACOS CHIP
    images?
  • What processing steps were completed to obtain
    the ACOS PICS limb image of the day for January
    26, 2005?
  • Who (person or program) added the comments to the
    science data file for the best vignetted,
    rectangular polarization brightness image from
    January, 26, 2005 184909UT taken by the ACOS
    Mark IV polarimeter?
  • What was the cloud cover and atmospheric seeing
    conditions during the local morning of January
    26, 2005 at MLSO?
  • Find all good images on March 21, 2008.
  • Why are the quick look images from March 21,
    2008, 1900UT missing?
  • Why does this image look bad?

32
Provenance
  • Origin or source from which something comes,
    intention for use, who/what generated for, manner
    of manufacture, history of subsequent owners,
    sense of place and time of manufacture,
    production or discovery, documented in detail
    sufficient to allow reproducibility

33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
Visual browse
37
(No Transcript)
38
(No Transcript)
39
Discussion (1)
  • Taken together, an emerging set of collected
    experience manifests an emerging informatics core
    capability that is starting to take data
    intensive science into a new realm of
    realizability and potentially, sustainability
  • Use cases (i.e. real users)
  • X-informatics
  • Core Informatics
  • Cyber Informatics
  • There are implications for data models

40
Progression after progression
IT Cyber Infrastructure Cyber Informatics Core Informatics Science Informatics Science, SBAs
  • Example
  • CI OPeNDAP server running over HTTP/HTTPS
  • Cyberinformatics Data (product) and service
    ontologies, triple store
  • Core informatics Reasoning engine (Pellet), OWL
  • Science (X) informatics Use cases, science
    domain terms, concepts in an ontology

41
Discussion (2)
  • Data and information science is becoming the
    fourth column (along with theory, experiment
    and computation)
  • Semantics (of the data) are a very key ingredient
    -gt may imply richer data models

42
Summary
  • Informatics is playing a key role in filling the
    gap between science (and the spectrum of
    non-expert) use and generation and the underlying
    cyberinfrastructure, i.e. in shifting the burden
  • This is evident due to the emergence of
    Xinformatics (world-wide)
  • Our experience is implementing informatics as
    semantics in Virtual Observatories (as a working
    paradigm) and Grid environments
  • VSTO is only one example of success
  • Data mining, data integration, smart search,
    provenance are close behind
  • Informatics is a profession and a community
    activity and requires efforts in all 3 sub-areas
    (science, core, cyber) and must be synergistic

43
More Information
  • Virtual Solar Terrestrial Observatory (VSTO)
    http//vsto.hao.ucar.edu, http//www.vsto.org
  • Semantically-Enalbed Science Data Integration
    (SESDI) http//sesdi.hao.ucar.edu
  • Semantic Provenance Capture in Data Ingest
    Systems (SPCDIS) http//spcdis.hao.ucar.edu
  • Semantic Knowledge Integration Framework
    (SKIF/SAM) http//skif.hao.ucar.edu
  • Semantic Web for Earth and Environmental
    Terminology (SWEET) http//sweet.jpl.nasa.gov
  • Conferences AGU 2008, EGU 2009, ISWC 2008, CIKM
    2008,
  • Peter Fox pfox_at_ucar.edu
Write a Comment
User Comments (0)
About PowerShow.com