SC 32/WG 2 Tutorial - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

SC 32/WG 2 Tutorial

Description:

Recorded data loses much real world meaning, context, relationships ... The 'Normalize' function can make use of standard code sets that have mapping ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 35
Provided by: engl159
Category:
Tags: study | time | tutorial

less

Transcript and Presenter's Notes

Title: SC 32/WG 2 Tutorial


1
JTC1 SC32 N1649
  • SC 32/WG 2 Tutorial
  • Metadata Registry Standards
  • July 16, 2007

Bruce Bargmeyer University of California,
Berkeley and Lawrence Berkley National
Laboratory Tel 1 510-495-2905 bebargmeyer_at_lbl.go
v
2
Topics
  • Standards development OMG, ISO (TC 37 JTC 1/SC
    32), W3C, OASIS
  • Align, Coordinate, Integrate
    Standards, Recommendations, Specifications
  • Semantics Challenges and Future Directions

3
Align, Coordinate, IntegrateStandards
WG 2 doing OK internally
24707
11179 E3
19763
20944
4
Align, Coordinate, IntegrateStandards
SC 32?
WG 1
WG 2
WG 3
WG 4
Clearwater meeting a step forward
5
Align, Coordinate, Integrate Standards/Recommendat
ions/Specificationsfor Semantic Computing
Semantic Web
Terminology
Object Management
ISO/IEC 11179 Metadata Registries
Graph
RDF
MOF ODM CWM IMM
Subject
Node
Predicate
Edge
Node
Object
W3C
OMG
ISO/IEC JTC 1/SC 32
ISO TC 37
6
Standards DevelopmentSemantics Management and
Semantics Services Semantic Computing
Align, Co-develop, Fast Track, PAS Submission
OMG
W3C
ISO/IEC JTC 1 SC 32
ISO TC 37
7
Standards DevelopmentSemantics Management and
Semantics Services Semantic Computing
Align, integrate, co-develop, Fast Track, PAS
Submission Can we coordinate content?
OMG
ISO/IEC JTC 1 SC 32
W3C
W3C
8
A Success
Some text and figures are identical in the two
standards.
OMG
ISO/IEC 24707 OMG ODM
ISO/IEC JTC 1 SC 32
ISO/IEC 20944 Common Logic OMG Ontology
Definition Metamodel
9
Standards DevelopmentSemantics Management and
Semantics Services Semantic Computing
Ongoing effort
ISO/IEC JTC 1 SC 32
ISO/IEC 11179 (Edition 3)
10
Standards DevelopmentSemantics Management and
Semantics Services Semantic Computing
Possible effort
OMG
RFP - MOF? IMM
11179 E3 proposals
11
Standards DevelopmentSemantics Management and
Semantics Services Semantic Computing
Hopeful?
OMG
IMM
ISO/IEC JTC 1 SC 32
ISO/IEC 11179 (Edition 3)
12
Other Possibilities
  • OASIS ebXML Registry
  • W3C Semantic Web Deployment WG
  • TC 37

13
The Ageless Information Problemcf Data,
Information, Knowledge, Wisdom
  • Getting the information that we need, when we
    need it, without afflicting the excellent minds
    of humans with toil and drudgery
  • The litany
  • Too much or too little, irrelevant, not
    authoritative, out of date
  • Unknown quality, not trustable, lacks provenance,
    no certainty measures
  • Difficult to find, difficult to access, difficult
    to use
  • Meaning not clear, relationship to other
    information not clear
  • Data creators do not have the same understanding
    of the data as end users
  • Recorded data loses much real world meaning,
    context, relationships
  • Much of the meaning of data is buried in the
    processes used to manipulate the data (e.g., in
    computer code)
  • Need improvements in efficiency and effectiveness
  • Every time we solve it, we re-create it.

14
New Semantics Capabilities Proposed for ISO/IEC
11179 MDR (Edition 3)
  • Improve traditional data management/data
    administration
  • Use stronger semantics management and semantics
    services capabilities
  • Enable something new
  • Semantic computing

15
Semantic Computing The Nub of It
  • Processing that takes meaning into account
  • Makes use of concept systems, e.g., thesauri
    and/or ontologies
  • Moves some of the meaning of data from computer
    code to managed semantics
  • Processing that uses (e.g., reasons across) the
    relations between things not just computing about
    the things themselves.
  • Processing that helps to take people out of the
    computation, reducing the human toil
  • Semantics grounding for data, data discovery,
    extraction, mapping, translation, formatting,
    validation, inferencing,
  • Delivering higher-level results that are more
    helpful for the users thought and action

16
In The Epic Information StruggleWe Have Made
Heroic Progress

Files
Computer Processing Cards Tape Disk
Machine Processing
17
In The Epic Information StruggleWe Have Made
Heroic Progress
  • In structuring data and text --
  • Structured Data
  • Columns on cards tape (possibly comma
    separated)
  • Hierarchical (DBMS)
  • Network
  • Table (relational DBMS)
  • Hierarchy (XML)
  • Graph (RDF)
  • Semi-structured text
  • Nrof, trof, LaTeX
  • SGML
  • HTML
  • XML

18
In The Epic Information StruggleWe Have Made
Heroic Progress
  • In documenting data and text (e.g., semantics
    management)
  • Data Standards
  • Code sets
  • (Meta)Data Standards
  • Data element definitions, valid values, value
    meanings
  • Metadata registries (MDR, ISO/IEC 11179)
  • Other standards as presented at this conference
  • Concept systems (or KOS)
  • Glossaries
  • Dictionaries
  • Thesauri
  • Taxonomies
  • Ontologies
  • Graphs

19
Semantic ManagementProposals for 11179 Edition 3
  • Improve data management through use of stronger
    semantics management
  • Databases
  • XML data
  • Other traditional data
  • Enable new wave of semantic computing
  • Take meaning of data into account
  • Process across relations as well as properties
  • May use reasoning engines, e.g., to draw
    inferences

20
Semantics Improve Data Management/Data
Administration
Conceptual Domain Agent
Object Class Chemopreventive Agent
Valid Values Cyclooxygenase Inhibitor Doxercalcife
rol Eflornithine Ursodiol
Data Element Concept Chemopreventive Agent NSC
Number
Value Domain NSC Code
Classification Schemes caDSRTraining
Property NSCNumber
Representation Code
Data Element Chemopreventive Agent Name
Context caCORE
Enterprise Vocabulary Services (EVS) Concepts
Unite NCI MDR
Source Denise Warzel, National Cancer Institute
21
Semantic Computing Application Find and process
non-explicit data
Analgesic Agent
For example Patient data on drugs contains
brand names (e.g. Tylenol, Anacin-3,
Datril,) However, want to study patients
taking analgesic agents
Non-Narcotic Analgesic
Analgesic and Antipyretic
Acetominophen
Nonsteroidal Antiinflammatory Drug
Datril
Anacin-3
Tylenol
22
A Semantics Application Specify and compute
across Relations, e.g., within a food web in an
Arctic ecosystem
                                       
An organism is connected to another organism for
which it is a source of food energy and material
by an arrow representing the direction of
biomass transfer.
Source http//en.wikipedia.org/wiki/Food_webFood
_web (from SPIRE)
23
Semantics Application Combine Data, Metadata
Concept Systems
Inference Search Query find water bodies
downstream from Fletcher Creek where chemical
contamination was over 10 micrograms per liter
between December 2001 and March 2003
Concept system
Data
ID Date Temp Hg
A 06-09-13 4.4 4
B 06-09-13 9.3 2
X 06-09-13 6.7 78
Metadata
Name Datatype Definition Units
ID text Monitoring Station Identifier not applicable
Date date Date yy-mm-dd
Temp number Temperature (to 0.1 degree C) degrees Celcius
Hg number Mercury contamination micrograms per liter
24
Semantics Application Use data from systems that
record the same facts with different terms
  • Reduce the human toil of drawing information
    together and performing analysis.

25
Challenge Use data from systems that record the
same facts with different terms
Database Catalogs
Common Content
ISO 11179Registries
UDDIRegistries
Table Column
Data Element
Common Content
Common Content
Business Specification
Country Identifier
OASIS/ebXMLRegistries
CASE Tool Repositories
XML Tag
Attribute
Common Content
Common Content
Business Object
Coverage
TermHierarchy
OntologicalRegistries
Common Content
26
Same Fact, Different Terms
Data Elements
DZ BE CN DK EG FR . . . ZW
012 056 156 208 818 250 . . . 716
Algeria Belgium China Denmark Egypt France . .
. Zimbabwe
LAlgérie Belgique Chine Danemark Egypte La
France . . . Zimbabwe
DZA BEL CHN DNK EGY FRA . . . ZWE
Name Context Definition Unique ID 4572 Value
Domain Maintenance Org. Steward Classification
Registration Authority Others
ISO 3166 English Name
ISO 3166 3-Numeric Code
ISO 3166 2-Alpha Code
ISO 3166 French Name
ISO 3166 3-Alpha Code
27
Challenge Draw information together from a broad
range of studies, databases, reports, etc.
28
A semantics application Information Extraction
and Use
Extraction Engine
Segment Classify Associate Normalize Deduplicate
Discover patterns Select models Fit
parameters Inference Report results
11179-3 (E3) XMDR
Actionable Information
Decision Support
29
Extraction Engines
  • Find concepts and relations between concepts in
    text, tables, data, audio, video,
  • Produce databases (relational tables, graph
    structures), and other output
  • Functions
  • Segment find text snippets (boundaries
    important)
  • Classify determines database field for text
    segment
  • Association which text segments belong together
  • Normalization put information into standard
    form
  • Deduplication collapse redundant information

30
Metadata Registries are Useful
  • Registered semantics
  • For training extraction engines
  • The Normalize function can make use of standard
    code sets that have mapping between
    representation forms.
  • The Classify function can interact with
    pre-established concept systems.
  • Provenance
  • High precision for proper nouns, less precision
    (e.g., 70) for other concepts -gt impacts
    downstream processing, Need to track precision

31
Challenge Gain Common Understanding of meaning
between Data Creators and Data Users
A common interpretation of what the data
represents
EEA
USGS
text
data
environ agriculture climate human
health industry tourism soil water air
DoD
123 345 445 670 248 591 308
123 345 445 670 248 591 308
3268 0825 1348 5038 2708 0000 2178
3268 0825 1348 5038 2708 0000 2178
Users
text
data
environ agriculture climate human
health industry tourism soil water air
EPA
123 345 445 670 248 591 308
123 345 445 670 248 591 308
3268 0825 1348 5038 2708 0000 2178
3268 0825 1348 5038 2708 0000 2178
text
data
3268 0825 1348 5038 2708 0000 2178
123 345 445 670 248 591 308
ambiente agricultura tiempo salud
huno industria turismo tierra agua aero
123 345 445 670 248 591 308
3268 0825 1348 5038
Others . . .
Users
Information systems
Data Creation
32
Practical Vocabulary Management
  • Vocabulary Management is essential for use of
    semantic technologies
  • Define concepts and relationships
  • Harmonize terminology, resolve conflicts
  • Collaborate with stakeholders
  • An approach
  • Select a domain of interest
  • Enter core concepts and relationships
  • Engage community in vocabulary review
  • Harmonize, validate and vet the vocabulary
  • Enter metadata describing enterprise data
  • Link concept system to metadata

33
Use eXtended MDR Capabilities
  • For vocabulary repository
  • Register, harmonize, validate, and vet
    definitions and relations
  • To register mappings between multiple
    vocabularies
  • To register mappings of concepts to data
  • To provide semantics services
  • To register and manage the provenance of data
  • 11179-3 (E3) is part of the infrastructure for
    semantics and data management.
  • These capabilities are proposed for ISO/IEC 11179
    Edition 3

34
11179 (E3) Use
  • Upside
  • Collaborative
  • Supports interaction with community of interest
  • Shared evolution and dissemination
  • Enables Review Cycle
  • Standards-based dont lock semantics into
    proprietary technology
  • Foundation for strategic data centric
    applications
  • Lays the foundation for Ontology-based
    Information Management
  • Content is reusable for many purposes
  • Downside
  • Managing semantics is HARD WORK- No matter how
    friendly the tools
  • Needs integration with other components

35
Some Challenges
  • Data management and metadata management must
    evolve to address more complex data structures
    (relational, object, hierarchies, graphs)
  • Query capabilities
  • More than SQL, XQuery, SPARQL
  • Discovery mechanisms
  • More than Google
  • Access, mining, extraction
  • We need stronger semantics management

36
Metadata Registry Support for
  • Registering and mapping ontologies
  • Ontology Evolution
  • Registering Process Ontologies

37
Thank You
  • Acknowledgements
  • Karlo Berket, LBNL
  • Kevin Keck, LBNL
  • John McCarthy, LBNL
  • Harold Solbrig, Apelon
  • This material is based upon work supported by the
    National Science Foundation under Grant No.
    0637122, USEPA and USDOD. Any opinions, findings,
    and conclusions or recommendations expressed in
    this material are those of the author(s) and do
    not necessarily reflect the views of the National
    Science Foundation, USEPA or USDOD.
  • Bruce Bargmeyer
  • Lawrence Berkeley National Laboratory
  • Berkeley Water Center
  • University of California, Berkeley
  • Tel 1 510-495-2905
  • bebargmeyer_at_lbl.gov
Write a Comment
User Comments (0)
About PowerShow.com