XMDR Prototype Progress Report - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

XMDR Prototype Progress Report

Description:

XMDR Prototype Progress Report – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 42
Provided by: johnm424
Category:

less

Transcript and Presenter's Notes

Title: XMDR Prototype Progress Report


1
XMDR Prototype Progress Report
  • John McCarthy and Kevin D. Keck
  • XMDR Project Quarterly Meeting
  • 13 July, 2005
  • UC Berkeley Faculty Club

2
Prototype Demonstration Outline
Build!
  • Review XMDR Prototype Purposes Goals
  • with examples of how it is achieving each goal
  • next step goals from our last Quarterly Meeting
  • Describe metadata currently loaded into prototype
  • Demonstrate new XMDR text retrieval interface
  • with examples of XMDR metadata objects
  • v3 hot off the press this morning (so slides no
    longer match)!
  • Explain importance of XMDRs three levels of
    constraints
  • XML, RDF, OWL
  • Refinements and revisions since last quarterly
    meeting
  • Outline current challenges and future plans
  • Show new XMDR inference capabilities later?

3
What is XMDR?
Build!
  • A collaboration of various U.S. international
    groups
  • to extend the ISO/IEC 11179 metadata registry
    standard
  • A multi-agency funded project
  • to develop semantic capabilities of 11179
    metadata registries
  • An open source prototype implementation testbed
  • to demonstrate new eXtended Metadata Registry
    capabilities
  • e.g., for ontology lifecycle management
    harmonization
  • to explore emerging semantic technologies
  • e.g., RDF, OWL, CL,
  • to assemble test metadata from diverse sources
    structures
  • health, environment, geography,
  • terminologies, thesauri, ontologies,

4
Purposes of XMDR Prototype for ISO/IEC 11179
Registry Standard
Build!
  • Adapt test emerging semantic technologies
  • Extend semantics management capabilities
  • Test uses of terminologies and ontologies
  • Explore different ways to represent relationships
  • Demonstrate implementation of proposed revisions
    to 11179 Parts 2 3 (Ver. 3)
  • Serve as model open source implementation
  • Help resolve registration harmonization issues
    for different metadata standards, including ODM
    MMF

5
XMDR Prototype Architecture Initial Implemented
Modules
External Interface
RegistryStore
Registry
Java
WritableRegistryStore
Subversion
Authentication Service (defer)
RetrievalIndex
MetadataValidator (defer?) schema-driven syntax
checker
Jena, Xerces
LogicBasedIndex
FullTextIndex
Jena, OWI KS Racer
Lucene
MappingEngine (defer)
Ontology Editor
Tools in smaller different font Lines around
boxes for print version Black print on pastel
blocks
11179 OWL Ontology
Protege
Composition (tight ownership)
Generalization
Aggregation (loose ownership)
6
Role of terminologies and ontologies in metadata
registries
  • Sources for concepts, concept definitions, object
    classes, properties, value meanings,
  • Sources of permissible values in value domains
  • Terminologies as classification schemes (e.g.,
    taxonomies)
  • Ontologies to specify semantic relationships
  • is-a, part-of, instance-of,
  • inheritance permits more compact definitions
  • semantic pathways for indexing
  • facilitates searching subclasses inverses
  • Frameworks for integration of multiple schemas
  • Help connect metadata entities via shared terms
  • via automatic indexing of metadata words
  • via text values from specific metadata elements

7
Next Steps (from last meeting)
  • Load data from EDR other sources
  • country codes as value domains from EDR
    versus conceptual domains from Lexgrid
  • need contractor support for full EDR
  • Implement advanced text search UI
  • Use JSP to display xml as html
  • maybe style sheets later?
  • Incorporate other data into page
  • Connect terminologies to other metadata
  • Defer validation and mapping modules

8
XML files for each metadata object were extracted
from EDR via scripts
Extract Script (perl) - works on underlying html
- follows each link - gets html file for each
linked object
Conversion Script (perl) - works on html files -
creates xml file for each
First stepall xx items linked from Countries of
the World Next step bulk load all metadata item
instances from EDR
9
Harold Solbrig (Mayo Cl.) converted 7 sources
into standard lexgrid format
Source Terminology NCI Thesaurus_05.03f DTIC_1.0 G
EMET_2001.0 Mouse_1.32 anatomy EPA Terms of
Environment ISO3166_V-10 Country codes ISO4217
Currency names
Concepts 41,694 13,387 5,284  2,415 1,453
289 204
Unicode Lexgrid Files
Lexgrid File Lexgrid File Lexgrid File Lexgrid
File Lexgrid File Lexgrid File Lexgrid File
10
Lexgrid file for each terminology was converted
to XML concept files
others
NCI Thesaurus
DTIC
GEMET
  • XSLT Conversion Script
  • takes unicode lexgrid files as input
  • creates xml file for each concept system
  • - creates xml file for each concept in system

NCI Thesaurus Concept System
DTIC Concept System
GEMET Concept System
Need to show relationships
11
XMDR Prototype now contains an xml file for each
11179 item
Administered
1 7 499 274 1 2 6 0
  • Context for Administered Items e.g., XMDR?
  • Concept Systems e.g., GEMET, DTIC
  • Data Elements e.g., Country Name
  • Data Element Concepts e.g., Country Label
  • Conceptual Domains e.g., Countries of the World
  • Representation Classes e.g., Code
  • Value Domains e.g., countries of the world
  • Relationship Types e.g., ??

Other
12
Demonstration of new Advanced Text Search
Interface results
  • http//erdos.lbl.gov/xmdr3/
  • Advanced text search modeled on Google      -top
    right pull-down menu lists text fields     -
    word (all vs one) vs phrase searches     - stem
    and wild-card searches     - other components
  • Results displayed as list of tuples
  • Individual item details displayed with links, etc.

13
XMDR Advanced Search Interfacehelps explore
registry contents
http//erdos.lbl.gov/xmdr2/
Search for "any(country (code name))"
More Resultsgtgt
XMDR Web Interface 0.4, LBNL
14
Lucene supports a variety of advanced text search
capabilities
  • Simple word search
  • Wild-card word search
  • Fuzzy or stem search ()
  • Search specified field
  • Search for links
  • Distance (in words)
  • Phrases
  • Boolean operators

country
coun
country
definitioncoun
linkhttp
country world4
countries of the world
AND, "", OR, NOT, "-"
http//lucene.apache.org/java/docs/queryparsersynt
ax.html
15
See Lucene web pages for more advanced text
search capabilities
  • Range search
  • Boosting a term
  • Grouping
  • Field grouping
  • Escape special characters

For more information, see http//lucene.apache.org
/java/docs/queryparsersyntax.html
16
XMDR Prototype supports unicodefor searching as
well as results
Show multilingual example Search protein,
protein product cyrillic ___ ____
17
Lucene text indexing facilitates searching on
words phrases
Word Index birth country world
Name Index birth country
Other Indexes
Phrase searches done on results
ltentity-typegtTerminologylt/entity-typegt
ltnamegtUnited Nations XXXXlt/namegt
ltentity-typegtValueDomainlt/entity-typegt ltnamegt
Countries of the World lt/namegt
ltentity-typegtConceptualDomainlt/entity-typegt
ltnamegtCountrylt/namegt
ltentity-typegtDataElement lt/entity-typegt
ltnamegtCountry of Birthlt/namegt ltconceptual
domaingtCountry lt/conceptual domaingt ltvalue
domaingtCountries of the World lt/value domaingt
18
Reasoners use OWL ontologies to augment RDF
graph queries
RDF Query (rdql/nrdql/SPARQL)
Reasoner Jena or Racer (memory)
result set includes subclasses, inverses, etc.
OWL 11179 Ontology
OWL built-in rules
11179 metadata (xml/rdf files)
19
XMDR RDF graph query facilities compliment text
query capabilities
  • SQL-like queries
  • e.g.,
  • Span items that are only indirectly connected
  • e.g.,
  • Expand queries to subsumed classes in hierarchy
  • e.g., ConceptualDomain includes EnnumeratedConc..
  • Transitivity
  • e.g., parts explosion
  • Shortest path
  • e.g.,
  • Least common ancestor
  • e.g.,

20
OWL, RDF XML Schema used to specify XMDR as UML
is used for 11179 metamodel
11179 Relational Schema
Relational Metadata
UML11179 Metamodel
OWL XMDR Ontology annotations
XMDR XML Schema
Types Cardinalities
TRang
XMDRs Relax NG Schema
Triples binary labeled relationships
RDF Spec
XML Schema Language spec
XML Objects
What things go in own files? Which property
direction stored? Sequential ordering of
properties
21
XMDRs XML schema provides a number of important
benefits
  • Schema specifies what is required as well as what
    is legal
  • Divides metadata into files conforming to XML
    schema
  • Normalizes data (relational one fact in one
    place)
  • Facilitates XSLT transformations by reducing
    degrees of freedom to a canonical encoding within
    the RDF standard
  • Relax NG used to create and check XMDR schema
  • RNG validator enforces many OWL ontology
    constraints
  • TRang automatically translates into XML schema
    syntax

22
RDF provides a rich set of complementary benefits
for XMDR
  • All the advantages of XML plus
  • RDF provides more explicit semantics than XML
  • Users can employ a growing set of RDF tools
  • e.g., SPARQL query language, SWIRL rule language,
    Jena inference
  • More powerful retrieval capabilities
  • Using many different RDF graph query tools
  • RDFs graph data model supports inference
  • e.g., inclusion of subsumed sub-classes
  • Results can be either
  • tuples (ala relational tables)
  • XML/RDF graphs (being developed for W3Cs SPARQL)
  • Facilitates integrated use and management of
    multiple related concepts spanning different
    concept systems

23
OWL ontology specification adds even richer
semantics to XMDR
  • All the advantages of XML RDF plus
  • Classes and subclasses (is-a relationships)
  • Union classes
  • Inverses
  • Same-as, same-property-as, same-class-as
  • Restriction classes (restrict range, cardinality,
    etc. of property based on type of subject)
  • and tools for creation, editing, visualization,
    and management (Protégé plug-ins)

24
Proposed XMDR OWL ontology revises extends
ISO/IEC 11179 OWL ontology
  • Relationships as first class objects
  • Administered items
  • Specific set of relationship types
  • Relationships can have their own attributes
  • Concepts more on this issue tomorrow!)
  • Notations (e.g., OWL, CL, Lexgrid, MOF, UML)
  • Axioms
  • Support for multiple languages per ISO 3066
  • 3 letter codes only if no 2 letter code
  • can specify even if ISO code not yet assigned
  • can be qualified by things other than country
  • e.g., variants of Chinese (traditional,
    simplified)
  • regions locales
  • Variant alphabets (e.g., Hindi and Urdu)

25
XMDR Prototype example dual purpose rdf/xml file
(extract) for one GEMET term
ltConcept rdfabout"" xmlbase"http//erdos.lbl.g
ov/xmdr2/data/CS-GEMET_2001.0/13198.xml"gt ltsource
rdfresource"../CS-GEMET_2001.0.xml"/gt ltidentifi
er rdfparseType"Resource"gt ltstring
rdfdatatype"http//www.w3.org/2001/XMLSchemastr
ing"gt13198lt/stringgt lt/identifiergt ltterminological
Entry rdfparseType"Resource"gt ltentryContext
rdfresource"CXT-default.xml"/gt ltsection
rdfparseType"Resource"gt ltlanguage
rdfdatatype"http//www.w3.org/2001/XMLSchemastr
ing"gtenlt/languagegt ltdesignation
rdfparseType"Resource"gt ltname
xmllang"en"gtprotein productlt/namegt lt/designation
gt ltdefinition rdfparseType"Resource"gt ltsource
rdfresource"lgConsource"/gt lttext
xmllang"en"gtNo definition needed.lt/textgt lt/defin
itiongt lt/sectiongt
Kevin show new version Note parts that illustrate
RDF and OWL
26
Example multilingual term file XMDR fragment
(continued)
ltsection rdfparseType"Resource"gt ltlanguage
rdfdatatype"http//www.w3.org/2001/XMLSchemastr
ing"gtdelt/languagegt ltdesignation
rdfparseType"Resource"gt ltname
xmllang"de"gtEiweißerzeugnislt/namegt lt/designation
gt lt/sectiongt ltsection rdfparseType"Resource"gt lt
language rdfdatatype"http//www.w3.org/2001/XMLS
chemastring"gtrult/languagegt ltdesignation
rdfparseType"Resource"gt ltname
xmllang"ru"gt???????? ???????lt/namegt lt/designatio
ngt lt/sectiongt lt/terminologicalEntrygt lt/Conceptgt
27
Relationships are implemented as LINKS to other
xml files
ltDataElement ...   xmlbase"http//erdos.lbl
.gov/xmdr2/data/DEALL.1.5394.1.xml"gt...
        ltnamegtCountry Namelt/namegt ...  
lttype rdfresource"RCDIS.1.12116.1.xml"/gtRepresen
tationClass link  ltdomain rdfresource"VDALL.1.1
5147.1.xml"/gt ValueDomain link   ltmeaning
rdfresource"DCDIS.1.12800.1.xml"/gt
DataElementConcept link  ltexample
rdfdatatype"xsdstring"gtUnited
Stateslt/examplegt lt/DataElementgt
Metadata schema includes relationships that
specify which attributes can or must link to
other entity-types
Kevin show HTML rendition
Index contains names of entities links
ltValueDomain ...   xmlbase"http//erdos.lbl
.gov/xmdr2/data/ VDALL.1.15147.1.xml"gt
28
Protégé OWLViz Plug-in Displays old OWL Class
Hierarchy for 11179
Built on top of GraphViz
29
Protégé OWLViz Plug-in displays new OWL Class
Hierarchy for 11179
Built on top of GraphViz
30
Class hierarchy pictures is-arelationships
31
Is-a hierarchies can nest to arbitrary depth in
OWL
32
XMDR prototype development is providing valuable
new insights
  • XML-RDF files can serve dual purpose
  • novel idea to combine ontology XML/RDF
    constraints
  • Independent modules facilitate testing many
    possible new toolsBUT
  • takes more time to implement maintain
  • State of the art for open-source OWL reasoners
    not very advanced (none yet for OWL-DL)
  • OTHERS????

33
Unresolved Challenges and Issues for XMDR
Prototype
  • Complexity
  • Representation of Relationships
  • XML RDF OWL is a lot
  • Scalability performance
  • Currently includes only 60,000 objects
  • maybe indexing and/or distributed registries will
    help?
  • RDF Issues
  • RDF queries yield tuples, not RDF objects (but
    W3C at work)
  • RDF tools wont create XMDR files (add wrapper
    constraints?)
  • User-friendly interface for RDF queries (later)
  • External data sources, ontologies, terminologies
  • Harmonization with ODM and MMF
  • XML/RDF objects results display browsing
  • Something like EDR UI with link labels inverse
    refs

34
How can XMDR development helpinform Data
Reference Model?
  • identifying important entities, attributes, and
    relationships for the DRM
  • Suggesting tools techniques
  • Others?

35
Next Steps Priorities
  • Load data from EDR other sources
  • Tribal data as conceptual as well as value
    domains
  • HL7 concepts, value domains, relations
  • See list for other priorities
  • Implement advanced RDF query UI
  • Connect terminology/ontology to items
  • Incorporate Common Logic, Web Services, etc.
  • Ontology Lifecycle Management (OLM)
  • form interface for registration uploading
    metadata?

36
Other Topics? Extra Slides
37
RDF Queries look very different from text search
or SQL syntax!
Checkbox for with without inference
QUERY subject object
predicate SELECT ?t WHERE (lthttp//erdos.lbl.gov/x
mdr/data/VDALL.1.15147.1.xmlgt rdftype
?t) USING mdr FOR lthttp//hpcrd.lbl.gov/SDM/XMDR/o
nt/iso11179-3v2.owlgt RESULT ?t lthttp//hpcrd.lbl
.gov/SDM/XMDR/ont/iso11179-3v2.owlEnumeratedValue
Domaingt lthttp//hpcrd.lbl.gov/SDM/XMDR/ont/iso1117
9-3v2.owlValueDomaingt lthttp//hpcrd.lbl.gov/SDM/X
MDR/ont/iso11179-3v2.owlAdministeredItemgt ltowlTh
inggt ltrdfsResourcegt Search results include
sub-classes and inverses
38
Protégé Editor was used to create OWL ontology
for 11179 metamodel
  • insert screenshot from Protégé

39
Some observations about using unicode
  • RDFX and XML both support unicode
  • (via 20944 bindings?)
  • Postgres had to be reconfigured for unicode
  • Chinese, Japanese, and Korean (CJK) use white
    space in different ways than Roman alphabets
  • Lexers and stemmers are available for CJK
  • XMDR other metadata registry implementations
    should all support Unicode!

40
Example RDF search results?
  • If Paul gets it working!
  • What to show? Query and results?
  • Checkbox for whether to include inference

41
XMDR differences with current 11179 can also be
shown in UML notation
  • Insert UML differences shown in diagrams

Standards discussion Wednesday will compare UML
models for XMDR and current 11179
Write a Comment
User Comments (0)
About PowerShow.com