Title: XMDR Prototype Progress Report
1XMDR Prototype Progress Report
- John McCarthy and Kevin D. Keck
- XMDR Project Quarterly Meeting
- 15 March, 2005
- UC Berkeley Faculty Club
2Purposes of XMDR Prototype for ISO/IEC 11179
Registry Standard
- Extend semantics management capabilities
- Explore uses of terminologies and ontologies
- Systematize representation of relationships
- Adapt test emerging semantic technologies
- Help resolve registration harmonization issues
for different metadata standards - Propose revisions to 11179 Parts 2 3 (Ver. 3)
- Show how proposed revisions to metadata registry
standards can be implemented - Demonstrate Reference Implementation (RI)
3XMDR Prototype Architecture Initial Implemented
Modules
Subversion
Java
Jena, Xerces
Lucene
Jena, OWI KS
4XML files for each metadata object were extracted
from EDR via scripts
Extract Script - works on underlying html -
follows each link - gets html file for each
linked object
Conversion Script - works on html files - creates
xml file for each
First stepall xx items linked from Countries of
the World Next step bulk load all metadata item
instances from EDR
5Html to xml file conversion example
- Show html file and then its xml transform
6XMDR Prototype contains an xml file for each
metadata item instance
- Administered Items
- Classification Schemes
- Conceptual Domains
- Contexts (for Administered Items)
- Data Elements
- Data Element Concepts
- Object Classes
- Properties
- Representation Classes
- Value Domains
Other Items Relationships What else?
7XMDR Prototype example dual purpose rdf/xml
file DC-Country_Label (extract)
- ltDataElement rdfabout""
- xmlnsmdr"http//hpcrd.lbl.gov/SDM/XMDR/ont/i
so11179-3v2.owl" - xmlnsrdf"http//www.w3.org/1999/02/22-rdf-sy
ntax-ns" - mdrmeaning"DC-Country_Label.xml"
- mdrregistrationAuthority"http//oaspub.epa.g
ov/edr"gt - ltidentifiergt
- ltregistrationAuthorityIdentifiergt
- ltinternationalCodeDesignatorgten-USlt/internat
ionalCodeDesignatorgt - ltorganizationIdentifiergtEPAlt/organizationIde
ntifiergt - lt/registrationAuthorityIdentifiergt
- lt/identifiergt
- ltadministrationRecordgt
- ltregistrationStatusgtStandardlt/registrationStat
usgt - ltadministrativeStatusgtFinallt/administrativeSta
tusgt - ltcreationDategt????lt/creationDategt
- lt/administrationRecordgt
- ltsteward mdrorganization"ORG-1044"
- mdrcontact"CON-20068"/gt
ltterminologicalEntry mdrentryContext"CXT-St
andard"gt ltcomponentgt ltsectionLanguagegt
ltlanguagegtenglt/languagegt ltcountryIdentifiergtUSAlt/
countryIdentifiergt lt/sectionLanguagegt
ltdesignationgt ltnamegtCountry Namelt/namegt
lt/designationgt ltdefinitiongt lttextgtThe name
that represents a primary geopolitical unit of
the world.lt/textgt lt/definitiongt
lt/componentgt lt/terminologicalEntrygt ltterminologi
calEntry mdrentryContext"CXT-XML_Tag_Final"
gt ltcomponentgt ltsectionLanguagegt ltlangua
gegtenglt/languagegt ltcountryIdentifiergtUSAlt/country
Identifiergt lt/sectionLanguagegt
ltdesignationgt ltnamegtCountryNamelt/namegt
lt/designationgt lt/componentgt
lt/terminologicalEntrygt lt/DataElementgt
boil down contents and add annotations
8XMDR files serve dual purposexml and
OWL-compatible rdf
- Well-formed XML
- XML serialization of RDF
- Conforms with 11179 OWL ontology
- Base tag includes rdfabout attribute
- Literals encoded as element content
- URIs encoded as attribute values
- striped resource, property, resource, use
abbreviated form for anonymous nodes
9Xml schema specifies constraints
- Relax NG schema to make xml files
- Enforces constraints for 11179 OWL
-
10Relationships are implemented as LINKS to other
xml files
ltentity-typegtDataElement lt/entity-typegt
ltnamegtCountry of Birthlt/namegt ltconceptual
domaingtCountry lt/conceptual domaingt ltvalue
domaingtCountries of the World lt/value domaingt
Metadata schema includes relationships that
specify which attributes can or must link to
other entity-types
ltentity-typegtConceptualDomainlt/entity-typegt
ltnamegtCountrylt/namegt
ltentity-typegtValueDomainlt/entity-typegt ltnamegt
Countries of the World lt/namegt
11How can terminologies and ontologies help manage
metadata?
- At the level of metadata instances in a registry,
connect metadata entities via shared terms - via automatic indexing of metadata words
- via text values from specific metadata elements
- At the level of the 11179 (or other) metamodel,
ontologies can help specify formal relationships - is-a and part-of hierarchies, etc.
- Inheritance, aggregation,
- for automatic searching of sub-classes inverses
- to specify semantic pathways for indexing
12Protégé Editor was used to create OWL ontology
for 11179 metamodel
- insert screenshot from Protégé
13Protégé OWLViz Plug-in Displays OWL Class
Hierarchy for 11179
Built on top of GraphViz
14Class hierarchy includes simple and union
is-arelationships
15Is-a and union hierarchies can nest to arbitrary
depth in OWL
16ISO/IEC 11179 fragment is expressed as an OWL
ontology using RDF syntax
lt?xml version"1.0" encoding"ISO-8859-1"?gt ltrdfR
DF xmlnsrdf"http//www.w3.org/1999/02/22-rdf
-syntax-ns" xmlnsrdfs"http//www.w3.org/200
0/01/rdf-schema" xmlnsowl"http//www.w3.org
/2002/07/owl" xmlns"http//www.owl-ontologie
s.com/unnamed.owl" xmlbase"http//www.owl-ont
ologies.com/unnamed.owl"gt ltowlOntology
rdfabout""/gt ltowlClass rdfID"Registrar"gt
ltrdfssubClassOf rdfresource"http//www.w3.org
/2002/07/owlThing"/gt ltrdfssubClassOfgt
ltowlRestrictiongt ltowlcardinality
rdfdatatype"http//www.w3.org/2001/XMLSchemaint
" gt1lt/owlcardinalitygt
ltowlonPropertygt ltowlObjectProperty
rdfID"contact"/gt lt/owlonPropertygt
lt/owlRestrictiongt lt/rdfssubClassOfgt
ltrdfssubClassOfgt ltowlRestrictiongt
17Lucene facilitates text indexing to search on
words phrases
Word Index birth country world
Name Index birth country
Other Indexes
Phrase searches done on results
ltentity-typegtTerminologylt/entity-typegt
ltnamegtUnited Nations XXXXlt/namegt
ltentity-typegtValueDomainlt/entity-typegt ltnamegt
Countries of the World lt/namegt
ltentity-typegtConceptualDomainlt/entity-typegt
ltnamegtCountrylt/namegt
ltentity-typegtDataElement lt/entity-typegt
ltnamegtCountry of Birthlt/namegt ltconceptual
domaingtCountry lt/conceptual domaingt ltvalue
domaingtCountries of the World lt/value domaingt
18Lucene text search capabilities, with examples
- Simple word search
- Wild-card word search
- Fuzzy or stem search ()
- Search specified field
- Search for links
- Distance (in words)
- Phrases
- Boolean operators
country
coun
country
namecoun
linkhttp)
country world4
countries of the world
AND, "", OR, NOT, "-"
http//lucene.apache.org/java/docs/queryparsersynt
ax.html
19More text search capabilities
- Range search
- Boosting a term
- Grouping
- Field grouping
- Escape special characters
20Advanced search interface can automatically
generate syntax
Field/Column
Operator
Value
Conjunctions
21show Text Queries here
- show Text Query examples here
- Queries and results
22Distinguish text vs rdf queries
- What text queries do
- What rdf queries do
- What are the major differences?
23Reasoners use OWL ontologies to augment RDF
graph queries
RDF Query (rdql/ndql)
Reasoner Jena or Racer (memory)
result set includes subclasses, inverses, etc.
OWL 11179 Ontology
OWL built-in rules
11179 metadata (xml/rdf files)
24Uses of RDF queries
- Expand queries to other classes in a hierarchy
- E.g., all data elements that have to do with
infectious diseases - E.g. what attributes are there for images of
carcinomas vs basil cell - Go from concepts back to des even if only have
concepts in des (do index)
25RDF Queries are very different from text search
or SQL!
QUERY SELECT ?t WHERE (lthttp//erdos.lbl.gov/xmdr/
data/VDALL.1.15147.1.xmlgt rdftype ?t) USING
mdr FOR lthttp//hpcrd.lbl.gov/SDM/XMDR/ont/iso1117
9-3v2.owlgt RESULT ?t lthttp//hpcrd.lbl.gov/SDM/X
MDR/ont/iso11179-3v2.owlEnumeratedValueDomaingt lth
ttp//hpcrd.lbl.gov/SDM/XMDR/ont/iso11179-3v2.owl
ValueDomaingt lthttp//hpcrd.lbl.gov/SDM/XMDR/ont/is
o11179-3v2.owlAdministeredItemgt ltowlThinggt ltrdfs
Resourcegt search includes sub-classes and
inverses
26Registering/Loading Metadata
- Browse and upload file(s)
- Form interface
27Lessons Learned to Date
- XML-RDF files can serve dual purpose (may be
first time this has been done?) - Independent modules facilitate testing many
possible new toolsBUT - take more time to implement maintain
- State of art for open-source OWL reasoners not
very advanced (none yet for OWL-DL)
28Unresolved Issues
- Performance using files for objects
- Relationship representation
- XML objects display browsing
- User-friendly interface for RDF queries
- external data sources, ontologies, terminologies
(maybe via indexing?) - Harmonization with MMR
- Others??
29Next Steps
- Load data from EDR other sources
- Implement advanced text search UI
- Use style sheets to display xml as html
- Add validation and mapping modules
- Connect terminology/ontology to items