Controlled vocabularies as Linked Data on the Web - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Controlled vocabularies as Linked Data on the Web

Description:

Ed Summers, Office of Strategic Initiatives: leading developer and creator ... Control synonyms. Make documentation available for reuse. Test and validate terms ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 24
Provided by: defu
Category:

less

Transcript and Presenter's Notes

Title: Controlled vocabularies as Linked Data on the Web


1
Controlled vocabularies as Linked Data on the Web
  • Rebecca Guenther
  • Network Development MARC Standards Office,
  • Library of Congress
  • rgue_at_loc.gov

Linked Data program July 13, 2009
2
Outline of presentation
  • Types of controlled vocabularies
  • Encoding formats for controlled vocabularies
  • What is SKOS?
  • id.loc.gov vocabulary services
  • Example of concept scheme in SKOS ISO 639-2
  • Next steps

3
Credits
  • Ed Summers, Office of Strategic Initiatives
    leading developer and creator of LCSH in SKOS
  • Clay Redding, Network Development MARC
    Standards Office, for developing web service and
    work on other controlled vocabularies

4
Why establish controlled vocabularies?
  • Control values that occur in metadata
  • Reduce ambiguity
  • Control synonyms
  • Make documentation available for reuse
  • Test and validate terms
  • Establish formal relationships among values where
    appropriate

5
Types of Controlled Vocabularies used in metadata
standards
  • Lists of enumerated values
  • Code lists (e.g. language, country)
  • Taxonomies
  • Formal Thesauri
  • Locally controlled enumerated lists

6
Enumerated lists
  • Simple list of terms used in a pull-down menu or
    Web site pick list
  • Values enumerated in an XML schema
  • Little additional information or structure about
    each value
  • Examples
  • Enumerated value MD5 for METS CHECKSUMTYPE
  • Enumerated value born digital in MODS
    digitalOrigin
  • Code and value from a MARC 21 fixed field, e.g.
    code e in Leader/06 is cartographic material

7
Code lists
  • Some established as ISO standards and used
    worldwide in many communities for many purposes
  • The standard generally standardizes the code, not
    a particular name for it
  • Codes are used as identifiers
  • Some examples
  • ISO 639 family (language codes)
  • MARC relator codes
  • MARC country codes
  • ISO 3166 country codes

8
Thesauri
  • A thesaurus is a controlled vocabulary with
    multiple types of relationships
  • Example
  • Rice
  • UF Paddy
  • BT Cereals
  • BT Plant products
  • NT Brown rice
  • RT Rice straw

9
Standards maintained at LC contain controlled
vocabularies
  • LCSH/NAF
  • Thesaurus of Graphic Materials
  • ISO 639-2 (language codes)
  • MARC (including code lists)
  • MODS
  • METS
  • PREMIS
  • MIX (XML schema for NISO Z39.87 Technical
    metadata for digital still images)
  • and some others

10
Representing information about controlled
vocabulary values
  • Data elements in metadata formats, e.g. MARC
    Authority format
  • XML schemas (sometimes as enumeration values)
  • RDF/XML and RDFS (Resource Description Framework)
  • SKOS
  • MADS (Metadata Authority Description Schema)

11
About SKOS
  • Simple Knowledge Organization System
  • RDF application used to express knowledge
    organization systems such as classifications,
    thesauri, taxonomies, and the concepts within
  • Allows distributed, decentralized management of
    KOS through Linked Data-inspired application.
  • All concepts and schemes require a URI

12
The SKOS data model (Classes)
  • ConceptSchemes (e.g., published vocabularies,
    thesauri, code lists, etc.)
  • Concepts (individual entries or terms within the
    larger vocabulary)
  • Collections (logical groupings of Concepts)

12
13
SKOS concepts
  • Labeling properties prefLabel, altLabel,
    hiddenLabel, notation
  • Annotation properties note, historyNote,
    scopeNote, changeNote, editorialNote, example,
    definition
  • Associative properties broader, narrower,
    related, broadMatch, narrowMatch, closeMatch,
    exactMatch, minorMatch, majorMatch (match
    properties go to Concepts in external
    ConceptSchemes)

14
Advantages to using SKOS
  • SKOS has a defined element set which is
    particularly relevant for controlled vocabularies
  • Relationships between entries in a concept scheme
    can be expressed (broader, narrower, etc.)
  • Relationships between entries in different
    concept schemes can be expressed (exactMatch,
    related)
  • Having a dereferencable URI for concepts and
    their concept schemes enhances the ability to
    provide web services for consumers of these
    standards

15
Reasons for developing a web service for
vocabularies
  • Facilitate development and maintenance process
    for vocabularies
  • Make controlled lists openly available
  • Provide comprehensive information about
    controlled values
  • Experiment with semantic web technologies and
    linked data
  • Expose vocabularies to wider communities

16
Introducing id.loc.gov
  • Library of Congress Authorities Vocabularies
    service http//id.loc.gov
  • Allows both human-oriented and programmatic
    access to LC-promulgated authorities and
    vocabularies.
  • First offering is Library of Congress Subject
    Headings, but more to come e.g. Thesaurus of
    Graphic Materials, ISO 639-2, MARC code lists,
    etc.

17
Introducing id.loc.gov
  • Offers bulk data downloads in several RDF
    serializations (likely more to come)
  • Goals
  • Convey a clear policy regarding access, usage,
    distribution
  • Provide continuous updates to keep the data sets
    fresh
  • Only serves data values authority and vocabulary
    data, not bibliographic

18
Some features of id.loc.gov
  • Provides resolvability by assigning RESTful URIs.
    Each vocabulary and data value within it
    possesses a resolvable URI
  • Known-label searches use when you know the label
    but not the identifier (e.g. LCCN)
    http//id.loc.gov/authorities/label/orchids,
    http//id.loc.gov/authorities/label/orchidaceae
  • Visualizations
  • Default serialization is RDFa XHTML, which can
    be transformed by RDFa tools
  • Influenced by the Linked Data movement
    implements SKOS, REST, and HTTP content
    negotiation

19
Example ISO 639-2 vocabulary
  • One in the family of ISO 639 language coding
    standards
  • Has a close relationship with other language
    coding standards (ISO 639-1 and -3, MARC)
  • LC is maintenance agency
  • The standard is the CODE, not the language name
    multiple names are given

20
ISO 639-2 language code in SKOS
ltrdfDescription rdfabout "http//www.loc.gov/st
andards/registry/vocabulary/iso639-2/por"gt
ltrdftype rdfresource"http//www.
w3.org/2008/05/skos Concept"/gt
ltskosprefLabel xmllang"x-notation"gtporlt/skos
prefLabelgt ltskosaltLabel xmllang"en-Latn"gtP
ortugueselt/skosaltLabelgt ltskosaltLabel
xmllang"fr-Latn"gtportugaislt/skosaltLabelgt
ltskosnotation rdfdatatype"xsstring"gtporlt/sk
osnotationgt ltskosdefinition
xmllang"en-Latn"gtThis Concept has not yet been
defined.lt/skosdefinitiongt ltskosinScheme
rdfresource"http//www.loc.gov/standards/regist
ry/vocabulary/iso639-2"/gt ltvsterm_statusgtstable
lt/vsterm_statusgt ltskoshistoryNote
rdfdatatype"xsdateTime"gt2006-07-19T084154.000
- 0500lt/skoshistoryNotegt ltskosexactMatch
rdfresource "http//www.loc.gov/standards/regis
try/vocabulary/iso639-1/pt"/gt ltskosexactMatch
rdfresource "http//www.loc.gov/standards/regis
try/vocabulary/languages/por"/gt
ltskoschangeNote rdfdatatype"xsdateTime"gt2008
-07- 09T134905.321-0400lt/skoschangeNotegt lt/rdf
Descriptiongt
21
(No Transcript)
22
Next steps
  • Advocacy, user feedback etc. for LCSH
  • Implement update mechanism for processing changes
    processed from LC CDS
  • Expand system to allow more vocabularies and
    Linked Data relationships
  • Name authorities

23
Next steps
  • MADS OWL Schema to enable identification of
    facets within name and subject authorities
    Aeronautics--Soviet UnionHistory
  • Develop mechanisms to output all public
    documentation for controlled vocabularies
    already working for ISO 639-5 (master data is
    SKOS)
  • http//www.loc.gov/standards/iso639-5/
Write a Comment
User Comments (0)
About PowerShow.com