Metadata for Networked Resources - PowerPoint PPT Presentation

1 / 106
About This Presentation
Title:

Metadata for Networked Resources

Description:

Disseminating short-lived or dynamic resources is greatly simplified ... What about versions, editions, back issues? Archiving is presently unsolved ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 107
Provided by: carl275
Category:

less

Transcript and Presenter's Notes

Title: Metadata for Networked Resources


1
Metadata for Networked Resources
  • Welcome

2
DESIRE ProjectSeptember 20, 1999 Bristol,
UK Metadata for Networked Resources
Introduction to Metadata
3
What is the Problem?
  • 3.6 million Web sites
  • Five hundred million or more addressable pages on
    the Web
  • High consumer expectations conflicting with
    primitive tools and mechanisms
  • Uncertain quality, integrity, trust

4
A Critical Perspective on the Information
Landscape of the Web
  • The Web changes relationships among
  • authors
  • publishers
  • information intermediaries and distributors
  • users
  • Disseminating short-lived or dynamic resources is
    greatly simplified
  • Providing access to resources is more difficult

5
The Web as an Information System
  • Lower barriers to publication
  • rapid dissemination of information and ideas
  • less advantage to size or centralization
  • greatly expanded access
  • Manageability is reduced
  • resource discovery is chaotic
  • organization is haphazard
  • preservation is almost non-existent

6
The Web is missing much of what we associate
with a library...
  • Search systems are motivated by advertising
  • Index coverage is unpredictable and limited (1/3)
  • Too much recall, too little precision
  • Index spam abounds
  • Resources (and their names) are volatile
  • What about versions, editions, back issues?
  • Archiving is presently unsolved
  • Authority and quality of service are spotty
  • Managing Access Rights is hard

7
Metadata Enabling higher quality information
services on the Web
  • Structured data about data
  • helps to impose order on chaos
  • enables automated discovery/manipulation
  • Many dimensions
  • richness
  • functionality
  • discipline
  • language/culture

8
Metadata takes many forms
resource
document
rights
discovery
administration
management
content
security and
archival
rating
authentication
status
products and
database
process control
services
schemas
or description
9
Metadata Challenges
  • Accommodate multiple varieties of metadata
  • Tension functionality and simplicity
  • Tension extensibility and interoperability
  • Human and machine creation and use
  • Community-specific functionality, creation,
    administration, access

10
Warwick Framework Containing Chaos
  • Conceptual Architecture for metadata from the
    Warwick Metadata Workshop (DC-2)
  • Conceptual architecture to support the
    specification, collection, encoding, and exchange
    of modular metadata
  • Provide context for metadata efforts (including
    Dublin Core)
  • avoids the black-hole of comprehensive element
    sets
  • encourages decentralized, community-based
    solutions

11
Modularization allows Distributed Management
  • Communities of expertise (not software vendors)
    are responsible for
  • Semantics
  • Registration
  • Administration
  • Access management
  • Authority of data
  • Sharing and Distribution

12
Modularization and Distribution Present New
Challenges
  • Preservation
  • Reliability
  • Integrity
  • Semantic Interaction

13
Resource Description Communities
A resource description community is characterized
by common semantic, structural, and syntactic
conventions for exchange of resource description
information
Libraries
MARC
AACR2
14
The Internet Commons embraces many Resource
Description Communities
15
Interoperabilityrequires conventions about
  • Semantics
  • The meaning of the elements
  • Structure
  • human-readable
  • machine-parseable
  • Syntax
  • grammars to convey semantics and structure

16
Dublin Core Metadata
  • How to improve resource discovery on the Web?
  • simple resource description semantics
  • Build an interdisciplinary consensus about a core
    element set for resource discovery
  • simple and intuitive
  • cross-disciplinary
  • international
  • flexible

17
Dublin Core Workshop Series and Related Events
  • Chicago WWW Conference Oct, 1994
  • OCLC/NCSA Metadata Workshop Mar, 1995
  • OCLC/UKOLN Warwick Workshop Apr, 1996
  • W3C Indexing and Searching Workshop May, 1996
  • CNI/OCLC Image Metadata Workshop Sep, 1996
  • DC-4, Canberra, Australia Mar, 1997
  • DC-5, Helsinki, Finland Oct, 1997
  • DC-6, Washington, D.C. Nov, 1998
  • DC-7, Frankfurt, Germany Oct, 1999

18
The Dublin CoreMetadata Element Set
  • Title
  • Author/Creator
  • Subject /Keywords
  • Description
  • Publisher
  • Other Contributor
  • Date
  • Resource Type
  • Format
  • Resource Identifier
  • Source
  • Language
  • Relation
  • Coverage
  • Rights Management

19
Central Characteristics of the
Dublin Core Metadata Element Set
  • Descriptive metadata for resource discovery
  • All elements optional
  • constraints are established at application level,
    not by the semantic specification
  • All elements repeatable
  • Extensible (a starting place for richer
    description)
  • Interdisciplinary (semantic interoperability)
  • International (21 languages, 4 continents)

20
A Maintenance Agency for the Dublin Core?
  • International consensus is the primary asset
  • Dublin Core Directorate
  • DC Policy Advisory Committee
  • Provide avenue of communication among major
    international stakeholders
  • DC Technical Advisory Committee
  • Working Group leaders

21
A Maintenance Agency forthe Dublin Core
Initiative
Dublin Core Web Site
Dublin Core Directorate
DC Policy Advisory Committee
Stakeholder Communities
DC-General Dublin Core Mail Server
22
Dublin Core Working Groups(http//www.mailbase.ac
.uk)
  • DC-General
  • DC-Data Model
  • DC-Internationalization
  • DC-Implementors
  • DC-Guides
  • DC-Standards
  • DC-Citation
  • DC-One2one
  • DC-Agents
  • DC-Coverage
  • DC-Date
  • DC-Format
  • DC-Relation
  • DC-SubDesc
  • DC-Title
  • DC-Type

23
Steps Toward Standardization
  • IETF informational RFCs of Dublin Core semantics
    and syntax
  • RFC 2413
  • IETF Informational Draft on DC in HTML
  • NISO standardization initiated
  • CEN standardization initiated
  • ISO standardization under discussion
  • The challenge is to establish a common path for
    disparate standards processes

24
What will be standardized?
  • Dublin Core Element Set 1.1. will be submitted
    for NISO and CEN standardization at the same time
  • Element Working Groups have reviewed and finalize
    element definitions
  • Format of element definitions brought into line
    with the ISO 11179 standard for expression of
    element semantics

25
Relationships to other Metadata Initiatives
  • MARC/AACR2
  • Z39.50
  • INDECS Project
  • IMS

26
MARC/AARC2
  • DC is strongly influenced by MARC/AACR2
  • Important differences in structure, detail, and
    focus
  • Substantial effort invested in cross walks
  • LC MARC Standards Office
  • Nordic Metadata Project
  • Australian Metadata Initiatives at NLA, DSTC
  • CORC project at OCLC

27
Z39.50 and Dublin Core
  • Dublin Core is the proposed Cross Domain
    attribute set
  • Creator/Contributor/Publisher are collapsed into
    a single abstract attribute (Name)
  • http//www.oclc.org/levan/docs/crossdomainattribu
    teset.html

28
INDECS project
  • INDECS Interoperability of Data in E-Commerce
    Systems.
  • Rights Management Metadata Identification of
    common functional requirements for managing IP on
    the Internet
  • Substantial overlap with Resource Discovery
  • Data model based on IFLA FRBR model
  • http//www.indecs.org/

29
IMS
  • Instructional Management System
  • Extended semantics to support description of
    educational materials
  • Core semantics based on Dublin Core

30
DC Implementation Projects
  • 100 major implementation projects in 20
    countries
  • Government Information
  • Australian Government Locator Service
  • Danish Online Government Information
  • Finnish Online Government Information

31
Projects (continued)
  • Science and Mathematics
  • Environment Australia
  • Australian Geodynamics Cooperative Research
    Centre (AGCRC)
  • EULER (European Libraries and Electronic
    Resources in Mathematical Sciences)
  • Swedish EnviroNet
  • German Mathematical Society Preprint Project

32
Projects (continued)
  • Education
  • EDNA (Educational Network of Australia)
  • GEM (Gateway to Educational Materials)
  • German Education Resources Server
  • IMS (Instructional Management System)
  • DC discipline-specific elements

33
Projects (continued)
  • Humanities
  • AHDS Arts and Humanities Data Service
  • CIMI Metadata Testbed Project
  • SCRAN (Scottish Cultural Resources Access
    Network)

34
Projects (continued)
  • Libraries and Digital Libraries
  • CORC Project (OCLC)
  • Pandora Project (NLA)
  • The Nordic Metadata Project
  • BIBLINK (Europe)
  • ELISE (Electronic Image Service for Europe)
  • Florida International University Digital Library
  • University of Washington Digital Library
  • State Library of Queensland

35
Commerce
  • Intranets
  • eg. Ford, Nokia, Boeing
  • Netscapes Open Directory Project

36
Why Consider the Dublin Core?
  • You have a rich standard, need a simple one
    (probably for cost reasons)
  • You want to reveal your data to other communities
    (via the Web) using commonly understood semantics
  • You want to provide unified access to databases
    with different underlying schemas
  • You need core description semantics and dont
    feel compelled to invent them anew

37
Additional Information on Dublin Core
  • Dublin Core Metadata Initiative Homepage
  • http//purl.org/dc
  • DLib Magazine (all workshop reports)
  • http//www.dlib.org

38
DESIRE Metadata Tools
DESIRE ProjectSeptember 20, 1999 Bristol,
UK Metadata for Networked Resources
39
A little light relief ?
  • Dublin Core in HTML
  • Some DESIRE metadata tools...
  • Dublin Core editors
  • DC-dot
  • Nordic DC generator
  • ROADS - metadata management
  • Web robots
  • Combine
  • Harvest

40
DC in HTML
  • lthtmlgtltheadgt
  • lttitlegtUKOLN Home Pagelt/titlegt
  • ltmeta name"DC.Title content"UKOLN UK Office
    for Library and Information Networking"gt
  • ltmeta name"DC.Subject" content"national centre,
    network information support, library community,
    awareness, research, information services, public
    library networking, bibliographic management,
    distributed library systems, metadata, resource
    discovery, conferences, lectures, workshops"gt
  • ltmeta name"DC.Description" content"UKOLN is a
    national centre for support in network
    information management in the library and
    information communities. It provides awareness,
    research and information services"gt
  • ltmeta name"DC.Creator" contentUKOLN
    Information Services Group"gt
  • lt/headgt
  • ...

41
Editors - DC-dot
  • Web-based DC creator and editor
  • Automatic generation of some metadata
  • Extraction of metadata from MS-Office, PDF and
    HTML files
  • Context sensitive help
  • Simple
  • Generates HTML ltmetagt tags and a variety of other
    formats
  • Can be integrated with browser
  • Validates existing HTML metadata

42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
Editors - Nordic Template
  • Web-based DC creator and editor
  • More complex than DC-dot, eg
  • support for schemes
  • simple support for repeated elements

46
(No Transcript)
47
(No Transcript)
48
Metadata management
  • Potential problems
  • embedded metadata fairly static
  • hard to make bulk changes
  • hard to migrate to new metadata formats
  • so
  • store metadata separately
  • embed into Web-pages on-the-fly

49
DC-ROADS - Summary
  • Embed on-the-fly
  • Apache SSI script
  • Store metadata in ROADS database
  • ROADS Web-based tool to edit/update metadata
    records
  • Associate metadata with resource by assigning a
    unique ID (will be able to use the resource URL
    in the future)

50
DC-ROADS - authoring
Apache syntax for calling server-side
script lt!--exec cmd"roads2metadc.pl" --gt
HTML editor
lthtmlgt ltheadgt lttitlegtlt/titlegt lt!--exec
cmd"roads2metadc.pl" --gt lt/headgt ...
ROADS database
ROADS editor
51
DC-ROADS - embedding
Web client or robot
lthtmlgt ltheadgt lttitlegtlt/titlegt lt!--exec
cmd"roads2metadc.pl" --gt lt/headgt ...
2
1
UKOLN Web server
6
ROADS database
3
4
SSI script
5
52
Metadata embedded in page
Edit metadata button for authors
Link to metadata display for end users
53
DC Usage - Web Robots
  • Combine
  • Support for embedded Dublin Core
  • Used for the Nordic Web Index (NWI)
  • Index of all pages in the Nordic countries
  • Promoted in combination with the Nordic template
  • Searchable using Z39.50
  • Harvest
  • Support for embedded Dublin Core
  • Used as basis for AC/DC
  • UK academic Web index

54
References
  • DC-dot
  • http//www.ukoln.ac.uk/metadata/dcdot/
  • Nordic Metadata Template
  • http//www.lub.lu.se/cgi-bin/nmdc.pl
  • DC-ROADS ROADS for metadata management
  • http//www.ukoln.ac.uk/metadata/roads/metadata-mg
    mt/
  • Combine
  • http//www.lub.lu.se/combine/
  • Nordic Web Index
  • http//nwi.ub2.lu.se/?langen
  • Harvest
  • http//www.tardis.ed.ac.uk/harvest/

55
DESIRE ProjectSeptember 20, 1999 Bristol,
UK Metadata for Networked Resources
Qualifying and Extending Metadata Semantics
56
Tom Bakers Theory of Pidgin Metadata
  • Pidgin languages result from the need for
    communication among groups who do not share a
    common language
  • simplification and hybridization
  • Creolization is the process of complexification
    of a pidgin language
  • Addition of semantic and syntactic nuance that
    supports the inherent complexity of natural
    language

57

Pidginization and Creolization
Museums

Metadata Creoles
Metadata Elements
Pidgin Metadata
Interoperability
58
Extensibility(refined semantics)
  • Ukrainian Doll model
  • improve description precision with sub-structure
    (sub-elements and schemes)
  • should degrade gracefully to preserve
    interoperability

59
Modular Extensibility
  • Extensibility via modularity
  • additional elements to support local or
    discipline-specific requirements
  • complementary packages of metadata

60
What might Extensibility mean for Specific
Communities?
  • Basic elements can be thought of as a semantic
    framework
  • High-level descriptors to describe general
    characteristics of a resource or collection
  • Use of domain-specific schemes for further
    precision
  • Schemes to refine the semantics of Subject,
    Description, Format, Relation, Coverage.
  • Controlled vocabularies, thesauri, namespaces,
    and encoding rules

61
The Purpose of Qualifiers
  • Increase semantic specificity
  • Specification of encoding rules
  • Definition of substructure
  • Authority Control

62
Increase Semantic Specificity
  • DC-4 Qualifiers should refine, not extend, the
    semantics of elements
  • Additional detail is often required to support
    the needs of local or domain-specific
    applications
  • Controlled vocabularies provide for more
    effective classification and retrieval (LCSH,
    Dewey, MeSH, AAT.)
  • Enumerated lists of possible values
  • Formats and Types
  • Language Codes (ISO xxxx)

63
Specification of Encoding Rules
  • 2-4-1998
  • The fourth day of February?
  • The second day of April?
  • Schemes that define the parsing rules for a
    value
  • ISO 8601
  • 1998-04-02

64
Define the Substructure of a Compound Value
  • Established schemas are essential for
    interpreting certain data
  • An agent may include additional structured
    information along with names
  • vCard
  • LCNA

65
Authority Control
  • Authority Records assure unique identity of
    people, places, corporate entities
  • Libraries have strong commitment to authority
    control
  • Other communities as well
  • Interested Party names (music industry)
  • Important for contractual purposes

66
Tradeoffs of Qualification
  • On the one hand Keep it Simple
  • no sub-elements or substructure
  • interoperability is highest priority
  • simplicity promotes deployment
  • On the other Make it flexible
  • complexity of description is unavoidable
  • schemas will help bridge the complexity
  • query precision is more important than simplicity
  • All applications probably require some level of
    Qualification

67
DESIRE ConsortiumSeptember 20, 1999 Bristol,
UK Metadata for Networked Resources
An Introduction to RDF
68
To recap...
  • People and economies depend on information
  • Exchange of information has been hindered by
    incompatible hardware, software, protocols
  • eg library community MARC, AACR2, z39.50
  • Less of a problem before... big problem now. Web
    forces to recognize this.

69
How do we solve this...
  • Design enabling technologies / standards
  • W3C World Wide Web Consortium
  • Dublin Core Metadata Initiative
  • Recognize problem context
  • Multiple stakeholders and requirements
  • International community
  • Requirements will evolve
  • Make assumption
  • Common architectural components (syntax,
    structure, semantics, protocols, etc) help

70
Common Syntax XML
  • XML - eXtensible Markup Language
  • Markup Language - a mechanism to define tags and
    the structural relationship between them in
    documents
  • eXtensible - semantics not defined, no
    pre-coordinated set of tags.

HTML ltmeta name author content Smith,
Johngt XML ltauthorgtltfngtJohnlt/fngtltlngtSmithlt/ln
gtlt/authorgt
71
XML Continued...
  • W3C Recommendation, Oct 1998
  • Broad industry endorsement
  • Subset of SGML ISO 8879
  • lighter, stronger, able to leap tall building,
    etc.
  • Validation
  • Still able to benefit from SGML DTDs
  • XML Schema (in progress)
  • Notion of Well-formedness
  • ltAgtltBgtlt/Bgtlt/Agt

72
Data Transmission Methods
73
XML for Describing Data
  • Often times common syntax isnt enough
  • Common structural representation for expressing
    statements is required
  • The author of a document is Eric

ltauthorgt lturlgt http//doc_url lt/urlgt
ltnamegt Eric lt/namegt lt/authorgt
ltdocumentgt ltauthorgt ltnamegt Eric
lt/namegt lt/authorgt lturlgt http//doc_url
lt/urlgt lt/documentgt
ltdocument href http//doc_url author
Eric /gt
74
Common Structure RDF
  • RDF Resource Description Framework
  • W3C Recommendation, Feb 1999
  • Data Model
  • Designed to impose structural constraint on
    syntax to support consistent encoding, exchange
    and processing of metadata
  • Schema
  • Enables resource description communities to
    define (and share) vocabularies (museum, library,
    e-commerce)

75
RDF Continued...
  • RDF statement

Eric
URIR
Eric
author
ltrdfDescription rdfabout http//uri_of_docu
ment bibauthor Eric /gt
76
RDF Example 1
URIR
title
RDF Presentation
creator
Eric Miller
Eric Miller
lt?XML version1.0?gt ltrdfRDF xmlnsrdf
http//www.w3.org/TR/REC-rdf-syntax
xmlnsdc http//purl.org/dc/elements/1.0gt
ltrdfDescription rdfabout URIRgt
ltdctitlegt RDF Presentation lt/dctitlegt
ltdccreatorgt Eric Miller lt/dccreatorgt
lt/rdfDescriptiongt lt/rdfRDFgt
77
RDF Example 3
URIR
title
RDF Presentation
creator
URIERIC
Eric Miller
Eric Miller
78
RDF Example 2
URIR
title
RDF Presentation
creator
URIERIC
lt?XML version1.0?gt ltrdfRDF xmlnsrdf
http//www.w3.org/TR/REC-rdf-syntax
xmlnsdc http//purl.org/dc/elements/1.0gt
ltrdfDescription rdfabout URIRgt
ltdctitlegt RDF Presentation lt/dctitlegt
ltdccreator rdfresource URIERIC/gt
lt/rdfDescriptiongt lt/rdfRDFgt
79
Description Vocabularies
URIR
msKgrip
John Smith
80
Common Semantics
  • Enabling technologies
  • XML provides flexible syntax, RDF provides common
    data model for representation and declaration
    mechanisms for semantics
  • Resource Description communities define
    vocabularies that satisfy community requirement
  • share and reuse vocabularies
  • Dublin Core Metadata Initiative is a prime example

81
Dublin Core Metadata Initiative
  • Simple element set designed for resource
    description
  • International, inter-discipline, community
    consensus
  • Semantic interface among resource description
    communities

82
More Info
  • W3C World Wide Web Consortium
  • http//www.w3.org/
  • XML home page
  • http//www.w3.org/XML/
  • RDF home page
  • http//www.w3.org/RDF/
  • Dublin Core Metadata Initiative
  • http//purl.org/dc/

83
DESIRE ConsortiumSeptember 20, 1999 Bristol
UKMetadata for Networked Resources
  • Metadata building blocks

84
Building block for systems and services
  • Revealing information about a resource
  • Managing resources
  • Negotiating transactions
  • Providing discovery, locate, delivery services

85
Metadata is used for..
  • Supporting operations carried out on information
    objects
  • Enabling software and humans to initiate actions
    on resources

86
What does metadata describe?
  • papers, articles
  • information pages
  • images
  • sound
  • collections
  • user profiles
  • ...Digital and physical
  • manifestations

87
Diversity of services
  • Resource discovery services
  • Web site management
  • Content rating
  • Digital preservation
  • Rights management

88
Selective services
  • Added value descriptions
  • subject headings
  • subject classifications
  • summary descriptions
  • authority control
  • Selection
  • target audience
  • quality of resource
  • by subject area
  • by region

89
Benefits of shared approaches
  • Compatible technical solutions
  • Shared semantics (common metadata sets)
  • Shared syntax (HTML, RDF/XML )
  • Consistency of content (cataloguing rules)

90
Information gateways
  • Support activities
  • ROADS, DESIRE, IMesh
  • Range of associated information gateways
  • DutchESS
  • Finnish Virtual Library project
  • EELS
  • NOVAGate
  • SOSIG, EEVL, OMNI, BizEd ...
  • Internet Scout . etc

91
Metadada creation
  • Who creates metadata?
  • Authors
  • Experts
  • Metadata creation agencies
  • Where?
  • Embedded in a resource
  • Linked to resource
  • Local database
  • Third party database

92
Collaborative metadata creation
  • Information providers
  • Publishers
  • Libraries
  • Service providers
  • information gateways (RNC, Nordic Web Index,
    AGLS)
  • bibliographic utilities (OCLC, BookData .)

93
Description of BIBLINK Workspace
Publishers
BIBLINK Workspace A shared facility for storing
and manipulating BIBLINK workspace records
Third parties e.g. Identification agencies -
ISBN, ISSN, etc.
BIBLINK Workspace Administrator
National Bibliographic Agencies
15
94
Future options?
  • More complex creation models
  • Re-use of metadata
  • Enhancement of harvested metadata
  • Incremental additions to metadata
  • Targeted services
  • Facilitating personalised views
  • Providing structured environments

95
DESIRE ConsortiumSeptember 20, 1999 Bristol,
UK Metadata for Networked Resources
Metadata into the Mainstream
96
Semantic Web
If HTML and the Web made all the online
documents look like one huge book, RDF, schema
and inference languages will make all the data
in the world look like one huge database. Tim
Berners-Lee, 1999
97
(No Transcript)
98
Data Transmission Methods
99
RDF as Building Blocks
100
Trusted Third Party metadata e.g.
Resource Referenced by Service in RDF
101
Trusted Third Party metadata e.g.
Resource Referenced by Service in RDF
102
Trusted Third Party metadata e.g.
Resource Referenced by Service in RDF
Embedded and Associated metadata e.g.
Site-Maps in RDF
103
Trusted Third Party metadata e.g.
Resource Referenced by Service in RDF
Embedded and Associated metadata e.g.
Site-Maps in RDF
104
Sitemaps / Channels described in RDF/DC
Search Results described in RDF/DC
105
Open Source / Open Standards
The consensus-building role played by the Dublin
Core within the metadata community is similar to
that played by the Mozilla Organization and
related initiatives in the 'open source' software
world. It should be possible to leverage the work
of the DC community to provide non-proprietary,
multilingual vocabularies for Mozilla-based
applications http//www.mozilla.org/rdf/doc/vo
cabs.html
106
Panel Session
  • Nicky Ferguson, ILRT (Chair)
  • Eric Miller, OCLC
  • Carl Lagoze, Cornell
  • Rachel Heery, UKOLN
  • Dan Brickley, ILRT
  • Andy Powell, UKOLN
  • Debra Hiom, SOSIG, ILRT
Write a Comment
User Comments (0)
About PowerShow.com