Interoperability in the Cultural Heritage Domain - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Interoperability in the Cultural Heritage Domain

Description:

17: 3421.8 (280 714 243) Diabetes mellitus - suikerziekte ... Some books come with indexing done by other libraries (openbare bibliotheken, Biblion) ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 48
Provided by: csVu
Category:

less

Transcript and Presenter's Notes

Title: Interoperability in the Cultural Heritage Domain


1
Interoperability in the Cultural Heritage Domain
  • Lourens van der Meij
  • VU Amsterdam KB
  • (part of sheets by A.Isaac)
  • October 3rd , 2008

2
Background
  • CATCH (NWO)
  • Continuous Access To Cultural Heritage
  • Computer science research projects
  • Applied to Cultural Heritage (Libraries, Musea)
  • STITCH
  • SemanTic Interoperability To access Cultural
    Heritage
  • Interoperability
  • Exchanging (standardization)
  • Integrating (translating, linking)
  • metadata

3
Intention
  • Show through example applications that
  • Integration of data, collections, and services
  • Interoperability
  • Data standardized such that it can be used across
    different applications
  • Functionality reusable via services.
  • Creating mappings, semantic links between data
    from different sources
  • is important in the Cultural Heritage Domain

4
First
  • Illustrate Integrated access to collections in
    the CH domain by looking at use case.
  • Introduction of the use case
  • About vocabulaires
  • Introduce the collections that will be integrated
  • Faceted browsing
  • What we want -gt
  • Demo
  • Requirements, details

5
(Integrated) Access to collections
  • Collections (records) of books, pieces of art,
  • Electronic access, web portal.
  • STITCH focuses on semantics structured access
    using the available knowledge sources, not full
    text search
  • Records meta data, information about the object
  • Author
  • Date
  • Subject
  • CH institutes often maintain knowledge
    structures(KOS), vocabularies, to facilitate
    storage and access and maintenance.
  • Subject meta data, access through KOS focus of
    STITCH.

6
Vocabularies (Knowledge Structures, KOS)
  • Thesauri, classification systems, structuring
    collections, describing content, form, aspects of
    collection elements.
  • Many vocabularies, within the KB STITCH is
    cooperation between VU Amsterdam (KRR group),
    National Library(KB) and MPI Nijmegen. In the KB
    in the order of 10 vocabularies are maintained
    internally, and 20 or more external vocabularies
    play a role. Why?
  • History
  • Specialized collections, particular views on the
    collection and theories how access should be
    provided.
  • Examples of vocabularies in the demos.

7
Vocabularies
  • Many different (kinds) of Vocabularies
  • Many different representations, data formats,
    methods of access.
  • Integrated access requires
  • standardized representation of vocabularies and
    collections
  • standardized access gt services
  • Providing links between elements of vocabularies,
    alignment of vocabularies
  • Next example of integration

8
Illustration, use case STITCH
  • Integrated access to two collections
  • KB geillumineerde manuscripten
  • BnF Mandragore, manuscrits enluminés
  • STITCH focus
  • Integration
  • Alignment, techniques (and standards)
  • Interoperability
  • RDF, SKOS
  • Those aspects will be discussed after the first
    demo.

9
KB Illustrated Manuscripts
10
KB Illustrated Manuscripts Iconclass
11
Mandragore
12
Mandragore
13
Faceted browsing
  • Access the collection, using structure of the
    vocabularies
  • Different dimensions subject, author,..
  • Use the hierarchy of vocabularies if there is
    such to group together objects
  • Lions, Giraffes, Zebras -gt animals. Distinguish
    them as a group.

14
What we have
15
What we want
16
Demo
  • KB Illuminated Manuscripts
  • BNF Mandragore Manuscripts
  • http//galjas.cs.vu.nl33333/MANDRA-SV-ICE-mandraN
    ewNONE , amphibians
  • Wheat

17
Integrated Access
  • Integrated semantic access requires
  • standardized representation of vocabularies and
    collections
  • standardized access gt services
  • Providing links between elements of vocabularies.

18
Standardized representation
  • Use of semantic web techniques
  • Things are represented as resources,URIs,
    over any application and data set
  • Values as simple strings, numbers(Literals), URIs
  • Properties as typed, named links between URIs and
    URIs and Literals
  • Theory, reasoning methods.
  • interoperability, some standardization
  • Still need standardization on how to represent CH
    objects (xmlDublin core), vocabularies (SKOS),
    links between elements of vocabularies.

19
SKOS Example
skosConceptScheme
rdftype
skosConcept
http//www.iconclass.nl/
rdftype
skosinScheme
http//www.iconclass.nl/s_11F
skosprefLabel
the Virgin Mary_at_en
skosbroader
skosprefLabel
la Vierge Marie_at_fr
http//www.iconclass.nl/s_11
20
SKOS (Simple Knowledge Organization System)
  • SKOS offers building blocks to represent KOSs in
    RDF
  • Objects Concept and ConceptScheme
  • Lexical properties (multilingual)
  • prefLabel
  • altLabel
  • Semantic relations
  • broader, narrower
  • related
  • Notes
  • scopeNote
  • definition

21
Vocabulary alignment
  • Aim finding semantic correspondences between
    vocabulary elements
  • klassieke ruïnes landschap met ruïnes
  • maagd Maria Heilige Moeder
  • Doing it (semi-) automatically
  • Vocabularies are big (tens of thousands concepts)
  • They change

22
Automatic alignment techniques
  • Lexical
  • Labels of entities and textual definitions
  • Structural
  • Structure of the vocabularies
  • Background knowledge
  • Using a shared conceptual reference to find links
  • Extensional
  • Object information (e.g. book indexing)

23
Automatic alignment techniques
  • Lexical
  • Labels of entities and textual definitions
  • Structural
  • Structure of the vocabularies
  • Background knowledge
  • Using a shared conceptual reference to find links
  • Extensional
  • Object information (e.g. book indexing)

24
Extensional Statistical Alignment
  • Object information (e.g. book indexing)

Dutch Literature
Thesaurus 1
Thesaurus 2
Dutch
Collection of books
25
Results
  • 1 9132.9 (1704 3479 976) Schilderijen -
    schilderkunst
  • 2 8088.5 (1204 2330 767) Kwaliteitszorg -
    kwaliteitsmanagement
  • 3 6232.7 (820 1572 543) Personeelsmanagement -
    personeelsbeleid
  • 4 5392.1 (1399 3271 622) Beeldende kunsten -
    beeldende kunst
  • 5 5063.1 (4951 1152 613) Nederlands -
    Nederlandse taalkunde
  • 17 3421.8 (280 714 243) Diabetes mellitus -
    suikerziekte

26
Alignment no Trivial Solution
  • Current techniques are not reliable as unique
    source of knowledge
  • What is a good alignment?
  • Evaluation criteria?
  • gt What will it be used for?
  • Usage scenarios
  • Integrated Search
  • Reindexing
  • Thesaurus merging
  • Navigation gt faceted browsing

27
What next
  • Evaluation, lessons learned
  • What next -gt
  • Second use case reindexing
  • (Vocabulary service)
  • Conclusion

28
Why usage scenarios
  • Evaluation of alignments depends on its use.
  • Real world applications provide test of quality
    of alignments
  • Requirements on alignments depend on their use.
  • What kinds of links should be distinguished?
  • Optional demo evaluation
  • http//localhost33344/logineval
  • http//kits.cs.vu.nl33344/logineval
  • Next, reindexing, nearest to real world
    application.

29
Situation at Dutch libraries, National
Library(KB)
  • KB two large collections
  • DEPOT?Deposit collection all Dutch language
    publications)
  • Own Scientific collection
  • Subject indexing using two completely different
    indexing systems Brinkman, GOO
  • Common automation system for NL, Eu (OCLC-Pica)
  • Meta data of books, contains lots of fields
  • Een boek, publicatie door verschillende
    bibliotheken voorzien van meta data, gebruik
    makend van vele verschillende vocabulaires.

30
Reindexing
  • KB has about 20 people indexing books daily,
    about 20,000 books per year are being indexed.
  • Indexing even internally according to different
    vocabularies. Indexing adding keywords and
    classification information to books.
  • Some books come with indexing done by other
    libraries (openbare bibliotheken, Biblion).
  • If Biblion indices, or combinations could be
    translated to KB indices (Brinkman). Less work
    for KB.

31
WinIBW
  • OCLC (PICA) automatiseringssysteem voor
    bibliotheken in Nederland, ook gebruikt binnen
    Europa
  • Online Public Access Catalogue (OPAC)
  • WinIBW internet access to Pica system (local and
    central). Adding records, adding meta data,
    searching records.
  • Demo, closest to real world application.

32
Reindexing
  • Biblion -gt Brinkman
  • Fietstochten, Kapellen, Beesel,
    Heiligenbeelden,
  • -gt Brinkman?
  • Use alignment..
  • BiblFietstochten -gt Brinkman?
  • BiblKappellen -gt Brinkman?
  • DEMO
  • (Voorbeeld z sel 3-10-2008 gd?
  • 79)

33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
Result
42
Reindexing
  • Under evaluation
  • Improvement
  • Use other meta data
  • Adapt scenario (pass 95 confidence records)
  • Many other uses.

43
Schets vocabulaires van belang voor de KB
44
Integrated Access
  • Services through the internet
  • Protocols, SOAP, REST,..
  • Collection Access?
  • Vocabulary Access, Alignment access
  • http//eculture.cs.vu.nl38080/vocreptags
  • http//localhost8080/vocreptags

45
Lessons
  • Using semantic web techniques interoperability
    and integration of collections can be made
    easier.
  • Aligning vocabularies is of use in different
    situations. The alignment methods need to be
    fine-tuned to the application they are meant for.
  • Introducing new techniques, interaction between
    field CH and scientific institutes very valuable.
  • Standardization of access to collections and
    vocabularies should be dealt with (prototype has
    been developed).

46
Begrippen
  • An ontology in both computer science and
    information science is a formal representation of
    a set of concepts within a domain and the
    relationships between those concepts. It is used
    to reason about the properties of that domain,
    and may be used to define the domain.
  • Metadata (meta data, or sometimes
    metainformation) is "data about data", of any
    sort in any media. An item of metadata may
    describe an individual datum, or content item, or
    a collection of data including multiple content
    items and hierarchical levels, for example a
    database schema.

47
begrippen
  • A library classification is a system of coding
    and organizing library materials (books, serials,
    audiovisual materials, computer files, maps,
    manuscripts, realia) according to their subject
    and allocating a call number to that information
    resource. Similar to classification systems used
    in biology, bibliographic classification systems
    group entities that are similar together
    typically arranged in a hierarchical tree
    structure.
  • In information technology, a thesaurus represents
    a database or list of semantically orthogonal
    topical search keys. In the field of Artificial
    Intelligence, a thesaurus may sometimes be
    referred to as an ontology.
Write a Comment
User Comments (0)
About PowerShow.com