Title: Interoperability in the Cultural Heritage Domain
1Interoperability in the Cultural Heritage Domain
- Lourens van der Meij
- VU Amsterdam KB
- (part of sheets by A.Isaac)
- October 3rd , 2008
2Background
- CATCH (NWO)
- Continuous Access To Cultural Heritage
- Computer science research projects
- Applied to Cultural Heritage (Libraries, Musea)
- STITCH
- SemanTic Interoperability To access Cultural
Heritage - Interoperability
- Exchanging (standardization)
- Integrating (translating, linking)
- metadata
3Intention
- Show through example applications that
- Integration of data, collections, and services
- Interoperability
- Data standardized such that it can be used across
different applications - Functionality reusable via services.
- Creating mappings, semantic links between data
from different sources - is important in the Cultural Heritage Domain
4First
- Illustrate Integrated access to collections in
the CH domain by looking at use case. - Introduction of the use case
- About vocabulaires
- Introduce the collections that will be integrated
- Faceted browsing
- What we want -gt
- Demo
- Requirements, details
5(Integrated) Access to collections
- Collections (records) of books, pieces of art,
- Electronic access, web portal.
- STITCH focuses on semantics structured access
using the available knowledge sources, not full
text search - Records meta data, information about the object
- Author
- Date
- Subject
- CH institutes often maintain knowledge
structures(KOS), vocabularies, to facilitate
storage and access and maintenance. - Subject meta data, access through KOS focus of
STITCH.
6Vocabularies (Knowledge Structures, KOS)
- Thesauri, classification systems, structuring
collections, describing content, form, aspects of
collection elements. - Many vocabularies, within the KB STITCH is
cooperation between VU Amsterdam (KRR group),
National Library(KB) and MPI Nijmegen. In the KB
in the order of 10 vocabularies are maintained
internally, and 20 or more external vocabularies
play a role. Why? - History
- Specialized collections, particular views on the
collection and theories how access should be
provided. - Examples of vocabularies in the demos.
7Vocabularies
- Many different (kinds) of Vocabularies
- Many different representations, data formats,
methods of access. - Integrated access requires
- standardized representation of vocabularies and
collections - standardized access gt services
- Providing links between elements of vocabularies,
alignment of vocabularies - Next example of integration
8Illustration, use case STITCH
- Integrated access to two collections
- KB geillumineerde manuscripten
- BnF Mandragore, manuscrits enluminés
- STITCH focus
- Integration
- Alignment, techniques (and standards)
- Interoperability
- RDF, SKOS
- Those aspects will be discussed after the first
demo.
9KB Illustrated Manuscripts
10KB Illustrated Manuscripts Iconclass
11Mandragore
12Mandragore
13Faceted browsing
- Access the collection, using structure of the
vocabularies - Different dimensions subject, author,..
- Use the hierarchy of vocabularies if there is
such to group together objects - Lions, Giraffes, Zebras -gt animals. Distinguish
them as a group.
14What we have
15What we want
16Demo
- KB Illuminated Manuscripts
- BNF Mandragore Manuscripts
- http//galjas.cs.vu.nl33333/MANDRA-SV-ICE-mandraN
ewNONE , amphibians - Wheat
17Integrated Access
- Integrated semantic access requires
- standardized representation of vocabularies and
collections - standardized access gt services
- Providing links between elements of vocabularies.
18Standardized representation
- Use of semantic web techniques
- Things are represented as resources,URIs,
over any application and data set - Values as simple strings, numbers(Literals), URIs
- Properties as typed, named links between URIs and
URIs and Literals - Theory, reasoning methods.
- interoperability, some standardization
- Still need standardization on how to represent CH
objects (xmlDublin core), vocabularies (SKOS),
links between elements of vocabularies.
19SKOS Example
skosConceptScheme
rdftype
skosConcept
http//www.iconclass.nl/
rdftype
skosinScheme
http//www.iconclass.nl/s_11F
skosprefLabel
the Virgin Mary_at_en
skosbroader
skosprefLabel
la Vierge Marie_at_fr
http//www.iconclass.nl/s_11
20SKOS (Simple Knowledge Organization System)
- SKOS offers building blocks to represent KOSs in
RDF - Objects Concept and ConceptScheme
- Lexical properties (multilingual)
- prefLabel
- altLabel
- Semantic relations
- broader, narrower
- related
- Notes
- scopeNote
- definition
21Vocabulary alignment
- Aim finding semantic correspondences between
vocabulary elements - klassieke ruïnes landschap met ruïnes
- maagd Maria Heilige Moeder
- Doing it (semi-) automatically
- Vocabularies are big (tens of thousands concepts)
- They change
22Automatic alignment techniques
- Lexical
- Labels of entities and textual definitions
- Structural
- Structure of the vocabularies
- Background knowledge
- Using a shared conceptual reference to find links
- Extensional
- Object information (e.g. book indexing)
23Automatic alignment techniques
- Lexical
- Labels of entities and textual definitions
- Structural
- Structure of the vocabularies
- Background knowledge
- Using a shared conceptual reference to find links
- Extensional
- Object information (e.g. book indexing)
24Extensional Statistical Alignment
- Object information (e.g. book indexing)
Dutch Literature
Thesaurus 1
Thesaurus 2
Dutch
Collection of books
25Results
- 1 9132.9 (1704 3479 976) Schilderijen -
schilderkunst - 2 8088.5 (1204 2330 767) Kwaliteitszorg -
kwaliteitsmanagement - 3 6232.7 (820 1572 543) Personeelsmanagement -
personeelsbeleid - 4 5392.1 (1399 3271 622) Beeldende kunsten -
beeldende kunst - 5 5063.1 (4951 1152 613) Nederlands -
Nederlandse taalkunde - 17 3421.8 (280 714 243) Diabetes mellitus -
suikerziekte
26Alignment no Trivial Solution
- Current techniques are not reliable as unique
source of knowledge - What is a good alignment?
- Evaluation criteria?
- gt What will it be used for?
- Usage scenarios
- Integrated Search
- Reindexing
- Thesaurus merging
- Navigation gt faceted browsing
27What next
- Evaluation, lessons learned
- What next -gt
- Second use case reindexing
- (Vocabulary service)
- Conclusion
28Why usage scenarios
- Evaluation of alignments depends on its use.
- Real world applications provide test of quality
of alignments - Requirements on alignments depend on their use.
- What kinds of links should be distinguished?
- Optional demo evaluation
- http//localhost33344/logineval
- http//kits.cs.vu.nl33344/logineval
- Next, reindexing, nearest to real world
application.
29Situation at Dutch libraries, National
Library(KB)
- KB two large collections
- DEPOT?Deposit collection all Dutch language
publications) - Own Scientific collection
- Subject indexing using two completely different
indexing systems Brinkman, GOO - Common automation system for NL, Eu (OCLC-Pica)
- Meta data of books, contains lots of fields
- Een boek, publicatie door verschillende
bibliotheken voorzien van meta data, gebruik
makend van vele verschillende vocabulaires.
30Reindexing
- KB has about 20 people indexing books daily,
about 20,000 books per year are being indexed. - Indexing even internally according to different
vocabularies. Indexing adding keywords and
classification information to books. - Some books come with indexing done by other
libraries (openbare bibliotheken, Biblion). - If Biblion indices, or combinations could be
translated to KB indices (Brinkman). Less work
for KB.
31WinIBW
- OCLC (PICA) automatiseringssysteem voor
bibliotheken in Nederland, ook gebruikt binnen
Europa - Online Public Access Catalogue (OPAC)
- WinIBW internet access to Pica system (local and
central). Adding records, adding meta data,
searching records. - Demo, closest to real world application.
32Reindexing
- Biblion -gt Brinkman
- Fietstochten, Kapellen, Beesel,
Heiligenbeelden, - -gt Brinkman?
- Use alignment..
- BiblFietstochten -gt Brinkman?
- BiblKappellen -gt Brinkman?
- DEMO
- (Voorbeeld z sel 3-10-2008 gd?
- 79)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41Result
42Reindexing
- Under evaluation
- Improvement
- Use other meta data
- Adapt scenario (pass 95 confidence records)
- Many other uses.
43Schets vocabulaires van belang voor de KB
44Integrated Access
- Services through the internet
- Protocols, SOAP, REST,..
- Collection Access?
- Vocabulary Access, Alignment access
- http//eculture.cs.vu.nl38080/vocreptags
- http//localhost8080/vocreptags
45Lessons
- Using semantic web techniques interoperability
and integration of collections can be made
easier. - Aligning vocabularies is of use in different
situations. The alignment methods need to be
fine-tuned to the application they are meant for. - Introducing new techniques, interaction between
field CH and scientific institutes very valuable. - Standardization of access to collections and
vocabularies should be dealt with (prototype has
been developed).
46Begrippen
- An ontology in both computer science and
information science is a formal representation of
a set of concepts within a domain and the
relationships between those concepts. It is used
to reason about the properties of that domain,
and may be used to define the domain. - Metadata (meta data, or sometimes
metainformation) is "data about data", of any
sort in any media. An item of metadata may
describe an individual datum, or content item, or
a collection of data including multiple content
items and hierarchical levels, for example a
database schema.
47begrippen
- A library classification is a system of coding
and organizing library materials (books, serials,
audiovisual materials, computer files, maps,
manuscripts, realia) according to their subject
and allocating a call number to that information
resource. Similar to classification systems used
in biology, bibliographic classification systems
group entities that are similar together
typically arranged in a hierarchical tree
structure. - In information technology, a thesaurus represents
a database or list of semantically orthogonal
topical search keys. In the field of Artificial
Intelligence, a thesaurus may sometimes be
referred to as an ontology.