Title: Accessing Cultural Heritage Collections using Semantic Web Techniques
1Accessing Cultural Heritage Collections using
Semantic Web Techniques
- Antoine ISAAC
- STITCH Project
- SIKS Semantic Web Seminar, Utrecht
- April 11th, 2007
2Background
- CATCH_at_ NWO
- Continuous Access To Cultural Heritage
- 10 computer science projects applied to the CH
field - Personalization of access, image/text/audio
analysis - Integration of projects in CH institutes
(museums, archives) - STITCH
- SemanTic Interoperability To access Cultural
Heritage - Exchanging and integrating metadata
- Vrije Universiteit, Koninklijke Bibliotheek Max
Planck Institute
3Agenda
- Cultural Heritage and Semantic Web
- Two important issues
- Representing Cultural Heritage vocabularies on
the Semantic Web - Vocabulary alignment
- Demo
4Some Needs for CH Collections
- Representation of objects and knowledge about
them - Pointing at collection artifacts books
- Describing them creating metadata
- Specific metadata structures (metadata schemes)
- Controlled expert vocabularies (e.g. thesauri)
- Accessing artifacts using metadata
- E.g. search using information contained in
thesauri
5KB Illustrated Manuscripts Iconclass vocabulary
6KB Illustrated Manuscripts
7Some Needs for CH Collections (2)
- Communicating data to the outside world
- Web portals
- Integrating different collections
- Virtual collections
- The European Library, http//www.theeuropeanlibrar
y.org - Geheugen van Nederland, http//www.geheugenvannede
rland.nl
8(Biased) Semantic Web
- Pointing at resources documents, knowledge
objects - Enabling structured assertions
- Metadata about entities present on the Web
- Using vocabularies with defined semantics
- Ontologies formal definitions of shared
conceptual vocabularies - RDF Schema /OWL
ltowlClass rdfabout"Bird"gt
ltowldisjointWithgt ltowlClass
rdfabout"Mammals"/gt lt/owldisjointWithgt
ltrdfssubClassOfgt ltowlClass
rdfID"Animals"/gt lt/rdfssubClassOfgt
lt/owlClassgt ltBird rdfabout"tweety"/gt
9(Biased) Semantic Web
- Web-based resources allow division/sharing of
- document
- vocabulary
- metadata
http//www.geo.org/voc/
(doc3, hasSubject, Amsterdam)
http//www.kb.nl/eDepot
http//www.ned.nl/doc3
different owners locations
10Cultural Heritage Collections and Semantic Web
- Categorizing/classifying things
- Structuring descriptions
- Web-based approach
- Semantic Web techniques are good candidates for
representing and exploiting Cultural Heritage
metadata
11Important line of research
- Long-term projects
- MuseumFinland, http//www.museosuomi.fi/
- eCulture, http//e-culture.multimedian.nl/
- Common portals to (many) collections
- Exploiting the data found in the original systems
- Metadata content place, date, creator
- Semantics of vocabularies used to create this
information - E.g. hierarchical information
- A Picture featuring a crow features a bird
12(No Transcript)
13Agenda
- Cultural Heritage and Semantic Web
- Two important issues
- Representing Cultural Heritage vocabularies on
the Semantic Web - Vocabulary alignment
- Demo
14Representing CH vocabularies on the Semantic Web
- Similarities
- Both ontologies and thesauri bring concept
hierarchies - giving the intended meaning of a vocabulary
through links between its items - concept/term ? owlClass
- broader ? rdfssubClassOf
- scope notes ? rdfscomment
15Representing CH vocabularies on the Semantic Web
- Problems
- Thesauri designed for humans, no formal
interpretation - How to interpret a thesaurus in RDFS/OWL
- If (Story of) Hercules is a class, what are its
instances? - Is Hercules shooting Nessus a subclass of
Love-affairs of Hercules? - Thesaurus hierarchy subsumption, mereological
relation,
16Representing CH vocabularies on the Semantic Web
Different approaches
- Ontologising
- Cleaning thesaurus by distinguishing roles,
kinds, etc. - Cleaning the hierarchical links
- Representing knowledge found in sources as such
- Informal knowledge represented in RDF/OWL formal
framework
17SKOS
- Simple Knowledge Organization Systems
- (Future) W3C standard
- Model to represent controlled and structured
vocabularies on the Semantic Web - Compatible with community needs
- Core model for representing thesauri,
classification schemes, etc.
18SKOS
- Building blocks (ontology) to create XML/RDF data
about controlled vocabularies - Classes Concept and ConceptScheme
- Lexical properties
- prefLabel
- altLabel
- Semantic properties
- broader, narrower
- related
- Properties for notes and comments
- scopeNote
- definition
19SKOS Brinkman Trefwoorden (KB)
- 075607204 geneeskunde
- RT geneesmiddelen
- NT kindergeneeskunde
- 075607220 geneesmiddelen
- UF medicijnen
- 075611791 kindergeneeskunde
- BT geneeskunde
- noot kinderen ouder dan 12 vallen niet onder
- kindergeneeskunde
- medicijnen
- USE geneesmiddelen
20SKOS Brinkman Trefwoorden (KB)
skos http//www.w3.org/2004/02/skos/corebk
http//www.kb.nl/brinkman/
21SKOS Brinkman Trefwoorden (KB)
ltskosConcept rdfabout"http//www.kb.nl/brinkman
/bk075607204"gt ltskosprefLabelgtgeneeskundelt/skos
prefLabelgt ltskosrelated rdfresource"http//www
.kb.nl/brinkman/bk075607220"/gt lt/skosConceptgt ltsk
osConcept rdfabout"http//www.kb.nl/brinkman/bk
075607220"gt ltrdftype rdfresource"skosConcept
"/gt ltskosprefLabelgtgeneesmiddelenlt/skosprefLabe
lgt ltskosaltLabelgtmedicijnenlt/skosaltLabelgt lt/sk
osConceptgt ltskosConcept rdfabout"http//www.kb
.nl/brinkman/bk075611791"gt ltrdftype
rdfresource"skosConcept"/gt ltskosprefLabelgtki
ndergeneeskundelt/skosprefLabelgt ltskosbroader
rdfresource"http//www.kb.nl/brinkman/bk07560720
4"/gt ltskosscopeNotegtkinderen ouder dan 12
vallen niet onder kindergeneeskundelt/skosscopeNo
tegt lt/skosConceptgt
22Agenda
- Cultural Heritage and Semantic Web
- Two important issues
- Representing Cultural Heritage vocabularies on
the Semantic Web - Vocabulary alignment
- Demo
23Cultural Heritage Interoperability Problems
- Problem integrating different databases/metadata
schemes/vocabularies - Syntactic interoperability can be solved
- Common format XML (RDF)
- Common vocabulary model (SKOS)
- How about conceptual heterogeneity?
24The semantic interoperability problem
- There is no standard thesaurus
- We dont really want it
- different vocabularies for different expertise
domains, traditions, tasks - Consequence
- klassieke ruïnes vs. landschap met ruïnes
- maagd Maria vs. Heilige Moeder
- Practical problem
- Searching for Heilige Moeder misses maagd
Maria - Unless we know both vocabularies
25Old situation
26Vocabulary alignment
- STITCH aim find correspondences between
vocabulary elements - klassieke ruïnes landschap met ruïnes
- maagd Maria Heilige Moeder
27New situation
28Automatic alignment techniques
- Lexical
- Labels of entities and textual definitions
- Structural
- Structure of the formal definitions of entities,
position in the hierarchy - Statistical
- Object information (e.g. book indexing)
- Background knowledge
- Using a shared conceptual reference to find links
29Lexical alignment
- Use preferred labels, synonyms, notes
- Heuristic methods to discover equivalence and
specialization relations
30Automatic Alignment Techniques
- Lexical
- Labels of entities and textual definitions
- Structural
- Structure of the formal definitions of entities,
position in the hierarchy - Statistical
- Object information (e.g. book indexing)
- Shared background knowledge
- Using a conceptual reference to deduce
correspondences
31Statistical alignment
32Statistic approach Koninklijke Bibliotheek case
- Situation 2 overlapping collections indexed with
different thesauri - Comparison means measuring overlap between
concepts from the thesauri - Using the sets of books indexed by these concepts
- Results
- 1 9132.9 Schilderijen - schilderkunst
- 2 8088.5 Kwaliteitszorg - kwaliteitsmanagement
- 3 6232.7 Personeelsmanagement - personeelsbeleid
- ...
- 17 3421.8 Diabetes mellitus - suikerziekte
33Agenda
- Cultural Heritage and Semantic Web
- Two important issues
- Representing Cultural Heritage vocabularies on
the Semantic Web - Vocabulary alignment
- Demo
34Demo
- KB Illuminated Manuscripts
- French National Library Mandragore Manuscripts
35Manuscripts, 2nd Collection BNF Mandragore
36Manuscripts, 2nd Collection BNF Mandragore
37Demo
- http//stitch.cs.vu.nl/rp33333/MANDRA-SV-ICE-mandr
aNewNONE , amphibians - http//stitch.cs.vu.nl/rp33333/MANDRA-SV-MANDRA-ma
ndraNewNONE, wheat
38Conclusion Semantic Web can help Cultural
Heritage
- Representation of collections and associated
expert vocabularies - Semantic integration through correspondences
between different vocabularies - New opportunities for exploiting cultural
heritage information
39Thanks!
40Links
- Semantic Web at Vrije Universiteit
- http//www.cs.vu.nl/ai/kr/
- http//www.cs.vu.nl/bi/
- SKOS
- http//www.w3.org/2004/02/skos/
- Other Cultural Heritage and Semantic Web projects
- MuseumFinland, http//www.museosuomi.fi/
- eCulture, http//e-culture.multimedian.nl/