Title: Developing a Metadata Infrastructure for Information Access: What, Where, When and Who?
1Developing a Metadata Infrastructure for
Information AccessWhat, Where, When and Who?
Prof. Ray R. Larson University of California,
BerkeleySchool of Information
2Overview
- Metadata as Infrastructure
- What, Where, When and Who?
- What are Entry Vocabulary Indexes?
- Notion of an EVI
- How are EVIs Built
- Time Period Directories
- Mining Metadata for new metadata
- 4W Demo
- New Project Bringing Lives to Light
3Metadata as Infrastructure
- The difference between memorization and
understanding lies in knowing the context and
relationships of whatever is of interest. When
setting out to learn about a new topic, a
well-tested practice is to follow the traditional
5Ws and the H Who?, What?, When?, Where?,
Why?, and How?
4Metadata as Infrastructure
- The reference collections of paper-based
libraries provide a structured environment for
resources, with encyclopedias and subject
catalogs, gazetteers, chronologies, and
biographical dictionaries, offering direct
support for at least What, Where, When, and Who. - The digital environment does not yet provide an
effective, and easily exploited, infrastructure
comparable to the traditional reference library.
5What?
- Searching texts by topic, e.g. Dewey, LCSH, any
subject index, or category scheme applied to
documents. - Two kinds of mapping in every search
- Documents are assigned to topic categories, e.g.
Dewey - Queries have to map to topic categories, e.g.
Deweys Relativ Index from ordinary words/phrases
to Decimal Classification numbers. - Also mapping between topic systems, e.g. US
Patent classification and International Patent
Classification.
6What searches involve mapping to controlled
vocabularies
Thesaurus/ Ontology
Texts
7Building a Search Term Recommender
Start with a collection of documents.
8Classify and index with controlled vocabulary Or
use a pre-indexed collection.
9ProblemControlled Vocabularies can be difficult
for people to use.
pass mtr veh spark ign eng
10SolutionEntry Level Vocabulary Indexes.
Index
EVI
11What and Entry Vocabulary Indexes
- EVIs are a means of mapping from users
vocabulary to the controlled vocabulary of a
collection of documents
12Building and Searching EVIs
13Technical Details
For noun phrases
Internet DB indexed with a controlled vocabulary.
Building an Entry Vocabulary Module (EVI)
14Association Measure
C C t a
b t c d
Where t is the occurrence of a term and C is the
occurrence of a class in the training set
15Association Measure
16Alternatively
- Because the evidence terms in EVIs can be
considered a document, you can also use IR
techniques and use the top-ranked classes for
classification or query expansion
17(No Transcript)
18EVI example
Index termpass mtr veh spark ign eng
EVI 1
User Query Automobile
Index termautomobiles OR internal
combustible engines
EVI 2
19But why stop there?
Index
EVI
20Which EVI do I use?
Index
EVI
Index
EVI
Index
EVI
Index
21EVI to EVIs
Index
EVI
EVI2
Index
EVI
Index
EVI
Index
22Why not treat language the same way?
23Support for the Learner with a Query
Facet Vocabulary Displays WHAT
Thesaurus Cross- e.g. LCSH
references WHERE Gazetteer Map WHEN
Period directory Timeline WHO Biograph.
dict. Personal e.g. Whos Who relations
Any catalog Archives, Libraries, Museums, TV,
Publishers
Any resource Audio, Images, Texts, Numeric data,
Objects, Virtual reality, Webpages
24It is also difficult to move between different
media forms
Thesaurus/ Ontology
Texts
EVI
Numeric datasets
25Searching across data types
- Different media can be linked indirectly via
metadata, but often (e.g. for socio-economic
numeric data series) you also need to specify
WHERE to get correct results
26But texts associated with numeric data can be
mapped as well
Thesaurus/ Ontology
Texts
EVI
EVI
captions
Numeric datasets
27But there are also geographic dependencies
Thesaurus/ Ontology
Texts
EVI
EVI
captions
Maps/ Geo Data
Numeric datasets
28WHERE Place names are problematic
- Variant forms St. Petersburg, ????? ?????????,
Saint-Pétersbourg, . . . - Multiple names Cluj, in Romania / Roumania /
Rumania, is also called Klausenburg and
Kolozsvar. - Names changes Bombay ? Mumbai.
- HomographsVienna, VA, and Vienna, Austria
- 50 Springfields.
- Anachronisms No Germany before 1870
- Vague, e.g. Midwest, Silicon Valley
- Unstable boundaries 19th century Poland
Balkans USSR - Use a gazetteer!
29WHERE. Geo-temporal search interface. Place
names found in documents. Gazetteer provided lat.
long. Places displayed on map.
Timebar?
30Zoom on map. Click on place for a list of
records. Click on record to display text.
31So geographic search becomes part of the
infrastructure
Thesaurus/ Ontology
Texts
EVI
Gazetteers
captions
Maps/ Geo Data
Numeric datasets
32WHEN Search by time is also weakly supported
- Calendars are the standard for time
- But people use the names of events to refer to
time periods - Named time periods resemble place names in being
- Unstable European War, Great War, First World
War - Multiple Second World War, Great Patriotic War
- Ambiguous Civil war in different centuries in
England, USA, Spain, etc. - Places have temporal aspects periods have
geographical aspects When the Stone Age was,
varies by region
33Vocabularies are the key! Want Kung-fu
movies? Use LCSH Hand-to-hand fighting,
oriental, in motion pictures.
34Time period directories link via the place (or
time)
Thesaurus/ Ontology
Texts
EVI
Gazetteers
captions
Maps/ Geo Data
Numeric datasets
Time Period Directory
Time lines, Chronologies
35WHEN Time Period Directory Timeline
Link to Catalog Link to Wikipedia
36WHO Biographical Dictionary Complex
relationships
37Any document, object, or performance
Connect it with its context and other resources.
Facet Vocabulary Displays WHAT
Thesaurus Cross- e.g. LCSH
references WHERE Gazetteer Map WHEN
Period directory Timeline WHO Biograph.
dict. Personal e.g. Whos Who relations
Any catalog Archives, Libraries, Museums, TV,
Publishers
Any resource Audio, Images, Texts, Numeric data,
Objects, Virtual reality, Webpages
38Demo of search interface
39(No Transcript)
40Related places
41Potentially related people
42Potentially related periods
43(No Transcript)
44Find out more about this area.
45Different Browsing Options!
46(No Transcript)
47(No Transcript)
48More information about the country of India
49(No Transcript)
50ECAI Cultural Atlases presenting history in its
geographical chronological contexts
51Mongol Empire Video
52Demo Interface
- http//ecai.berkeley.edu/imls2004/imls4w/
53New Project Bringing Lives to Light
Biography in Context
Ray R. Larson, Michael Buckland, Fredric
Gey University of California, Berkeley
54Overview
- Focussing on the Who in Who, What, Where and When
- Types of Biographical Markup
55WHEN, WHERE and WHO
- Catalog records found from a time period search
commonly include names of persons important at
that time. Their names can be forwarded to, e.g.,
biographies in the Wikipedia encyclopedia.
56Place and time are broadly important across
numerous tools and genres including, e.g.
Language atlases, Library catalogs, Biographical
dictionaries, Bibliographies, Archival finding
aids, Museum records, etc., etc. Biographical
dictionaries are also heavy on place and time
Emanuel Goldberg, Born Moscow 1881. PhD under
Wilhelm Ostwald, Univ. of Leipzig, 1906.
Director, Zeiss Ikon, Dresden, 1926-33. Moved to
Palestine 1937. Died Tel Aviv, 1970. Life as a
series of episodes involving Activity (WHAT),
WHERE, WHEN, and WHO else.
57A new form of biographical dictionary would link
to all
Biographical Dictionary
Thesaurus/ Ontology
Texts
EVI
Gazetteers
captions
Maps/ Geo Data
Numeric datasets
Time Period Directory
Time lines, Chronologies
58Projected Work
- Develop XML markup for Biographical Events
- Most likely to be adaptation and extension of
existing biographical event markup - Example EAC/EAD
- Harvest biographical resources
- Wikipedia, etc.
- Integrate as next generation of current interface
59EAC/EAD
ltbioghistgt ltheadgtBiographical
Notelt/headgt ltchronlistgt
ltchronitemgt ltdategt1892, May
7lt/dategt lteventgtBorn,
ltgeognamegtGlencoe, Ill.lt/geognamegtlt/eventgt
lt/chronitemgt ltchronitemgt
ltdategt1915lt/dategt
lteventgtA.B., ltcorpnamegtYale University,
lt/corpnamegtNew Haven, Conn.lt/eventgt
lt/chronitemgt ltchronitemgt
ltdategt1916lt/dategt
lteventgtMarried ltpersnamegtAda Hitchcocklt/persnamegt
lt/eventgt
lt/chronitemgt ltchronitemgt
ltdategt1917-1919lt/dategt
lteventgtServed in ltcorpnamegtUnited States
Armylt/corpnamegtlt/eventgt lt/chronitemgt
lt/chronlistgt lt/bioghistgt
60Wikipedia data
61(No Transcript)
62A Metadata Infrastructure
63Acknowledgements
- Electronic Cultural Atlas Initiative project
- This work is being supported supported by the
Institute of Museum and Library Services through
a National Leadership Grant for Libraries - Contact ray_at_ischool.berkeley.edu