Developing a Metadata Infrastructure for Information Access: What, Where, When and Who? - PowerPoint PPT Presentation

About This Presentation
Title:

Developing a Metadata Infrastructure for Information Access: What, Where, When and Who?

Description:

Title: PowerPoint Presentation Author: aitao Last modified by: Ray R. Larson Created Date: 10/26/1999 9:01:01 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:1188
Avg rating:3.0/5.0
Slides: 64
Provided by: ait2
Category:

less

Transcript and Presenter's Notes

Title: Developing a Metadata Infrastructure for Information Access: What, Where, When and Who?


1
Developing a Metadata Infrastructure for
Information AccessWhat, Where, When and Who?
Prof. Ray R. Larson University of California,
BerkeleySchool of Information
2
Overview
  • Metadata as Infrastructure
  • What, Where, When and Who?
  • What are Entry Vocabulary Indexes?
  • Notion of an EVI
  • How are EVIs Built
  • Time Period Directories
  • Mining Metadata for new metadata
  • 4W Demo
  • New Project Bringing Lives to Light

3
Metadata as Infrastructure
  • The difference between memorization and
    understanding lies in knowing the context and
    relationships of whatever is of interest. When
    setting out to learn about a new topic, a
    well-tested practice is to follow the traditional
    5Ws and the H Who?, What?, When?, Where?,
    Why?, and How?

4
Metadata as Infrastructure
  • The reference collections of paper-based
    libraries provide a structured environment for
    resources, with encyclopedias and subject
    catalogs, gazetteers, chronologies, and
    biographical dictionaries, offering direct
    support for at least What, Where, When, and Who.
  • The digital environment does not yet provide an
    effective, and easily exploited, infrastructure
    comparable to the traditional reference library.

5
What?
  • Searching texts by topic, e.g. Dewey, LCSH, any
    subject index, or category scheme applied to
    documents.
  • Two kinds of mapping in every search
  • Documents are assigned to topic categories, e.g.
    Dewey
  • Queries have to map to topic categories, e.g.
    Deweys Relativ Index from ordinary words/phrases
    to Decimal Classification numbers.
  • Also mapping between topic systems, e.g. US
    Patent classification and International Patent
    Classification.

6
What searches involve mapping to controlled
vocabularies
Thesaurus/ Ontology
Texts
7
Building a Search Term Recommender
Start with a collection of documents.
8
Classify and index with controlled vocabulary Or
use a pre-indexed collection.
9
ProblemControlled Vocabularies can be difficult
for people to use.
pass mtr veh spark ign eng
10
SolutionEntry Level Vocabulary Indexes.
Index
EVI
11
What and Entry Vocabulary Indexes
  • EVIs are a means of mapping from users
    vocabulary to the controlled vocabulary of a
    collection of documents

12
Building and Searching EVIs
13
Technical Details
For noun phrases
Internet DB indexed with a controlled vocabulary.
Building an Entry Vocabulary Module (EVI)
14
Association Measure
C C t a
b t c d
Where t is the occurrence of a term and C is the
occurrence of a class in the training set
15
Association Measure
  • Maximum Likelihood ratio

16
Alternatively
  • Because the evidence terms in EVIs can be
    considered a document, you can also use IR
    techniques and use the top-ranked classes for
    classification or query expansion

17
(No Transcript)
18
EVI example
Index termpass mtr veh spark ign eng
EVI 1
User Query Automobile
Index termautomobiles OR internal
combustible engines
EVI 2
19
But why stop there?
Index
EVI
20
Which EVI do I use?
Index
EVI
Index
EVI
Index
EVI
Index
21
EVI to EVIs
Index
EVI
EVI2
Index
EVI
Index
EVI
Index
22
Why not treat language the same way?
23
Support for the Learner with a Query
Facet Vocabulary Displays WHAT
Thesaurus Cross- e.g. LCSH
references WHERE Gazetteer Map WHEN
Period directory Timeline WHO Biograph.
dict. Personal e.g. Whos Who relations
Any catalog Archives, Libraries, Museums, TV,
Publishers
Any resource Audio, Images, Texts, Numeric data,
Objects, Virtual reality, Webpages
24
It is also difficult to move between different
media forms
Thesaurus/ Ontology
Texts
EVI
Numeric datasets
25
Searching across data types
  • Different media can be linked indirectly via
    metadata, but often (e.g. for socio-economic
    numeric data series) you also need to specify
    WHERE to get correct results

26
But texts associated with numeric data can be
mapped as well
Thesaurus/ Ontology
Texts
EVI
EVI
captions
Numeric datasets
27
But there are also geographic dependencies
Thesaurus/ Ontology
Texts
EVI
EVI
captions
Maps/ Geo Data
Numeric datasets
28
WHERE Place names are problematic
  • Variant forms St. Petersburg, ????? ?????????,
    Saint-Pétersbourg, . . .
  • Multiple names Cluj, in Romania / Roumania /
    Rumania, is also called Klausenburg and
    Kolozsvar.
  • Names changes Bombay ? Mumbai.
  • HomographsVienna, VA, and Vienna, Austria
  • 50 Springfields.
  • Anachronisms No Germany before 1870
  • Vague, e.g. Midwest, Silicon Valley
  • Unstable boundaries 19th century Poland
    Balkans USSR
  • Use a gazetteer!

29
WHERE. Geo-temporal search interface. Place
names found in documents. Gazetteer provided lat.
long. Places displayed on map.
Timebar?
30
Zoom on map. Click on place for a list of
records. Click on record to display text.
31
So geographic search becomes part of the
infrastructure
Thesaurus/ Ontology
Texts
EVI
Gazetteers
captions
Maps/ Geo Data
Numeric datasets
32
WHEN Search by time is also weakly supported
  • Calendars are the standard for time
  • But people use the names of events to refer to
    time periods
  • Named time periods resemble place names in being
  • Unstable European War, Great War, First World
    War
  • Multiple Second World War, Great Patriotic War
  • Ambiguous Civil war in different centuries in
    England, USA, Spain, etc.
  • Places have temporal aspects periods have
    geographical aspects When the Stone Age was,
    varies by region

33
Vocabularies are the key! Want Kung-fu
movies? Use LCSH Hand-to-hand fighting,
oriental, in motion pictures.
34
Time period directories link via the place (or
time)
Thesaurus/ Ontology
Texts
EVI
Gazetteers
captions
Maps/ Geo Data
Numeric datasets
Time Period Directory
Time lines, Chronologies
35
WHEN Time Period Directory Timeline
Link to Catalog Link to Wikipedia
36
WHO Biographical Dictionary Complex
relationships
37
Any document, object, or performance
Connect it with its context and other resources.
Facet Vocabulary Displays WHAT
Thesaurus Cross- e.g. LCSH
references WHERE Gazetteer Map WHEN
Period directory Timeline WHO Biograph.
dict. Personal e.g. Whos Who relations
Any catalog Archives, Libraries, Museums, TV,
Publishers
Any resource Audio, Images, Texts, Numeric data,
Objects, Virtual reality, Webpages
38
Demo of search interface
39
(No Transcript)
40
Related places
41
Potentially related people
42
Potentially related periods
43
(No Transcript)
44
Find out more about this area.
45
Different Browsing Options!
46
(No Transcript)
47
(No Transcript)
48
More information about the country of India
49
(No Transcript)
50
ECAI Cultural Atlases presenting history in its
geographical chronological contexts
51
Mongol Empire Video
52
Demo Interface
  • http//ecai.berkeley.edu/imls2004/imls4w/

53
New Project Bringing Lives to Light
Biography in Context
Ray R. Larson, Michael Buckland, Fredric
Gey University of California, Berkeley
54
Overview
  • Focussing on the Who in Who, What, Where and When
  • Types of Biographical Markup

55
WHEN, WHERE and WHO
  • Catalog records found from a time period search
    commonly include names of persons important at
    that time. Their names can be forwarded to, e.g.,
    biographies in the Wikipedia encyclopedia.

56
Place and time are broadly important across
numerous tools and genres including, e.g.
Language atlases, Library catalogs, Biographical
dictionaries, Bibliographies, Archival finding
aids, Museum records, etc., etc. Biographical
dictionaries are also heavy on place and time
Emanuel Goldberg, Born Moscow 1881. PhD under
Wilhelm Ostwald, Univ. of Leipzig, 1906.
Director, Zeiss Ikon, Dresden, 1926-33. Moved to
Palestine 1937. Died Tel Aviv, 1970. Life as a
series of episodes involving Activity (WHAT),
WHERE, WHEN, and WHO else.
57
A new form of biographical dictionary would link
to all
Biographical Dictionary
Thesaurus/ Ontology
Texts
EVI
Gazetteers
captions
Maps/ Geo Data
Numeric datasets
Time Period Directory
Time lines, Chronologies
58
Projected Work
  • Develop XML markup for Biographical Events
  • Most likely to be adaptation and extension of
    existing biographical event markup
  • Example EAC/EAD
  • Harvest biographical resources
  • Wikipedia, etc.
  • Integrate as next generation of current interface

59
EAC/EAD
ltbioghistgt ltheadgtBiographical
Notelt/headgt ltchronlistgt
ltchronitemgt ltdategt1892, May
7lt/dategt lteventgtBorn,
ltgeognamegtGlencoe, Ill.lt/geognamegtlt/eventgt
lt/chronitemgt ltchronitemgt
ltdategt1915lt/dategt
lteventgtA.B., ltcorpnamegtYale University,
lt/corpnamegtNew Haven, Conn.lt/eventgt
lt/chronitemgt ltchronitemgt
ltdategt1916lt/dategt
lteventgtMarried ltpersnamegtAda Hitchcocklt/persnamegt
lt/eventgt
lt/chronitemgt ltchronitemgt
ltdategt1917-1919lt/dategt
lteventgtServed in ltcorpnamegtUnited States
Armylt/corpnamegtlt/eventgt lt/chronitemgt
lt/chronlistgt lt/bioghistgt
60
Wikipedia data
61
(No Transcript)
62
A Metadata Infrastructure
63
Acknowledgements
  • Electronic Cultural Atlas Initiative project
  • This work is being supported supported by the
    Institute of Museum and Library Services through
    a National Leadership Grant for Libraries
  • Contact ray_at_ischool.berkeley.edu
Write a Comment
User Comments (0)
About PowerShow.com