Modeling a Natural Language Gateway to MetadataEnabled Resources - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Modeling a Natural Language Gateway to MetadataEnabled Resources

Description:

Funding provided by the Social Sciences and Humanities Research ... Provenance. Repository. Title Statement-Sponsor. TITLE. Object Name. Series Name. SUBJECT ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 29
Provided by: NTUs92
Category:

less

Transcript and Presenter's Notes

Title: Modeling a Natural Language Gateway to MetadataEnabled Resources


1
Modeling a Natural Language Gateway to
Metadata-Enabled Resources
  • Lynne C. Howarth
  • Faculty of Information Studies
  • University of Toronto, Canada

2
Acknowledgements
  • Funding provided by the Social Sciences and
    Humanities Research Council of Canada (SSHRC SRG
    410-99-1287)

3
Introduction
  • The development of technologies that enable
    access to information regardless of geographic or
    language barriers is a key factor for truly
    global sharing of knowledge.
  • Source Oard, D., et al. 1999. Multilingual
    Information Discovery and AccesS (MIDAS). D-Lib
    Magazine, October 1999. Available at URL
    http//www.dlib.org/dlib/october99/Ioard.html

4
Introduction - 2
  • End-user expectations for seamless cross-language
    retrieval are increasing with greater use of and
    access to Web resources (Large Moukdad, 2000)
  • Cross-language retrieval systems in experimental
    stages (Peters Braschler, 2001)
  • Need for systems that obviate the requirement of
    understanding underlying metadata structures and
    tagging (Buckland et al., 1999)
  • Current research focus in metadata arena has
    tended to be on syntax less focus on semantics

5
Research Objectives
  • Building on previous research (Howarth, Cronin,
    Hannaford, 2002 2003)
  • to develop and refine a common set of labelled
    categories to serve as a natural language
    gateway to metadata-enabled resources,
    enhancing
  • Semantic interoperability
  • Language interoperability
  • Multilingual access
  • Cross-domain searching

6
Crosswalks - 1
  • Identified and analysed the structure and content
    of eight metadata schemes from various domains
  • Encoded Archival Description (EAD)
  • Dublin Core (DC) (qualified)
  • Government now Global Information Locator
    Service (GILS)
  • Text Encoding Initiative (TEI)
  • Visual Resources Association (VRA) Visual
    Document Description Categories
  • Consortium for Interchange of Museum Information
    (CIMI) now ceased
  • Digital Geospatial Metadata (DGM)
  • ONIX (Online Information Exchange) Publishing
    domain)

7
Corsswalks 2
  • Using MARC21 as a baseline, and existing
    cross-schema crosswalks as benchmarks, created
    a master crosswalk incorporating all elements
    from each of the eight metadata schemas (previous
    slide)
  • Crosswalks used to identify
  • elements that matched across all schemes
  • elements that corresponded between two systems or
    among three or more
  • elements that were clearly unique to a domain
  • Source, Cromwell-Kessler, W. (1998)

8
Crosswalk ExampleTop cells Schema-specific
termBottom cells Like terms (cross-schema)
9
Categorization
  • Crosswalks used as a framework for deriving a set
    of common categories (n 17 see Table 1)
  • Metatags from each schema assigned to one or more
    of the 17 categories of 885 tags assessed
  • 680 tags assigned 11 with category
  • 75 tags assigned to gt 1 category, e.g.
  • temporal keyword (DGM) gt Date Time Period
    Subject
  • altformavail (EAD) gt Contact Information
    Identifiers Physical Format Place
  • 130 tags not included in any category (DGM 110
    ONIX 9 DC 6 EAD 2 others 1 GILS 0)
  • Categories were assigned natural language labels
    and definitions developed for each

10
Categories (see Table 1)
11
Exercises Focus Group Testing
  • Potential clarity and utility of labeled
    categories tested using quantitative (assigned
    activities) and qualitative (focus group
    discussions) approaches
  • Categorization exercises - purpose
  • Resolve any semantic ambiguities (fuzzy terms
    that defied ready assignment to any one category)
  • Refine category definitions to ensure that
    categories contain the kinds of concepts the end
    user expects
  • Once categories validated in English can
    broaden to multilingual environments

12
Participants - 1
  • Participants recruited from University of Toronto
    environment (total 19)
  • Division of participants into two cohorts
  • Experts (librarians) (n 12)
  • Novices (students) (n 7)
  • First year students, with undergraduate degrees
    in the Social Sciences or Humanities
  • No completed courses in information
    retrieval/search strategies, or cataloguing

13
Category Matching Activities
  • All participants (n 19) asked to match category
    names to definitions (could match gt1)
  • All participants asked level of agreement with
  • In general I found it easy to make the match
  • In general this category name represents what is
    described in the definition
  • In general, I found the definition helpful in
    clarifying the meaning of the category
  • Each cohort given a randomly generated set of 28
    concepts (or subcategories each with a
    definition) and asked to assign each to a
    category or categories (see Figure 1)

14
Example of Categorization Exercise (see also
Figure 1 in paper)
  •  Point of Contact Identifies an organization or
    person serving as the point of contact also
    includes information on methods for making
    contact.
  •  
  • I would put this concept into the following
    category/categories check (P) as many as
    apply
  •  
  • _____ Contact Information
  • _____ Names
  • _____ None of the above
  • And/or
  • ____ I would suggest the category
    name(s)_______________ from the
  • list
  • ____ I would like to suggest my own category
  • name__________________
  • ____ I dont know

15
Procedures
  • After paper exercise completed, facilitator led
    discussion
  • Discussion focused on elements that participants
    either could not categorize or for which they
    assigned a new category name
  • Of particular note for this paper are
  • Names category
  • Subject category
  • Title category

16
Findings Activity 1
  • Matching elements to definitions
  • 10.5 of participants did not link any definition
    to the Title category definition described as
    weird, strange, sterile
  • For Subject, categories Summary
    Description, and Genre Type were deemed
    interchangeable by some participants
  • For Names, categories Roles and Contact
    Information were considered interchangeable by
    some participants element label described as
    fuzzy
  • Moderate to high ambiguity associated with key
    categories in search strategies

17
Findings Activity 2
  • Likert scale questions (1strongly agree 5
    strongly disagree) - rankings for categories,
    Names, Subject, and Title
  • In general I found it easy to make the match
  • Subject 5th (mean 1.63)
  • Title 9th (mean 2.42)
  • Names 11th (mean 2.53) - total ranks out of
    13
  • In general this category name represents what is
    described in the definition
  • Subject 5th (mean 1.86)
  • Names 10th (mean 2.26)
  • Title 11th (mean 2.42) - total ranks out
    of 13

18
Findings Activity 2
  • Likert scale questions (1strongly agree 5
    strongly disagree) - rankings for categories,
    Names, Subject, and Title
  • In general, I found the definition helpful in
    clarifying the meaning of the category
  • Subject 6th (mean 2.11)
  • Names 7th (mean 2.26)
  • Title 8th (mean 2.32) - total ranks out of
    12
  • Despite expectations of transparency, category
    labels were somewhat unclear even confusing
  • Definitions marginally more helpful for Names
    and Title, than for Subject

19
 
 
Table 2 Subcategory to Category Assignment by
Element Name and Participant Cohort
Table 2 Subcategory to Category Assignment by
Element Name and Participant Cohort
20
Findings Activity 3
  • Correct matching of subcategories
  • Novices successfully matched subcategory to
    category for 17.9 of 28 assigned terms
  • Experts successfully matched subcategory to
    category for only 7 of 28 assigned terms
  • Experts supplied more write-in suggestions
    than novices
  • Novices more likely than experts to make
    educated guess or to respond I dont know,
    than to offer new or indicate none of the above

21
Findings Activity 3
  • Novices and experts, alike perceived
    category label and definition for Subject to be
    semantically unambiguous
  • Names and Title were perceived by both
    cohorts to have semantic flaws
  • Many subcategories associated with names and
    highly confusing may require rethinking and
    refining

22
Q - Whats in a Name?A Ambiguity and Confusion
  • Focus group participants respond - 1
  • Well, it was interesting, challenging. It
    really makes you realize how much terminology
    we're all tied by and how troubling it really is
    (general agreement). I mean, people, you know
    like us that are allegedly finding information
    (laughs) and doing research all the time and
    we're going "what does this mean?", "I don't know
    what this is" so imagine the role of the user who
    is more baffled, presumably.

23
Q - Whats in a Name?A Arbitrariness and
Inconsistency
  • Focus group participants respond - 2
  • Well yeah. It brings to mind how arbitrary and
    difficult it is to assign those things, you know.
    And how subjective and why it is inconsistent,
    you know why you've got resources under one
    category because one person dealt with them and
    why, you know, someone else did something
    different and even though, you know, you can
    communicate and look them up and make things
    uniform, there's still lots of inconsistencies.
    It's human nature (general agreement/laughter).

24
Q - Whats in a Name?A - The Importance of
Context
  • Focus group participants respond - 3
  • It is kind of hard, just looking at it sort of
    abstractly, sort of broken apart like this
    without being able to look at a few records or
    something, you know, because when you're actually
    using it, the context always does help. I mean
    that's part of understanding it, so you know,
    just because it's sometimes hard to understand
    then, how some of the things relate to one
    another cuz you don't know how they're going to
    be put together on the screen, that made it
    harder in some places.

25
Next Steps - 1
  • If misalignment of semantic congruence in
    monolingual environment, then likely problematic
    to map to multilingual, multicultural
    applications to create universal gateway
  • Categories Names, Title, Subject that might
    be assumed to be more readily understood may pose
    particular challenges for language
    interoperability

26
Next Steps - 2
  • Matrix of Scenarios to be addressed
  • English language query gt retrieves monolingual
    English results
  • English language query gt retrieves multilingual
    results
  • Other language query gt retrieves monolingual
    results by language of query
  • Other language query gt retrieves multilingual
    results
  • Multiple languages query gt retrieves results in
    languages defined by query
  • Multiple languages query gt retrieves
    multilingual results
  • Universal gateway based on a common set of
    derived categories that readily map to other
    languages? Useful? Possible?
  • Some reflections on research direction

27
For More Information . . .
  • Please visit the project Web site at
  • http//www.fis.utoronto.ca/special/
  • metadata/
  • E-mail metadata_at_fis.utoronto.ca

28
Thank-you!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com