Taxonomy as Content Outline, Site Map and Search Aid SLA NWR Vancouver October 6, 2006 - PowerPoint PPT Presentation

1 / 84
About This Presentation
Title:

Taxonomy as Content Outline, Site Map and Search Aid SLA NWR Vancouver October 6, 2006

Description:

Outline of the content, A search aid for finding the content stored in a site ... Silver Platter (BRS Format, left tagged ASCII) NICEMnet.com (XML output) NICEM Record ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 85
Provided by: marjori7
Category:

less

Transcript and Presenter's Notes

Title: Taxonomy as Content Outline, Site Map and Search Aid SLA NWR Vancouver October 6, 2006


1
Taxonomy as Content Outline, Site Map and Search
Aid SLA NWR VancouverOctober 6, 2006
  • Marjorie M.K. Hlava
  • President
  • 505-998-0800
  • mhlava_at_accessinn.com

2
Presentation outline
  • Background
  • Standards
  • Afterthought or critical?
  • A taxonomy tale..Case study
  • Additional Sample displays

3
Presentation Goals
  • Practical guide case study
  • Content specific taxonomy
  • Outline of the content,
  • A search aid for finding the content stored in a
    site
  • Content viewed in a list,
  • Other graphical representations
  • Taxonomy based on the thesaurus
  • Access the knowledge stored in a CMS

4
Background
  • Thesaurus
  • TT, BT, NT Taxonomy
  • Add information object at the last node
  • Synonyms USE, Non-Preferred Synonym Ring
  • Above RT Thesaurus
  • Thesaurus RDF IO Semantic web

5
Charting the difference
  • Authority Files
  • People
  • Places
  • Things
  • Thesaurus
  • Concepts
  • Methods
  • Processes
  • Ontology's
  • Link Instances
  • Classes
  • Semantic Web / Topic Maps
  • Link the description
  • To the actual item (object)

6
Taxonomy / thesaurus
  • Main Term (MT)
  • Top Term (TT)
  • Broader Terms (BT)
  • Narrower Terms (NT)
  • Narrower Term Instance
  • Related Terms (RT)
  • See also (SA)
  • NonPreferred Term (NP)
  • Used for (UF), See (S)
  • Scope Note (SN)
  • History (H)

TAXONOMY
ONTOLOGY
THESAURUS
7
The Standards
  • NISO Z39.19
  • ANSI
  • ISO
  • BSI
  • W3C
  • OMB egov section 207
  • KM working groups

8
ISO TC 46 SC 6 or 9
  • Controlled vocabulary and other information
    standards
  • ISO 5127 Information and Documentation
    Vocabulary
  • ISO 2788-1986 Guidelines for the establishment
    and development of monolingual thesauri
  • BS 57231987
  • ISO 5964-1985 Guidelines for the establishment
    and development of multilingual thesauri
  • BS 67231985
  • NEW - BSA 8723 - Parts 1 4 Stella Dexter
    Clarke

9
British Standards - BS 8723
  • Structured vocabularies for information retrieval
    Guide
  • Part 1 General
  • Part 2 Thesauri
  • Part 3 Vocabularies other than thesauri
  • Part 4 Interoperability between vocabularies
  • Part 5 Interoperability with applications

10
ISO TC 37
  • Scope of ISO TC 37 Standardization of
    principles, methods and applications relating to
    terminology and other language resources.
  • TC 37/SC 1 - Principles and methods
  • TC 37/SC 2 - Terminography and lexicography
  • TC 37/SC 3 - Computer applications for
    terminology
  • TC 37/SC 4 - Language resource management

11
Sample Standards
  • Principles of concept-oriented terminology and
    data categories

ISO 7042000 Terminology work - Principles and
methodsISO 8601996 Terminology work -
Harmonization of concepts and terms ISO
1087-12000 Terminology work - Vocabulary - Part
1 Theory and applicationISO 1087-22000
Terminology work - Vocabulary - Part 2 Computer
applications ISO 102411992 Preparation and
layout of international terminology standards
ISO 122001999 Computer applications in
terminology - Machine-readable terminology
interchange format (MARTIF) - Negotiated
interchangeISO 126162002 Translation-oriented
terminographyISO/TR 126181994 Computer aids in
terminology - Creation and use of terminological
databases and text corpora ISO 126201999
Computer applications in terminology - Data
categories used to create glossaries
12
W3C
  • OWL Web Ontology Language
  • RDF Resource Description Format
  • Topic Maps
  • SKOS - Simple Knowledge Organization Systems
  • Which community to serve?
  • Build on the current standard
  • Might make this link next

13
Other things to watch
  • Other W3C and ISO areas
  • Support groups
  • Blogs
  • Communities of Practice
  • SIMILE
  • Web 2.0 activities
  • WSDL Web Services Digital Library

14
Other Relevant ISO W3C Standards
  • Markup Languages
  • Metadata Resources
  • Character Coding
  • Access Protocols and Interoperability
  • Content Creation, Manipulation, and Maintenance
  • Authoring Standards
  • Text and Content Markup
  • Translation Standards
  • Terminology and Lexicography Standards
  • ISO TC 37 Standards
  • Terminology Interchange Standards
  • Controlled Language Standards
  • Taxonomy and Ontology Standards
  • Corpus Management Standards  
  • Locale-Related Standards
  • For translation, terminology and applied
    linguists go to
  • http//appling.kent.edu/ResourcePages/LTStandard
    s/Chart/standards.chart.htmOntology

15
SIMILE
  • Semantic Interoperability of Metadata and
    Information in unLike Environments
  • Forming a data reference for open source
    taxonomies

16
Dont reinvent the wheel
  • Use the standards

17
Case study
  • One production group
  • One database
  • One thesaurus
  • One taxonomy
  • One style guide
  • Two web portals
  • Directory service - database
  • E-commence platform and portal
  • Two online services
  • Two CD products
  • One MARC cataloging authority

18
Production to Portal
  • Background
  • The production platform
  • Integrated Tools for
  • Content Management
  • Indexing
  • Taxonomy Development
  • The Portal displays
  • New Search options

19
Our Mission
  • NICEM was established on, and remains committed
    to, the principle that instructional media offer
    tremendous potential for improving learning.

20
What Is NICEM?
  • National Information Center for Educational Media
  • Established 1963
  • Searchable by title, date, age level, subject
    area, media type and over 130 languages
  • 664,000 items

21
What Is NICEM?
  • 5,700 producers of non-print media
  • 16,000 distributors of non-print media
  • US MARC Cataloging Authority for non print media
  • 460,000 unit title records
  • Output XML or MARC records
  • Online
  • TLC (MARC)
  • Silver Platter (BRS Format, left tagged ASCII)
  • NICEMnet.com (XML output)

22
NICEM Record
  • Main record fields
  • Series record fields
  • PD fields
  • E-commerce fields
  • Pick lists / authority fields
  • 86 fields total

23
NICEM Thesaurus
  • 22 top terms supporting education curriculum
  • 5708 main terms
  • Standard Z39.19 term record set
  • BT, NT
  • Related terms
  • Synonyms
  • Notes

24
MediaSleuth Output
  • Same DBMS.
  • Additional fields for purchase info
  • Price
  • Item number etc
  • Different interface
  • Also take away fields
  • No P/D information
  • Different export

25
What Is Media Sleuth?
  • The e-commerce platform of NICEM
  • 96,000 items from 156 P/Ds
  • Easy ordering online
  • Virtual Cart
  • Bonus Bucks

26
We Needed Integrated content management
  • Database management system
  • Indexing terms to describe content
  • System to apply indexing terms
  • for targeted document retrieval
  • Treat once for multiple outputs

27
Integrated tools for content management
Database system
Establish rules for term use Suggest indexing
terms
Search thesaurus Validate term entry Block
invalid terms Record candidates
Thesaurus tool
Indexing tool
Validate terms Add terms and rules Change terms
and rules Delete terms and rules
28
Interdependence
In the integrated system, all parts interconnect,
rely on each other, and must work together.
The right hand must know
what the left hand...
and the other left hand
are doing.
29
DBMS wish list
  • Easy data entry for editors
  • Fully customized database
  • Numerous data fields and room to grow
  • Free text entry with unlimited field length
  • Controlled vocabulary for selected fields
  • Branching structures from multiple fields
  • Systematic collection of candidate terms
  • Platform independence
  • Remote access for offsite editors
  • XML tagging to convert to various
  • output formats

30
XML Intranet System for DBMS
31
XIS provides NICEM flexible fields
  • Branching data

Title
Distributor A
video filmstrip
video audio
Distributor B
video laser disc software
Distributor C
Windows Mac
  • Unlimited text length

32
XIS stores ideas for new terms
to maintain your thesaurus
fill gaps in concept coverage
add terms as new concepts arise
33
XML export file
34
NICEM needed a thesaurus tool
  • Restructure flat file into hierarchy
  • Map from old terms to new
  • Expand thesaurus coverage
  • Easy to navigate hierarchy
  • User friendly, easy to maintain
  • Form associations and interconnections
  • RTs, Use/UFs, Scope Notes, etc.
  • Comply with ANSI/NISO, ISO standards
  • Integrate with DBMS

35
Thesaurus Term Record view
Taxonomy view
36
Thesaurus Master connects to DBMS
37
(No Transcript)
38
(No Transcript)
39
Search thesaurus through DBMS
Define your search parameters -- just the term,
or
also search Related Terms, Non-Preferred Terms,
Scope Notes, and more.
40
Search thesaurus for choices, select term to view
full record,
or view hierarchy
41
Thesaurus Master blocks invalid terms in DBMS
Thesaurus tool must accept manual entry, but
prevent invalid terms and typos.
42
NICEM needed an indexing tool
  • Basic requirement for an indexing tool
  • Suggest terms that are
  • valid
  • correctly formatted
  • conceptually appropriate

Avoid suggesting any terms that do not meet
these criteria.
and more...
43
and NICEM wanted...
  • Faster, more consistent production
  • Memory prompt for forgotten terms
  • Facilitate training on thesaurus
  • Index all relevant concepts
  • Index concepts deeply, specifically
  • Smarter indexing than simple
  • term recognition or co-occurrence
  • Integrate with DBMS

44
Machine Aided Indexer (M.A.I.) met NICEMs needs
How M.A.I. works
  • Scans selected fields
  • Text words prompt rules
  • Match rule conditions?
  • Suggests the indexing term
  • Tracks M.A.I. suggestions and editors choices
  • Presents comparative statistics for review
  • Enables rule changes for improved future
    performance

45
M.A.I. connects to DBMS
46
M.A.I. suggests thesaurus terms.
Highlight terms and hit Select to index.
47
M.A.I. gives editors choice
M.A.I. is an aide, an assistant for the editor, a
memory prompt. M.A.I. suggests indexing terms
based on the rules in its knowledge base.
But the editor makes the decision, based on
human understanding, analysis, and
interpretation of the text.
The editor then teaches M.A.I. to recognize
the set of clues in text that prompted use of an
indexing term.
48
Rules governing M.A.I.s term suggestions can be
simple or complex
49
(No Transcript)
50
Editors can write rules that consider
  • style of text
  • sentence length
  • proximity of target words (four degrees)
  • capitalization of target words or initial
    letters
  • position of target word in sentence
  • rejection of indexing term if a specific word
  • is present
  • idiomatic word usage
  • flexible mix and match combinations of
  • target words in text to clarify meaning

51
Other things to do with production
  • Automatic application?
  • Spider setting internally
  • External web crawls use all aliases
  • Web harvesting of popular sites

52
Sailing on to the portal
53
Taxonomy descriptors become subject metadata
  • Selected descriptors are XML-tagged and stored
    with document
  • Descriptors available as webpage metadata
  • Put in the HTML Header
  • Metatags enable precise document retrieval
  • Term equivalence enables query expansion in
    search (MAIQuery)

54
Accommodating Learning Styles
  • Visual
  • Auditory
  • Kinesthetic Tactile
  • Visual (spatial)
  • Aural (auditory-musical)
  • Verbal (linguistic)
  • Physical (kinesthetic)
  • Logical (mathematical)
  • Social (interpersonal)
  • Solitary (intrapersonal)
  • Active and Reflective
  • Visual and Verbal
  • Sensing and Intuitive
  • Sequential and Global

55
Designing for everyone
  • Structure of the Corpus
  • User Context and Search Task
  • User-Interface Design
  • Mobile Search

56
Structure of the Corpus
  • Specific domains
  • Easier than the whole internet
  • DLESE Digital Library for Earth Science
    Education
  • Domain specific taxonomy
  • Specific branches
  • Dynamic classifications

57
(No Transcript)
58
User Context and Search Task
  • What about unique interfaces for different tasks?
  • Culture effects
  • Embedded search
  • Search history

59
User-Interface Design
  • Combine search and browse
  • Providing confidence in search
  • Help build the query
  • Predicting the users queries

60
Mobile Search
  • Small screen design
  • Offline queries
  • Location search
  • Person, things search
  • Conceptual search

61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
The Portal View - MediaSleuth
  • Use all options for search
  • Traditional Search
  • Taxonomy
  • Rule Base

65
(No Transcript)
66
(No Transcript)
67
Value of Category search
  • Searchers find info 50 faster using browsable
    categories than using list returned from free
    text search
  • Results even stronger when results not
    in top 20 returns
  • Searchers prefer browsable category search
  • Chen, H., and Dumais, S.

68
Our assumptions forEnterprise Taxonomy Management
  • Consistent application across entire site
  • Synonyms are used interchangeably
  • User doesnt need to know the taxonomy
  • Pop up view is helpful
  • Site map for construction and browsing
  • Allows hidden sections for internal use

69
s
NavTree View
MAIQuery
70
Content Outline
Top Terms in the Taxonomy Click each to drill
down
71
Click taxonomy category to see associated titles
72
(No Transcript)
73
MAIQuery use the rule base to expand your
search query
74
(No documents in Microorganisms category in 1,000
document sample)
75
Integrated software tools provide
  • Cross-checks
  • Interconnection
  • Cooperation
  • Validation
  • Feedback
  • Coordination
  • Seamless integration
  • Error prevention

The right hand knows what the left hands are
doing.
76
Other display options
  • Grokker
  • Maps
  • Some newer ones follow

77
xrefer Research Mapping
xrefer Research Mapping
78
This extract, expressed as an RDF graph using the
SKOS Core Vocabulary, looks like
                                                  
                                                  
                                                  
                                                  
                                                  
              
79
Another way to search
80
Data Harmony View - VxInsight
81
(No Transcript)
82
(No Transcript)
83
Framework (logical view)
Metamodel of Services
Data processed by Services
Webserver (Cocoon2)
Support Services
Service creation and assembly
Transaction- management
Java-classes (framework application)
84
Questions? Comments?
Thanks for your attention! Marjorie M.K.
Hlava NICEM / MediaSleuth A division of Access
Innovations
Call 505-998-0800 Email mhlava_at_nicem.com
Write a Comment
User Comments (0)
About PowerShow.com