Title: Taxonomy as Content Outline, Site Map and Search Aid SLA NWR Vancouver October 6, 2006
1Taxonomy as Content Outline, Site Map and Search
Aid SLA NWR VancouverOctober 6, 2006
- Marjorie M.K. Hlava
- President
- 505-998-0800
- mhlava_at_accessinn.com
2Presentation outline
- Background
- Standards
- Afterthought or critical?
- A taxonomy tale..Case study
- Additional Sample displays
3Presentation Goals
- Practical guide case study
- Content specific taxonomy
- Outline of the content,
- A search aid for finding the content stored in a
site - Content viewed in a list,
- Other graphical representations
- Taxonomy based on the thesaurus
- Access the knowledge stored in a CMS
4Background
- Thesaurus
- TT, BT, NT Taxonomy
- Add information object at the last node
- Synonyms USE, Non-Preferred Synonym Ring
- Above RT Thesaurus
- Thesaurus RDF IO Semantic web
5Charting the difference
- Authority Files
- People
- Places
- Things
- Thesaurus
- Concepts
- Methods
- Processes
- Ontology's
- Link Instances
- Classes
- Semantic Web / Topic Maps
- Link the description
- To the actual item (object)
6Taxonomy / thesaurus
- Main Term (MT)
- Top Term (TT)
- Broader Terms (BT)
- Narrower Terms (NT)
- Narrower Term Instance
- Related Terms (RT)
- See also (SA)
- NonPreferred Term (NP)
- Used for (UF), See (S)
- Scope Note (SN)
- History (H)
TAXONOMY
ONTOLOGY
THESAURUS
7The Standards
- NISO Z39.19
- ANSI
- ISO
- BSI
- W3C
- OMB egov section 207
- KM working groups
8ISO TC 46 SC 6 or 9
- Controlled vocabulary and other information
standards - ISO 5127 Information and Documentation
Vocabulary - ISO 2788-1986 Guidelines for the establishment
and development of monolingual thesauri - BS 57231987
- ISO 5964-1985 Guidelines for the establishment
and development of multilingual thesauri - BS 67231985
- NEW - BSA 8723 - Parts 1 4 Stella Dexter
Clarke
9British Standards - BS 8723
- Structured vocabularies for information retrieval
Guide - Part 1 General
- Part 2 Thesauri
- Part 3 Vocabularies other than thesauri
- Part 4 Interoperability between vocabularies
- Part 5 Interoperability with applications
10ISO TC 37
- Scope of ISO TC 37 Standardization of
principles, methods and applications relating to
terminology and other language resources. - TC 37/SC 1 - Principles and methods
- TC 37/SC 2 - Terminography and lexicography
- TC 37/SC 3 - Computer applications for
terminology - TC 37/SC 4 - Language resource management
11Sample Standards
- Principles of concept-oriented terminology and
data categories
ISO 7042000 Terminology work - Principles and
methodsISO 8601996 Terminology work -
Harmonization of concepts and terms ISO
1087-12000 Terminology work - Vocabulary - Part
1 Theory and applicationISO 1087-22000
Terminology work - Vocabulary - Part 2 Computer
applications ISO 102411992 Preparation and
layout of international terminology standards
ISO 122001999 Computer applications in
terminology - Machine-readable terminology
interchange format (MARTIF) - Negotiated
interchangeISO 126162002 Translation-oriented
terminographyISO/TR 126181994 Computer aids in
terminology - Creation and use of terminological
databases and text corpora ISO 126201999
Computer applications in terminology - Data
categories used to create glossaries
12W3C
- OWL Web Ontology Language
- RDF Resource Description Format
- Topic Maps
- SKOS - Simple Knowledge Organization Systems
- Which community to serve?
- Build on the current standard
- Might make this link next
13Other things to watch
- Other W3C and ISO areas
- Support groups
- Blogs
- Communities of Practice
- SIMILE
- Web 2.0 activities
- WSDL Web Services Digital Library
14Other Relevant ISO W3C Standards
- Markup Languages
- Metadata Resources
- Character Coding
- Access Protocols and Interoperability
- Content Creation, Manipulation, and Maintenance
- Authoring Standards
- Text and Content Markup
- Translation Standards
- Terminology and Lexicography Standards
- ISO TC 37 Standards
- Terminology Interchange Standards
- Controlled Language Standards
- Taxonomy and Ontology Standards
- Corpus Management Standards
- Locale-Related Standards
- For translation, terminology and applied
linguists go to - http//appling.kent.edu/ResourcePages/LTStandard
s/Chart/standards.chart.htmOntology
15SIMILE
- Semantic Interoperability of Metadata and
Information in unLike Environments - Forming a data reference for open source
taxonomies
16Dont reinvent the wheel
17Case study
- One production group
- One database
- One thesaurus
- One taxonomy
- One style guide
- Two web portals
- Directory service - database
- E-commence platform and portal
- Two online services
- Two CD products
- One MARC cataloging authority
18Production to Portal
- Background
- The production platform
- Integrated Tools for
- Content Management
- Indexing
- Taxonomy Development
- The Portal displays
- New Search options
19Our Mission
- NICEM was established on, and remains committed
to, the principle that instructional media offer
tremendous potential for improving learning.
20What Is NICEM?
- National Information Center for Educational Media
- Established 1963
- Searchable by title, date, age level, subject
area, media type and over 130 languages - 664,000 items
21What Is NICEM?
- 5,700 producers of non-print media
- 16,000 distributors of non-print media
- US MARC Cataloging Authority for non print media
- 460,000 unit title records
- Output XML or MARC records
- Online
- TLC (MARC)
- Silver Platter (BRS Format, left tagged ASCII)
- NICEMnet.com (XML output)
22NICEM Record
- Main record fields
- Series record fields
- PD fields
- E-commerce fields
- Pick lists / authority fields
- 86 fields total
23NICEM Thesaurus
- 22 top terms supporting education curriculum
- 5708 main terms
- Standard Z39.19 term record set
- BT, NT
- Related terms
- Synonyms
- Notes
24MediaSleuth Output
- Same DBMS.
- Additional fields for purchase info
- Price
- Item number etc
- Different interface
- Also take away fields
- No P/D information
- Different export
25What Is Media Sleuth?
- The e-commerce platform of NICEM
- 96,000 items from 156 P/Ds
- Easy ordering online
- Virtual Cart
- Bonus Bucks
26We Needed Integrated content management
- Database management system
- Indexing terms to describe content
- System to apply indexing terms
- for targeted document retrieval
- Treat once for multiple outputs
27Integrated tools for content management
Database system
Establish rules for term use Suggest indexing
terms
Search thesaurus Validate term entry Block
invalid terms Record candidates
Thesaurus tool
Indexing tool
Validate terms Add terms and rules Change terms
and rules Delete terms and rules
28Interdependence
In the integrated system, all parts interconnect,
rely on each other, and must work together.
The right hand must know
what the left hand...
and the other left hand
are doing.
29DBMS wish list
- Easy data entry for editors
- Fully customized database
- Numerous data fields and room to grow
- Free text entry with unlimited field length
- Controlled vocabulary for selected fields
- Branching structures from multiple fields
- Systematic collection of candidate terms
- Remote access for offsite editors
- XML tagging to convert to various
- output formats
30XML Intranet System for DBMS
31XIS provides NICEM flexible fields
Title
Distributor A
video filmstrip
video audio
Distributor B
video laser disc software
Distributor C
Windows Mac
32XIS stores ideas for new terms
to maintain your thesaurus
fill gaps in concept coverage
add terms as new concepts arise
33XML export file
34NICEM needed a thesaurus tool
- Restructure flat file into hierarchy
- Map from old terms to new
- Expand thesaurus coverage
- Easy to navigate hierarchy
- User friendly, easy to maintain
- Form associations and interconnections
- RTs, Use/UFs, Scope Notes, etc.
- Comply with ANSI/NISO, ISO standards
35Thesaurus Term Record view
Taxonomy view
36Thesaurus Master connects to DBMS
37(No Transcript)
38(No Transcript)
39Search thesaurus through DBMS
Define your search parameters -- just the term,
or
also search Related Terms, Non-Preferred Terms,
Scope Notes, and more.
40Search thesaurus for choices, select term to view
full record,
or view hierarchy
41Thesaurus Master blocks invalid terms in DBMS
Thesaurus tool must accept manual entry, but
prevent invalid terms and typos.
42NICEM needed an indexing tool
- Basic requirement for an indexing tool
- Suggest terms that are
- valid
- correctly formatted
- conceptually appropriate
Avoid suggesting any terms that do not meet
these criteria.
and more...
43and NICEM wanted...
- Faster, more consistent production
- Memory prompt for forgotten terms
- Facilitate training on thesaurus
- Index all relevant concepts
- Index concepts deeply, specifically
- Smarter indexing than simple
- term recognition or co-occurrence
44Machine Aided Indexer (M.A.I.) met NICEMs needs
How M.A.I. works
- Suggests the indexing term
- Tracks M.A.I. suggestions and editors choices
- Presents comparative statistics for review
- Enables rule changes for improved future
performance
45M.A.I. connects to DBMS
46M.A.I. suggests thesaurus terms.
Highlight terms and hit Select to index.
47M.A.I. gives editors choice
M.A.I. is an aide, an assistant for the editor, a
memory prompt. M.A.I. suggests indexing terms
based on the rules in its knowledge base.
But the editor makes the decision, based on
human understanding, analysis, and
interpretation of the text.
The editor then teaches M.A.I. to recognize
the set of clues in text that prompted use of an
indexing term.
48Rules governing M.A.I.s term suggestions can be
simple or complex
49(No Transcript)
50Editors can write rules that consider
- proximity of target words (four degrees)
- capitalization of target words or initial
letters
- position of target word in sentence
- rejection of indexing term if a specific word
- is present
- flexible mix and match combinations of
- target words in text to clarify meaning
51Other things to do with production
- Automatic application?
- Spider setting internally
- External web crawls use all aliases
- Web harvesting of popular sites
52Sailing on to the portal
53Taxonomy descriptors become subject metadata
- Selected descriptors are XML-tagged and stored
with document - Descriptors available as webpage metadata
- Put in the HTML Header
- Metatags enable precise document retrieval
- Term equivalence enables query expansion in
search (MAIQuery)
54Accommodating Learning Styles
- Visual
- Auditory
- Kinesthetic Tactile
- Visual (spatial)
- Aural (auditory-musical)
- Verbal (linguistic)
- Physical (kinesthetic)
- Logical (mathematical)
- Social (interpersonal)
- Solitary (intrapersonal)
- Active and Reflective
- Visual and Verbal
- Sensing and Intuitive
- Sequential and Global
55Designing for everyone
- Structure of the Corpus
- User Context and Search Task
- User-Interface Design
- Mobile Search
56Structure of the Corpus
- Specific domains
- Easier than the whole internet
- DLESE Digital Library for Earth Science
Education - Domain specific taxonomy
- Specific branches
- Dynamic classifications
57(No Transcript)
58User Context and Search Task
- What about unique interfaces for different tasks?
- Culture effects
- Embedded search
- Search history
59User-Interface Design
- Combine search and browse
- Providing confidence in search
- Help build the query
- Predicting the users queries
60Mobile Search
- Small screen design
- Offline queries
- Location search
- Person, things search
- Conceptual search
61(No Transcript)
62(No Transcript)
63(No Transcript)
64The Portal View - MediaSleuth
- Use all options for search
- Traditional Search
- Taxonomy
- Rule Base
65(No Transcript)
66(No Transcript)
67Value of Category search
- Searchers find info 50 faster using browsable
categories than using list returned from free
text search - Results even stronger when results not
in top 20 returns - Searchers prefer browsable category search
- Chen, H., and Dumais, S.
68Our assumptions forEnterprise Taxonomy Management
- Consistent application across entire site
- Synonyms are used interchangeably
- User doesnt need to know the taxonomy
- Pop up view is helpful
- Site map for construction and browsing
- Allows hidden sections for internal use
69s
NavTree View
MAIQuery
70Content Outline
Top Terms in the Taxonomy Click each to drill
down
71Click taxonomy category to see associated titles
72(No Transcript)
73MAIQuery use the rule base to expand your
search query
74(No documents in Microorganisms category in 1,000
document sample)
75Integrated software tools provide
The right hand knows what the left hands are
doing.
76Other display options
- Grokker
- Maps
- Some newer ones follow
77 xrefer Research Mapping
xrefer Research Mapping
78This extract, expressed as an RDF graph using the
SKOS Core Vocabulary, looks like
79Another way to search
80Data Harmony View - VxInsight
81(No Transcript)
82(No Transcript)
83Framework (logical view)
Metamodel of Services
Data processed by Services
Webserver (Cocoon2)
Support Services
Service creation and assembly
Transaction- management
Java-classes (framework application)
84Questions? Comments?
Thanks for your attention! Marjorie M.K.
Hlava NICEM / MediaSleuth A division of Access
Innovations
Call 505-998-0800 Email mhlava_at_nicem.com