Quality Taxonomies - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Quality Taxonomies

Description:

Thesaurus versus Taxonomy. Put an ontology on ... See also Taxonomy (functions, combinations, etc.) Install ... in current ontology / taxonomy. ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 36
Provided by: Cla5152
Category:

less

Transcript and Presenter's Notes

Title: Quality Taxonomies


1
Quality Taxonomies
  • Jim Nisbet
  • Senior Vice President of Technology
  • Semio Corporation
  • Knowledge Technologies 2001
  • March 5th, 2001

2
Ontology / Taxonomy
Root Ontology
Static Discovery
Taxonomy Generation
Dynamic Discovery
3
What is Quality ?
  • Best value for the money
  • According to this definition, you are entitled to
    get high performance from a costly product
    likewise a low cost product or service is
    expected to be a poor delivery. For example, a
    loose demo delivery is both predictable and
    acceptable, since its quality is low conformance
    / low cost.

4
What is Quality ?
  • Good Quality is Nominal Conformance
  • Taxonomy Quality is defined as Taxonomy
    Conformance to
  • Valid requirements
  • Explicitly documented development standards and,
  • Implicit characteristics that are expected of all
    professionally developed taxonomies, such as the
    desire for good maintainability.

5
Standards
  • ISO 2788-1986
  • International Organization for Standardization.
    DocumentationGuidelines for the Establishment
    and Development of Monolingual Thesauri. 2nd ed.
    n.p. ISO, 1986. (ISO 2788-1986(E)). (Available
    in the U.S. from American National Standards
    Institute)
  • ISO 5964-1985 
  • International Organization for Standardization.
    DocumentationGuidelines for the Establishment
    and Development of Multilingual Thesauri. n.p.
    ISO, 1985. (ISO 5964-1985(E)). (Available in the
    U.S. from American National Standards Institute)
  • ANSI/NISO Z39.19-1993
  • National Information Standards Institute.
    Guidelines for the Construction, Format, and
    Management of Monolingual Thesauri. Bethesda, MD
    NISO Press, 1994. 69p. (ANSI/NISO Z39.19-1993)
  • SEMIO Quality Plan v1 2000
  • ISO/IEC 13250 Topic Maps
  • RDF
  • Please refer to RDF at http//www.w3.org/RDF and
    XML at http//www/w3/org/XML

6
Project Plan
  1. Kick-off
  2. Requirements Review
  3. Lexicon Review
  4. Taxonomy Review
  5. Tags Review
  6. Final Review

7
1. Kick-off
  • Objectives
  • Purpose
  • Scope
  • Scale
  • Users
  • Conditions of receipt
  • Roles
  • Supplier
  • Customer
  • Admin
  • KE
  • Experts
  • Users
  • Planning
  • Training and Transfer

8
2. Requirements Review
  • Sources
  • Lexicon
  • Ontology
  • Install

9
Sources
  • Dispersion (Multiplicity, Size, Homogeneity)
  • Refresh
  • Access

10
Typical Patterns
  • Disparity
  • Adjust sources
  • Adjust crawl strategy
  • Isolate communities / taxonomies

11
Lexicon
  • Vocabularies, etc.
  • Substitutions Acronyms, Synonyms, etc.
  • Preferred Keywords Brand Names, etc.
  • Banned Keywords

12
Typical Patterns
  • Lack of requirements
  • Use Librarian Resources

13
Ontology
  • Thesaurus ?
  • Is the information domain analysis complete,
    consistent, and accurate ?
  • Is the partitioning of the problem complete ?

14
Typical Patterns
  • Directory versus Taxonomy
  • Isolate directory branches
  • Thesaurus versus Taxonomy
  • Put an ontology on top of thesaurus
  • Check ASAP match of thesaurus generics with
    extracted lexicon
  • Very high level design for top categories
    requirements
  • Plan to work bottom-up
  • See also Taxonomy (functions, combinations, etc.)

15
Install
  • Implementation / Integration
  • Are external and internal interfaces properly
    defined?
  • Are all requirements traceable to the system
    level?
  • Has prototyping been conducted for the
    user/customer?
  • Is performance achievable within the constraints
    imposed by other system elements?
  • Are requirements consistent with schedule,
    resources, and budget?

16
Typical Patterns
  • Scale
  • Security
  • Missing Documents

17
3. Lexicon Review
  • Coverage
  • Extracted words / Words
  • (Extracted Index / Index)
  • Sources bench-marking
  • Coverage
  • Extraction quality
  • Topic distribution
  • Structure
  • Most Frequent Phrases
  • Most Productive Generics
  • Substitutions
  • Exceptions

18
Typical Patterns
  • Low level of frequency / quality for the most
    meaningful content
  • Increase size of value corpus
  • Filter and re-import lexicon

19
4. Taxonomy Review
  • Taxonomy Operation
  • Correctness
  • Reliability
  • Usability
  • Integrity
  • Efficiency
  • Taxonomy Revision
  • Maintainability
  • Flexibility
  • Testability
  • Taxonomy Transition
  • Portability
  • Reusability
  • Interoperability

20
Folk Taxonomies Design
  • The Berlin and Kay model Taxonomy Nomenclature
    Terminology

Unique Beginner
Life Form
Generic
Specific
Varietal
21
Correctness
  • Accuracy
  • Completeness
  • Consistency

22
Accuracy
  • Precision
  • Recall

23
Completeness
Taxonomy
Maps
Lexicon
Collection
24
Concentration Works Against Quality
  • Tagging Coverage
  • Ontology Coverage
  • Hook Coverage
  • Map Coverage
  • Lexical Coverage
  • Collection Coverage

25
ConsistencyTypical Patterns
  • Objectivization
  • Hyperonymy
  • Speciation
  • Necessity

26
Objectivization
  • Avoid functional categories
  • Dont mix functions / objects
  • Exhaust scripts
  • Match idiomatic phrases
  • Employment
  • Firing
  • Hiring
  • Salaries

27
Genericity
  • Parts
  • Air Conditioning
  • Belts and Hoses
  • Body
  • Brake System
  • Chassis
  • Engine
  • Exhaust System
  • Fuel System
  • Glass
  • Ignition
  • Avoid meronymy
  • Dont mix meronymy / hyperonymy
  • Exhaust prototypes

28
Speciation
(WordNet)
  • Person
  • Unwelcome person
  • Unpleasant person
  • Selfish person
  • Opportunist
  • Backscratcher
  • Avoid strings of categories
  • Avoid (non-idioms) properties for categories

29
Necessity
  • Avoid non-productive categories
  • Avoid combinations of categories

30
Nomenclature (Design Structure) Quality Index
  • Depth
  • Width
  • Balance

31
Complexity Index
  • Cyclometric complexity increases with number of
    Cross References within the Taxonomy, giving an
    indication of complexity and difficulty of
    testing.
  • Taxonomy Complexity Index combines
  • autonomy
  • closure
  • similarity
  • typicality
  • commonality
  • redundancy
  • stability

32
Maturity index
  • The IEEE standard 982.1-1988 suggests a taxonomy
    maturity index to provide an indication of the
    stability of the taxonomy .
  • Maturity Index combines
  • number of modules in current ontology / taxonomy.
  • number of modules in current ontology / taxonomy
    that have been changed.
  • number of modules added to current ontology /
    taxonomy.
  • number of modules deleted from the previous
    version of the ontology / taxonomy.

33
5. Tags Review
  • Document coverage
  • Concepts coverage

lttagsetgt ltdocumentgt
ltdocurlgthttp//www.TaxSource.comlt/docurlgt
lttaggt lttagnamegtLiabilitylt/tagnamegt
ltweightgt1.289lt/weightgt lt/taggt
lttaggt lttagnamegtFederal
Fundslt/tagnamegt ltweightgt0.746lt/weightgt
lt/taggt lt/documentgtlt/tagsetgt
34
6. Final Review
  • Receipt
  • Maintenance

35
Quality Taxonomies
  • Jim Nisbet
  • niz_at_semio.com
  • Knowledge Technologies 2001
Write a Comment
User Comments (0)
About PowerShow.com