Title: Standards for Controlled Vocabularies
 1Standards for Controlled Vocabularies
- I. IFLA Guidelines - 2005 
 - II. U.S. Standard (NISO Z39.19 - 2005) 
 - III. British Standards (BS 8723  2005) 
 
Marcia Lei Zeng for IFLA 2006 Classification  
Indexing Section program, Seoul, Korea 
 2I. IFLA Guidelines for Multilingual Thesauri
- IFLA Classification and Indexing Section 
 -  April 2005 released for commentshttp//www.ifla
.org/VII/s29/pubs/Draft-multilingualthesauri.pdf 
  3IFLA Classification and Indexing Section WG on 
Guidelines for Multilingual Thesauri
- Chair Gerhard J.A. Riesthuis (Netherlands) 
 - Members 
 - Lois Mai Chan (USA), 
 - Patrice Landry (Switzerland), 
 - Pia Leth (Sweden), 
 - Ia McIlwaine (United Kingdom), 
 - Martin Kunz (Germany), 
 - Dorothy McGarry (USA), 
 - Max Naudi (France), 
 - Marcia Lei Zeng (USA)
 
  4Three approaches in the development of 
multilingual thesauri
- building a new thesaurus from the bottom up 
 - starting with one language and adding another 
language or languages  - starting with more than one language 
simultaneously  - combining existing thesauri 
 - merging two or more existing thesauri into one 
new (multilingual) information retrieval language 
to be used in indexing and retrieval  - linking existing thesauri and subject heading 
languages to each other using the existing 
thesauri and/or subject heading languages both in 
indexing and retrieval  - translating a thesaurus into one or more other 
languages 
  5Semantic structure of multilingual thesauri (1)
- symmetrical 
 - all different language versions of a multilingual 
thesaurus have to be identical  - each descriptor must have one and only one 
equivalent in every language and be related in 
the same way to other descriptors in the given 
language  
  6Example a symmetrical thesaurus (last versions 
interface) 
http//www.fao.org/agrovoc/
1
3
4
2 
 7Semantic structure of multilingual thesauri (2)
- non-identical and non-symmetrical structure 
 - the number of descriptors in each language is not 
necessarily the same  -  the way descriptors are related to each other 
can be different for the different languages 
  8Example a non-symmetrical thesaurus
HEREIN thesaurus interlingua 
http//www.european-heritage.net/sdx/herein/thesau
rus/introduction.xsp 
 9(No Transcript) 
 10Each exists in its own language and structure. 
(PDF version) 
 11Each exists in its own language and structure. 
(PDF version) 
 12 Semantic problems
- Semantic problems pertain to equivalence 
relations between terms used as preferred and 
non-preferred terms in information retrieval 
languages.  - Equivalence relations exist not only within each 
separate language involved, but also between the 
languages (intra-language equivalence and 
inter-language equivalence).  - Intra-language homonymy and inter-language 
homonymy are also considered semantic questions.  - Additional problems pertaining to semantics 
involve the scope, form and choice of thesaurus 
terms. 
  13Examples of homographs in multiple languages
Cranes as a homograph in English does not 
necessarily mean that equivalent terms in other 
languages are also homographs. The Dutch term 
kranen is a homograph, too, but with the meanings 
cranes (lifting equipment) and taps. 
 14Structural problems
- Structural problems involve hierarchical and 
associative relations between the terms.  - An important question in this respect is whether 
the structure should be the same or different for 
each language.  - In most, if not all, cases of linking, the 
structure will most likely not be the same in all 
the information retrieval languages involved.  
  15Contents covered by the guidelines
- Building multilingual thesauri starting from 
scratch  - Structure 
 - Morphology and Semantics 
 - Starting from existing thesauri 
 - Merging 
 - Linking 
 - Glossary 
 - Appendix 
 - An example of a non-symmetrical thesaurus
 
  16II. U.S. Standard for Controlled Vocabularies  
NISO Z39.19
- NISO Z39.19-2005 Guidelines for the Construction, 
Format, and Management of Monolingual Controlled 
Vocabularies   - Some of the slides are based on 
 -  Emily Fayen 2004.6 SLA presentation, Margie 
Hlavas talk at 2005 Data Harmony User Group 
meeting 2005 and Marcia Zeng  NKOS Meeting in 
Denver, 2005  
  17A little bit of history
- ANSI/NISO Z39.19,Guidelines for the Construction, 
Format, and Management of Monolingual Thesauri  
1993  - The most frequently requested NISO Standard 
 - In spite of its age the Standard is still 
relevant  - 1999 NISO Workshop on Electronic Thesauri 
http//www.niso.org/news/events_workshop/thes99rpt
.html  - 2002 NISO initiates revision of Z39.19 
 - 2004 Z39.19-1993 reaffirmed 
 - 2005 New standard Z39.19-2005 published 
 
  18Scope
- Expand beyond thesaurus 
 - Make more user-friendly 
 - Explain important concepts 
 - Explain principles of vocabulary control 
 - Include electronic information environment 
 - Include additional user search methods 
 - Browse 
 - Navigate 
 - Keyword searching 
 - Expand beyond A  I services 
 - Include Web applications 
 
  19The Team
- Emily Gallup Fayen, project Leader-- MuseGlobal, 
Inc.  - Vivian Bliss  Microsoft 
 - Carol Brent  ProQuest 
 - John Dickert  DTIC 
 - Lynn El-Hoshy  Library of Congress 
 - Marjorie Hlava  Access Innovations 
 - Stephen Hearn  ALA 
 - Sabine Kuhn  Chemical Abstracts Service 
 - Pat Kuhr  H.W. Wilson Company 
 - Diane McKerlie  DMA Consulting 
 - Peter Morville -- Semantic Studios 
 - Stuart Nelson  National Library of Medicine 
 - Allan Savage  National Library of Medicine 
 - Diane Vizine-Goetz  OCLC 
 - Marcia Lei Zeng  Special Libraries Association 
 
  20Z39.19 Chapters
- Content1 Introduction 2 Scope 3 Referenced 
Standards 4 Definitions, Abbreviations, and 
Acronyms 5 Controlled Vocabularies  Purpose, 
Concepts, Principles, and Structure 6 Term 
Choice, Scope, and Form 7 Compound Terms 8 
Relationships9 Displaying Controlled 
Vocabularies 10 Interoperability11 
Construction, Testing, Maintenance, and 
Management Systems  
  21Whats new?
Added
- Coverage 
 - documents 
 - Types of vocabularies 
 - Thesauri 
 - Post-coordinated 
 - Printed formats 
 - Monolingual vocabularies
 
- Coverage 
 - Content objects 
 - Types of vocabularies 
 - lists, synonym rings, taxonomy 
 - Pre-coordinated 
 - Web format 
 - Multilingual vocabularies (general) 
 - Interoperability 
 - Facet analysis 
 
  22Types of vocabulary control-- based on the 
important principles 
 23Lists
-  A list is a simple group of terms 
 -  Example 
 - Alabama 
 - Alaska 
 - Arkansas 
 - California 
 - Colorado 
 - . . . . 
 -  Frequently used in Web site pick lists and 
pull down menus 
  24Source The J. Paul Getty Museum's implementation 
of The Museum System software by Gallery Systems 
 25Synonym Rings
-  A synonym ring is a list of synonyms or near 
synonyms that are used interchangeably for 
retrieval purposes 
  26Synonym Rings-- Examples
- Synonym rings are usually found as sets of lists 
that allow users to access all content containing 
any of the terms. 
- e.g., cholesterol 
 - Cholesterol 
 - Blood Cholesterol 
 - Serum Cholesterol 
 - Good Cholesterol 
 - Bad Cholesterol 
 - LDL 
 -  . 
 -  . 
 -  .
 
-- Frequently used in systems where the content 
is not indexed or the indexing vocabulary is not 
controlled 
 27An example from International SEMATECH a 
search for Silicon would look like this
Your search was submitted as SILICON or SI 
 28Synonym Rings are used--
- to expand queries for content objects. 
 - in systems where the underlying content objects 
are left in their unstructured natural language 
format.  - in conjunction with search engines and provide a 
minimal amount of control of the diversity of the 
language found in the texts of the underlying 
documents. 
  29Taxonomies
-  A taxonomy is a set of preferred terms, all 
connected by a hierarchy or polyhierarchy  -  Example 
 - Chemistry 
 -  Organic chemistry 
 -  Polymer chemistry 
 -  Nylon 
 - Frequently used in web navigation systems
 
  30Thesauri
-  A thesaurus is a controlled vocabulary with 
multiple types of relationships  -  Example 
 - Rice 
 -  UF paddy 
 - BT Cereals 
 - BT Plant products 
 - NT Brown rice 
 - RT Rice straw 
 
  31Thesauri (cont.)
- Relationship types 
 - Equivalence (Use/Used For) 
 -  indicates preferred term in a synonym 
relationship  - Hierarchy (NT/BT) 
 -  indicates broader and narrower terms 
 - Associative (RT/RT) 
 -  almost unlimited types of relationships may be 
used - related  -  It is the most complex format for controlled 
vocabularies and is widely used.  
  32Interoperability
- One of the most important issues from the 1999 
workshop  - Question How to 
 - compare indexes? 
 - perform searches? 
 - merge databases that have been developed using 
different controlled vocabularies?  
  33Interoperability (cont.)
- Factors Affecting Interoperability 
 - Multilingual Controlled Vocabularies 
 - Searching 
 - Indexing 
 - Merging Databases 
 - Merging Controlled Vocabularies 
 - Achieving Interoperability 
 - Storage and Maintenance of Relationships among 
Terms in Multiple Controlled Vocabularies  
  34III. The British Standard
- BS 8723 Structured Vocabularies for Information 
Retrieval  Guide  - Slides based on the presentation by 
 - Stella G Dextre Clarke, Alan Gilchrist ,Leonard 
Will  - In ISKO 2004, London
 
  35 Existing BSI/ISO thesaurus standards
- ISO 2788-1986 Guidelines for the establishment 
and development of monolingual thesauri  -   BS 57231987 
 - ISO 5964-1985 Guidelines for the establishment 
and development of multilingual thesauri  -   BS 67231985
 
  36 What needs updating?
- Printed versus electronic application 
 - Guidance on management software 
 - Interoperability 
 - Mapping between thesauri and other types of 
vocabulary  - Formats/protocols for data exchange with 
downstream applications  - Applicability to end-user applications, not just 
to those for information professionals 
  37 BS 8723 Structured Vocabularies for 
Information Retrieval  Guide
- Part 1 - Definitions, symbols and abbreviations 
 - Part 2  Thesauri Part 1 and 2 correspond to 
ISO2788 and they supersede BS 5723.   - Part 3 - Vocabularies other than thesauri  
 - Part 4 - Interoperability between vocabularies 
 Part 4 will supersede BS 6723, which is 
 equivalent to ISO 5964.  - Part 5 - Interoperation between vocabularies and 
other components of information storage and 
retrieval systems  
  38- Part 1 Definitions, symbols and abbreviations 
 - provides definitions, symbols, abbreviations and 
other conventions applying to all the parts.  - Part 2 Thesauri 
 - is designed for situations in which human 
indexers analyse documents and express their 
subjects using thesaurus terms, before searchers 
retrieve the documents with the same vocabulary. 
  39Part 3 Vocabularies other than thesauri
- Now in written 
 - Classification schemes 
 - Business classification schemes for records 
management  - Taxonomies 
 - Subject heading schemes 
 - Ontologies 
 
- Planned 
 - Classification schemes 
 - Taxonomies 
 - Subject heading lists 
 - Ontologies 
 - Semantic nets 
 - Search thesauri
 
  40Part 4 Interoperability between vocabularies
- Huge demand for accessing information that has 
been indexed with another language and/or 
vocabulary. The buzzword is Mapping.  - Part 4 includes multilingual thesauri as a 
special case of mapping between vocabularies.  - Part 4 applies to situations in which more than 
one language or vocabulary is in use, but access 
to all resources is needed through the one 
vocabulary chosen by the user. 
  41Part 4 (cont.)
- BS 8723 part 4 has a wider scope than BS 6723, 
which was concerned only with multilingual 
thesauri.  - It covers all of the previous ground and extends 
the scope to  - thesauri in different dialects of one language 
 - different thesauri in a single language 
 - situations where a thesaurus interoperates with 
one or more different types of structured 
vocabulary, such as classification schemes  - situations where not all the interoperating 
vocabularies have the same status and/or function. 
  42Part 5 Interoperability with applications
- Vocabularies must work with 
 - Search engines 
 - Content Management Systems 
 - Web publishing software, etc. 
 - Part 5 sets out the protocols and formats needed 
for the exchange of vocabulary data. 
  43Standards are available at
- IFLA Guidelines for Multilingual Thesauri 
 - http//www.ifla.org/VII/s29/pubs/Draft-multilingua
lthesauri.pdf  - US Standard -- NISO 2788 
 - http//www.niso.org/standards/standard_detail.cfm?
std_id814  - Tutorial http//www.slis.kent.edu/mzeng/Z3919/in
dex.html  - British Standard -- BS 8723  
 - These documents may be ordered from BSI Customer 
Services  - tel 44(0)208-996-9001 or 
 - email orders_at_bsi-global.com 
 
  44http//www.slis.kent.edu/mzeng/Z3919/index.html