Terminology Organization in Terminology Management Systems - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Terminology Organization in Terminology Management Systems

Description:

Terminology Organization in Terminology Management Systems Angela Boll, Marina Kaneva, Claudia Himmler, Chiara Huber, Annika Meinhardt, Patrick Johnson – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 71
Provided by: wwwhomesU
Category:

less

Transcript and Presenter's Notes

Title: Terminology Organization in Terminology Management Systems


1
Terminology Organization in Terminology
Management Systems
  • Angela Boll, Marina Kaneva, Claudia Himmler,
    Chiara Huber, Annika Meinhardt, Patrick Johnson

2
COMPILATION OF TERMINOLOGY
  • the most practical way to process lexical data is
    by computer
  • benefits speed, flexibility and storage capacity
  • growing trend towards the automation of
    terminological data processing
  • from now on, all aspects of terminology
    compilation, storage and retrieval will be
    assisted by or directly carried out by computers

3
PRINCIPLES OF COMPILATION
  • automation fundamentally affects the compilation
    of terminology
  • necessity to evolve completely new principles for
    compilation

4
PRINCIPLES OF COMPILATION
  • systematic terminology compilation is now firmly
    corpus-based
  • text corpora reinforce the principle that
    terminology compilation is an ongoing and
    repeated activity

5
PRINCIPLES OF COMPILATION
  • many technical texts can now be preserved in or
    converted into a suitable format for
    terminological analysis
  • texts which are to be processed by translators
    can be analysed and compared with current
    machine-readable terminology holdings and a
    machine-readable general dictionary in order to
    produce a listing of items not contained in
    either

6
PRINCIPLES OF COMPILATION
  • running text can be used totally independently of
    user requirements
  • terminology compilation is becoming increasingly
    text-oriented

7
PRINCIPLES OF COMPILATION
  • the second major innovation affecting principles
    of compilation is the division which is now
    possible between
  • the raw data as they are found in the corpus,
  • the database which contains all the information
    that is collected in suitably structured form,
    and
  • all the various subsets of information which are
    created for specific purposes and uses

8
PRINCIPLES OF COMPILATION
9
PRINCIPLES OF COMPILATION
  • the terminologist now has appropriate tools which
    lift his work from a craft to a scientifically
    supported activity
  • automatic processing and computer-assisted
    terminology compilation is therefore
    qualitatively superior to conventional methods
  • terminologist is freed from the limitations of
    the past with respect to size of individual
    records and total quantity of records

10
PRINCIPLES OF COMPILATION
  • however, there is also a danger private term
    collections of individual translators can become
    widely known
  • instead, there should be only one major database
    of terminological information for each language
    community, to which all users would refer and
    contribute
  • communication across all industrial and
    institutional barriers would be facilitated

11
The nature and type of terminological information
  • Information for the construction of a
    terminological record is various and subject to
    changes
  • This affects the nature of database system
  • Information in the database must be considered
    independent of each other
  • Information can be entered at different times and
    from different sources

12
The nature and type of terminological information
  • Full bibliographical information for each item is
    provided separately
  • Limitation of human manipulation of lexical data
    to the specific interpretative tasks the computer
    cannot perform
  • Concept is explained by indication of linguistic
    forms antonyms, broader and narrower generic
    terms (refer to a whole class of terms), broader
    and narrower partitive terms (relate to a part of
    a whole)

13
The nature and type of terminological information
  • Exemplification of the usage of technical terms
    example sentences (context) and usage notes
  • Terms meaning is semantically more changeable
    than items of the general lexicon of a langauge

14
The nature and type of terminological information
  • In conceptually-based terminological data banks
    definitions are given in one language only
  • Bilingual terminology is directional and
    non-reversible gt translation equivalents cannot
    be converted into entries of the source language
  • Translation equivalents do not refer to an
    authentic concept because they introduce new
    concepts

15
Methodological considerations
  • Terminologists dont need to be concerned how the
    data is stored in the computer thanks to the
    modern techniques of computational linguistics
  • Computer can store a multi-dimensional semantic
    network
  • No physical limitation of the size of any
    non-magnetic medium
  • Definitions can be as long as is necessary to
    properly define the term

16
Methodological considerations
  • Terminology compilation can be distributed
    physically and temporally
  • Information can be collected and stored in stages
  • As long as each item of data satisfies the
    controls (e.g. bibliographical reference) gt as
    much data as available can be entered at any time
  • Information can be collected on a distributed
    basis gt work can be distributed among various
    people and locations gt it is particularly
    important for the compilation of multilingual
    terminology

17
Quality of data
  • Computer usage for input control and validation
    resulted in a trend to terminology of a higher
    quality
  • Increased dangers of spreading terminology of low
    quality
  • Increase in quality is very important
  • Far-reaching effect of computerised terminology
    processing on terminology spreading

18
Quality of data
  • Distinction between original source texts and
    translated texts
  • Terms taken from texts in their original language
    genuine terms and as such have full validity
  • Terms taken from translated texts may either be
    valid terms or translation equivalents

19
Quality of data
  • Trend towards the use of genuine original texts
    for extraction of terms and contexts
  • There is no exact match of concepts for many
    terms across languages
  • Several possible equivalents together with
    context and usage information are needed for a
    correct choice

20
Principles of data collection
  • Set of basic principles for the compilation of
    terminological data
  • Certain consistency of criteria
  • Sources must be stated
  • Distinction between original and translated texts
  • Linguistic behaviour of terms should be
    documented by contexts so that all relevant
    textual variants are covered

21
Terminological Data Banks-A Definition-
  • Automated collection of vocabularies of special
    areas that serve a particular user group
  • Used for large translation services
  • Enhanced but still conventional glossaries
    transferred to a new medium

22
Terminological Data Banks-A Definition-
  • Designed to give response to the same questions a
    good dictionary is supposed to answer
  • But these questions only elicit direct responses
    from the various parts of the conventional
    dictionary

23
Terminological Data Banks-A Definition-
  • Examples
  • ENTRY PART QUESTION
    ANSWER
  • equivalent what is the French word
    imprimante
  • for laser
    printer? laser
  • gender what is the gender of
    feminine
  • imprimante?

24
Terminological Data Banks-A Definition-
  • These responses are not sufficient for a wide
    range of dictionary users
  • Answers may be ambiguous
  • Full potential of a lexical database was not
    exploited by existing term banks

25
Terminological Data Banks-A Definition-
  • Reasons
  • Information was not unified in a suitable manner
    in order to retrieve it
  • Lack of coherent structure
  • Existing system failed to exploit new and
    additional techniques for ordering and
    representing the data

26
Terminological Data Banks-A Definition-
  • There was an increasing demand for a system that
    allows to answer complex queries
  • Example
  • QUERY
    SEARCH OF FIELD
  • what do you call a machine
    definition or
  • that performs X?
    conceptual links

27
Terminological Data Banks-A Definition-
  • a collection, stored in a computer, of
    special language vocabularies, including
    nomenclatures, standardised terms and phrases,
    together with the information required for their
    identification, which can be used as a mono- or
    multilingual dictionary for direct consultation,
    as a basis for dictionary production, as a
    control instrument for consistency of usage and
    term creation and as an ancillary tool in
    information and documentation.

28
Terminological Data Banks-A Definition-
  • Term banks are supposed to be used by people
    with varying degrees of expertise and different
    purposes

29
Semantic Networks
  • Complex storage of data to represent
    terminological relationships
  • First developed in artificial intelligence
    research for formal representation of the human
    knowledge
  • Have no intrinsic meaning they are basically
    directed graphs
  • They have superficial similarity

30
Semantic Networks
  • Example

31
Semantic Networks
  • The relationships between concepts are expressed
    through abbreviations
  • Generic relationshipis a type ofisa
  • Partitive relationshipis a part of /
    consists ofispart- of/has-part
  • Nodes different concepts
  • Arcs labelled links

32
Semantic Networks
  • A wide variety of relationships between concepts
  • To create semantic networks it is necessary to
    define a specific number of relationships and a
    coherent internal structure
  • System must allow only one single method of
    description for each type of relationship
  • Networks have to be subject field-specific

33
Semantic Networks
  • In order to get a perfect result the end-user
    poses questions to the system
  • The fragments are matched against the network
    data base
  • Variable nodes in the fragments are bound to the
    value they must have in order to make the match
    perfect

34
Semantic Networks
  • The success of term banks depends on several
    factors
  • The semantics of the network arcs must be
    carefully defined
  • System must be easy to implement and
    user-friendly
  • Danger of over-complicated system that is too
    detailed

35
Compilation of TerminologyTerminological
information
  • What terms are used in a terminological tool?
  • The selection of the most effective terms is
    assisted by reference to terminological
    information which is collected in
    dictionaries/glossaries/term banks
  • Principal factor of effectiveness type and
    quality of information

36
Terminological information
  • International consensus on basic categories for
    terminological records
  • entry term
  • a reference number
  • a subject field
  • a definition
  • an indication of the usage

37
Terminological information
  • Customary to add indication of the sources of the
    term(s), definition, context or any foreign
    language equivalents
  • It is up to the user to decide on appropriateness
    of terms

38
Corpora of raw data containing definitions,
terms, contexts
Source information
origin
type
origin type
origin type
origin type
No.
No. page
No. page
No. page
page
Conceptual Specification
Linguistic Specification
Pragmatic Specification
FL equivalent Specification
language
language
Equiv. term
language
definition
term
context
Grammatical information
Grammatical information
links to other concept
Usage note or example
synonyms
synonyms
scope notes
abbreviation
usage
abbreviation
subject field
variants
usage
variants
date type
date type
date type
date type
pool number
record number
terminologist
Housekeeping information
39
Terminological informationBasic data categories
  • What information is included in a
    multifunctional term record?
  • Information complex and consists of a number of
    subsets which can be compiled and processed quite
    separately.

40
Terminological informationBasic data categories
  • In which categories is the term record
    structured?
  • 1. source information links the term record to
    the raw data files
  • 2. entry term either linguistic item or a label
    of a concept, or both
  • 3. semantic and conceptual specification
    definition, a subject attribution, scope notes,
    set of links to other concepts
  • 4. linguistic specification e.g. variants,
    abbreviations

41
Terminological informationBasic data categories
  • 5. pragmatic specification examples of the
    context in which term occurs, usage notes
  • 6. housekeeping or administrative information
    record number, name of terminologist, dates of
    first processing, up-dating of the record
  • 7. foreign language equivalent specification in
    translation-orientated databases

42
Terminological informationBasic data categories
  • Now let us take a closer look on the information
    categories
  • Entry Term
  • - most common search item
  • - presented in the most relevant form
    (e.g. sing. for nouns)
  • - because the distinction between concept-
    or term-orientation
  • affects the treatment of
    homographs/synonyms ? decision,
  • whether entry term represents concept
    or is the linguistic
  • form

43
Terminological informationBasic data categories
  • - In concept-orientated term banks primary
    importance on the
  • definition of the concept and all terms
    matching the definition
  • are grouped together ? imposes difficult
    choice of the order in which terms are listed
  • - Exclusive concept orientation (e.g. NORMATERM)
    is doable
  • in mono- and bilingual term banks which deal
    with subject fields of similar conceptual
    structures

44
Terminological informationBasic data categories
  • - For multilingual term banks explanatory notes
    are required which indicate in every case the
    scope and degree of matching a term with the
    concept defined in another language
  • - Three types of entry
  • 1. simple compound or complex terms
  • 2. phrases regardless of lexicalisation
  • 3. sentences

45
Terminological informationBasic data categories
  • Conceptual Specification
  • Definition
  • - first item that links entry term to the
    concept
  • - can be in a style specific to the term
    bank, or extracted from
  • an authoritative source
  • - term banks can be classified by the way an
    entry is identified
  • or explained
  • - there are two major schools of thought

46
Terminological informationBasic data categories
  • - The first can refer to a definition which is
    strictly limited in its validity to the range of
    texts which represent the source material for the
    term collection
  • - In the second there is no restricted corpus ?
    no single valid definition in the first place

47
Terminology informationBasic data categories
  • Relationships
  • - most controversial and least defined category
    of information, it may indicate no more than the
    most obvious broader term
  • - information could be a reference to another
    record
  • Subject Field
  • - terminology is divided by subject field
    before ordered in another way
  • - because of the large quantities of terms it
    is advisable to introduce a classification of
    terms by subject areas

48
Terminological informationBasic data categories
  • Scope Note
  • - can be considered a further specification
    of subject or register
  • - is intended to indicate a special field in
    application
  • Linguistic specification
  • Grammatical Information
  • - can consist of spelling, pronunciation,
    gender for nouns,
  • parts of speech (e.g. n, v, adj.),
    principal parts of verbs (e.g. infinitive, past)

49
Terminology informationBasic data categories
  • Language
  • - is important in term banks where it is
    combined with an
  • indication of the country where it is used
  • Parallel information categories to the entry
    term
  • - has usually no separate record but is listed
    in an index with a
  • reference to the record of the entry term
  • - comprises information as spelling, expanded
    forms or
  • reduced forms or synonyms
  • - several overlapping categories exist
    variants, full synonyms abbreviated forms

50
Terminology informationBasic data categories
  • Pragmatic specification
  • ? Context
  • Gives examples of the way that the entry term is
    used in a language
  • Is considered a successful way of showing any
    unusual features of wordform, inflection or
    collocation
  • The context should make the definition and the
    usage note complete

51
Terminology informationBasic data categories
  • Usage Note
  • - gives information about the way the entry
    term is used in context
  • - cannot be provided in the form of examples of
    a real context
  • e.g. collocational restrictions of formal
    variants
  • colloquial
  • slang
  • mandatory
  • firm-specific
  • standardised
  • translation
  • General language dictionaries further usage
    labels as archaic,
  • informal, taboo, derogatory, offensive,
    vulgar but these are
  • only rarely found in terminology

52
Terminology informationBasic data categories
  • Quality Label
  • - term banks show in many different ways
    whether a term is standardised or and whether a
    term in a foreign language or borrowed from a
    foreign language can be considered established
    usage
  • Synonyms
  • terms that differ from the entry term (by usage,
    context and subject field)
  • usually are fully entry terms and represent a
    crossreference in the term bank structure

53
Terminology informationBasic data categories
  • Source Reference Specification
  • ? Sources
  • printed dictionaries rarely give an indication
    of the source
  • term banks source of every relevant item of
    information is recorded
  • needed for entry term, definition, context,
    translation equivalents, possibly also for
    synonyms
  • Can determine the selection criteria according to
    which information is collected

54
Terminology informationBasic data categories
  • Can consist of
  • source origin
  • ? (reliable sources in the UK
  • BSI - British Standard Institute
  • CEC - Commission of the European Communities
  • HMS - Her Majestys Stationery Office
  • ISO - International Organisation for
    Standardisation
  • IEC- International Electrotechnical Commission
  • - The origin of a term may be its best
    indication of quality and usage.
  • - detailed reference of the source (e.g. year
    of publication ? acceptability of a term)

55
Terminology informationBasic data categories
  • Source type
  • - Indication of the type of document of the
    source
  • ?Article in specialist literature
  • ?Contracts and legal usage
  • ?Governments circulars to the general public
  • ?journalistic publications
  • ?manuals
  • ?patents
  • ?publicity material
  • ?research reports
  • ?standards
  • ?dictionary words ? should be avoided

56
Terminology informationBasic data categories
  • Sources of definition and contexts
  • should show different areas of usage
  • Source for the foreign language equivalent
    should match the source of the entry term to make
    it suitable
  • ? Source reference code or number
  • large databases a separate source reference file
    that gives the full bibliographical details for
    written sources
  • databases of raw data reference can be directly
    into the different/specific file

57
Terminology informationBasic data categories
  • Housekeeping information (or administrative
    information)
  • ? Record number
  • Consists of a number for the entry, possibly with
    some subcategories
  • Possible subsets of the database can be used to
    identify a topic (e.g. the terminology of a
    particular product, manual, congress or set of
    documents which can cut across subject field
    divisions)
  • Such subsets are often the basis of the database
    they are isolated for separate use

58
Terminology informationBasic data categories
  • ? Author of Record
  • for checking the work
  • author either terminologist or committee
  • ? Date of Record
  • date of the production of the first record
  • and any up-dates

59
Terminological informationMethods of compilation
  • ? Methods of compilation
  • No fully acknowledged and general methodology
  • Terminology compilation must become
    user-oriented!
  • Serious terminology compilation is firmly
    corpus-based ? relies on the analysis of textual
    evidence
  • Compilation can be a discontinous process as long
    as certain items of information which are
    connected and have an effect on each other are
    compiled at the same time
  • Compilation must be seen as an ongoing revision
    and up-dating process

60
Terminological informationMethods of compilation
  • Term banks softwares should provide a facility
    for prompting terminologists when building up
    terminological records.
  • some form of expert system is required to control
    the work of terminologists
  • If machines themselves shall be end-users of
    terminological databases there must be greater
    precision and explicitness of identification in
    the compilation of data.
  • Methods to be applied in the regular compilation
    of terminology depend on
  • 1. nature of data available
  • 2. purpose of compilation

61
Terminological informationMethods of compilation
  • Methods change with the degree of automatic
    support available ? rapid advances in the design
    of automatic tools
  • no specific model is possible
  • - most cases
  • 1. a corpus of text is put together in
    machine-readable form (criteria
    representativeness, completeness, relevance)
  • 2. corpus is fully indexed
  • 3. terms are isolated and extracted
  • 4. terms are sorted automatically and variously
    grouped
  • 5. terms are matched with definitions

62
Terminological informationMethods of compilation
  • 6. the provisional file is enlarged and
    corrected
  • 7. terms are placed in relationship to other
    terms
  • 8. terms are attributed to particular subject
    fields if required
  • 9. a term record is created which contains
    only the term with its linguistic variants
  • 10. the term record is completed with the
    addition of the house-keeping information
  • - The amount and diversity of data collected
    in the term record varies according to the range
    of purposes of the data base.

63
IATE (iate.europa.eu)
  • IATE ( Inter-Active Terminology for Europe)
  • it is the EU inter-institutional terminology
    database system
  • IATE has been used in the EU institutions and
    agencies since summer 2004 for the collection,
    distribution and shared management of EU-specific
    terminology

64
About IATE
  • EU institutions and agencies involved
  •          European Commission
  •          Parliament
  •          Council
  •          Court of Justice
  •          Court of Auditors
  •          Economic Social Committee
  •          Committee of the Regions
  •          European Central Bank
  •          European Investment Bank
  •          Translation Centre for the Bodies of
    the EU

65
About IATE
  • The project was launched in 1999 with the
    objective
  • - to provide a web-based infrastructure for
    all EU terminology resources
  • - enhancing the availability
  • - standardisation of the information

66
About IATE
  • existing terminology databases by the European
    Commission, Council, Parliament and Translation
    Centre have been imported into IATE
  • ? single new, highly interactive and
    accessible interinstitutional database
  • ? approximately 1.4 million multilingual
    entries

67
SDL MultiTerm 2007
  • SDL MultiTerm captures, creates, manages and
    distributes terminology
  • Designed for companies (e.g.marketing), which
    spend significant time and resources to create
    words which position their brand, company and
    products to the market

68
SDL MultiTerm 2007
  • Concept-based terminology management
  • Web- and server-based access
  • Different search types (e.g. Fuzzy Search)
  • Customisable data entry definitions and layouts
  • It supports all worldwide languages
  • Cross-references to easily link entries to each
    other (Unicode)

69
SDL MultiTerm 2007
  • Benefits
  • - delivers accurate and approved terminology
    with real-time verification during the
    translation process
  • - quickly builds corporate glossaries
  • - improves publication quality

70
Bibliography
  • Sager, J. 1990. A practical course in terminology
    processing. John Benjamins B. V.
  • Quah. C. K. 2006. Translation and Technology.
    Basinstoke (UK). Palgrave Macmillan.
  • IATE, iate.europa.eu
  • SDL MultiTerm 2007, www.sdl.com
Write a Comment
User Comments (0)
About PowerShow.com