Title: Sociopolitical Domain as a Bridge from General Words to Terms of Specific Domains
1Sociopolitical Domain as a Bridge from General
Words to Terms of Specific Domains
Natalia V. Loukachevitch, Boris V. Dobrov
2General Words and Terms in Automatic Text
Processing
- Texts in electronic collections contain as
general words as terms - Two different research domains lexicology and
terminology - Wuster (founder of Vienna school of
terminology) terminologists begin
consideration from a concept, but lexicologists
from a form of a linguistic expression
3Wuster difference between lexicological and
terminological approaches
- terminological research starts from the
concept which has to be precisely delimited - in terminology concepts are considered to be
independent from their designations - terminologists talk about concepts while
linguists talk about word meanings
4Construction of Wordnets and Terminology Research
- Development of wordnets
- Construction of hierarchical semantic
networks - Search for similar synsets for different
languages - building the top ontology of
language-independent concepts - Approaches to study of general words and terms
become closer
5Theory of Terminology Properties of Ideal Term
- the term must relate directly to the
concept. It must express the concept clearly, - there should be no synonyms where absolute,
relative or apparent, - the contents of terms should be precise and
not overlap in meaning with other terms, - the meaning of the term should be
independent of context.
6Theory of terminology serious difference
between a general word and a term
- biunivocal relationship between concepts and
terms in each special field of knowledge - For a terminology nothing could be better than
that no synonymy, no homonymy and no polysemy - A huge gap between general words and terms
- BUT!
7Term Formation and Words of General Language
- A general sense of a word and a
terminological senses of a word are really
different function as a general word,
function in mathematics, function in
biology - Cruse senses of a lexical form are
antagonistic to one another that is to say ,
they can not be brought into play
simultaneously without oddness
8A word and a term are very similar in meaning
- arson - Law. the malicious burning of another's
house or property, or in some statutes, the
burning of one's own house or property, as to
collect insurance (Random House Unabriged
dictionary) - A general dictionary uses a very strict definition
9How to distinguish terminological and general
senses
- Teacher in court accused of school arsonA
teacher charged with setting fire to a West
Yorkshire school has appeared in court. Amina
Ditta, 23, of Scholemoor Road, Bradford, has
faced the city's magistrates court charged with
one count of arson. The charge relates to an
incident last Wednesday at Atlas Primary School
in Manningham, where Ms Ditta was employed. She
spoke only to confirm her personal details and
was represented by barrister Mr Narinda Sekhon.
She was granted conditional bail to return to
court on June 12. - (http//www.ananova.com/news/story/sm_579296.htm
l)
10Traditional point of view definitions
- Traditional terminologists definitions of
terms are strict in comparison to glosses of
general words - Contemporary point of view degree of
vagueness in term definitions is lower, but in
many cases it is inevitable. - Taxation in Russian legislation New
construction vs. repair
11How many general and terminological senses are so
close? - 1
- Building - relatively permanent enclosed
construction over a plot of land, having a roof
and usually windows and often more than one
level, used for any of a wide variety of
activities, as living, entertaining, or
manufacturing (Unabridged Webster dictionary) - Domains
- Construction industry
- Domain of public utilities
- It is impossible to separate senses
- Practically all denotations are the same
12How many general and terminological senses are so
close? - 2
- transportation means, job positions,
technical devices, food, agricultural plants
and animals other natural objects, art work
and others - Produced by professionals
- we use them in everyday life
- social, political and economic processes
- planned or restricted by professionals,
- our life is influenced by them
13General words and terminologies
- Intersection is significant
- Number of words in general dictionaries --
40-50 percents belong to the intersection area - We call this intersection area --
socio-political domain -- domain of social
life -- it describes everyday life of
contemporary society
14The sociopolitical domain and domains in WordNet
- Many researchers proposed sets of domains for
WordNet and EuroWordNet - The sociopolitical domain is approximately equal
to sum of the proposed domains - A synset is related to the sociopolitical domain
if there is a professional domain (not science)
that has a term with very similar sense (-
vagueness) - Emotions and feelings do not belong to the
sociopolitical domain
15Multiword terms from specific domains
- A lot of multiword terms from professional
domains are understandable to native speakers - Multinational country
- Single member constituency
- Amicable agreement
- Global market
- Criminal omission
- Special criteria for inclusion of multiword
expressions
16Features of Sociopolitical Domain-1
- Texts of various genres official documents,
international treaties, legislative documents,
newspaper articles are related to the
sociopolitical domain. - Development of a unified linguistic resource for
automatic text processing of such various texts - A broad basis for development of domain-specific
resources
17Features of Sociopolitical Domain-2
- Inclusion of multiword terms facilitates
disambiguation procedures - Ambiguity within the domain is much lower than in
the whole resource, distinctions between senses
are more definite and more important it is
possible to use different disambiguation
procedures within the sociopolitical domain and
out of the domain - Procedures of identification of lexical cohesion,
lexical chains can be also different for synsets
in the sociopolitical area and out of it, because
of more thematic definiteness of concepts in the
sociopolitical domain (privatization vs.
creation)
18Experience of Work in Sociopolitical Domain
- Project University Information System RUSSIA
(www.cir.ru) 800 thousand Russian Documents
(after 1991) - Russian thesaurus on Sociopolitical life (since
1994) concept-based network of 30 thousand
concepts, 75 thousand words and terms - Automatic text processing since 1995 text
categorization, automatic conceptual indexing,
text summarization
19University Information System RUSSIA
(www.cir.ru) 800,000/ 7.5Gb
20Socio-Political Domain vs. Lexicon
Sciences
110,000 text entries 50,000 concepts
Lexicon
75,000 text entries 30,000 concepts
Socio-Political Domain
Levels of Hierarchy
21Specific Domains vs. Socio-Political
Socio-Political Domain
Elections
Geography
Industrial Production
Levels of Hierarchy
22Interrelations between Socio-Political Domains
Socio-Political Domain
Taxation
Law
Accounting
Banking
Levels of Hierarchy
23Sciences vs. Socio-Political Domain
Social Sciences
Socio-Political Domain
Natural Sciences
Socio-Political Domain
24Specific applications of Sociopolitical thesaurus
- Terms of economics and sociology were included
automatic text categorization of scientific
papers (700 Categories JEL (Journal of Economic
Literature subject headings) - Terms of non-production spheres were added
automatic text categorization of Russian
legislation (3000 categories of the commercial
subject headings system)
25Conclusions-1
- A border between a general language lexicon
and terminologies of specific domains is not
sharp and abrupt. - It looks more like a broad strip and
contains general language senses practically
coinciding with concepts of social subdomains
and concepts of specific domains
understandable for native speakers
26Conclusions-2
- Detailed description of concepts, terms, words
from this transition area, called
sociopolitical domain, can be naturally added
to a wordnet semantic network - and facilitate solution of such problems as
lexical disambiguation and identification of the
text structure, enhance coverage of
domain-specific texts by wordnets synsets,
improve effectiveness of the wordnets use in
various automatic text processing applications
27(No Transcript)