Development of Ontology and Thesaurus for Agriculture in China - PowerPoint PPT Presentation

About This Presentation
Title:

Development of Ontology and Thesaurus for Agriculture in China

Description:

Major social and economic benefits ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 48
Provided by: afi1
Category:

less

Transcript and Presenter's Notes

Title: Development of Ontology and Thesaurus for Agriculture in China


1
Development of Ontology and Thesaurus for
Agriculture in China
  • Li Shijing He Chunpei
  • Scientech Documentation and Information Centre
  • Chinese Academy of Agricultural Sciences
  • Beijing 100081, China

2
1. Chinese Language Agricultural Thesaurus
  • passed the ministry level identification in1992
  • printed in hardcopies in July 1994
  • won the State Third Prize of Scientech Progress
    awarded by State Science and Technology
    Commission in 1995

3
1.1 Main achievements in the development of
Agricultural Thesaurus
  • Selection of terms
  • a tool in document indexing and information
    retrieval ------ a component of the State
    Thesaurus Database
  • We have expanded the selection of terms to
    include agricultural information retrieval
    systems, relevant thesauri, relevant authority
    dictionaries, textbooks, monographs, periodicals,
    related classification schemes and terms
    recommended by agricultural experts and users.

4
  • Capacity Agricultural Thesaurus (AT) includes
    240 domains in 45 disciplines and contains 63,000
    entries, with 51,000 descriptors and 12,000
    non-descriptors, the largest agricultural
    thesaurus in the world. AT just meets the demand
    of rapid development of modern agricultural
    science in the world.

5
1.2 Enhanced compatibility
  • (1) AT is compatible with both foreign and
    domestic thesauri
  •  
  • (2) AT is compatible with the Chinese
    Classification Scheme
  •  
  • (3) AT is compatible with the State descriptor
    database.

6
1.3 Key technologies and innovation points
  • Thesaurus mode and display design AT has adopted
    integrated mode of classification and subject and
    full display of interterm relation

7
  • (2) Construction of descriptor reference system.
    It is the key in thesaurus development. AT has
    the following innovations
  • ? Only establish three kinds of relations used
    for, related term and narrow term
  • ? Use reference relation to narrow down big word
    family relation, hence, enriched the
    compatibility and inter transformation capacity
    of AT, and improved thesaurus quality and
    searching efficiency.

8
  • (3) Thesaurus compatibility improvement
    technology
  • Innovation point unified descriptor and
    classification names, thus made AT to be
    compatible to different kinds of search languages.

9
  • (4) Compilation method
  • Innovation points
  • Change the traditional method to
    integrated domain as the unit, thus increasing
    the efficiency by 3 times.

10
  • (5) Use computer to realize automation in
    database establishment, table compilation,
    management and printing,
  • With the following innovations

11
Innovations
  • ? Used the third model to design the thesaurus
    database so as to solve the difficult issues of
    interterm relation.

12
Innovations
  • ? Establish special Chinese character database
    and realize comprehensive sorting of Grade I and
    Grade II character database.

13
Innovations
  • ? Establish maintenance system of compilation and
    Chinese character database.

14
Innovations
  • ? Realize linkage of multiple systems. We can
    generate MINISIS system database and also machine
    readable thesaurus in laser-jet printer.

15
Innovations
  • ? Establish thesaurus output system, which is
    selective and general use. It can generate
    thesaurus of multiple structures.

16
1.4 Users Comments
  • (1) AT has wide domain coverage with broad
    collection of terms, standardization and high use
    rate.
  • (2) AT has a rational general structure, which
    has expanded view when query.
  • (3)  AT has comprehensive interterm relations and
    adopted full display mode, which is convenient
    for query, thus improved success rate in first
    reading and saved time in indexing.
  • (4)  ATs offsetting is compact with page headers
    indicator, which has provided convenience for
    query.

17
1.5 Social and Economic Benefits
  • AT has been included in the State Descriptor
    Database, which will bring into full play the
    role of AT in the modernization of agricultural
    information work. Under the support of the
    computer compilation system, we have produced the
    machine-readable product and hardcopy of AT. In
    this work the work efficiency has been improved
    by 300 and saved 75 of the production cost.

18
Major social and economic benefits
  • (1) AT can provide unified, standard tools for
    the users in information resources sharing.

19
Major social and economic benefits
  • (2) Due to the standardized terms in AT, it
    provides good software environment and data
    preparation for the standardization work, such as
    Agricultural Terms Database.
  •  

20
Major social and economic benefits
  • (3) Due to the fact that AT is a multilingual
    thesaurus, it can be used in network for
    international information resources sharing.

21
Major social and economic benefits
  • (4) AT can be used in any machine translation
    systems, thus promote automatic indexing of
    agricultural documents and machine translation.
  •  

22
Major social and economic benefits
  • (5) AT can provide good conditions for providing
    common vocabulary for agricultural institutions,
    research institutions and universities. AT can
    also be used in office automation of these
    institutions and universities.

23
Major social and economic benefits
  • (6) On the basis of the classification tables of
    AT, we can develop and research on classification
    schemes of different agricultural domains, thus
    making even greater contribution for agricultural
    information searching language and theory.

24
Major social and economic benefits
  • (7) We can develop agricultural and biological
    vocabulary compendium on the basis of biological
    name classification database and agricultural
    descriptor database, so as to standardize
    agricultural terms and biological names.

25
Major social and economic benefits
  • (8) AT has been used by 30 academies of
    agricultural sciences and more than 40
    agricultural universities in subject indexing and
    information query when they create agricultural
    information databases. AT has made great
    contributions in the standardization of indexing
    and query in modernized networking age.

26
2. Application of Agricultural Thesaurus
  • ----- Study on computer translation system
  •  

27
  • SDIC/CAAS has developed a computer translation
    system on the basis of AT.
  • The system has a dictionary database with a total
    number of over 300,000 words.
  • The system has been released to public in 2000
    and is now broadly used by clients in various
    economic circles.

28
  • The machine translation system includes
    following domains
  • Crop industry, animal science, biology,
    biotechnology, plant pathology, animal pathology,
    soil science, soil conservation, food technology
    and biological classification.
  • It also contains words in chemical engineering,
    computer science, genetics and 72 domains related
    to agriculture.

29
2.1 Content of the study
  • The main study contents include
  • ? Collect agricultural English common vocabulary
    and create an English-Chinese agricultural
    electronic dictionary

30
  • The main study contents include
  • ? Study and compile agricultural language
    material databases for both translating English
    into Chinese and vice versa.

31
  • The main study contents include
  • ? Select machine translation software packages
    through comparative studies.

32
2.2 Significance of the study
  • ? It is an important means in making use of
    international agricultural information resources.
    It has promoted the Chinese agricultural
    scientech information to go to the world and for
    the formation of a Chinese agricultural
    information industry.

33
2.2 Significance of the study
  • ? It is of great importance to improve the
    efficiency of English-Chinese and Chinese-English
    translation so as to meet the demand of
    translation service.

34
2.2 Significance of the study
  • ? The study has not only promoted the unfolding
    of agricultural term study in the world and also
    opened up a new area in computer application.

35
3. Application of Agricultural Thesaurus
  • ----- Translation of AGROVOC multilingual
    thesaurus
  •  

36
  • FAO published the first edition of AGROVOC in
    1982, and then published the second, third and
    fourth editions in 1988, 1995 and 1999,
    respectively.
  • FAO had issued the network edition in 2000.
  • The fourth edition had four languages, namely
    English, French, Spanish and Arabic.

37
  • AGROVOC has a total number of 16,607 descriptors
    and 10,758 non-descriptors, which are used for
    indexing information materials produced within
    the international cooperative information system
    AGRIS and CARIS, and for data retrieval from
    those systems.

38
  • We first used the AT database to screen the
    descriptors in AGROVOC, and found more than 50
    of the descriptors were already in AT.
  • Therefore, the translation load has been reduced
    half when we used the Agricultural Thesaurus.

39
  • There are about 20 agricultural domains in
    AGROVOC. All the descriptors and non-descriptors
    need to be divided according to their specialties
    and distributed to the translators before
    translation.

40
  • It is impossible to divide it based on English
    and their term code. We found all the first terms
    in every word trees based on AGROVOC word
    relation and created a new database, which
    included every word tree.

41
  • The translator can translate the term starting
    from the first level to the narrower terms till
    the last level.
  • In every hierarchical level, we also put out
    their related words, but translators need not
    translate these words, because all these words
    will appear in the hierarchical levels.

42
  • As some terms belong to different word trees,
    therefore, there are some overlapping terms in
    our database.
  • The total number of overlapping terms is about
    2000, which may be allocated to one or several
    translators

43
  • When the same terms in different word trees were
    translated, we can find very small differences
    among their translations and can select a term,
    which is more appropriate in the translation.

44
4. Outlook on information retrieval in networks
  • SDIC/CAAS is a government institution, which is a
    national research organization in the field of
    agricultural information research in China.

45
  • SDIC/CAAS is looking for both research partners
    and funds in agricultural ontology development.
    We are willing to cooperate with any
    international or national organization.

46
  • China is willing to cooperate with any
    international or national organization so as to
    facilitate more capable results in information
    network search.

47
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com