Where%20do%20we%20stand?%20MT%20development,%20research,%20and%20deployment%20in%20Asia - PowerPoint PPT Presentation

About This Presentation
Title:

Where%20do%20we%20stand?%20MT%20development,%20research,%20and%20deployment%20in%20Asia

Description:

Where do we stand? MT development, research, and deployment in Asia Key-Sun Choi (KAIST) AAMT http://www.asianlp.org/ http://www.afnlp.org/ http://korterm.org/ – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 40
Provided by: Key93
Learn more at: http://www.eamt.org
Category:

less

Transcript and Presenter's Notes

Title: Where%20do%20we%20stand?%20MT%20development,%20research,%20and%20deployment%20in%20Asia


1
Where do we stand? MT development, research, and
deployment in Asia
  • Key-Sun Choi (KAIST)
  • AAMT
  • http//www.asianlp.org/
  • http//www.afnlp.org/
  • http//korterm.org/

2
Contents
  • China
  • Japan
  • India
  • Malaysia
  • Thailand
  • Taiwan
  • Korea
  • UNL
  • Associations related to MT

3
MT in China 1980-1990s
  • To translate the scientific documents
  • From Russian and Western Countries language
  • Supported by government
  • No private company in early stage
  • TRANS-STAR
  • 30,000 words/hour for 386 PC.
  • Basis dictionary includes 40,000 entries,
  • 10 specialized technical dictionaries
  • including 350,000 entries.
  • subject fields computer, economics,
    telecommunication, ceramics, thermal power
    industry, printing machine industry,
    automobile/tractor industry, Petroleum
    prospecting, geology, Chemical industry.

4
MT in China PresentEnglish-to-Chinese
  • GAOLI
  • jointly by Beijing GAOLI Computer Co. Lid.
    Linguistics Institute of CASS.
  • Basic lexical dictionary 60,000 entries in which
    usage and grammatical function of every word is
    described in detail.
  • Translation accuracy 80
  • Readability of translated text 80-90
  • 863-IMT/EC
  • by the Institute of Computer Technology, Academia
    Sinica.
  • commercialized and got very good economic
    benefits.

5
MT in China PresentChinese-to-English
  • SINO-TRANS
  • by the Company CSS (China National Software
    Technology Service Co.) at 1993.
  • Basic dictionary 40,000 entries
  • Two special subject technical dictionaries Naval
    ships and boats (9312 entries), rocket-gun
    (33,773 entries)
  • Linguistic rules 1,000 rules

6
MT in China PresentEnglish-to-Chinese
terminology
  • TONGYI system
  • by the Tianjin DATONG computer software company
  • WINDOWS platform
  • Different special subject dictionaries
  • a. commonly-used scientific terms 200,000
    entries
  • b.terms including 22 different subjects (e.g.
    machine building, telecommunication, aviation,
    medicine, etc) 3,000,000 entries
  • Good market strategy and service
  • Cooperation with enterprises

7
MT in China PresentEnglish-to-Chinese
internet browsing more user interface
  • YIWANG
  • by SUNSHINE company of Shenzhen.
  • Highest translation speed 100 sentences per
    second.
  • Internet browsing
  • YIBA
  • by YAXINCHENG software technical company.
  • Three translation on line, automatic, interface.
  • Open to users to revise dictionary and rules
  • Rich special subject dictionaries 30 subjects
    (e.g. Computer, telecommunication, medicine)

8
MT in China PresentEnglish-to-Japanese
  • E-to-J
  • by JEC company in Beijing.
  • Technique of transformation from phrase tree
    (P-tree) to dependency tree (D-tree).
  • Closely integrated with word processor

9
MT in China PresentExample-based MT
experimental systems
  • Japanese-Chinese EBMT
  • computer department of Qinghua university in
    1996.
  • corpus for Japanese and Chinese alignment
    sentences
  • The example unit is sentence
  • The similarity rate calculation based on word
  • DAYA EBMT
  • Harbin Polytechnic University.
  • machine-aided translation system, human factor is
    very important
  • corpus is sentence-level alignment

10
MT in ChinaGovernment Funding 1990s
  • Hi-Tech 863 funding
  • 863-IMT/EC system (English-Chinese)
  • SUNSHINE YIWANG system.
  • 905 Chinese Language Processing Project
  • completed in 1998.

11
MT in ChinaUsers English Level
  • The proportion of English level of user for
    TONGYI MT software
  • Higher level 16.5
  • Middle level 49.5
  • Lower level 34.1
  • So the MT software must be oriented to common
    people

12
MT in ChinaPotential Users
  • The proportion of enterprise user for TONGYI MT
    software
  • Small enterprises 31.3
  • Medium-scale large-scale enterprises 68.7
  • So the MT software must be oriented to
  • large-scale medium-scale enterprises,
  • but we dont ignore the small enterprises that
    also has translation demand.

13
MT in ChinaRegional Distribution
  • Users region distribution of MT software
  • translation demand is concentrated in the big
    cities and developing regions.
  • Beijing 18.7
  • Liaoning 7.9, Jiangsu 7.5
  • Zhejiang 6.5, Hubei 6.5, Shanghai 6.1
  • Sichuan 4.7, Guangdong 4.7
  • Henan 3.3, Helongjiang 3.3
  • Hebei 2.8, Shanxi 2.3, Jilin 2.3
  • Yunnan 1.9, Neimeng 1.5, Gansu 1.4
  • Guizhou 0.5, Anhui 0.5

14
MT in China - Future and Strategies
(1)Terminology Data Bank
  • MT software combines with terminology data bank
  • 1990 sub-committee of computer-aided in
    terminology of China set up.
  • This sub-committee is attached to the State
    Language Commission (SLC) of China
  • A series of national standards for terminology
    data-bank
  • Terminology Databank creation
  • Chinese-English Since 1995, by ISTIC (Institute
    of Scientific and technical Information of China)
  • Remarkable databanks

15
MT in China - Future and Strategies (2)Language
Corpus Processing
  • Corpus construction
  • the scale of 25 million Chinese characters (1999)
  • Automatic segmentation of Chinese writing text in
    corpus (97.68, close test)
  • Automatic phrase bracketing and syntactic
    annotation for Chinese Corpus

16
MT in China - Future and Strategies
(3)speech-to-speech translation
  • Chinese speech into Chinese text.
  • "SIDA-863A" system can recognize
  • 398 basic Chinese syllable,
  • recognition rate can arrive to 93,
  • response time is less than 0.1 second,
  • input rapidity can arrive to 80 Chinese
    characters per minute

17
MT in China - Future and Strategies (4)combined
with OCR and Internet
  • Internet MT
  • SUNSHINE YIWANG, YAXIN YIBA, TONGYI, etc.
  • The advantage for MT software in INTERNET are
  • Higher translation speed, real-time translation
  • Cheap price
  • Large machine dictionary
  • Possibility to add the new words

18
MT in China New National Project
  • 973 project from 2001
  • supported by Chinese government.
  • For creative research in
  • Natural Language processing including machine
    translation.
  • automatic speech-to-speech translation system
    (English-Chinese)
  • developing in Institute of Automation of Academia
    Sinica.

19
MT in China Survey Source
  • Prof. Feng, Zhiwei
  • Secretary-general and the deputy chairman of
  • sub-committee of computer-aided in terminology of
    China
  • under the State Language Commission (SLC) of
    China.
  • Invited professor, KAIST (Sep/2001 Aug/2002)
  • Dr. Liu, Qun
  • Institute of Computer Technology, Academia
    Sinica, Beijing

20
MT in Japan - 1
  • More than 10 companies
  • For English, Chinese, Korean
  • Waiting for the new breakthrough
  • Internet
  • eLearning
  • Co-work with special-domain related companies
  • Technology transfer
  • Collaboration tools is ready to be in market
  • For translators collaboration workbench thru
    network
  • User interface well-organized.

21
MT in Japan - 2
  • Leading Systems
  • Cross-lingual patent retrieval
  • Prime
  • NTT/ALT
  • Japanese-to-English
  • Japanese-to-Malay
  • Japanese-to-Chinese
  • Speech Translation
  • ATR C-Star

22
UNL in UN University
  • Through Universal Networking Language
  • With Hindi, Japanese, Persian, Indonesia-Malay,
    Thai, Chinese, Mongolian, Korean in Asian Region
  • Other region Major European languages and
    English
  • Possible Users
  • ITU mail translation

23
MT in Malaysia
  • No commercial product yet.
  • But in academic sectors
  • For application to
  • Internet
  • eLearning
  • eCommerce
  • Universiti Sains Malaysia
  • Computer Aided Translation Unit
  • Prof. Tang Enya Kong and Prof. Yusoff Zaharin

24
MT in India
  • 18 constitutional languages with 10 different
    scripts
  • their script grammar and language grammars are
    quite similar
  • they have 40 to 80 percent vocabularies in common
  • less than 5 percent people who can work in
    English

25
MT in India 1990-2001government effort for IT
  • TDIL (Technology Development of Indian
    Languages)
  • 1990-1991
  • development of corpora, OCR, Text-to-Speech,
    machine translation Standards for keyboard and
    internal code for information interchange
  • 2000-2001
  • seven major initiatives
  • Knowledge Resources, Knowledge Tools, Translation
    Support Systems, Human Machine Interface Systems,
    Localisation, Standardization and Language
    Technology Human Resource Development.
  • Thirteen Resource centres for Indian Language
    Technology Solutions (RC-ILTS)
  • were supported covering all 18 Indian languages.

26
MT in India Future Digital Unite and Knowledge
for All
  • Indian Language Technology Vision 2010 has been
    prepared
  • with the Vision statement Digital Unite and
    Knowledge for All.
  • Growing popularity of Internet
  • content creation, localisation, on-line gisting
    and summarisation, e-learning, Cross-Lingual
    Information Retrieval are being promoted to
    ensure information access in cyberspace in Indian
    languages
  • Source Dr. Om Vikas
  • Senior Director and Head, Computer Development
    Division, Ministry of Information Technology

27
MT in ThailandGovernment 1996
  • IT-2000
  • To build a national information infrastructure
    (NII)
  • To invest in people, intends to concentrate on
    transferring IT knowledge to their children.
  • To build a Government Information Network (GINET)
  • Internet Users in Thailand (2000) 2.3M/66M
  • Age lt10 10-14 15-19 20-29 30-39 40-49 50-59
    60-69 70 Total
  • Freq 18 124 261 1,238 572 187 32
    27 2 2,461
  • Percent 0.7 5 10.6 50.3 23.2 7.6
    1.3 1.1 0.1 100
  • Most of the Thai Internet users know English and
    other Internet languages at a basic or low
    intermediate level

28
MT in ThailandPARSIT
  • web-based Thai-English Machine Translation
  • since 1998 in cooperation with NEC (Japan).
  • very popular among Thai users
  • to translate English to Thai with the accuracy of
    60.
  • 20 percent mistranslating might be due to
    differences in expressions, slang, and sentence
    structures
  • http//www.suparsit.com/
  • 300,000 hits/month
  • 25,000 users/month

29
MT in Thailand Dictionary
  • a web-based dictionary Lexitron
  • Thai-English and English-Thai dictionary

30
MT in Thailand Future
  • to develop PARSIT translating system
  • Thai-to-English
  • and to other target languages.
  • Other language programs, such as OCR research,
    speech research, and language research
  • Thai full-text search engine

31
MT in Thailand eASEAN
  • eASEAN Plan
  • Multilingual Machine Translation Proposal
  • Thailand, Cambodia, Laos, Vietnam, Japan, Korea,
    English
  • source
  • Dr. Virach Sornlertlamvanich virach_at_nectec.or.th
  • Dr. Prayong THITITHANANON (Rajabhat Institute
    Ubon Ratchathani, Thailand)

32
MT in Taiwan
  • Prof. Su, Keh-Ih
  • Machine translation
  • localization

33
MT in KoreaCommercial Product
  • English-to-Korean (Korean-to-English)
  • Enguide LNI Soft
  • E-Tran2001 NLP Lab (Seoul National University)
  • EZ Reader Language and Computer
  • ClickWorld ClickQ
  • Transmate IBM Korea
  • Japanese-to/from-Korea
  • Unisoft
  • Changmyung
  • Translation Memory
  • Localization companies develop for their own use
  • ITI

34
MT in KoreaTest suite for E-to-K
  • KAIST (http//korterm.kaist.ac.kr/ksurimal)
  • Supported by Ministry of Science and Technology
  • Exhaustive Evaluation
  • A variety of Sentences (5000 from high school
    textbooks, 10000 from internet e-business site)
  • To identify the RD direction

35
Problematic Part of System A
serious
average
Article
Pronoun
Noun
Adverb
Adjective
Verb
Part of Specech
Preposition
Relatives
Conjunction
Mark
Partial Structure
Infinitive
Tense
Gerund
Participle
Idioms
Structural Part
Number
Sentence type
Comparative
Subjunctive mood
Special Construction
Sentence Structure
Negation
Speech
Ellipsis
Lists
Insertion
Inversion
Multiple part of speech
Realtion and Scope of modification
Phrase
Semantic Part
VN
VPrep.
NV
NN
Collocation
NPrep.
Adv.N
Adv. Prep
N
Etc.
V
Ambiguous word
NP
Idioms
VP
PP
AP(adjective phrase)
Sentence
Natural Expression
Different meaning between singular and plural
36
MT in Korea
  • Caption/EK and KE - ETRI
  • Real-time translation of caption in the TV news
  • CNN for English-Korean
  • KBS for Korean-English
  • Chinese-Korean MT
  • Pohang University of Science Tech.
  • KAIST
  • ETRI (Korean-to-Chinese)
  • Companies Konan tech.
  • Japanese-Korean MT (technology transfer)
  • Pohang University of Science Tech.

37
Online language populations (2001 June)
  • English 45, Japanese 9.8, Chinese 8.4
  • German 6.2, Korean 4.7, Spanish 4.5
  • Italian 3.6, French 3.4, Portuguese 2.5
  • Dutch 2, Russian 1.9
  • GlobalReach. Global Internet Statistics (by
    Language).
  • http//www.glreach.com/globstats/index.php3

38
Organizations in Asia
  • AAMT
  • AFNLP (Asia Federation of NLP Assocations)
  • http//asianlp.org/
  • http//afnlp.org/
  • Eafterm (East Asia Terminology Forum)
  • http//eafterm.org/
  • Language Resource Sharing and Management
  • Jan/2001 workshop in Tokyo, invited by Japan
  • Prof. Tanaka, Hozumi (Chair GSK)
  • Nov/2001 workshop in NLPRS-2001, Tokyo
  • ISO TC37/SC4 (Language Resource Management) under
    organization

39
MT Status in Asia
  • Thank you.
Write a Comment
User Comments (0)
About PowerShow.com