Contents - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Contents

Description:

... years later, a thousand dollar will buy rich kids the computational capacity of ... 320 240 display, Soft-modem, Touch Panel, MP3 Player, Stilus/tap-a-tap input. ... – PowerPoint PPT presentation

Number of Views:261
Avg rating:3.0/5.0
Slides: 49
Provided by: tdilM
Category:
Tags: contents

less

Transcript and Presenter's Notes

Title: Contents


1
  • Contents
  • Towards Global Village - Vasudhaiva Kutumbakam
  • Catching up the knowledge wave
  • Threads Uniting Linguistic Diversity
  • A B C Technology Development Phases
  • Language Technology Mission
  • Promoting Competitive and Collaborative
    Technology Development
  • Achievements
  • Setting up Technology Development Centers
  • Status of technologies developed at Resource
    Centres
  • Language Technology Handshakes
  • Whither in seven initiatives
  • Large Mass pacing up slow - challenges
  • World Scenario of Multilingual Computing
  • Beacon to Steps Ahead
  • Summing up

2
  • Technology Shrinks Distances
  • 1st Revolution with invention of writing system
    (5000 years ago)
  • 2nd Revolution with invention of written book
    (1300 BC, China)
  • 3rd Revolution with Gutenbergs invention of
    printing press (1450 AD) and
  • 4th Revolution is the new information revolution
    since 1950s.
  • Chip performance will double every 18 months
    Moores law
  • Storage doubles every 9 months.
  • Communication bandwidth will triple every 12
    months Gilders law

3
  • Technology Shrinks Distances
  • Prof Raj Reddy of Carnegie Mellon University
    predicts after 10 years from now we shall be
    getting at the same cost the processing power 100
    times, the storage 1000 times, and the band-width
    10,000 times. Computing and communication ICT
    will be affordable, easy to use and pervasive.
  • Ray Kurzweil, an informatics guru, predicts
  • Within 10 years, a 1000-dollar computer will be
    able to perform more than one trillion
    calculations a second,
  • Within the first quarter of next century, a
    similarly priced computer will match the human
    brain, and a few years later, a thousand dollar
    will buy rich kids the computational capacity of
    one thousand human brains.

4
  • Future Direction Information Interspace
  • The Interspace represents the third wave in the
    ongoing evolution of the Global Information
    Infrastructure, driven by rapid advances in
    computing and Information Technology.
  • The technological progress of knowledge exchange
    has occurred in three waves, each building on the
    previous one.
  • The wave pattern roughly describes four
    distinct phases of functionality
  • fundamental research (trough),
  • development of prototype systems (ascent),
  • emergence of commercial systems (crest), and
  • mass propagation (descent).

5
Paradigm Shift in Computer Processing Data
Information Knowledge
  • Within the Next decade , computing technology
    will transform the Internet into Interspace.
  • Concept Navigation will become standard function
    in the Interspace just as document browsing is in
    the Internet.

Evolution of Global Information Infrastructure
(e-mail document browsing concept
navigation)
6
  • Networking with Humane Sensitivity?
  • na hi jnaanen sadrasham pavitramih vidyate
  • Nothing is so pious like Knowledge. - Shrimad
    bhagwad Gita, 4.38
  • A people become poor and enslaved when they are
    robbed of the tongue left them by their
    ancestors they are lost forever. - Ignazio
    Bittira, Sicilia Poet
  • Distances will shrink. Information will flow.
    Whether people will have innate sense of
    communication across linguistic and cultural
    diversities? Challenge ahead is Vasudhaiva
    Kutumbakam that is "the whole earth is a family"
    - whether emerging technologies will imbibe
    family-like bondage of love, kindness,
    sensitivity and cooperation establishing peace
    and harmony.

7
  • Knowledge defies economic principle of scarcity.
  • Knowledge is not scarce in traditional sense.
    The more you use it and pass it on, the more it
    proliferates. It is "infinitely expansible" or
    "non-rival in consumption". It can be replicated
    cheaply and consumed over and over again.
    Knowledge is more difficult to measure than
    traditional inputs such as steel or labour.
  • However future prosperity of rich economies will
    depend both on their ability to innovate and on
    their ability to adjust to change.
  • The economist Brian Arthur argues increasing
    returns will magnify the market leaders
    advantage.

8
  • Is there gain in knowledge or loss of knowledge ?
  • From an estimated 10,000 world languages in 1900,
    about 6,700 language survived in 2000. Two
    percent of the world's languages are becoming
    extinct every year.
  • With the loss of a language, we lose art and
    ideas, scientific information and technological
    innovation capacity.
  • UNESCO study (1999) of 65 languages 49 of the
    languages (75) had experienced real decline in
    number of works translated from these languages
    into other languages. Worlds 140 most published
    authors 90 out of 140 were English writers in
    1994 compared to 64 out of 140 in 1980.
  • Proportion for English arose from 43 in 1980 to
    over 57 in 1994.
  • The share held by top four translated languages
    (English, Spanish, French and German) rose from
    65 percent in 1980 to 81 percent in 1994.
  • There is collapse in authorship, translation and
    quality in other languages.
  • World-level literacy is improving. More people
    can read than ever before, but fewer people
    create stories.
  • There is tendency from being creators to
    consumers at the time when technology could have
    amplified our creative capacities.
  • Cultural Erosion!

9
Language-wise world population Estimate (H Tanaka
1999)
10
  • Linguistic Scenario in India
  • Eighteen constitutional Indian Languages are
    mentioned as follows with their scripts within
    parentheses Hindi (Devanagari), Konkani
    (Devanagari), Marathi (Devanagari), Nepali
    (Devanagari), Sanskrit (Devanagari), Sindhi
    (Devanagari/Urdu), Kashmiri (Devanagari/Urdu)
    Assamese (Assamese), Manipuri (Manipuri), Bangla
    (Bangali), Oriya (Oriya), Gujarati (Gujarati),
    Punjabi (Gurumukhi), Telugu (Telugu), Kannada
    (Kannada), Tamil (Tamil), Malayalam (Malayalam)
    and Urdu (Urdu). There are 10 Indic Scripts in
    vogue.
  • Indian languages owe their origin to Sanskrit.
    They have in common rich cultural heritage and
    treasure of knowledge. Indic scripts have
    originated from Brahmi script. Less than 5 of
    people can either read write English. Over 95
    population is normally deprived of the benefits
    of English-based Information Technology.
  • Characteristics of Indian Languages
  • What You Speak Is What You Write (WYSIWYW)
  • Script grammar - transformation rules
  • Relatively word order free
  • Common phonetic based alphabet
  • Common concept terms (from Sanskrit)

11
  • Technology Development for Indian Languages
    (TDIL)
  • A B C Technology Development Phases
  • India aware of the technological changes and the
    local constraints has taken up Development of
    Language Technology in three phases
  • 1976-1990 A-Technology Phase
  • Focus was on Adaptation Technologies abstraction
    of requisite technological designs and competence
    building in RD institutions.
  • 1991-2000 B-Technology Phase
  • Focus was on developing Basic Technologies-
    generic information processing tools, interface
    technologies and cross-compatibility conversion
    utilities. TDIL(Technology Development for Indian
    Languages) programme was initiated.
  • 2001-2010 C-Technology Phase
  • Focus is on developing Creative Technologies in
    the context of convergence of computing,
    communication and content technologies.
    Collaborative technology development is being
    encouraged to realise.

Government spending during 1991- 2002 was about
US 4 Million
12
  • TDIL Vision 2010
  • Vision statement
  • Digital unite and knowledge for all.
  • Mission statement
  • Communicating without language barrier moving
    up the knowledge chain.
  • Major Initiatives(TDIL Vision)
  • Knowledge Resources
  • (Parallel Corpora, Multilingual Dictionaries,
    lexical resources)
  • Knowledge Tools
  • (Portals, Language Processing Tools, Translation
    Memory Tools)
  • Translation Support Systems
  • (Machine Translation, Cross Language Information
    Retrieval)
  • Human Machine Interface System
  • (OCR, Voice Recognition Systems, Text-to-Speech
    System)

13
(No Transcript)
14
  • Resource Centres for Indian Language Technology
    Solutions
  • IIT, Kanpur Hindi, Nepali
  • IIT, Mumbai Marathi, Konkani
  • IIT, Guwahati Assamese, Nepali
  • IISc, Bangalore Kannada, Sanskrit (cognitive
    models)
  • ISI, Calcutta Bengali
  • UOH, Hyderabad Telugu
  • Anna Univ., Chennai Tamil
  • MS Univ., Baroda Gujarati
  • Utkal Univ., Bhubaneshwar Oriya
  • TIET, Patiala Punjabi
  • ERDC, Trivendrum Malayalam
  • C-DAC, Pune Urdu, Sindhi, Kashmere
  • JNU, New Delhi Foreign languages (Japanese,
    Chinese) Sanskrit (language learning
    systems)

15
  • Core objectives of Resource Centres RC-ILTS
  • To build a repository of all knowledge tools and
    products for computer based processing in Indian
    Languages.
  • To develop niche technologies for providing IT
    localization solutions.
  • Collaborative developments in association with
    industry.
  • Technology dissemination through
  • Specialized training programmes
  • IT localization clinics
  • Interactions with state governments for e-Gov,
    e-HealthCare, ...

16
  • Content Creation and IT Localisation Network
    (CoIL-Net)
  • Objectives
  • To bridge the existing digital divide in the
    economically backward Hindi speaking states of
    MP, Chattisgarh, UP, Uttaranchal, Bihar,
    Jharkhand, and Rajasthan having lower than
    national average levels of technical and IT
    education facilities, as identified by the
    National Task Force on IT SD.

17
  • Implementing Agencies CoIL-Net
  • C-DAC, Pune Core Technology Development for
    Hindi
  • IGNCA, New Delhi Digital Library for Regional
    Heritage
  • IETE, New Delhi - IT based material in Hindi
  • IIT, Kanpur Hindi to English Machine
    Translation
  • IIITM, Gwalior IT localization solutions for
    MP
  • Banasthali Vidyapith IT localization
    solutions for Rajasthan
  • BHU, Varanasi IT localization solutions for
    UP
  • BIT, Ranchi IT localization solutions for
    Jharkhand
  • Roorkee University IT localization solutions
    for Uttaranchal

18
Anusrijan (Transcreation)
  • Anusrijan (Transcreation) Generating modern
    knowledge in local language
  • Over 25Mn ST research papers are added per
    year. Almost nil in Indian Languages
  • Anusrijan to bring out books/monographs on
    emerging IT in Hindi and other Indian
    Languages(model for other ST areas).
  • To organize Jnananudyog training programmes for
    IT entrepreneurship generation, especially in
    ITES areas.

19
  • Information Dissemination through
  • Quarterly TDIL Newsletter VishwaBharat_at_tdil.gov
    .in
  • (Issues 1 to 7 Jan01 - Oct02)
    Information on Language technology
  • TDIL Web Site http//tdil.mit.gov.in
  • Site contains information for various TDIL
    activities, achievements and provides access to
    a variety of content and downloadables in Hindi
    and for other Indian languages.
  • Free Downloads Indian Language keyboard driver
    fonts, Basic Word Processors, Spell
    Checkers, corpora, dictionaries, IT
    glossary,classic works .
  • Hindi e-mail
  • FAQ on Indian language technologies
  • Samadhan Seva to answer users queries
  • Jnana Nidhi Seva to access to content,
    dictionaries, classic works

20
  • Innovation Networking Management
  • COILTech
  • Language Technology Business Meet
  • ZOPP Workshop
  • Peer-review
  • Focus on Productizing, test evaluation
  • Open source technology
  • Language Technology Marketing and IPR

21
  • Consortium on Innovation Language
    Technology(COILTech)
  • The MAIT COnsortium on Innovation Language
    Technology (COILTech) since its inception in
    September 2001, has been actively co-ordinating
    various activities with the Industry and the TDIL
    (Technology Development in Indian Languages)
    Program. The consortium today has active
    participation from both Indian and MNC companies.
  • Broad Objectives
  • To promote industry participation in
    collaborative RD in language technology.
  • To coordinate Open Source Software supporting
    Indian languages.
  • To evolve consensus on standards, benchmarks, and
    certification of LT products.
  • To collectively interface with government and
    academia.
  • To conduct market surveys, organize technology
    shows, promote technology transfers and expand
    market collectively.

22
Indian Language Software Market
Projection Estimate 2002 (0.6) 2008
(2) Conservative (20) Rs. 1.2 Bn Rs. 4 Bn (
80 Mn) Moderate (40) Rs. 2.4 Bn Rs. 8 Bn (160
Mn) Optimistic (80) Rs. 4.8 Bn Rs.16 Bn(320
Mn) basis IL SW on PCs (20-40-80) PC
Penetration (0.6 - 2) av. Rs.1000/-per IL
Software per PC IDC predict global demand for
GIL will increase to 8 Bn. In 2008 in
comparison of present demand of 4.5 Bn. in 2002.
The market of voice portals will be 6 Bn.
23
OCR Evaluation
  • The testing was based on the parameters indicated
    in the product specifications. The same are
    indicated below
  • Accuracy
  • Speed
  • Noise Reduction
  • Skew Angle Correction detection
  • File Format Support
  • Configuration Testing
  • Installation
  • The input documents for scanning were selected to
    validate the product specifications. Six
    different books (of different sizes, with
    different fonts and with different paper quality)
    were selected. Only two tone (black and white)
    books of offset / Laser print quality and
    photocopied papers from these books were used as
    input documents. For testing "Noise Reduction
    Feature" photocopied document with salt pepper
    noise and with blurs and smudges were used.
  • Evaluation of OCR for Punjabi, Bengali and
    Devanagari has been completed (November 2002).
    Testing for the rest is going on.

24
Status of Technologies developed at Resource
CentersName of the Technology/Product
TH(n) Technology Handshake to n parties (?) ?
version (?) ? version
25
Language Technology Handshakes At the Language
Technology Business Meet 2001 organised by the
Ministry of Information Technology on 7-8
November, 2001, 43 Technology Handshakes were
signed by 13 companies for transfer/collaborative
development of technology for Localised Linux,
OCR, MAT, TTS, ASR Spell Checker, Morph Analyser,
Encyclopaedia.
26
  • Major Achievements
  • Encoding Standards
  • Standardization of 8 bit ISCII (Indian Script
    Standard Code for Information Interchange) was
    developed in 1988 and later on revised in 1991.
    ISCII-1988 is subset of the Unicode.
  • Dept. of IT is a voting member of the Unicode
    consortium.
  • Feedback on revision of UNICODE 3.0 for all
    Indian languages has been finalised. (Ref
    VishwaBharat_at_tdil, issues 4, 5 6, 2002)
  • Vagvarna Unicode for Vedic Sanskrit has also been
    proposed(240 code points)
  • Propose to organize International UNICODE
    Conference 2003 in India.
  • Font code standards
  • Transliteration standards Gurumukhi to Urdu
    Hindi to Urdu transliteration scheme
  • Standard of display codes in the form of INSFOC
    (Indian Standard for Font Code) is ready.
  • Scheme for Indian Script to Roman Transliteration
    (INSROT) is ready.
  • Lexware formats
  • Standard of multilingual lexicon format has also
    been proposed.

27
  • Knowledge Resources
  • One Million Pages Parallel Corpora (Gyan Nidhi)
  • NBT Books Collection status
  • Languages Books Pages
  • Hindi 678 65000
  • English 547 60000
  • Gujarati 203 22000
  • Panjabi 319 35000
  • Marathi 269 30000
  • Bangla 227 25000
  • Oriya 197 21700
  • Tamil 271 29800
  • Kannada 212 23300
  • Telugu 252 27700
  • Total Books 3175 339500
  • NBT 400,000 pages. Rest from magazines like
    champak, grihshobha,chandamama, sahitya akademi.
  • Language Resource in Public Domain
  • 3 Mn Corpora for all languages are ready and in
    public domain.
  • Geeta Reader Shabdika, encyclopedia in Hindi and
    some classic works in Indian languages are
    available in public domain.

28
  • KU Resources
  • Million Books Universal Digital
    Library(UDL)Programme
  • under Indo - US cooperation
  • Objective
  • To digitize 1 million books (less than 1 of all
    books in all languages ever published) by 2005.
  • To provide a test bed that will support other
    research domains such as scanning techniques,
    optical character recognition .
  • To supplement the formal education system by
    making knowledge available to anyone who can read
    and have access.
  • Participating Organizations In India
  • IISc. Banglore, IIIT Hyderabad,
  • IIIT, Allahabad, TTD Tirupathi,
  • Maharashtra Industrial Development Corporation
    -- MIDC, Goa University, University of
    Pune, SASTRA Tanjavur, AK College of Engineering
    Krishna Koil, Directorate of Public Libraries
    (Govt. of Andhra Pradesh) Anna University
  • Expected Number of Scanners 150 (reached 49)

29
Knowledge Tools
Softwares and Tools for Language Technology in
public Domain Softwares and Tools developed by
the Resource Centres under the TDIL Programme are
placed in public domain for widespread
proliferation. These include Fonts with Keyboard
Driver, Code/Font conversion utilities, Email
Client, Multilingual WP, Keyboard Interface,
 Spellchecker, Morph Analyzer, Corpora, INDIX
(Indian Language Interface Support on LINUX )
etc. These are available on the TDIL
web-site http// tdil.mit.gov.in. SIMPUTER
Simple Inexpensive Multi-lingual ComPUTER has
been designed that enables use of Smartcard,
Text-to-Speech, Information Markup Language for
Internet applications (IMLi) is XML based. IMLi
browser supports Indian languages. Its features
include Linux OS, 32 bit CPU, 32 MB D-RAM, 320 ?
240 display, Soft-modem, Touch Panel, MP3 Player,
Stilus/tap-a-tap input. Its price is estimated
about US 200. This may become a means for
bridging Digital Divide.
30
  • Translation Support Systems
  • Mantra Machine-aided Translation System
    (English to Hindi) for Government
    notifications. at C-DAC
  • Anusaraka Provides rapid translation as language
    accessor from other Indian Languages to Hindi.
    UoH-IITK IIIT Hyderabad
  • Matra Machine- aided translation system (E to
    H) with a Prototype Vaakya system for web based
    translation service for English news stories to
    Hindi has been developed. at NCST, Mumbai
  • Anubharati (H -gt E) is a machine - aided
    translation system, a nascent prototype from
    Hindi to English. at IIT, Kanpur.
  • Angalabharati (E -gt H) (at IIT Kanpur
    ERDCI/N), a Machine-aided Translation System
    (English to Hindi)for public health domain is
    being developed for offcialese, health and
    agriculture domains. On- line Machine Aided
    Translation system integrated with TTS is
    available on http//anglahindi.iitk.ac.in. It
    has total 5 X (30,000 root words). It follows
    hybrid approach of rule-based example based
    approaches IIT-K.

31
  • Human Machine Interface Systems
  • Continuous Speech Recognition system for Hindi is
    being developed at IBM. This has been
    successfully tested up on training with
    pre-recorded speech of Hon'ble Prime Minister of
    India.
  • Test-to-Speech for Hindi "Vaachak", produces
    acceptable speech at Lucknow
  • An alpha version of "Hindi Vani" software which
    is PC based Unlimited Vocabulary Text-to-Speech
    Conversion Software for Hindi. The quality of
    speech is being improved upon in terms of pitch,
    tone, intonation with on-line screen reading
    capabilities. Speech Technologies group at IIT
    Madras is also developing technologies for Indian
    languages.
  • Line and dot matrix printers were enabled for
    printing Devanagari . Bilingual computer
    compatible electronic tele-printers were
    manufactured
  • Gist terminal was developed that allows use of
    Indic scripts in UNIX environment.
  • Optical Character Recognition software for Hindi,
    Marathi, Bangla, Oriya, Punjabi, Telugu, and
    Tamil have been developed with accuracy above
    97. Development for other languages is in
    progress.

32
  • Optical Character Recognition technology in
    Indian Languages(OCR)
  • OCRs for Indian Languages are in the advanced
    stages of development
  • For Hindi, Marathi, Punjabi, Telugu, Tamil, Oriya
    and Bangla OCR performance at character level was
    recorded above 97
  • OCR were tested over 500 pages, for 3-5 fonts of
    font-size 12-32.
  • Independent Testing of OCR by STQC

33
  • Language Technology HRD
  • Trainers Training Programmes in NLP Modular IT
    curricula in language studies and linguistics,
    IT-enriched curricula for functional Hindi at BA
    MA levels have been prepared.
  • IT curricula designed for Secondary and Senior
    Secondary schools of CBSE introduce Indian
    languages.
  • NLP Training Programmes are being offered during
    Summer and Winter (RCs - ILTS, IIIT/Hyderabad)
  • Masters Programme in Computational Linguistics
    is being worked out.

34
  • Localization
  • There are three related terms in vogue -
    Globalisation, Internationalisation and
    Localisation.
  • Globalisation facilitates free trade across
    borders. New markets (round the clock all over
    world), new technologies (interact over
    separation in time and space) and new players
    (Multi National Corporations) become important.
    Technologies are no longer homogenous, they are
    heterogeneous new meanings can be given in
    different cultural settings.
  • Internationalisation is an intermediate attempt
    towards localization by way of translation and
    enablement. This may include local to Unicode
    code-conversion, example-base in the context of
    globalization, and globalization considerations
    in user interface design.
  • Localisation may be defined as the technological
    fusion of language culture.
  • LISA (Localisation Industry Standards
    Association) is promoting GIL activities.

35
Six aspects to localization are Infrastructure,
Input output, Linguistic, Design Content,
Commercial, Legal. Six linguistic issues are
Sentence structure, word wrapping, compound
nouns, agreement, differing perspectives, and
message expansion. Localisation activities may
include website localisation, software
localisation, translation, cross-language
applications, translation memory software,
dictionary management software, code-conversion,
on-the-fly (text and speech) translation,
business transactions, voice activated
telephoning navigation remote diagnosis,
localisation of voice portals. Customer's buying
criteria for localisation tools are Quality,
Non-proprietary products, Faster turn around,
Interoperability, Preserving linguistic and
Cultural diversity, Cross-lingual functionality
integrated into other enterprise applications
also.
36
Culture needs diversity and thrives on
difference. According to Mahatma Gandhi,
"Dominance and exclusivity cannot ultimately
benefit anybody, not even big players, because no
culture can live if it attempts to be exclusive".
Hence there is need for localisation,
internationalisation and globalisation.
37
Currently Localisation is top-down US-driven,
Global icons suppress local contents and create
'accidental' web. But localisation must be
bottom - up local-to-global, must fundamentally
change, look for new markets and cost models,
must use reusable components, must handle
locale-specific issues such as date, time,
color-schemes, hand-signals, gestures, sound,
historical data, product names,
acronyms. Developing nations/ communities should
not remain mere recipient of contents localised
by others. They must become localisers, must
localise their own content and then make it
accessible to all.
38
(No Transcript)
39
Large Mass Pacing up slow - challenges India
ranks top among countries which harness IT most
for economic development. Digital divide is also
highest in India, UNESCO report. Economic
Indicators Country Population GNP Per
Capita PPP Rank RD ( Bn) GNP () ()
GNP World 6,054 31171 5150 6980 - - India 1027
471 460 2390 86 0.73 America 282 9646 34260 3426
0 1 2.63 Japan 127 4337 34210 26460 7 2.80 German
y 82 2058 25050 25010 13 2.41 France 59 1430 255
00 25000 20 20 Britain 60 1464 24500 23550 18 1.9
5 China 1261 1065 840 3940 68
0.66 Source Tata Economic Services
2001
40
Low IT affordability persists. Hence, need for
Innovative IT Solutions.
41
  • Is the technology to divide or to unite?
  • Latin Alphabet users , 39 of the global
    population enjoy 84 of access to the Internet
  • Hanzi-users in (CJK), 22 in global population
    enjoy 13 of Internet access
  • Arabic script users, 9 of the population have
    1.2 of the Internet Access
  • Brahmi-origin scripts users in South-east Asia
    and Indic scripts users occupy 22 of the World
    population have just 0.3 of Internet access.
  • More than 80 content on Internet is in English.

42
Digital Divide -- Difference in
perceptions Perception Developed Countries
Developing Countries Why discussed
? Desire to capture larger markets Fear of
lagging behind in economic race
Policy Information explosion Localization
Tech. Dev. IPR-Centric Open source
technology Results Increasing use of English
and Preservation of local thrust of western
culture. language and culture. Consumer
nature substitute the old Upgrade
the Old Consumerism-centric Low
cost PC 400 less than 40 Reason PPP
(151) 34260 (USA) 2400 (India) GNP
(751) 24260 460 Focus Digital
divide Digital Unite Access to Information
Share the Knowledge Wider control Small is
beautiful.
43
(No Transcript)
44
Speech to Speech Translation Approach 1
(CASCADING Approach) S2S S2T T2T
T2S Constrained on accuracy to about (40 - 60)
(0.70.80.9) Approach 2 NLU Based (Concept
Based) S2S S2C C2C C2S Accuracy up to 70
- 80 (0.90.950.9) The roadmap includes S2T,
T2S, Machine Translation, OCR
45
S2S Translation Trends in Machine
Translation 1970s Narrow Domain, Rule-based
approach 1980s Practical MT systems - Example
based approach, Interlingua Transfer
methods 1990s Multilingual MT, Simultaneous
Interpretation, example based revisited, corpus
based statistics based approach 2000s MT
through NL understanding language
resources Trend in Speech technology There is
ongoing shift from Speech component research to
research on integrated Speech Systems. Together
with Speech, are the modalities that constitute
full natural human - human communication (e.g..
Gesture, lip movements, facial expression, gaze,
bodily posture) leading towards multimodal
interactive systems. 1970s Speech synthesis
systems used rule-based formant system.
(Formants are transfer function of vocal
tract resonant frequency.) 1990s Concatenated
speech synthesis systems use small pieces of
pre- recorded speech. There is a trend towards
cross-project collaboration, synergy, critical
mass, and deployable scalable technologies.
46
  • Basic Technologies targeted by 2005
  • Context sensitive summarization (responsive to
    users specific needs)
  • Answering questions by making logical inferences
    from database content
  • Speech synthesis with several styles and emotions
    in major ILs
  • Continuous speech understanding in workstations
    with standard dictionaries (5000 w) in major ILs
  • Controlled languages with syntactic and semantic
    verification for specific domains
  • General speaker identification, robust speech
    recognition in hard-to-model noise conditions and
    real speaker-independent recognition
  • Natural Speech (with facial expressions)
    Understanding and generation
  • -
  • -
  • Basic Technologies targeted by 2010
  • Unlimited-vocabulary spoken multilingual
    conversation
  • Unlimited-vocabulary spoken translation systems
  • Unlimited on-line understanding generation of
    integrated natural speech, lips, facial
    expression and gesture communication
  • Fully natural interactive communication

47
  • Open Source Software
  • Objectives
  • Evolving updating standards for multilingual
    support building up standards database.
  • Ensuring service support implementation strategy
  • Consortium of OSS researchers in academia
    industry.
  • Towards cooperation with major LT initiatives
  • Initiative B_at_bel of UNESCO
  • Human Language Technology(HLT) program of
    European Union
  • Translingual Information Detection, Extraction
    Summarization (TIDES)
  • Universal Networking Language (UNL)
  • Country programmes of France, Germany, China,
    Japan, Russia, etc.

48
  • Summing Up
  • 18 Indian language and 10 Indic scripts
  • Over 95 Mn Indian can't benefit of English based
    IT.
  • In 1990-91, Government launched the program on
    TDIL (Technology Development of Indian Languages)
  • 13 RCs-ILTS, 7 CoILNet centers.
  • Innovation networking management of multi-lingual
    projects.
  • Voting member of UNICODE consortium.
  • CoIL-Tech (Consortium on Innovation Language
    Technology)
  • New Initiatives
  • Global Village (Vasudhaive Kutumbakam) with
    Cooperation(saha Veeryam) Humane Sensitivity
    (Sarve bhavantu sukhinah).
Write a Comment
User Comments (0)
About PowerShow.com