Information Research and Development Division iTech - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Information Research and Development Division iTech

Description:

Corpus-based Thai-English electronic dictionary. EZKey: ... Building Intelligent Databases (e.g., Bioinformatics, Medical Database) KID Functional Aspect ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 31
Provided by: virachsorn
Category:

less

Transcript and Presenter's Notes

Title: Information Research and Development Division iTech


1
?????????????????????????????Information
Research and Development Division (iTech)
  • ????? ??????????????
  • virach_at_nectec.or.th
  • 27 December 2000

2
Milestone
Linguistics and Knowledge Science Laboratory
(LINKS) 1992
Software Laboratory (SWL)
Software and Language Engineering Laboratory
(SLL) 1997
Information Research and Development Laboratory
(iTech) 2000
3
IT Market in Thailand
Source ATCI2542
4
Software Type
Source ATCI2542
5
Knowledge Information Data
Data
Knowledge
Knowledge
Information
6
Knowledge Information Data
7
iTech
Program
Sub-program
8
(No Transcript)
9
I1 Scope
  • ????????????
  • ????????????????????????????????????????????
    ????????????????
  • ???????????????????????????????????????????? ??
    ?????????????????????????????? ?????????? I
    ?????????
  • ???????????? ?????????????? I1 ???? ????? -
  • ??????????????????????????????????????????????????
    ????? ??
  • ??????????????????? ??????????????????????????????
    ???? I2-I5

10
I1 Achievement
  • ??????????????????????????????????????????????????
    ???-?????? ?????????????????????
  • C Library ????????????????????? double-array
    trie
  • C Framework ?????????????? three-tied web
    application ??? XML
  • ??????????? web/CD-ROM ?????????????????? XML
  • ?????????????????????????? ??????????????????
    ???????????
  • ????????????? ?? 3 ?????? (TrueType,
    Postscript)
  • ????? NECTEC18 (Bitmap)

11
I1 Achievement
  • Linux TLE 3.0 (RedHat 6.2 Helix GNOME 1.2
    TLE)
  • Thai Locale for GNU C Library
  • Thai support for GNU Emacs
  • Thai LATEX (Babel-based)
  • swath (Smart Word Analysis for THai)
  • LEXiTRON klexitron (Kaiwal)
  • TIS620-0/1/2 convention for Thai X fonts
    (experimental)
  • Thai shaping in X servers (experimental)

12
I1 Ongoing Projects (2001)
  • ??????????????????????????????????????????? (Open
    Source)
  • Linux TLE 4.0
  • ????????? Star Office
  • Project Hosting
  • Thai Library
  • ????????????????????
  • ???????????? TrueType, Postscript, OpenType
  • ???????????????????????????????????????
  • ?????????????????????????? ???????????????????????
    ????

13
I1 Target in 5 years
  • Thai language common library
  • Font library
  • OpenSource Software for users and developers
  • Standardization

14
I2 Scope
gt Standard annotated speech corpus
gt For synthesis and recognition
gt Speech synthesis
gt Grapheme-to-Phoneme
gt Prosody analysis
gt Speaker identification
gt Speaker verification
gt Voice command
gt Continuous speech recognition
15
I2 Achievement
  • Thai Text-to-Speech Synthesis
  • Thai text-to-speech synthesis module V1.0
  • Thai text-to-speech synthesis module V2.0
  • Speaker Recognition in Thai
  • Prototype of text-dependent speaker
    identification system

16
I2 Ongoing Projects (2001)
  • Thai speech corpus for synthesis and recognition
  • Thai text-to-speech synthesis phase II
  • Speaker identification on phone

17
I2 Target in 5 years
  • Speech corpus
  • 400 prosodic and phonetically balanced sentences
    for speech synthesis
  • Phonetically balanced sentences and 5K
    vocabularies coverage sentences for speech
    recognition
  • Text-to-speech
  • Corpus-based probabilistic prosody generation
  • Speaker recognition
  • Text-dependent speaker verification
  • Speech recognition
  • Large vocabulary continuous speech recognition

18
I3 Scope
  • ???????????????????????????????????? ??????
  • ?????????? (Transformation)
  • ?????????????? (Restoration/Enhancement)
  • ???????????? (Compression)
  • ????????????? (Segmentation)
  • ????????????????????????????????? (Recognition/
    Identification)
  • ????????????????? (Representation/Description)
  • ????????????????????? (Simulation)

19
I3 Achievement
  • ???????????????? ???????? 1.0
  • ???????????????? ???????? 2.0
  • ??????????????? ????????????????

20
I3 Ongoing Projects (2001)
  • ??????????????????????????????????????????????????
    ??????????? ??????? 2 (Thai Optical Character
    Recognition Thai OCR Phase II)
  • ??????????????????????????????? (Document Image
    Analysis)
  • ???????????????????????????????????????? (Thai
    Online Handwritten Character Recognition)

21
I3 Target in 5 years
  • ??????????????????????????????????????????????????
    ???? 98
  • ???????????????????????????????????????? 95
  • ??????????????????????????????????????? 90
  • ??????????????????????????????????????

22
I4 Scope
  • RD on algorithm and software for text processing
  • Text corpus
  • Search engine
  • Machine translation
  • Information retrieval/extraction
  • Text summarization
  • Lexicon/thesaurus

23
I4 Achievement
  • ParSitE-T cross language web navigation
  • UNLThai decoder for UNL, language-independent
    database
  • LEXiTRONCorpus-based Thai-English electronic
    dictionary
  • EZKeyThai-English input supporting system
    (patent)
  • ORCHIDThai POS tagged corpus

24
I4 Ongoing Projects (2001)
  • ParSit
  • UNL
  • Search engine for Thai
  • LEXiTRON II
  • Text corpus

25
I4 Target in 5 years
  • ParSit (E-T/T-E)
  • UNL language center
  • LEXiTRON very large Corpus-based dictionary
  • Search engine for Thai in very large database

26
I5 Scope
  • RD on Managing Very Large-Scaled Information on
    the Internet
  • RD on Knowledge, Information and Data Management
  • RD on Annotated Systems XML, Annotated Content
    and Intelligent Content
  • RD on Database and Software for Bioinformatics

27
I5 Two Aspects in I3
  • Knowledge, Information and Data(KID) Structural
    Aspect
  • KID Representation Standardization
  • KID Integration
  • Building Intelligent Databases (e.g.,
    Bioinformatics, Medical Database)
  • KID Functional Aspect
  • Large-Scaled KID Management
  • Heterogeneous Database Integration
  • Intelligent Application

28
I5 Research on KID Structure
  • Basic Research
  • XML and Its Standardization
  • Semantic Annotation - UNL and MMA
  • Indexing Methods
  • Data Compression
  • User Model
  • Application Research
  • KID Management Tools
  • Annotation Tools
  • KID Visualization and Pattern Presentation

29
I5 Research on KID Function
  • Basic Research
  • Statistical, AI and NLP Methods
  • Querying System
  • Inference Mechanism (such as on XML)
  • Pattern Visualization
  • Application Research
  • Information Retrieval and Extraction Systems
  • Data and Text Mining Systems
  • Decision Support Systems

30
I5 Integration Figure in I3
Record
Record
Web Unstructured Data (Textual Database) (Multimed
ia Database)
Structured Data (Relational Database)
(Object-Oriented Database)
Record
Transformation Extraction
Data Mining Information Retrieval Information
Extraction Knowledge Discovery Decision Support
Structured Data (Annotated Data) e.g. XML
Write a Comment
User Comments (0)
About PowerShow.com