The Development of E2T and T2E Active Reading via Web - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

The Development of E2T and T2E Active Reading via Web

Description:

English to Thai. Summarization and Translation: Frame-based. Text to relational database ... Web pages displayed in Thai. Output characteristics (TL) ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 60
Provided by: arm46
Category:
Tags: e2t | t2e | active | development | reading | thai | via | web

less

Transcript and Presenter's Notes

Title: The Development of E2T and T2E Active Reading via Web


1
The Development of E2T and T2E Active Reading
via Web
  • Asanee Kawtrakul and Teams
  • Kasetsart University, Bangkok, Thailand
  • ak_at_vivaldi.cpe.ku.ac.th
  • Fifth Agricultural Ontology Service (AOS)
    Workshop
  • 29 April 2004, Beijing, China

2
Outline
  • Motivation
  • Objectives
  • System Overview
  • Methodologies
  • Example
  • Conclusion and Future work

3
Acknowledgement
  • KURDI
  • Kasetsart University Research and Development
    Institute

4
Collaboration
  • Library Institute of Kasetsart University
  • Providing thesaurus and Agricultural Corpus

5
Motivation
  • Valued data scattering throughout the
    organization in multi-language
  • Good Information collected by many individuals in
    unstructured format
  • Digested information gives quicker
    decision-making

6
Proposed project
  • Summarization
  • From unstructured to structured format
  • Only the gist of information
  • Translation
  • From English to Thai (E2T)
  • Thai to English (T2E)

7
Objectives
  • To develop a system for summarizing and
    translating the agricultural information from
    English to Thai using statistical and frame-based
    approach (E2T)
  • To support the development of information
    discovery and web-based information exchange in
    the agricultural domain(T2E)

8
E2T
9
Summarization (Input)
  • Let us focus on Canadas agricultural products.
    In 1998, there were 1,216 registered commercial
    egg producers in Canada. Ontario produced 39.8
    of all eggs in Canada, Quebec was second with
    16.6. The western provinces have a combined egg
    production of 35.6 and the eastern provinces
    have a combined production of 8.0.

With a courtesy of Agriculture and Agri-Food
Canada, http//www.agr.ca/cb
10
Summarization (Cube)
11
Other Output
12
Some related works
  • Frame
  • Knowledge representation in form of slot and
    filler
  • Consisting of attributes and their values

Attributes
Values
13
Methodologies
  • Integration of NLP techniques and data cube
    structure
  • Gist of information extracted and summarized by
    frames and then translated into the target
    language
  • Data cube structure supporting efficient data
    access management and powerful decision making
  • Focusing on the case
  • Agricultural summary articles which have merely
    similar structure

14
Why needs NLP techniques?
  • NP Analysis
  • To extract the name entity for activating a frame
  • To enhance the performance of indexing
  • Word sense Disambiguation
  • Pound
  • The basic monetary unit of the United Kingdom
  • Unit of mass and weight

15
System Overview
GraphicalUser Interface
16
Gathering Module
17
Indexing and Clustering Module
18
Summarization Module
SentenceStructures
19
Summarization (Input)
  • Let us focus on Canadas agricultural products.
    In 1998, there were 1,216 registered commercial
    egg producers in Canada. Ontario produced 39.8
    of all eggs in Canada, Quebec was second with
    16.6. The western provinces have a combined egg
    production of 35.6 and the eastern provinces
    have a combined production of 8.0.

With a courtesy of Agriculture and Agri-Food
Canada, http//www.agr.ca/cb
20
Summarization (Filtering)
Let us focus on Canadas agriculturalproducts.
In 1998, there were 1,216 registeredcommercial
egg producers in Canada.
Ontario produced 39.8 of all eggsin Canada.
Quebec was second with 16.6
The western provinces have a combinedegg
production of 35.6.
The eastern provinces have a combinedproduction
of 8.0.
21
Summarization (Templates)
22
Summarization (Frames)
23
Summarization (Cube)
24
Translation Module
VisualizationTool
25
Translation Result
26
Web-based User Interface
  • To make inquiries about the history of
    agricultural products price, including their
    chronological, statistical data

27
Output
28
Current State E2T the system
  • Parser Shallow parsing
  • English to Thai
  • Summarization and Translation Frame-based
  • Text to relational database

29
Parser
Big dog loves small cat.
????? ???? ??? ??? ???? /sulnakh yail rakh määwm
lekh/
30
T2E
31
Input and Output
  • Input characteristics (SL)
  • Web pages must be of html file only
  • Web pages displayed in Thai
  • Output characteristics (TL)
  • The system will display output in English by
    popping up the new window

32
Why Translate only Table?
  • From the survey, the agricultural web pages could
    be divided into 3 types
  • Full text
  • Tables with contexts
  • Tables only (approx. 50)

33
Table Characteristics (cnt.)
Unit
Heading (Outside Table)
Pure Texts
Numeric
34
Table Characteristics (cnt.)
Unit outside table
Unit Inside table
35
Input Format Example
  • Input as Frame format

Department of Internal Trade (DIT)
Office of Agriculture Economics (OAE)
36
Tables only
Picture
Bullet
Agricultural Economics News
37
System overview
38
Input Webpage
HTML File
Web Robot
Internet
39
Table Analysis
HTML File
Tag with position anchor
Text with position anchor
40
Position Anchor (Table Analysis)
  • Using letter to stand for the datas position in
    each cell of table
  • T stands for table
  • R stands for row
  • C stands for column

41
Keyword Definition Example(Table Analysis)
The result will be T1R1C1 ???? T1R1C2
1999 T1R1C3 2000 T1R2C1 ????????? T1R2C2
24,245 T1R2C3 28,356 T2R1C1 ??????? T2R1C2
1999 T2R2C1 ????????? T2R2C2 2,172,000
42
Chunk-level Translation
Translated File
Text with Keyword
43
Phrase Chunker (cnt.)(Chunk level Translation)
rules 1 np ? n vp vp ? aux? v n
???? ?????? ??????
1
2
3
44
Phrase Chunker (Chunk level Translation)
45
Chunk level Translation (cnt.)
  • Handle with Name Entity!
  • NE cannot be word-by-word translated
  • e.g. ????????????????????????????
  • Chunker ? AGRICULTURAL PLANT AND MATERIAL CONTROL
    DIVISION
  • NE Extraction ? AGRICULTURAL REGULATORY DIVISION

46
Table Characteristics (Unit Conversion)
Unit outside table
1
2
Unit Inside table
47
Unit conversion (cnt.)
48
Sentence Generation
rules 1 np ? n vp vp ? aux? v n
???? ?????? ??????
1
2
3
49
Sentence Generation (cnt.)
NP ????vp ?????? ??????
Transfer rules Thai English np ? n
vp np ? adjp n vp ? v n adjp ? adj np
NP np ?????? ??????????
NP np goods importing????
NP np goods importing price
50
Result
Active Reading
51
Available Web sites
  • Department of Internal Trade
  • http//www.dit.go.th/
  • Office of the Rubber Replanting Aid Fund
  • http//www.thailandrubber.thaigov.net/menu5.php
  • http//www.talaadthai.com/pricebase/default.asp
  • http//www.rubberthai.com/price/price_index.htm
  • http//www.thaifruitnews.com/

52
Multilinguality Extension
53
(No Transcript)
54
Structure of ML-Dictionary (New version)
  • Main language English (Vocabulary and POS.)
  • Separate table for each language.
  • Vocabularies that have the same meaning are
    linking together by ID attribute.
  • Supported 10 languages
  • Bahasa Indonesian, Chinese, English, French,
    Italian, Japanese, Korean, Tagalog, Thai and
    Vietnamese.
  • UTF-8 Character encoding.

55
User Interface example.
  • Adding new vocabulary user interface

56
User Interface example. (cont)
  • Query vocabulary user interface

57
Current result based on FAO stat
  • English 23,207 vocabularies.
  • French 1,482 vocabularies.
  • Thai 23,097 vocabularies.
  • Vietnamese 175 vocabularies.
  • Japanese 108 vocabularies.
  • Bahasa Indonesian 13 vocabularies.
  • Chinese, Italian, Korean and Tagalog 0
    vocabulary.

58
Future work
  • Web-based Multilingual Active Reading System for
    Information Exchange
  • Language Configuration
  • Active Reading assistant
  • Table Translator with more multilingual dictionary

59
Thank you
Write a Comment
User Comments (0)
About PowerShow.com