Development of a German-English Translator - PowerPoint PPT Presentation

About This Presentation
Title:

Development of a German-English Translator

Description:

Development of a German-English Translator. Felix Zhang ... Developed a functional translator for simple German sentences to simple English sentences ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 11
Provided by: tjh5
Category:

less

Transcript and Presenter's Notes

Title: Development of a German-English Translator


1
Development of a German-English Translator
  • Felix Zhang
  • TJHSST Computer Systems Research Lab 2007-2008
  • Period 5

2
Summary of previous quarters
  • Developed a functional translator for simple
    German sentences to simple English sentences
  • All rule-based, no statistical methods, yet
  • Very specifically geared towards German
    language dependent

3
Scope for 4th quarter
  • Statistical methods
  • Part-of-speech tagging
  • Morphological analysis
  • Information available in TIGER corpus
  • Accuracy testing
  • Reliability of statistical methods
  • Vs. Rule-based

4
Finishing Rule-Based Translation
  • Convert translation into user-readable format
  • Punctuation and capitalization
  • fzhang_at_ltsp1 /research python proj.py
  • Part of speech tags 'den', 'art', 'kurzen',
    'adj', 'Mann', 'nou', 'machen', 'ver',
    'die', 'art', 'kleinen', 'adj', 'Kinder',
    'nou'
  • Morphological analysis 'kurzen', 'adj',
    'akk', 'mas', 'dat', 'pl', 'Mann',
    'nou', 'akk', 'mas', 'dat', 'pl',
    'machen', 'ver', '1', 'pl', '3', 'pl',
    'pres', 'kleinen', 'adj', 'nom', 'pl',
    'akk', 'pl', 'Kinder', 'nou', 'nom',
    'pl', 'akk', 'pl'
  • Disambiguated after noun-verb agreement
    'kurzen', 'adj', 'akk', 'mas', 'dat',
    'pl', 'Mann', 'nou', 'akk', 'mas',
    'dat', 'pl', 'machen', 'ver', '3',
    'pl', 'pres', 'kleinen', 'adj', 'nom',
    'pl', 'akk', 'pl', 'Kinder', 'nou',
    'nom', 'pl'
  • Lemmatized 'kurzen', 'kurz', 'Mann',
    'Mann', 'Man', 'machen', 'machen',
    'kleinen', 'klein', 'Kinder', 'Kind'
  • Root translated 'den', 'the', 'kurzen',
    'short', 'Mann', 'man', 'machen', 'make',
    'die', 'the', 'kleinen', 'small', 'Kinder',
    'child'
  • NP Chunked English 'the', 'art', 'short',
    'adj', 'man', 'nou', 'akk', 'mas', 'dat',
    'pl', 'make', 'ver', '3', 'pl', 'pres',
    'the', 'art', 'small', 'adj', 'child',
    'nou', 'nom', 'pl'
  • Assigned an element type
  • 'the', 'art', 'short', 'adj', 'man',
    'nou', 'akk', 'mas', 'dat', 'pl', 'dobj',
    'make', 'ver', '3', 'pl', 'pres', 'mverb',
    'the', 'art', 'small', 'adj', 'child',
    'nou', 'nom', 'pl', 'sub'
  • Assigned priority
  • '5', 'the', 'art', 'short', 'adj', 'man',
    'nou', 'akk', 'mas', 'dat', 'pl', 'dobj',
    '2', 'make', 'ver', '3', 'pl', 'pres',
    'mverb', '1', 'the', 'art', 'small', 'adj',
    'child', 'nou', 'nom', 'pl', 'sub'
  • Rearranged to English structure
  • '1', 'the', 'art', 'small', 'adj',
    'child', 'nou', 'nom', 'pl', 'sub', '2',
    'make', 'ver', '3', 'pl', 'pres', 'mverb',
    '5', 'the', 'art', 'short', 'adj', 'man',
    'nou', 'akk', 'mas', 'dat', 'pl', 'dobj'
  • Inflected
  • The small childs make the short man.

5
Statistical Methods
  • Trivial methods
  • Part of speech tagging, morphological analysis
  • Tags based on most frequently occurring tag with
    the word in corpus
  • Theoretically, should still achieve reasonable
    levels of accuracy

6
Testing
  • Check all 746,660 words in TIGER corpus
  • Problem Running time
  • Match Actual part of speech / linguistic
    properties of the word in the corpus part of
    speech, etc. that program assigns based on
    maximum likelihood
  • Predictions from research 90 accuracy for part
    of speech tags

7
Accuracy Part of speech
  • Reaches about 90 percent accuracy, as predicted
  • Match NE NE Nato
  • 656882
  • Match ADV ADV allein
  • 656883
  • Match VAFIN VAFIN sein
  • 656884
  • Match PIAT PIAT kein
  • 656885
  • Match NN NN Zukunftskonzept
  • 656886
  • Match APPR APPR für
  • 656887
  • Match ART ART der
  • 656888
  • Match NN NN Sicherheit
  • 656889
  • Match APPR APPR in
  • 656890

8
Accuracy Morphological Analysis
  • Lower accuracy Properties are more
    context-dependent
  • Match -- -- allein
  • 550403
  • Match 3.Sg.Pres.Ind 3.Sg.Pres.Ind ist
  • 550404
  • Match Nom.Sg.Neut Nom.Sg.Neut Zukunftskonzept
  • 550405
  • Match -- -- für
  • 550406
  • Match Acc.Sg.Fem Acc.Sg.Fem die
  • 550407
  • Match Acc.Sg.Fem Acc.Sg.Fem Sicherheit
  • 550408
  • Match -- -- in
  • 550409
  • Match Dat.Sg.Neut Dat.Sg.Neut Europa
  • 550410
  • Total matches 550410
  • Total words 746660

9
Problems
  • No good way to compare effectiveness of
    rule-based vs. statistical translations
  • Accuracy not high enough
  • 10 error too large in a 700,000 word corpus
  • 27 even worse
  • Slow run times

10
Future Research
  • Combine statistical and rule-based methods
    together in hybrid program
  • Find way to compare accuracy of two techniques
  • More advanced (and more accurate) statistical
    methods Context-based
Write a Comment
User Comments (0)
About PowerShow.com