Developing Statisticbased and Rulebased Grammar Checkers for Chinese ESL Learners - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Developing Statisticbased and Rulebased Grammar Checkers for Chinese ESL Learners

Description:

9. Collocation Errors. 27. 10. Sentence Structure Errors. 28. The Strengths of NTNU Ngram Checkers: ... Collocations. 29. The Weakness of Ngram Checkers. It ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 39
Provided by: hjc8
Category:

less

Transcript and Presenter's Notes

Title: Developing Statisticbased and Rulebased Grammar Checkers for Chinese ESL Learners


1
Developing Statistic-based and Rule-based Grammar
Checkers for Chinese ESL Learners
  • Howard Chen
  • Department of English
  • National Taiwan Normal University
  • hjchen_at_ntnu.edu.tw

2
The Needs to Provide Feedback on Second Language
Writing
  • More and more tests ask ESL/EFL students to
    demonstrate their writing abilities
  • SLA Researchers would suggest that learners would
    need more practices and corrective feedback.
  • However, who can provide them useful feedback on
    meaning and forms?

3
Use the Existing Grammar Checkers?
  • Teachers are the best feedback providers.
  • However, so many essays to correct.
  • Microsoft grammar checker
  • General impressions from ESL/EFL learners it is
    NOT very useful.
  • The two new commercial packages Vantage MyAccess
    and ETS Criterion
  • The feedback quality for ESL learners are not so
    accurate and comprehensive. (perhaps because it
    does not target at any L1 group and it is mainly
    targeted at native speakers)

4
A More Through Review on E-rater- ETS Criterion
  • Japanese college researcher Junko Otoshi (2005)
    from Ritsumeikan University
  • Use 28 Japanese adult students TOEFL writing
    essays to explore what Criterion can and cannot
    do with regard to providing feedback on the
    essays.
  • Criterions critique function was compared with a
    human instructors error feedback focusing on
    five error categories verbs, word choice, nouns,
    articles, and sentence structures.

5
Errors Marked by Criterion and Human Instructors
(Means)
  • Error Type Criterion Human Instructors
  • Verbs 0.47 0.84
  • Nouns 0.00 0.94
  • Articles 0.07 2.00
  • Word Choice
  • 0.11 2.32
  • Sentence Structure
  • 0.32 6.31

6
Rather Disappointing Results and Possible Reasons
  • The results revealed that Criterion experienced
    difficulties in detecting errors in all of the
    five categories.
  • Does it aim for higher accuracy and has lower
    recall? More conservative approach
  • The size the reference corpus?
  • Another program MyAccess has similar problems,
    though the general impression from review reports
    was that they can detect more errors.

7
Trying to Combine Different Approaches Plan A
and B for Grammar Checkers
  • With the funding from NSC in Taiwan, we planned
    to develop two grammar checkers.
  • Different approaches parser-rules-statistics
  • Plan A we will use the ngram to help to identify
    the errors
  • Plan B we will use the rule-based grammar
    checker to identify errors.
  • If possible, plan A and B will be merged and it
    should be able to capture more errors.
  • In this paper, we will only discuss the plan A.

8
Whats the Ngram (statistical) Checker?
  • We will not write specific grammar rules.
  • The computer helps to calculate all the possible
    combinations of word strings (2-word and 3-word)
    in a very large native corpus. Language models
    building.
  • All these saved to a large database.
  • Then when students write and submit an essay to
    the ngram checker, the system can quickly detect
    the word strings that do not exist in the native
    corpus.

9
Ngram-based Checker advantages
  • The key idea is simple but powerful
  • No need to write rule
  • More robust in detecting errors.
  • Large and suitable corpus might make this very
    useful. (ETS, they used 30-million news)

10
The Procedure of Developing an Ngram Checker
(corpora and tools)
  • 1. Find suitable and large corpus (e.g BNC
    wikipedia, and Google)
  • 2. Extract the ngrams (NLP tools SRI tool )
  • 3. Build a large ngram database
  • 4. Develop and test different highlighting
    methods
  • 5. Highlight the possibly problematic ngrams in
    learners writing

11
Grammar Checker Online
  • The links
  • http//140.122.83.2504000/main (BNC)
  • http//140.122.83.250/search.php (Google)
  • http//140.122.83.245/ngram-check/ (BNC)

12
The Web Interface of Ngram Checker
13
(No Transcript)
14
(No Transcript)
15
A Simple Example
16
Evaluate the Checker Performances Any Standard
Way of Evaluating Checkers?
  • What kind of errors should be used to test the
    grammar checker?
  • Fair assessment- same set of sentences.
  • How many sentences?
  • Many different categories and errors
  • Lexical factors.
  • NLP researchers F-measure and precision and
    recall

17
Test with CLEC Corpus from China
  • The size of the Chinese learners of English
    Corpus.
  • 1 million error-tagged learner corpus.
  • With about 60 error types.
  • We decided to single out some sentences (10
    sentences) from the learner corpus and then throw
    them into our ngram checkers.

18
1. Form
19
2. Verb Phrases (Tense)
20
3. Noun Phrases
21
4. Pronouns
22
5. Adjective Phrases
23
6. Prepositions- seems to be a difficult area
24
7. Conjuncts Errors
25
8. Word Errors
26
9. Collocation Errors
27
10. Sentence Structure Errors
28
The Strengths of NTNU Ngram Checkers
  • Ngram is good at detecting errors in the local
    or adjacent domains. It can indeed find many
    errors in CLEC.
  • Spellings
  • Word forms
  • Verb phrases
  • Noun phrases
  • Adj phrases
  • Collocations

29
The Weakness of Ngram Checkers
  • It failed to catch the followings effectively
  • Tense errors
  • Conjuncts errors
  • Fragments
  • Pronoun errors
  • Preposition errors
  • The run on sentences
  • The missing words

30
The Poor Performance of Ngram Checkers for Tense
and Conjuncts
31
Rule-based Checker can Perform Better for Some
Nonlocal Errors
32
Wintertree Grammar Checker
33
BUT Ngram Performed Better for the Local Errors
  • I have some book. The informations are so rich.
    These researches are excellent. He is new
    friend. He cutted his finger. He enjoys to eat.
    He wants jumping into the river. I cannot
    decided about this. These reason are too simple.
    I has three answers.

34
What Can We Do to Improve Feedback from Ngram
Checkers?
  • Only Highlighting and No detailed feedback??
  • We are facing a bigger challenge.
  • How to recommend correct usage? How we can find
    the correct examples for students?
  • If students only see the errors highlighted, they
    might still fail to correct the errors.
  • For agreement errors, tense errors, confusing
    words, Students might be able to self-correct.
  • However, if there are some tense errors,
    collocations errors or preposition errors,
    learners might need more specific suggestions.

35
Find the Proper Collocates increase and improve
life
36
Confusion between accept and receive your apology
37
Future Directions for Improvement
  • Test with many different errors and find the
    strengths and limitations of Ngram-based checkers
    and Rule-based checkers
  • Use Tagged learner corpus to find the error
    patterns from learner languages
  • Feedback can be added in for ngram-based Checkers
    on the major error patterns
  • Better integration of the rule- based system and
    ngram checkers

38
  • Thanks for your attention
  • Questions and Discussions
  • hjchen_at_ntnu.edu.tw
  • National Taiwan Normal University
Write a Comment
User Comments (0)
About PowerShow.com