An overview of corpusbased studies of Chinese learners English - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

An overview of corpusbased studies of Chinese learners English

Description:

A large part of this presentation has been drawn from Prof Wen Qiufang's ... No introspection can claim credence without verification through real language data. ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 27
Provided by: huns7
Category:

less

Transcript and Presenter's Notes

Title: An overview of corpusbased studies of Chinese learners English


1
An overview of corpus-based studies of Chinese
learners English
  • ???

2
Acknowledgements
  • A large part of this presentation has been drawn
    from Prof Wen Qiufangs presentation at the 2006
    International Conference on CALL. Besides, some
    ideas have drawn heavily on discussions with some
    of my friends.
  • I am, however, solely responsible for any
    possible mis-interpretations of Prof Wens ideas.

3
Topics Addressed
  • Learner corpora in mainland China
  • Corpus-based studies of interlanguage in mainland
    China
  • Problems and challenges

4
Learner Corpora in Mainland China
  • Corpora are becoming mainstream. (Svartvik 1996)
  • Today, the corpus is considered the default
    resource for almost anyone working in
    linguistics. No introspection can claim credence
    without verification through real language data.
    (Teubert 2005)

5
Learner Corpora in Mainland China
  • Chinese Learners English Corpus (CLEC)
  • First learner corpus in China
  • Components
  • ST-2
  • ST-3
  • ST-4
  • ST-5
  • ST-6
  • Error-tagged
  • Followed by a boom of research papers

COLEC
6
Learner Corpora in Mainland China
  • College Learners Spoken English Corpus (COLSEC)
  • Transcriptions from CET
  • 0.7 million tokens
  • Manually tagged for mis-pronunciation and other
    oral language features
  • Tasks
  • Teacher-student conversations
  • Student-student discussions
  • Teacher-student discussions
  • LINDSEI-China

7
Learner Corpora in Mainland China
  • Spoken and Written Corpus of Chinese Learners
    (SWECCL)
  • SECCL
  • WECCL
  • 1.0
  • 2.0 (2 2 million tokens, under construction)

8
Learner Corpora in Mainland China
  • Spoken and Written Corpus of Chinese Learners
    (SWECCL)
  • SECCL (1.18 million tokens, about 230 hours
    speech)
  • Collected from TEM-4 (1996-2002)
  • Transcripts (1 million tokens)
  • Digital sound files
  • With test scores for in-depth studies (2
    independent scores)
  • Number of subjects
  • 6 groups from each year (1996-2002)
  • 42 groups about 1400 students
  • POS tagged with CLAWS7

9
Learner Corpora in Mainland China
  • Spoken and Written Corpus of Chinese Learners
    (SWECCL)
  • SECCL (1.18 million tokens, about 230 hours
    speech)
  • Tasks

10
Learner Corpora in Mainland China
  • Spoken and Written Corpus of Chinese Learners
    (SWECCL)
  • WECCL (1.46 million tokens)
  • Timed and untimed writing (by English majors)
  • Argumentative and narrative writing
  • Collected from students across 4 years at
    university
  • Essays in the corpus address a variety of topics
    (important!)
  • POS tagged with CLAWS7

11
Learner Corpora in Mainland China
  • SWECCL 2.0 (2007)

SWECCL
WECCL
SECCL
Two million
Two million
12
Learner Corpora in Mainland China
  • SECCL 2.0
  • 2003-2006 National Spoken English Test for
    second-year English majors (band 4)
  • 2000-2006 National Spoken English Test for
    4th-year English majors-Band 8 (Task 3)
  • A set of longitudinal data (2001-2004)

13
Learner Corpora in Mainland China
  • SECCL 2.0
  • New task in SECCL 2.0
  • Make a comment on a given topic
  • More careful transcription and proof-reading
  • Better sound quality
  • More compatible sound format

14
Learner Corpora in Mainland China
  • Longitudinal dataset in SECCL 2.0
  • 56 students
  • 40 hours speech

15
Learner Corpora in Mainland China
  • Tasks in the longitudinal dataset in SECCL 2.0
  • Reading aloud
  • Retelling a story
  • Talking on a given topic (Narrative)
  • Talking on a given topic (argumentative)
  • Conversation (Role play)
  • Discussion on a given topic

16
Learner Corpora in Mainland China
  • Bilingual Corpus of Chinese English Learners
    (BICCEL) under construction
  • Collected from TEM-8 and other sources
  • Spoken and written
  • The first parallel corpora of learners
  • Can better justify claims about L1 transfer
  • 2 million tokens upon completion

17
Learner Corpora in Mainland China
  • Bilingual Corpus of Chinese English Learners
    (BICCEL)

BICCEL
Spoken
Written
E-C
C-E
E-C
C-E
0.5 million
0.5 million
0.5 million
0.5 million
18
Corpus-based studies of interlanguage in mainland
China
  • Sources
  • China National Knowledge Infrastructure
    (CNKI)(On-line journals)
  • Digital dissertation database

19
Corpus-based studies of interlanguage in mainland
China
20
Corpus-based studies of interlanguage in mainland
China
21
Corpus-based studies of interlanguage in mainland
China
22
Corpus-based studies of interlanguage in mainland
China
  • Conferences and workshops
  • The International conference on Corpus
    Linguistics 25-27 October, 2003. Shanghai
  • The First National Symposium on corpus
    linguistics and ELT Education, 11-13 October,
    2004
  • Corpus Linguistics Week, Dec. 12-16, 2005. Henan
    Normal University
  • Workshop on the use of corpus in teaching and
    research 17-19 March, 2006. Beijing

23
Corpus-based studies of interlanguage in mainland
China
A growing number of corpus linguists, corpus
activists, and corpus enthusiasts.
24
Problems and challenges
  • Data representativeness, tagging, openness
  • Reliability of methods
  • Subjects learners on the tertiary level
  • Aspects of interlanguage studied lexical,
    discoursal, etc.

25
Problems and challenges
  • Conclusions Shallow, unconvincing conclusions
  • Research Issues predominantly error analysis
  • Researchers Corpus-based studies are
    enthusiastically pursued by a small number of
    researchers who are inclined to cluster together
  • Thanks to www.corpus4u.com

26
100 thanks!
Write a Comment
User Comments (0)
About PowerShow.com