Title: An overview of corpusbased studies of Chinese learners English
1An overview of corpus-based studies of Chinese
learners English
2Acknowledgements
- A large part of this presentation has been drawn
from Prof Wen Qiufangs presentation at the 2006
International Conference on CALL. Besides, some
ideas have drawn heavily on discussions with some
of my friends. - I am, however, solely responsible for any
possible mis-interpretations of Prof Wens ideas.
3Topics Addressed
- Learner corpora in mainland China
- Corpus-based studies of interlanguage in mainland
China - Problems and challenges
4Learner Corpora in Mainland China
- Corpora are becoming mainstream. (Svartvik 1996)
- Today, the corpus is considered the default
resource for almost anyone working in
linguistics. No introspection can claim credence
without verification through real language data.
(Teubert 2005)
5Learner Corpora in Mainland China
- Chinese Learners English Corpus (CLEC)
- First learner corpus in China
- Components
- ST-2
- ST-3
- ST-4
- ST-5
- ST-6
- Error-tagged
- Followed by a boom of research papers
COLEC
6Learner Corpora in Mainland China
- College Learners Spoken English Corpus (COLSEC)
- Transcriptions from CET
- 0.7 million tokens
- Manually tagged for mis-pronunciation and other
oral language features - Tasks
- Teacher-student conversations
- Student-student discussions
- Teacher-student discussions
- LINDSEI-China
7Learner Corpora in Mainland China
- Spoken and Written Corpus of Chinese Learners
(SWECCL) - SECCL
- WECCL
- 1.0
- 2.0 (2 2 million tokens, under construction)
8Learner Corpora in Mainland China
- Spoken and Written Corpus of Chinese Learners
(SWECCL) - SECCL (1.18 million tokens, about 230 hours
speech) - Collected from TEM-4 (1996-2002)
- Transcripts (1 million tokens)
- Digital sound files
- With test scores for in-depth studies (2
independent scores) - Number of subjects
- 6 groups from each year (1996-2002)
- 42 groups about 1400 students
- POS tagged with CLAWS7
9Learner Corpora in Mainland China
- Spoken and Written Corpus of Chinese Learners
(SWECCL) - SECCL (1.18 million tokens, about 230 hours
speech) - Tasks
10Learner Corpora in Mainland China
- Spoken and Written Corpus of Chinese Learners
(SWECCL) - WECCL (1.46 million tokens)
- Timed and untimed writing (by English majors)
- Argumentative and narrative writing
- Collected from students across 4 years at
university - Essays in the corpus address a variety of topics
(important!) - POS tagged with CLAWS7
11Learner Corpora in Mainland China
SWECCL
WECCL
SECCL
Two million
Two million
12Learner Corpora in Mainland China
- 2003-2006 National Spoken English Test for
second-year English majors (band 4) - 2000-2006 National Spoken English Test for
4th-year English majors-Band 8 (Task 3) - A set of longitudinal data (2001-2004)
13Learner Corpora in Mainland China
- New task in SECCL 2.0
- Make a comment on a given topic
- More careful transcription and proof-reading
- Better sound quality
- More compatible sound format
14Learner Corpora in Mainland China
- Longitudinal dataset in SECCL 2.0
- 56 students
- 40 hours speech
-
15Learner Corpora in Mainland China
- Tasks in the longitudinal dataset in SECCL 2.0
- Reading aloud
- Retelling a story
- Talking on a given topic (Narrative)
- Talking on a given topic (argumentative)
- Conversation (Role play)
- Discussion on a given topic
16Learner Corpora in Mainland China
- Bilingual Corpus of Chinese English Learners
(BICCEL) under construction - Collected from TEM-8 and other sources
- Spoken and written
- The first parallel corpora of learners
- Can better justify claims about L1 transfer
- 2 million tokens upon completion
17Learner Corpora in Mainland China
- Bilingual Corpus of Chinese English Learners
(BICCEL)
BICCEL
Spoken
Written
E-C
C-E
E-C
C-E
0.5 million
0.5 million
0.5 million
0.5 million
18Corpus-based studies of interlanguage in mainland
China
- Sources
- China National Knowledge Infrastructure
(CNKI)(On-line journals) - Digital dissertation database
19Corpus-based studies of interlanguage in mainland
China
20Corpus-based studies of interlanguage in mainland
China
21Corpus-based studies of interlanguage in mainland
China
22Corpus-based studies of interlanguage in mainland
China
- Conferences and workshops
- The International conference on Corpus
Linguistics 25-27 October, 2003. Shanghai - The First National Symposium on corpus
linguistics and ELT Education, 11-13 October,
2004 - Corpus Linguistics Week, Dec. 12-16, 2005. Henan
Normal University - Workshop on the use of corpus in teaching and
research 17-19 March, 2006. Beijing
23Corpus-based studies of interlanguage in mainland
China
A growing number of corpus linguists, corpus
activists, and corpus enthusiasts.
24Problems and challenges
- Data representativeness, tagging, openness
- Reliability of methods
- Subjects learners on the tertiary level
- Aspects of interlanguage studied lexical,
discoursal, etc.
25Problems and challenges
- Conclusions Shallow, unconvincing conclusions
- Research Issues predominantly error analysis
- Researchers Corpus-based studies are
enthusiastically pursued by a small number of
researchers who are inclined to cluster together - Thanks to www.corpus4u.com
26100 thanks!