A Preliminary Study of the 2000 Basic English Word List in Taiwan - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

A Preliminary Study of the 2000 Basic English Word List in Taiwan

Description:

1. Importance of Word List. 2. Basic Criteria for Making a Word List. 3. Well-known ... cereal, chess, chopsticks, chubby, clap, classmate, closet, cockroach, ... – PowerPoint PPT presentation

Number of Views:415
Avg rating:3.0/5.0
Slides: 47
Provided by: vinc8
Category:

less

Transcript and Presenter's Notes

Title: A Preliminary Study of the 2000 Basic English Word List in Taiwan


1
A Preliminary Study of the 2000 Basic English
Word List in Taiwan
  • Vincent Su (???)

2
Outline
  • I. Introduction
  • II. Research Questions
  • III. Instruments
  • IV. Discussion
  • V. Conclusion

3
I. Introduction
  • 1. Importance of Word List
  • 2. Basic Criteria for Making a Word List
  • 3. Well-known Word Lists

4
1. Importance of Word List
  • Four kinds of word list (Nation, 2001)
  • high-frequency words
  • academic words
  • technical words
  • low-frequency words
  • High-frequency words
  • Common core or start-up vocabulary for beginners.
  • This small number of high frequency words makes
    up most of the words learners meet.
  • With these words, beginners can feel empowered
    that they can do things.
  • Master the High-frequency words Many studies
    have suggested that second language learners need
    first to concentrate on the high frequency words
    and the return for learning the high frequency
    words is very great (Nation, 1993 Meara, 1995
    Nation Waring, 1997 Waring, 2000 .Nation
    2001).

5
2. Basic Criteria for Making a Word List
  • When making a list of high-frequency words, both
    frequency and range must be considered.
  • Frequency As not all of the words are equally
    useful, one measure of usefulness is word
    frequency, that is, how often the word occurs in
    normal use of the language (National, 1997).
  • Range is measured by seeing how many different
    texts or subcorpora each particular word occurs
    in. A word with wide range occurs in many
    different texts or subcorpora (Nation, 2001 p16)

6
3. Well-know Word Lists
  • Table 1. Well-known English Word Lists

7
Word Lists in Taiwan
  • Table 2. English Word Lists in Taiwan

8
II. Research Questions
  • 1. Word form and size between the GSL and TBEWL
  • What is the size of word families and types?
  • What is the distribution of words in 26 letters
    of alphabet?
  • What is the distribution of content and function
    words?
  • 2. Overlap Does the GSL contain most of the
    TBEWL?
  • 3. Coverage Does the TBEWL provide better
    coverage than the GSL?
  • 4. Range Do most of the words in the lists occur
    in a range of texts?

9
III. Instruments
  • 1. RANGE and FREQUENCY programs
  • 2. Taiwan Basic English Word List (TBEWL)
  • 3. The General Service List (GSL)

10
1. RANGE and FREQUENCY programs
  • RANGE Two programs http//www.vuw.ac.nz/lals/staf
    f/paul-nation/RANGE32.zip
  • Range32 (Vocabulary profile)
  • Here is a sample table from RANGE.
  • What common vocabulary is found in all these
    texts?
  • How large a vocabulary is needed to read this
    text?
  • If a learner has a vocabulary of 2,000 words, how
    much of the vocabulary in the text will be
    familiar to the learner?
  • What are the words in the text which the learner
    is not likely to know?
  • How well does the course book prepare learners
    for the vocabulary in newspapers?
  • How rich a vocabulary do second language learners
    use in their free writing?
  • What is needed to run RANGE?
  • base word lists (BASEWRD1.txt, BASEWRD2.txt,
    BASEWRD3.txt etc),
  • text files in ASCII (DOS) format.

RANGE Base Word Lists
Texts
Vs.
11
1. RANGE and FREQUENCY programs
  • Frequency32
  • Run on an ASCII text
  • Make a frequency list of all the words in a
    single text.
  • Only run one text at a time.
  • Output
  • Order An alphabetical list or a frequency
    ordered list.
  • Rank order of the words, their raw frequency and
    the cumulative percentage frequency.
  • Here is some sample output from FREQUENCY.

12
2. Taiwan Basic English Word List (TBEWL)
  • Date of announcement
  • 21 January, 2003 by Ministry of Education
  • The TBEWL was based on many word lists and
    corpus
  • (1) Word lists from curricular standards used by
    elementary and junior high schools from Taiwan,
    Korea, Japan, and Shanghai (??)
  • (2) Word lists from Taiwan college entrance
    center
  • (3) The most frequent word lists from the U.S.,
    U.K., South Africa and Japan
  • (4) Collins COBUILD corpus

13
2. Taiwan Basic English Word List (TBEWL)
  • The TBEWL is adjusted by
  • (1) life experience of the elementary and junior
    high school students in Taiwan
  • (2) Standards for English language learning
  • (3) Environment for foreign language learning
  • The TBEWL includes
  • (1) The most basic English word list 1000 words
    (TBEWL 1000)
  • (2) The most useful word list 2000 (TBEWL 2000)
    including the first 1000 words.

14
Making the TBEWL 1st and 2nd 1000 Words
  • Make individual headword list
  • Delete (derivational, inflectional or synonymous)
    words in the parentheses
  • Separate compounds (ice cream) , phrases (get up)
    into individual words
  • Make Word families of TBEWL
  • A same basis for comparison Word family
  • Dictionaries
  • Cambridge Advanced Learners Dictionary
  • Collins CoBuild Advanced Learners English
    Dictionary
  • WordNet 2.1
  • Add the letters of the alphabet into the TBEWL
    1st 1000 word families
  • Test the TBEWL 1st and 2nd 1000 words as the base
    word files (basewrd1.txt and basewrd2.txt) in
    RANGE. (no overlap between 1st and 2nd 1000 word
    types)

15
Making the TBEWL 1st and 2nd 1000 Word Families
  • Table 3 TBEWL 1st and 2nd 1000

16
3. The General Service List (GSL)
  • Why does this study choose GSL
  • With similar purpose and numbers of words
  • The GSL still remains the best of the available
    list because of (Nation, 1993 1997, Nation
    Hwang, 1995 Nation Waring, 1997 Coxhead,
    2000).
  • (1) its information about frequency of meanings
  • (2) Wests careful application of criteria other
    than frequency and range
  • (3) coverage from 78 to 92 (82 mean coverage)
    of various kinds of written text
  • (4) its basis for many series of graded readers
  • Criteria to select these words (West, 1953)
  • Frequency
  • Ease or difficulty of learning (cost)
  • Necessity
  • Cover
  • Stylistic level
  • Intensive and emotional words

17
Making the GSL 1st and 2nd 1000 Word Families
  • Table 4. GSL 1st and 2nd 1000
  • The letters of the alphabet, numbers, days of the
    weeks and months of the year are added in making
    the GSL 1st 1000 word families.
  • Adapted from the base word files in RANGE.

18
IV. Discussion
  • 1. Word form and size
  • Word family and types
  • Word distribution in 26 letters of alphabet
  • Content and function words
  • 2. Overlap Does the GSL contain most of the
    TBEWL?
  • 3. Coverage Does the TBEWL provide better
    coverage than the GSL?
  • 4. Range Do most of the words in the lists occur
    in a range of texts?

19
1. Word form and size
  • Table 5. Word family and types in GSL and TBEWL
  • The TBEWL has a little more words in its original
    word list because it includes some compounds (air
    conditioner, hair dresser, ice cream, ) and
    phrases (a little, a few, .
  • After extending to word families, the GSL has
    more word families with more types because the
    TBEWL originally includes numbers, months and
    weeks, and has more function words, food, animals
    insects which are not included in the GSL.

20
1. Comparison of Word Forms
  • Table 6. Word distribution in 26 letters of
    alphabet
  • The distribution in letters of alphabet is
    similar between the GSL and the TBESL

21
1. Comparison of Word Forms
  • Table 7. Distribution of Function and Content
    Words
  • TBEWL has more function words as it already
    includes numbers and pronouns.

22
2. Overlap Does the GSL contain most of the
TBEWL?
  • Table 8. TBEWL 2001 words in GSL 1952 Words

23
2. Overlap Does the GSL contain most of the
TBEWL?
  • Table 9. TBEWL 2001 Words in GSL 7827 Word Types
  • Overlap 1506 (75.3) words
  • TBEWL not in GSL 495 (24.7) words

24
TBEWL not in GSL 495 (24.7) words
  • High Frequency Words According to the frequency
    per one million in BNC lemmas, some of these
    words are high frequent such as
  • area (585), affect (133), assume (112), available
    (272), contact (140), contract (175), couple
    (152). Create (217), design (266), which may be
    worth including in the list.
  • Low Frequency Words There are still a lot of
    these words which are rather low frequent and
    some are even not included in the BNC frequent
    word list, such as words in the food category,
    culture specific words. Most of these words are
    related to daily life or culture specific words.
    They are the special features of the TBEWL, and
    may cause low coverage while comparing to GSL in
    the next section.
  • Low frequency words (not in BNC 10 frequencies
    per million) include alphabet, armchair,
    badminton, bakery, bakery, banana, barbecue,
    bark, baseball, basement, basketball, blackboard,
    blouse, bookcase, bookstore, brunch, buffet, bug,
    bun, burger, cabbage, cafeteria, campus, candy,
    carrot, cartoon, centimeter, cereal, chess,
    chopsticks, chubby, clap, classmate, closet,
    cockroach, coke, comic, conditioner,
    congratulation, considerate, cookie, couch,
    cowboy, crab, crayon, cute, dentist, dessert,
    dial, diligent, dinosaur, dizzy, dodge, doughnut,
    downtown, dresser, drugstore, dumb, dumpling,
    earrings, seafood, semester, shrimp, shorts,
    skate, ski, sneakers, steak, sweater, swimsuit,
    thanksgiving, pork, wok

25
TBEWL not in GSL
  • (1) Culture specific words
  • chopsticks, dumpling, typhoon, Halloween,
    Thanksgiving, cowboy
  • (2) Food drink
  • bakery, banana, barbecue, beef, beer, brunch,
    buffet, bun, burger, cabbage, cafeteria, candy,
    carrot, cereal, chocolate, coke, cookie, crab,
    delicious, dessert, doughnut, dumpling, mango,
    noodle, peach, pear, pork, pumpkin, seafood,
    shrimp, steak
  • (3) Sports games
  • badminton, baseball, basketball, bike, chess,
    skate, ski, tennis
  • (4) School
  • biology, blackboard, bookcase, bookstore, campus,
    chart, chemistry, classmate, classroom, crayon,
    debate, eraser, quiz, semester, textbook, workbook

26
TBEWL not in GSL
  • (5) Animals insects
  • ant, bark, bee, bug, butterfly, cockroach,
    dinosaur, dolphin, dragon, eagle, mosquito,
    panda, shark, spider, tiger, wolf
  • (6) House apartments
  • balcony, bathroom, bench, blanket, carpet,
    ceiling, closet, conditioner, couch, decorate
  • (7) Clothing accessories
  • blouse, earrings, pajamas, pants, scarf, shorts,
    sneakers, sweater, swimsuit, t-shirt, trousers,
    vest, wallet
  • (8) Countries and proper names
  • China, Chinese, America, Taiwan, ROC, USA, MRT
  • (9) Computer Tech
  • computer, e-mail, Internet

27
3. Coverage Does the TBEWL provide better
coverage than the GSL?
  • Coverage refers to the percentage of tokens in a
    text which are accounted for (covered by)
    particular word lists. The corpora used in the
    comparison are
  • (1) VOA corpus a 1,300,000 token VOA written
    script corpus.
  • (2) Literature corpus a 4,290,000 token
    fiction/story/fairy tale corpus of texts from
    Project Gutenberg and The Baldwin Online
    Childrens Literature Project.
  • (3) Academic corpus a 632,000 token English
    paper texts from thesis abstracts, online
    journals such as The Journal Community
    Informatics, TESL-EJ, The Internet TESL Journal,
    Language Learning and Technology, ReCALL, Reading
    in a Foreign Language, and Working Papers in
    TESOL and Applied Linguistics.
  • (4) Examination and textbook corpus a 30,900
    token corpus from (A) English examination texts
    from sample GEPT elementary level tests and the
    Basic Competence Test for Junior High School
    Students from 20012005. (B) Worksheets (reading,
    writing, activities, examinations, grammars,
    cultural supplementary) from Longman junior high
    school English textbook (Lesson 16 in Book 1
    3).

28
3. Coverage Does the TBEWL provide better
coverage than the GSL?
  • Table 10. Percentage coverage of a range of
    corpora by the lists from the GSL and TBEWL
  • The GSL provides slightly better coverage in most
    of the corpora except Exam-Textbook corpus.
  • The TBEWL has better coverage in Examination and
    Textbook corpus, and shows its local color.

29
4. Range Do Most of the words in the lists occur
in a range of texts?
  • Based on the same corpus, the range is going to
    see if all the words in the lists are working.
    That is, does every word family (headword) in the
    lists occur in the various corpora? There could
    be words in the lists which seem useful but do
    not occur.

30
4. Range Do Most of the words in the lists occur
in a range of texts?
  • Table 11. Percentage of word families in the
    lists occurring in various corpora
  • The GSL consists of 1986 (998988) word
    families. The TBEWL consists of 1963 (985978)
    word families.
  • The TBEWL lists are fractionally better than the
    GSL in VOA and Exam-Textbook corpora while the
    GSL is better in Literature and Academic corpora.
  • Generally, the TBEWL lists show a little better
    in the distribution among different corpora.

31
V. Conclusion
  • 1. Comparison of Word Forms
  • Although the TBEWL has a little more words in its
    list, it consists of a little less word families
    and types when making its words into word
    families.
  • Word distribution in 26 letters of alphabet in
    both lists is similar. The first 3 letters are
    S, C, and P while the last two are X and Z.
  • The TBEWL has a little more function words than
    the GSL.
  • 2. There are 1552 (76) TBEWL words also found in
    the GSL, while there are 499 (24) words of TBEWL
    not in GSL.
  • 3. The GSL provides slightly better coverage in
    most of the corpora except in Exam Textbook
    corpus.
  • 4. Generally, the TBEWL lists are fractionally a
    little better than the GSL.

32
V. Conclusion
  • 1. Answers to the Research Questions
  • 2. Findings of the Study
  • 3. Limitation of the Study
  • 4. Implications for Vocabulary Teaching and
    Learning

33
1. Answers to the Research Questions
  • (1) Comparison of Word Forms and Size
  • A. Although the TBEWL has a little more words in
    its list, it consists of a little less word
    families and types when making its words into
    word families.
  • B. The word distribution in 26 letters of
    alphabet in both lists is similar. The first 3
    letters are S, C, and P while the last two are X
    and Z.
  • C. The TBEWL has a little more function words
    than the GSL.

34
1. Answers to the Research Questions
  • (2) There are 1506 (75.3) TBEWL words also found
    in the GSL, while there are 495 (24.7) words of
    TBEWL not in GSL.
  • (3) The GSL provides slightly better coverage in
    most of the corpora except in Exam Textbook
    corpus.
  • (4) Generally, the TBEWL lists are fractionally a
    little better among the four corpus than the GSL.

35
2. Findings of the Study
  • The GSL performs its better coverage and range in
    most of the corpora, and prove its still a very
    good word list in spite of its age.
  • Comparing to the GSL, the TBEWL has less word
    families (-23) and types (-564), but the TBEWL
    demonstrates a similar coverage and range among
    different corpora. The TBESL seems to be a
    workable word list.
  • Although 495 TBEWL words are not found in the
    GSL, these words are related to daily life and
    culture, such as food drinks, sports, school,
    animals insects, house apartments, and
    clothing accessories. In spite of their low
    frequencies, they may be useful for beginners and
    students under junior high schools in Taiwan.
  • The coverage and range of the TBEWL shows much
    better in the Exam-Textbook corpus, which proves
    that it is a good word list with Taiwan color.

36
3. Limitation of the Study
  • Limitation of the Range software
  • A. Compounds and contractions can not be
    identified by the Range software.
  • B. Homographs were counted under the same word
    family (May a month of a year and may an
    auxiliary verb were counted under the same
    headword)
  • Limitation of Corpus
  • A. Only 4 kinds of corpus are collected in this
    study.
  • B. In each kind of corpus, this study only
    collects a limited range.
  • Future research
  • The future study should try to include larger
    corpus with wider range, especially the textbook
    corpus. Thus the comparison of coverage and range
    shall be more stable and convincing.

37
4. Implications for Vocabulary Teaching and
Learning
  • Teachers can see the differences between the two
    word lists, and know what words are special in
    the TBEWL, and plan how to teach these words.
  • The RANGE software with the TBEWL 1st and 2nd
    1000 as its base word files can be used to
    calculate the coverage of any text. This will be
    a convenient tool for teachers to check if the
    text is suitable for their students or not.
  • Besides the headwords, teachers should also
    introduce the word families of a headword,
    including prefixes, suffixes, derivations and
    inflection of a word.
  • Teachers and students should not only focus on
    the word list. Although decontextualized learning
    of vocabulary is effective, learning from context
    is still necessary to broaden the width of word
    knowledge. Besides, students need also learn how
    to use the words productively to deepen their
    word knowledge.

38
Thanks
Comments and Discussion
su_at_ntjcpa.edu.twhttp//www.opensource.idv.tw/
39
Word family
  • A word family consists of a headword, its
    inflected forms, and its closed related derived
    forms. That is, it includes both closely related
    inflected and derived forms even if the part of
    speech is not the same. Here are some examples
  • ADD
  • ADDED
  • ADDING
  • ADDITION
  • ADDITIONAL
  • ADDITIVE
  • ADDITIONS
  • ADDS
  • Major problem what should be included in a word
    family and what should not.
  • Bauer, Laurie and Paul Nation (1993). Word
    Families. International Journal of Lexicography,
    6(4), 253279.
  • A word family consists of a base word and all its
    derived and inflected forms. These are important
    as it is believed that they can be understood by
    learners without them having to learn each form
    separately. For example, in the Vocabprofile
    programme the word family grouped under the head
    word ABLE includes abler, ablest, ably and
    unable. For more information on the
    classification criteria used see Bauer Nation
    (1993).

ADMIT ADMISSION ADMITTEDLY ADMITS ADMITTED AD
MITTING
40
Sample Output from RANGE
  • This shows that 54 of the running words in the
    text are in base list one and these 54 words make
    up 72 of the total running words in the text. In
    the word list column, one, two, three refer to
    each of the base lists.

41
Sample output from FREQUENCY
  • In the example, the word type a is the third most
    frequent word. It occurs 108 times in the text,
    and along with the and of covers 14.29 of the
    text. On its own it covers 3.01 (14.29 minus
    11.28) of the text. See the beginning of this set
    of instructions to see how to run FREQUENCY.

42
Web Vocabulary Profilers http//132.208.224.131/v
p/
43
Word Frequency Text Profilerhttp//www.edict.com.
hk/textanalyser/
44
GSLAWL vs. TBESL
  • Table 10. Distribution of TBEWL 2000 in GSL
    Headwords and Family Words

45
  • Types are different from tokens in that the exact
    same word form represents only one type. Thus
    Hamlet's famous To be or not to be counts as six
    tokens but only four types due to the repetitions
  • A token is a string of letters making up an
    individual word. Thus Hamlet's famous To be or
    not to be counts as six tokens irrespective of
    the repetitions.

46
  • Nation and Hwang (1995)
  • Replacing 452 of the words in GSL with 250 words
    of higher frequency across a range of genres only
    result a 1 coverage (from 82.3 to 83.4).
  • Nation (1993)
  • This list is rather old, based on work done in
    1930s and 1940s. However it still remains the
    most useful one available as the relative
    frequency of various meanings of each word is
    given.
  • Older series of graded readers are based on this
    list.
Write a Comment
User Comments (0)
About PowerShow.com