Harnessing Corpora for real and virtual ELT purposes - PowerPoint PPT Presentation

Loading...

PPT – Harnessing Corpora for real and virtual ELT purposes PowerPoint presentation | free to download - id: 6dd85-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Harnessing Corpora for real and virtual ELT purposes

Description:

The Longman Grammars of English (Quirk, Greenbaum, Svartvik, Leech and others) ... BIBER, D., S. JOHANSSON, G. LEECH, S. CONRAD & E. FINEGAN. 1999. ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 21
Provided by: Beli153
Learn more at: http://web.letras.up.pt
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Harnessing Corpora for real and virtual ELT purposes


1
Harnessing Corpora for real and virtual ELT
purposes
  • IFELT
  • Belinda Maia
  • FLUP
  • 10/11.2003

2
What is a corpus?
  • CORPUS - 13c from Latin corpus body - plural
    corpora)
  • A body of texts, utterances or other specimens
    considered more or less representative of a
    language, stored as an electronic database.
  • A corpus corpora may store many millions of
    running words
  • A corpus can be tagged to identify and classify
    words and other formations
  • A corpus can be searched using concordancing
    programmes

3
An example of concordancing(from the BNC)
  • A0R 2231 Maybe with twists of bacon.
  • A35 256 This substantial, 15-minute orchestral
    movement was inspired by three paintings of
    Innocent X by Francis Bacon, themselves based on
    Velasquez.
  • A6N 1311 They could cook vegetables and meat
    simply, deal with eggs and bacon and porridge,
    and they were able to bake and housekeep,
    learning as they went along.
  • AAX 286 Sir Richard Body, MP Hirohito, shy god
    who liked bacon eggs.
  • ABB 67 Remembering bacon and ham, the versatility
    of the pig can be stretched to pies, sandwiches
    and ham, egg and chips.
  • ABB 236 The Smoked Trout Parma Ham Mousse (see
    p18) is merely decorated with slices of the ham
    and the Carbonnade of Beef is enriched by using
    diced ham instead of bacon.

4
An example of concordancing(with Wordsmith)
5
Tagging
  • Example courtesy Catherine Ball at
    http//www.georgetown.edu/faculty/ballc/corpora/tu
    torial2.htmlRTFToC16
  • A01 2 '_' stop_VB electing_VBG life_NN
    peers_NNS '_' ._.A01 3 by_IN Trevor_NP
    Williams_NP ._.A01 4 a_AT move_NN to_TO
    stop_VB \0Mr_NPT Gaitskell_NP from_INA01 4
    nominating_VBG any_DTI more_AP labour_NNA01 5
    life_NN peers_NNS is_BEZ to_TO be_BE made_VBN
    at_IN a_AT meeting_NNA01 5 of_IN labour_NN
    \0MPs_NPTS tomorrow_NR ._.

6
Types of Corpora
  • Monolingual corpora - in which the texts are all
    in the same language
  • Parallel and/or aligned corpora - in which
    originals and translations are aligned so that
    both texts appear on the screen together and you
    can see how the translator has translated the
    original.
  • Comparable corpora - in which a selection of
    original texts has been made in two or more
    languages dealing with the same subject or genre.

7
Types of Corpora
  • Specialized corpora - texts on specialized
    subjects for the extraction of terminology and
    complementary explanatory material - definitions,
    explanations etc.
  • Concurrent corpora - used to describe texts taken
    from newspapers on the same subject on
    approximately the same dates.
  • 'Do-it-yourself ' or disposable corpora - small
    specialized corpora for the purpose of teaching
    translation or language

8
Corpora and Lexicography
  • COBUILD Collins Publishers University of
    Birmingham 1980s
  • Corpora work that revolutionised lexicography
  • TODAY - All serious lexicography uses corpora -
    e.g.
  • Oxford English Dictionary http//www.oed.com/
  • Academia das Ciências de Lisboa

9
Corpora Grammar
  • The Longman Grammars of English (Quirk,
    Greenbaum, Svartvik, Leech and others)
  • Based on corpora the classical corpora now
    availableon CD-ROM through ICAME
  • http//www.hd.uib.no/icame.html
  • BIBER, D., S. JOHANSSON, G. LEECH, S. CONRAD E.
    FINEGAN. 1999. Longman Grammar of Spoken and
    Written English. Harlow Pearson Education Ltd. 

10
The corpora debate
  • The bigger the corpus, the better
  • The carefully chosen representative corpora
  • Chomsky gt the average educated speaker was a
    better source
  • Big corpora are not necessarily representative
    e.g. The Hansard corpus
  • Any selection of texts is a selection

11
Yet
  • Very Large corpora exist and are very useful
  • Much research work nowadays is done with small
    selected corpora for studying
  • different registers
  • special subjects

12
Using official corpora - EN
  • British National Corpus at http//sara.natcorp.ox
    .ac.uk/lookup.html  - 50 examples of any word or
    expression for free on-line
  • CD-ROM of 100 million words available
  • The COBUILD project http//titania.cobuild.collins
    .co.uk/form.html
  • 40 Examples on-line

13
Using official corpora - PT
  • AC/DC, CetemPúblico Portuguese monolingual
    corpora
  • COMPARA aligned English/Portuguese corpus
  • All at http//www.linguateca.pt

14
Language Learning/Teaching and corpora
  • How can a language teacher use corpora?
  • Why should a language learner need to know about
    corpora?
  • What can be learnt?

15
How can a language teacher use corpora?
  • The teacher can
  • find an enormous amount of material for use in
    class, for exercises
  • check on real usage and compare it to textbooks
    used
  • BUT
  • Must be aware that corpora sometimes prove the
    textbook wrong!

16
What can be learnt?
  • Corpora as reference material for
  • Lexical work
  • Syntactic study
  • Textual analysis
  • Observing language in action
  • Learning about a wide variety of areas

17
The student
  • Can be trained to search autonomously for
    information of all kinds
  • Finding texts that supply real knowledge
  • Finding texts that serve as models for style and
    register
  • Finding correct collocations of individual words

18
Do-it-yourself corpora
  • Suggestion
  • Train students to make and use their own corpora
    by
  • Collecting texts off the Internet
  • Using the Find function in Word
  • Broadening their vocabulary

19
Useful sites
  • Catherine N. Ball
  • Tutorial Concordances and Corpora
  • http//www.georgetown.edu/faculty/ballc/corpora/tu
    torial.html
  • Tim Johns Data-driven learning at
    http//web.bham.ac.uk/johnstf/

20
Useful sites
  • Concordance the whole Web at http//www.webcorp.o
    rg.uk/
  • And, of course, Google at
  • http//www.google.com
About PowerShow.com