Compiling a Monolingual Dictionary for Native Speakers - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Compiling a Monolingual Dictionary for Native Speakers

Description:

... tiny, poisonous, black widow, camel, redback, trapdoor, wolf, whitetail, crab. ... Many species of spiders spin webs, with threads of strong silk. ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 35
Provided by: james877
Category:

less

Transcript and Presenter's Notes

Title: Compiling a Monolingual Dictionary for Native Speakers


1
Compiling a Monolingual Dictionary for Native
Speakers
  • Patrick Hanks
  • Formerly chief editor, Current English
    Dictionaries, Oxford University Press
  • Editor, Collins English Dictionary managing
    editor, Cobuild (1st edition).
  • Ljubljana
  • February 6, 2009

2
Talk Outline
  • L1 dictionaries and their users
  • Words and their histories
  • Research getting the words in
  • Macrostructure the lexical item
  • Words, multiword expressions, idioms, affixes
  • Abbreviations? Names?
  • Microstructure
  • Lemma, pronunciation, meaning, use, ...
  • The future of L1 dictionaries
  • Print? CD-Rom? On-line? Hypertext links?

3
Typology of L1 English dictionaries
  • British
  • Historical principles Oxford English Dictionary
    multivolume
  • Synchronic principles Collins, Chambers, (N)ODE
    each is 1 volume
  • American
  • Historical principles Merriam Websters
    Unabridged multivolume, Merriam Websters
    Collegiate 1 vol.,
  • Synchronic principles American Heritage
  • Australian Macquarie (synchronic principles)

4
Whats the difference between historical
principles and synchronic principles?
  • Historical principles place the earliest meaning
    of a word first
  • camera, noun Latin camera vaulted room 1686.
    1. a small room. 2. the treasury of the papal
    curia. 3. a darkened box or room with a screen in
    it, onto which an image is projected (camera
    obscura).... 4. an apparatus for taking
    photographs or making films.
  • Synchronic principles place the current meaning
    first.
  • camera, noun. an apparatus for taking photographs
    or making films. from Latin camera small room
  • camera obscura, noun. a darkened box or room with
    a screen in it, onto which an image is projected.
    ... Latin dark room

5
The instability of word meaning
  • The synchronic/historical distinction affects
    many words.
  • field enclosed land. Old English feld open
    country
  • gay homosexual. meant cheerful until about
    1965
  • intercourse sex act. meant conversation until
    C20
  • kind considerate and friendly. Old English
    noble, well-bred
  • magazine 1. periodical publication. 2. holder
    for cartridges on a gun or revolver. Arabic
    storehouse
  • sock. Latin soccus light shoe worn by a comic
    actor
  • size dimension, magnitude. from assizes
    session of a local law court a size loaf was a
    loaf of court-approved dimensions
  • Todays exploitation may become tomorrows norm.

6
Word histories
  • Modern British and American dictionaries even
    dictionaries on synchronic principles have a
    C19 model of word history
  • They tell the semantic development how each
    word developed its modern meaning(s) including
    changes that took place in the LI as well as
    the morphological development of etymons since IE
  • Also, discuss cognates (not just false friends),
    semantic equivalents, and the origins of idioms
  • English magazine. French magasin
  • English crane, French grue, Czech jeráb
  • kick the bucket

7
Getting the words in (1)
  • Building on existing dictionaries
  • Lexicography is accretive
  • Danger of mindlessly copying errors and
    out-of-date information
  • The Oxford reading program
  • 150 years of research to find millions of
    citations
  • But not a balanced corpus
  • Directed reading research specialist areas
  • Searching corpus data
  • low yield for new words
  • high yield for phraseology, collocation, usage
  • Trawling the internet. Problems
  • sorting the new words from the rubbish
  • many new words are in fact multiword
    expressions
  • They are hard to find by web crawling programs

8
Getting the words in (2)
  • Building on existing dictionaries
  • Lexicography is accretive
  • Danger of mindlessly copying error s and out-of
    date information
  • How to keep the lexicographers awake?
  • The Oxford reading program huge expense
  • Directed reading research specialist areas
  • Searching corpus data low yield
  • Trawling the internet. Problems
  • sorting the new words from the rubbish
  • many new words are in fact multiword expressions

9
Why do people want a dictionary of their native
language?
  • There are no good recent studies of L1 dictionary
    use in English
  • Academic studies of dictionary use are mostly of
    bilingual and foreign learners dictionaries
  • e.g. Atkins and Varantola (1997) studied
    dictionary use in translation tasks and language
    learning, but not native speaker use
  • L1 and L2 dictionaries are quite different
  • Foreign learners want to know what every native
    speaker knows already
  • Native speakers have a much broader spectrum of
    needsperipheral, not central usage

10
Informal feedback from marketing departments (1)
  • People use an L1 dictionary mainly
  • for correct spelling (English is problematic)
  • In Slovenian, maybe for correct morphology?
  • for guidance on correct usage and word choice,
    e.g.
  • uninterested vs. disinterested, refute vs.
    deny bored with or bored of
  • Is it wrong to split an infinitive (e.g. to
    boldly go) ?
  • for instant cultural reference information, e.g.
  • Whats the scientific name for a thrush?
  • Is your scapula your collarbone or your shoulder
    blade?
  • Whats the capital of Chile?
  • for browsing, e.g. Why is a madrigal called a
    madrigal?

11
Informal feedback from marketing departments (2)
  • An L1 dictionary is also used
  • as a source of information about rare words and
    senses
  • What does nook-shotten in Shakespeare mean?
    What is a predator, and can you use it to
    describe a person? Is a penguin a predator? What
    are chinos? What is an ohm? What is a joule, and
    why is it so called?
  • for word games (e.g Scrabble) Is aa an English
    word?
  • People want to have an authoritative inventory of
    their language, even if (in practice) they never
    look at it
  • They also want fun words e.g. cutpurse,
    mosstrooper, yegg, snakehead, tsotsi, rudeboy,
    grifter (various criminals)
  • And new words which provide journalistic copy

12
The role of corpus data
  • Corpora show how each word is used
  • providing an essential source of information for
    collocations and syntagmatics (studied
    statistically)
  • a framework, a solid empirical foundation for a
    dictionary
  • but dont stop there!
  • Other kinds of information must be slotted into
    this framework, e.g.
  • Etymologies and word histories
  • Guidance on correct usage
  • Scientific and technical definitions
  • Consistency of sets (e.g. all the terminology
    of cricket)
  • A corpus cannot be the only source for lexical
    data
  • Lexicographers reading newspapers, watching TV,
    note how things are said (the words used), not
    what is said (content of the message)

13
The dictionary as inventory
  • An L1 dictionary should contain all the words in
    the language
  • but is this possible? The lexicon is constantly
    growing
  • and all the meanings of each word
  • but word meaning is imprecise and fluid, not
    fixed
  • guidance on how each word is used (syntagmatics)
  • By examples of usage, rather than by abstract
    formulations in the technical language of
    linguistics
  • Dictionaries are for people, not for linguists!

14
Researching lexical items collecting evidence
  • Building on existing dictionaries
  • Lexicography is accretive
  • Danger of mindlessly copying error s and out-of
    date information
  • How to keep the lexicographers awake?
  • The Oxford reading program huge expense
  • Directed reading research specialist areas
  • Searching corpus data low yield for new words
  • Trawling the internet. Problems
  • sorting the new words from the rubbish
  • Many so-called new words are in fact multiword
    expressions

15
Some 2006 new words from Macmillan English
Dictionary
  • blogosphere, noun. the imaginary place on the
    Internet where peoples blogs go so that other
    people can read them and react to them software
    that tracks mood swings across the blogosphere
    and pinpoints the events behind them...
  • chav, noun. someone, especially a working-class
    person who is not well educated, dresses in
    designer clothes and wears a lot of gold
    jewellery but whose appearance shows bad taste.
  • air kiss, career gapper, Chelsea tractor, chick
    lit, civil partnership, designer baby, green
    audit, hissy fit, intelligent design

16
Even Homer nods (especially when copying)
  • dord, n. density.
  • actually copied from another dictionary
  • D. or d. density.
  • Example from an American dictionary of the 1960s,
    cited by David Crystal
  • intercourse, noun. 1. communication or dealings
    between individuals or groups everyday social
    intercourse. 2. short for SEXUAL INTERCOURSE.
  • NODE (1998, ODE 2005)
  • Sense 2 is the usual sense of the modern word it
    should be the main definition, not a mere
    cross-reference.

17
Terminology of special fields
  • Science, technology, sports, pastimes, slang
  • How far should an L1 dictionary go in covering
    these?
  • strobila, strobilus, strobilation, googly, chav
  • chav is a coelacanth among slang words very
    ancient, but only recently discovered. The
    etymology is Romany
  • Native speakers who do not know these words
    rightly expect to find them in a dictionary.
  • But a dictionary is not a term bank.

18
L1 dictionary macrostructure
  • The lexical item
  • words
  • multiword expressions
  • idioms and phrasal verbs
  • where to put them? E.g. bite the dust at dust
    or bite?
  • prefixes and suffixes combining forms
  • e.g. un-, -ation, -oholic, brachy-, -algia
  • abbreviations?
  • names?

19
Microstructure
  • Lemma (inflected forms)
  • Pronunciation
  • Wordclass and subcategorization
  • Selectional preferences and phraseology
  • Syntax and syntagmatics
  • definitions
  • Guidance on correct usage
  • Etymology and word histories

20
The lemma
  • strong, stronger, strongest
  • strongly
  • strength
  • strengthen
  • emblazon (but emblazoned is 100 times commoner)
  • frightened, frightening (forms of the verb, or
    adjectives in their own right?)

21
Pronunciation
  • Should a printed LI dictionary text give guidance
    on pronunciation at all?
  • More useful in English than in Slovenian?
  • Use the International Phonetic Alphabet or some
    sort of spelling-rewrite system?
  • Why give pronunciations only for headwords? Why
    not also for inflections?
  • An electronic product can be multimedia, so
    hypertext links to a spoken representation seems
    an obvious answer
  • But in which dialect?

22
Dictionary definitions
  • What is a word meaning? Does it exist?
  • A text is a unique deployment of meaningful
    units, and its particular meaning is not
    adequately accounted for by any organized
    concatenation of the fixed meanings of each unit.
    This is because some aspects of textual meaning
    arise from the particular combination of choices
    ... J. Sinclair 2004
  • Not least because the meaning of each unit is not
    fixed!
  • Dictionaries cant account for everything in the
    meaning of a text. But they can account for some
    things. (An elephant is not a toothpick.)

23
Writing definitions of technical terms
  • Stipulations by scientific committees and other
    classifying systems
  • Stipulations, not natural language!
  • Need both
  • Examples second, spider
  • Interface between the lexicographer and the
    scientist (the user of the term)

24
technical definitions (1)
  • second, noun. a sixtieth of a minute of time,
    which as the SI unit of time is defined in terms
    of the natural periodicity of the radiation of a
    caesium-133 atom.
  • informal a very short time his eyes met
    Charlottes for a second.
  • (N)ODE

25
technical definitions (2)
  • spider an eight-legged predatory arachnid with
    an unsegmented body consisting of a fused head
    and thorax and a rounded abdomen. Spiders have
    fangs which inject poison into their prey, and
    most kinds spin webs in which to capture insects.
  • Order Araneae, class Arachnida.
  • (N)ODE

26
Word sketch for spider
  • object_of 1341.5 catch 9153.93 watch 6 eat 43.43
    find 8290.89 put 4 see 9 get 8 come 50.33
  • subject_of 1373.0 scuttle 37.86 crawl 46.76 spin
    46.02 climb 105.83 bite 35.1 feed 33.65 wait
    32.61 live 4202.22 run 6 go 6 come 4
  • a_modifier 2111.8 trap-door 49.24 bird-eating
    38.84 tarantula 38.82 jumping 48.64 sedentary
    38.11 poisonous 47.61 giant 127.44 hairy 37.2
    gigantic 37.19 tiny 8575.4 black 18 huge 3
    white 6 great 9 large 6 little 3 small 4 female
    44.52
  • n_modifier 1321.3 Insy 142811.4 Winsy 14 bola
    49.9 orb 48.66 raft 98.56 fen 57.51 crab 47.05
    widow 116.83 wolf 46.72 hunting 45.65 forest
    43.23 sea 32.76 house 50.73
  • modifies 158 0.7 mite 179.65 catcher 38.29
    monkey 158.18 web 128.03 venom 47.78 crab 57.33
    rider 106.71 climb 46.39 silk 55.52 leg 42.64
    affair 32.28 plant 3151.89 woman 4 family 3
    system 5
  • and/or 2191.8 scorpion 119.62 cockroach 37.94
    beetle 8257.85 insect 12 fly 5 caterpillar 57.79
    octopus 37.66 boar 57.41 crab 57.25 wolf 67.21
    web 77.19 mite 37.03 spider 6126.89 snake 6 bug
    36.46 bird 53.55

27
Corpus-based profile for spider
  • Many thousands of species of spiders are known
    (funnel-web, web-building, orb-weaving,
    bird-eating, ground-dwelling, giant, huge, large,
    tiny, poisonous, black widow, camel, redback,
    trapdoor, wolf, whitetail, crab. tarantula,
    etc.).
  • Some species of spiders hunt prey.
  • Spiders bite.
  • Some species of spiders are poisonous.
  • Many species of spiders spin webs, with threads
    of strong silk.
  • Spiders lurk in the centre of their webs.
  • Spiders control what is going on in their webs.
  • Spiders have eight legs.
  • Their legs are thin, hairy, and long in
    proportion to body size.
  • Spiders have eight eyes.
  • Spiders spend a lot of time being motionless.
  • Spiders movement is sudden.
  • Spiders crawl.
  • Spiders scuttle.
  • Spiders are swift and agile.
  • Spiders can run up walls.
  • Many people have a dread of (hate) spiders.
  • People kill spiders.
  • English people are much concerned with trying to
    get spiders out of the bath.

28
The virtues of brevity
  • Avoid verbosity!
  • Even if in the dictionary of the future space is
    unlimited, dictionary entries should be brief,
    concise, and to the point.
  • Lumping and splitting
  • Ockhams razor
  • Menu-driven hierarchies of information

29
Lexical syntagmatics
  • Convention
  • A dictionary can show the relations between
    typical, normal phraseology and typical, normal
    meaning, e.g.
  • frighten, verb.
  • Something frightens a person or animal cause to
    feel fear
  • .. frighten someone off /away
  • .. frighten someone into doing something
  • .. frighten the children upstairs into bed
  • .. frighten someone out of their skin/wits
  • .. frighten the life (living daylight) out of
    someone

30
Selecting examples of usage
  • No invented examples!
  • Intuitions and usage are inverse variables.
  • Plenty of corpus evidence to choose from.
  • Beware of distortion through shortening
  • Choose natural, normal examples, not boundary
    cases.

31
The need to get on with it!
  • The lexicon of a language is large. Dictionary
    compilation is a huge task.
  • The editor must make policy decisions and
    everyone must stick to them
  • There is no time for agonizing.
  • Anyway, agonizing is often counterproductive.
  • When compiling, compilers should do their honest
    best.
  • A system must be set up for spotting obvious
    errors and accidental infelicities of wording
  • Lexicographers read and check each others work

32
The future of L1 dictionaries
  • The medium
  • Print? CD-Rom? On-line?
  • On-line dictionaries of the future will be
    locations that summarize and interface
  • Menu-driven information hierarchies
  • The message
  • Hypertext links to pre-processed corpus evidence,
    a grammar, an encyclopedia, other reference
    sources, other data of all kinds the dictionary
    will a) summarize b) typify, and b) interface
  • Corpus-based syntagmatics (dogs bark, wolves
    howl lions roar, cats miaow)
  • Multimedia (sound, photos, film clips. Smell?
    taste? touch?)
  • Links to scientific taxonomies, e.g. Linnaean
    classification of flora and fauna

33
Conclusions (1)
  • L1 lexicographers are not linguists
  • A self-indulgent belief
  • Linguistics is fatal for good lexicography
  • Lexicographers should know a bit about
    linguistics, but they need to know about a lot of
    other things too
  • Lexicography is a team game. Renaissance man
    is dead (as far as dictionary writing is
    concerned)
  • What are they, then?
  • Inventory clerks? Public servants? Cultural,
    social, and literary historians? Creative
    writers? Hack journalists?
  • All of these and more.
  • A lexicographer is a lexicographer!

34
Conclusions (2)
  • Evidence
  • Corpus shows word usage, both regular and
    irregular
  • Other research is needed for terminology, names,
    word histories, and attitudes to the
    correctness of controversial expressions
  • Interpretation
  • Definitions should explain, not merely define
  • Authoritative pronouncements must be based on
    evidence, not merely opinion
  • But public attitudes to correctness need to be
    reported objectively as well as evaluated
  • Explain all normal, central uses and meanings
  • Dont try to cover all possibilities!
  • If you do, the language will defeat you, for word
    meaning and use is infinitely flexible
Write a Comment
User Comments (0)
About PowerShow.com