Using A Corpus of Spontaneous Speech - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Using A Corpus of Spontaneous Speech

Description:

Some basic questions. What is a corpus? What are corpora for? ... phonological repertoire. 2 trials per role. experience. Counterbalancing v - eye contact' ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 27
Provided by: ellen95
Category:

less

Transcript and Presenter's Notes

Title: Using A Corpus of Spontaneous Speech


1
Using A Corpus of Spontaneous Speech
2
Structure of the Class
  • Introduction
  • Some basic questions
  • What is a corpus?
  • What are corpora for?
  • What are corpora especially good for?
  • What is coding?
  • An unusual corpus a designed corpus of
    spontaneous task-oriented speech
  • Some exercises using a coded corpus to learn
    about spontaneous speech
  • Do people speak in whole sentences?
  • Do people take turns speaking
  • What are the units of dialogue?
  • When do people look at one another?
  • When are people disfluent?

3
1.a. Some basic questions
  • What is a corpus?
  • A corpus is a collection of language materials
    produced by real people engaged in real
    activities, varying in
  • Genre
  • Single genre (Canadian Hansard, Wall Street
    Journal )
  • Sampling across genres (British National Corpus)
  • Modality of communication
  • Written (Novels of Jane Austen)
  • Spoken (Switchboard)
  • Purpose of communication
  • Chat (Switchboard, Phone Home)
  • Task Completion (MIT restaurant and hotel finder)

4
1.a. Some basic questions
  • What are corpora for?
  • Discovering how language is used norming
  • Lexicography Words and contexts for dictionaries
  • Pedagogy getting students used to the language
    as they will hear or read it
  • Socio-linguistics and historical linguistics
  • Neurolinguistics
  • Speech and language technologies to train
    computational systems for
  • Automatic summarization or indexing of text
  • Automatic comprehension of text
  • Automatic recognition of speech
  • Automatic production of speech
  • Human-computer dialogue
  • Conducting large-scale experiments comparison of
    groups or contexts

5
1.a. Some basic questions
  • What are corpora especially good for?
  • Studying spontaneous production
  • Studying big samples
  • Studying very rare phenomena
  • Allowing use of powerful statistics instead of
    weak observations
  • Sharing data and saving time

6
1.a. Some basic questions
  • What is coding?
  • Labelling parts of a corpus in a well-formed
    system of analysis
  • Examples
  • Linguistic analyses
  • Part of Speech (Noun, Verb, Adjective, etc)
  • Segment of text (paragraph, document, title,
    quotation)
  • Source (Speaker)
  • Conditions of production (telephone,
    face-to-face)
  • Simultaneous factors noise, direction of speaker
    gaze, gesture

7
1.a. Some basic questions
  • What is coding? contd.
  • N.B. Coding is a stringent test of linguistic
    theories for
  • consistency,
  • coverage,
  • interpretability
  • Coding tools software designed to
  • help linguists and psychologists code corpora
  • allow only legitimate choice of codes
  • Raw coding can be horrible to read,
  • so coding now used as instruction to display
    linguistic material in a particular way.

8
1.b. An unusual corpus a designed corpus of
spontaneous task-oriented speech
  • Name The HCRC Map Task Corpus
  • Authors Anne Anderson, Matthew Aylett, Ellen
    Gurman Bard, Matthew Bull, Jean Carletta,
    Gwyneth Doherty-Sneddon, Simon Garrod, Amy Isard,
    Stephen Isard, Jacqueline Kowtko, Robin Lickley,
    David McKelvie, Jim Miller, Alison Newlands,
    Cathy Sotillo, Paul Taylor, Henry Thompson
  • Type unscripted dialogue
  • Task route communication
  • One speaker has a route pre-printed on a map,
  • The other has a similar map without a route
  • Neither can see the others map
  • Either can say anything
  • No hand gestures allowed
  • Both know that maps differ but not how or where
  • Size 128 dialogues
  • Speakers 64 undergraduates at Glasgow University
    in Scotland
  • Design orderly division of examples among
    conditions which may be compared

9
CORPUS DESIGN, contd.
10
CORPUS DESIGN, contd.
11
CORPUS DESIGN, contd.
12
CORPUS DESIGN, contd.
13
CORPUS DESIGN, contd.
14
CORPUS DESIGN, contd.
15
CORPUS DESIGN, contd.
  • /t/-deletion east lake
  • glottalization slate mountain
  • /d/-deletion gold mine
  • nasal assimilationgreen bay

16
2. Some exercises using a coded corpus to learn
about spontaneous speech
  • This is a list of exercises. You do not have to
    finish them all today. The website for the
    corpus can be accessed from any web browser, so
    you can examine it at any time.
  • Getting started
  • Do people speak in whole sentences?
  • Do people take turns when they speak?
  • What are the units of dialogue?
  • When do people look at one another?
  • When are people disfluent?

17
2. Some exercises using a coded corpus to learn
about spontaneous speech
  • Getting started
  • Go to the introduction page for the corpus
  • http//www.hcrc.ed.ac.uk/amyi/maptask/demo_instru
    ctions.html
  • Read it. You will need to know how to select a
    particular one of the 128 dialogues to look at
    more closely.
  • Note that the speakers are Scottish.
  • You may find some of their words or expressions
    strange if you are used to English or American
    versions of English.
  • Use the maps to help you decipher the landmark
    names

18
2. Some exercises using a coded corpus to learn
about spontaneous speech
  • Introduction - getting started, contd.
  • Follow the link to the demo page
  • http//www.hcrc.ed.ac.uk/amyi/maptask/demo.html
  • This page will allow you to look at
    transcriptions of dialogues and learn something
    about them.
  • Choose a dialogue to look at. The best ones to
    use are in quad 3, I.e., q3ec1 or q3nc4. You can
    choose either an e dialogue (where speakers can
    see each other) or an n dialogue (where they
    cant). You can choose any dialogue from 1-8.
  • Enter the code number of a dialogue in the
    Dialogue Name box.
  • Hit the Process button to make the dialogue
    appear

19
2. Some exercises using a coded corpus to learn
about spontaneous speech
  • Introduction - getting started, contd.
  • The dialogue is divided into turns, stretches of
    speech by an individual speaker.
  • At the top of the page are buttons labelled giver
    map and follower map. Press each of these to see
    the maps that the participants used. Can you
    spot the differences?

20
2. Some exercises using a coded corpus to learn
about spontaneous speech
  • Do people speak in whole sentences?
  • Your first exercise is to examine the contents of
    the turns on the first page of your dialogue and
    answer the question above.
  • If you are not sure that you have enough
    information to answer, what should you do?

21
2. Some exercises using a coded corpus to learn
about spontaneous speech
  • Do people take turns when they speak?
  • Use the back button to return to the demo page.
    If that doesnt work, pull down the GO menu and
    select the demo page
  • Keep the same dialogue but now click on overlap
    under Display Highlight. Remember to click on
    Process.
  • What you will see next are the same turns but
    with an indication of when people were speaking
    at the same time. Did your speakers often speak
    at the same time? Scroll through the dialogue to
    see.
  • Take another dialogue and check the overlaps
    again. Do the speakers differ from the ones you
    first looked a? (Hint if you keep the same quad
    number, for example q3, and dialogue number, for
    example 4, but just change e to n, you will find
    a different pair of speakers dealing with the
    same map.)

22
2. Some exercises using a coded corpus to learn
about spontaneous speech
  • What are the units of dialogue?
  • Language is often organized in hierachies
    Sentences are composed of phrases, and phrases
    are composed of words.
  • For larger units of text, we have chapters,
    sections or paragraphs. What are the units of
    dialogue? The Map Task coding posits 3 levels.
    We will discuss 2 of them here.
  • To find out what they are, start at the
    introduction page
  • http//www.hcrc.ed.ac.uk/amyi/maptask/demo_instru
    ctions.html

23
2. Some exercises using a coded corpus to learn
about spontaneous speech
  • What are the units of dialogue? Contd
  • What is a Dialogue Move?
  • Follow the link to Dialogue Moves to answer
  • Hint Assume that conversation is a game in
    which one kind of move must be followed by
    another, a question by an answer, for example.
    If we classify utterances by their functions in
    terms of getting and giving information, which
    kinds of utterances demand to be followed by
    other particular kinds?
  • Look at the list of Initiating Moves. Do you
    agree that they need to be followed by certain
    kinds of responses?
  • Look at the list of Response Moves. Do any of
    these seem to you to be good replies to any
    Initiating Moves?

24
2. Some exercises using a coded corpus to learn
about spontaneous speech
  • What are the units of dialogue? Contd
  • What is a Dialogue Game?
  • Follow the link to Dialogue Games to answer
  • Now go back to the Demo page
  • http//www.hcrc.ed.ac.uk/amyi/maptask/demo.html
  • to examine a dialogue coded for Games and Moves.
  • Choose a dialogue by typing its number into the
    Dialogue Name box.
  • Ask for Dialogue Games as the Display option and
    overlap as the Highlight option..
  • Does the grouping of moves into games make sense
    to you?
  • Look at the overlaps. When do your speakers most
    often overlap their speech Early or late in
    Games?

25
2. Some exercises using a coded corpus to learn
about spontaneous speech
  • When do people look at one another?
  • Visual communication is important. But when does
    it occur?
  • Go back to the Demo page
  • http//www.hcrc.ed.ac.uk/amyi/maptask/demo.html
  • to examine a dialogue coded for Games and Moves.
    You must choose a dialogue coded for the
    direction of the speakers gaze Try q3. And
    lets start with one where they can actually see
    each other, q3e4, for example.
  • Choose Dialogue Games as the Display option and
    gaze as the Highlight option..
  • Look for the color code at the top of the page.
  • Do your speakers look at each other
    simultaneously? When do they most often overlap
    their speech Early or late in Games?
  • To see whether the speakers were just looking up
    at random, or whether they expected to see
    something, examine the dialogue using the same
    map, but with a barrier between speakers e.g.,
    q3n4 to compare with q3ec4.

26
2. Some exercises using a coded corpus to learn
about spontaneous speech
  • When are people disfluent?
  • Disfluency is an error in speaking which the
    speaker corrects. How disfluent are fluent
    speakers of a language?
  • Go to the introduction page
  • http//www.hcrc.ed.ac.uk/amyi/maptask/demo_instru
    ctions.html
  • Click on Disfluency to learn about how
    disfluencies are classified.
  • Go back to the Demo page to choose a dialogue.
  • http//www.hcrc.ed.ac.uk/amyi/maptask/demo.html
  • Ask for Dialogue Games as the Display option and
    disfluency as the Highlight option..
  • Look at the disfluency codngs. When do your
    speakers most often become disfluent? Early or
    late in Games?
Write a Comment
User Comments (0)
About PowerShow.com