Investigating speech, thought and writing presentation in a corpus of spoken British English - PowerPoint PPT Presentation

About This Presentation
Title:

Investigating speech, thought and writing presentation in a corpus of spoken British English

Description:

Investigating speech, thought and writing presentation in a corpus of spoken British English An AHRB funded project under the supervision of Mick Short, Elena Semino ... – PowerPoint PPT presentation

Number of Views:302
Avg rating:3.0/5.0
Slides: 11
Provided by: Lingui3
Category:

less

Transcript and Presenter's Notes

Title: Investigating speech, thought and writing presentation in a corpus of spoken British English


1
Investigating speech, thought and writing
presentation in a corpus of spoken British English
  • An AHRB funded project under the supervision of
  • Mick Short, Elena Semino and Tony McEnery

Research Assistants John Heywood and Dan McIntyre
2
Project outline
  • To compare speech, thought and writing
    presentation in spoken and written English.
  • To build a new corpus of 260,000 words of spoken
    British English to compare with the STWP Written
    English Corpus (1995-99).
  • To investigate the presentation of speech,
    thought and writing in the STWP Spoken Corpus by
    tagging with the Leech and Short (1981) category
    set.
  • To further test and adapt the Leech and Short
    (1981) model of STP.
  • The project is funded until February 2003.

3
Construction of the corpus
  • 120 texts - approximately 260,000 words.
  • Texts rich in STWP taken from the British
    National Corpus (BNC) and the Centre for North
    West Regional Studies (CNWRS) oral history
    archives at Lancaster University.
  • CNWRS interview tapes digitised to be
    time-aligned with text.

4
Number and distribution of NWRS files in the
corpus
NWRS Archive    Family and Social Life Archive
Childhood and Schooling

Archive
Male Female
Male
Female
1890-1940 1940-1970 1890-1940
1940-1970
7 records 7 records 8 records
8 records 15 records
15 records
i.e. 60 files with an equal balance of male
and female speakers in each age-range
5
Number and distribution of BNC files in the corpus
BNC spoken data  
Spoken Demographic Spoken
Context- Governed
Male
Female
0-14 15-24 25-34 35-44 45-59
60 0-14 15-24 25-34
35-44 45-59 60
5 files 5 files 5 files 5 files
5 files 5 files 5 files 5 files
5 files 5 files 5 files 5 files
i.e. 60 files with an equal balance of male
and female speakers in each age-range
6
The development of the tag-set
Leech Short (1981)
NRA NRSA NRS/IS FIS NRS/DS FDS
NRTA NRT/IT FIT NRS/DT FDT
  • The STWP Written Project (1995)
  • 3 main genres Fiction, Biography
    Autobiography, and Newspaper Journalism each
    divided into Serious/Popular sections.

N NV NRSA-P NRS/IS FIS NRS/DS FDS
N NI NRTA-P NRT/IT FIT NRT/DT FDT
N NW NRWA-P NRWS/IW FIW NRW/DW FDW
embedded, hypothetical, inferred, quote
7
The development of the tag-set new tags
The STWP Spoken Project (2001) BNC spoken
demographic data and NWRS oral history interviews
RM
A RV RSA-P RS/IS FIS RS/DS FDS
A RI RTA-P RT/IT FIT RT/DT FDT
A RN RWA-P RW/IW FIW RW/DW FDW
embedded, negative / absence, hypothetical,
inferred, quote, reiterated, interrogative,
imperative, uncompleted, 2 / 3 / 4
8
A 15-field tag-set 5 main categories
FIELD CHARACTER VALUE
1 x, A, F, Anything! Free
2 x, , R, I, D Representation, Indirect, Direct
3 x, S, T, W, V, I, N, M Speech, Thought, Writing, Voice, Internal state, WritiNg, Mention
4 x, A Act
5 x, P toPic
9
A 15-field tag-set 10 category attributes
FIELD CHARACTER VALUE
6 x, , 1, 2, 3, 4 odd interesting borderline cases, no.s repeated (-ing or ed) adjacent categories
7 xe embedded
8 xxg/a negative action etc e.g. 'we weren't allowed to go', absence eg 'I didn't say anything'
9 xxxh hypothetical
10 xxxxi inferred
11 xxxxxq quote
12 xxxxxxr iterative
13 xxxxxxxv/p interrogative, imperative
14 xxxxxxxxu uncompleted
15 xexxxxxxx2 level of embedding (2, 3, 4)
10
Issues arising
  • Technical issues
  • Legibility.
  • Comparability between NWRS and BNC data.
  • Tagging issues
  • Comparability between written and spoken corpora.
  • What counts as STWP?
  • Functional and formal criteria.
  • Embedding.
  • Repetition (e.g. he said he said well he said).
  • Report of mention.
  • Reading, hearing, listening and singing dogs!
Write a Comment
User Comments (0)
About PowerShow.com