Corpus analysis (2) - PowerPoint PPT Presentation

Loading...

PPT – Corpus analysis (2) PowerPoint presentation | free to download - id: 7eea8e-ZWUwZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Corpus analysis (2)

Description:

Title: Corpus Linguistics: the basics Author: Richard Xiao Last modified by: Richard Xiao Created Date: 12/28/2007 8:36:17 PM Document presentation format – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 43
Provided by: Richard2128
Learn more at: http://www.lancaster.ac.uk
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Corpus analysis (2)


1
Corpus analysis (2)
  • Corpus Linguistics
  • Richard Xiao
  • lancsxiaoz_at_googlemail.com

2
Outline of the session
  • Lecture
  • Keyword
  • Reference corpus
  • Key keyword
  • Practical
  • WST keyword
  • AntConc keyword
  • Wmatrix keyword / key concept
  • Extra keyword analysis with CQPweb

3
What is a keyword?
  • Keywords are those words whose frequency is
    exceptionally high (positive keywords) or low
    (negative keywords) in comparison with a
    reference corpus
  • Keywords usually refer to positive keywords
  • But negative keywords are equally interesting
    (see Xiao and McEnery 2005)
  • They appear at the very end of your listing, in a
    different colour in WordSmith
  • They are omitted automatically from a keywords
    database for key keyword analysis and a keyword
    plot

4
Why keyword analysis?
  • Indicating the aboutness (Scott 1999) of a
    particular text or corpus
  • Contents analysis, discourse analysis
  • Also revealing the salient features which are
    functionally related to a particular genre (Xiao
    and McEnery 2005)
  • Genre analysis, stylistic analysis

5
How to do keyword analysis
  • Make a wordlist of the target corpus
  • Locate or make a word list of a reference corpus
  • Scott (2005) In search of a bad reference
    corpus
  • http//www.methodsnetwork.ac.uk/redist/pdf/es1_05s
    cott.pdf
  • The reference corpus is usually larger than the
    target corpus
  • The appropriateness of a reference corpus depends
    on your research questions!
  • Compare the frequency of each item in the two
    wordlists to extract keywords done
    automatically
  • Analyse and interpret keywords you will do it!

6
Keywords in the party speeches
  • Target corpus just one text
  • David Cameron's speech at the Conservative
    conference (10 October 2012, Manchester)
  • http//www.bbc.co.uk/news/uk-politics-15189614
  • Local copy available (David_speech Unicode text)
    - download and unzip the file into a file folder
  • www.fass.lancs.ac.uk/projects/corpus/data/workshop
    3texts.zip
  • Reference corpus
  • The 100-million-word BNC download and unzip
    (local copy available)
  • www.lexically.net/downloads/version4/BNC_World.zip
  • Tool
  • WST Keyword

7
Wordlist of Davids speech
8
Creating keyword list
9
Keyword extraction in progress
Warning It can take time if you have loaded two
large wordlists
10
Keywords in Davids speech
What do these keywords tell us?
Negative keyword
11
Keyword Plot view
12
What companies do keywords keep?
13
Why marriage?
14
Key clusters
Similar to word clusters, but only keywords are
used.
15
Key keywords
  • A key keyword is one which is "key" in more than
    one of a number of related texts
  • The more texts it is "key" in, the more "key key"
    it is
  • Can avoid extracting keywords which are unusually
    frequent in only a small number of files
  • Can be created automatically and as simple to
    extract as you do for keywords
  • n.b. Negative keywords are omitted automatically
    from a key keyword list

16
Making a batch wordlist
Specify a folder where you can write
17
Batch making keyword lists
18
Batch making keyword lists
Specify a folder where you can write
19
Making a KW database
20
Key keywords
An "associate" is a keyword that appears in the
same text
key coverage of the corpus
21
Keyword in AntConc
target corpus
reference corpus
22
Keyword in AntConc
Key words in David's speech (in relation to Ed's
speech)
23
Wmatrix Keywords and key concepts
  • POS and semantic tagging
  • Keyword / key concept analysis in Camerons
    speech in comparison with Milibands speech
  • Copy and paste the speeches into two separate
    text files
  • http//www.bbc.co.uk/news/uk-politics-15189614
  • http//www.labour.org.uk/ed-milibands-speech-to-la
    bour-party-conference
  • Save the two texts as David_speech.txt and
    Ed_speech.txt
  • www.fass.lancs.ac.uk/projects/corpus/data/workshop
    3texts.zip

24
Wmatrix Keywords and key concepts
  • Login with your account using zhejiangxx account
  • http//ucrel.lancs.ac.uk/wmatrix3.html

25
Tagging Wizard
26
Tagging in progress
27
Tagging result
28
Labour frequency list
29
KWIC concordance
30
My folders
Upload and tag Eds speech and click on My
folders
Warning Your folder view may look different!
31
Open David_speech folder and select Ed_speech in
Keyword compared to dropdown box
32
Keyword list to download!
33
Keyword cloud even more interesting!
34
Davids key concepts(Key concepts compared to)
35
Keyword analysis in online corpora
  • Using Lancasters CQPweb to compare British
    English (LOBFLOB) and American English (Brown
    Frown)
  • Login CQPweb
  • http//cqpweb.lancs.ac.uk
  • Similar analysis can be done at BSFUs CQPweb
    corpus hub (different corpora)
  • http//124.193.83.252/cqp/
  • Account IDpasstest

36
Creating subcorpora
37
Creating subcorpus BrE
38
Creating subcorpus AmE
39
Making wordlists
40
Wordlist available now
41
Computing keywords
You can make adjustments to the statistical
measure, cut-off point, and minimum frequency
according your research purposes.
42
Keywords in BrE and AmE
About PowerShow.com