Language modeling for speaker recognition - PowerPoint PPT Presentation

About This Presentation
Title:

Language modeling for speaker recognition

Description:

Title: Language modeling for speaker recognition Author: dan gillick Last modified by: dan gillick Created Date: 1/15/2004 9:01:44 PM Document presentation format – PowerPoint PPT presentation

Number of Views:213
Avg rating:3.0/5.0
Slides: 28
Provided by: dangi3
Category:

less

Transcript and Presenter's Notes

Title: Language modeling for speaker recognition


1
Language modeling for speaker recognition
Dan Gillick January 20, 2004

2
Outline
  • Author identification
  • Trying to beat Doddingtons idiolect modeling
    strategy (speaker recognition)
  • My next project

3
Author ID (undergrad. thesis)
  • Problem
  • train models for each of k authors
  • given some test text written by 1 of those
    authors, identify the correct author
  • Variations
  • different kinds of models
  • different size test samples
  • different k

4
Character n-gram models
  • What?
  • 27 tokens a-z, ltspacegt
  • some text generated from such a trigram model
  • you orthad gool of anythilly
  • uncand or prafecaustiont and to hing that put
    ably

5
Character n-gram models
  • Why?
  • very simple
  • data sparseness less troublesome than with word
    n-grams
  • supposed to be state-of-the-art or at least close
    to it
  • (Khmelev, D, Tweedie, F.J. Using Markov Chains
    for the Identification of Writers Literary and
    Linguistic Computing, 16(4) 299-307. 2001.)

6
Character n-grams Setup
  • task pick correct author from 10 possible
    authors
  • training data 3 novels for each author
  • test data text from a held-out novel
  • jack-knifing 4 novels for each of 20 authors

7
Character n-grams Results
  • task picking 1 author from 10 possible authors
  • training data size 3 novels

8
Character n-gram models
  • Why does it work?
  • captures some word choice information
  • picks up word endings (ing, -tion, -ly, etc.)
  • not hurt much by data sparseness issues

9
Key-list models
  • Incentive
  • ought to be able to beat character n-grams
  • develop a new modeling method more focused on
    that which differentiates between authors
    (characters and words are both useful for topic
    recognition, but that doesnt mean they are best
    for author recognition)

10
Key-list models
  • Idea
  • convert the text stream into a stream of only
    authorship-relevant symbols (I called these lists
    of symbols key-lists)
  • each symbol is a regular expression to allow for
    broad definitions (/tion/ captures any
    nounification)
  • text not accounted for by the key-list is
    represented by ltshortgt, ltmedgt, or ltlonggt markers
  • build n-gram models from these new streams

11
Key-list models
Sample key-list
Regular Expression Description
(\w)(,)(\s) comma
(\w)(\.)(\s) period
(\b)(offortoaroundafter )(\b) common prepositions
(\b)(waswere \wed(\b) passive voice
(\b)(iswaswillareweream)(\b) is conjugations
(\b)(\wing)(\b) ends in ing
(\b)(\wly)(\b) adverb
(\b)(andbutornotifthenelse)(\b) logical
(\b)(as)(\b) as
(\b)(wouldshouldcould)(\b) modal verbs
  • sample trigram ltcommagt ltshortgt ltperiodgt

12
Key-list models Results
  • task picking 1 author from 10 possible authors
  • training data size 3 novels

13
Key-list models Results
  • Some other interesting results
  • key-lists with just punctuation (as well as
    ltshortgt, ltmedgt, ltlonggt) performed almost as well
    as the best key-lists
  • all key-lists were outperformed by the best
    n-letter model when test data size lt 10,000
    chars. but all key-list models eventually
    surpassed the n-letter models

14
Key-list models
  • Things I didnt do
  • vary amount of training data
  • spend a long time trying different key-lists
  • combine key-list results with each other or with
    the character results
  • a lot of other stuff
  • The thesis is available on the web
    http//www.dgillick.com/resource/thesis.pdf

15
Outline
  • Author identification
  • Trying to beat Doddingtons idiolect modeling
    strategy (speaker recognition)
  • My next project

16
G. Doddingtons LM strategy
  • create LMs with a limited vocabulary of the most
    commonly occurring 2000 bigrams
  • to smooth out zeroes, boost each bigram prob. by
    0.001
  • score by calculating
  • logprob(testtarget) logprob(testbkg)
  • logprobs are joint probabilities
  • logprob(AB) logprob(A) logprob(BA)

17
G. Doddingtons LM Setup
  • Switchboard 1 data
  • collected in early 90s from all over the US
  • 2,400 (5 min.) conversations among 543 speakers
  • corpus divided into 6 splits and tested using
    jack-knifing through the splits
  • manual transcripts provided by MS. State
  • Task
  • 8 conversation sides used as training data to
    build models for each target speaker
  • 1 conversation side used as test data
  • background model built from 3 splits of held-out
    data
  • jack-knifing allowed for almost 10,000 trials

18
G. Doddingtons LM Results
  • Notes
  • these results are my own attempt to replicate the
    original experiments
  • SRI reported EER 8.65 for this same experiment

19
Adapted bigram models
  • Incentive
  • adapting target models from a much larger
    background model should yield better estimates of
    probabilities in the language models
  • Specifically
  • use same 2000 bigram vocabulary
  • target probabilities are a mixture of training
    probabilities and background probabilities
  • mixture weight is 21 target databkg. data

20
Adapted bigram models Results
  • Notes
  • nearly identical performance
  • combination of the 2 systems yields almost no
    improvement
  • why isnt the adapted version better?

21
Can anything improve on 8.68?
  • Trigrams?
  • use same count threshold to make a list of the
    top 700 trigrams (a lot of, I dont know were
    among the most common)
  • Character models?
  • worked well for authorship
  • included all character combinations (no limited
    vocabulary)
  • tried bigram and trigram models

22
Scores and combinations
adapt. word bigrams EER 8.89
adapt. word trigrams EER 11.88
adapt.char. bigrams EER 13.73
adapt. char. trigrams EER 17.92
adapted words EER 8.46
adapted characters EER 13.24
adapted words adapted characters EER 7.89
GD bigrams EER 8.68
23
Final Comparison
24
What about less training data?
  • 1 conversation-side training
  • character models might provide more of an
    advantage with less data?
  • not so.
  • GD EER 22.5
  • adapted character EER 30
  • adapted word EER 20
  • maybe these character models pick up on the topic
    of that 1 conversation
  • havent tried any other size training data

25
Outline
  • Author identification
  • Trying to beat GDs result
  • My next project

26
Key-lists for speaker recognition
  • key-list n-grams picked up on phrasing (comma and
    period were valuable tokens)
  • automatic transcripts dont have punctuation but
    they do have pause and duration information
  • use reg. exps. and duration info. to capture
    idiosynchratic speaker phrasing
  • capture other speech information in key-lists?
    (energy, f0, etc.)

27
Acknowledgements
  • Thanks to
  • Anand and Luciana at SRI for trying to help me
    replicate their results
  • Barbara for providing advice
  • Barry and Kofi for helping with computers and
    stuff
  • George
Write a Comment
User Comments (0)
About PowerShow.com