Audio Features - PowerPoint PPT Presentation

About This Presentation
Title:

Audio Features

Description:

Learning Without Hidden State Learning is simple if we know the correct path for each sequence in our training set ... Vector Quantization Distance ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 70
Provided by: bak109
Category:

less

Transcript and Presenter's Notes

Title: Audio Features


1
Audio Features Machine Learning
  • E.M. Bakker

2
Features for Speech Recognition and Audio Indexing
  • Parametric Representations
  • Short Time Energy
  • Zero Crossing Rates
  • Level Crossing Rates
  • Short Time Spectral Envelope
  • Spectral Analysis
  • Filter Design
  • Filter Bank Spectral Analysis Model
  • Linear Predictive Coding (LPC)

3
Methods
  • Vector Quantization
  • Finite code book of spectral shapes
  • The code book codes for typical spectral shape
  • Method for all spectral representations (e.g.
    Filter Banks, LPC, ZCR, etc. )
  • Ensemble Interval Histogram (EIH) Model
  • Auditory-Based Spectral Analysis Model
  • More robust to noise and reverberation
  • Expected to be inherently better representation
    of relevant spectral information because it
    models the human cochlea mechanics

4
Pattern Recognition
Parameter Measurements
Speech Audio,
Test Pattern Query Pattern
Reference Patterns
Pattern Comparison
Decision Rules
Recognized Speech, Audio,
5
Pattern Recognition
6
Spectral Analysis Models
  • Pattern Recognition Approach
  • Parameter Measurement gt Pattern
  • Pattern Comparison
  • Decision Making
  • Parameter Measurements
  • Bank of Filters Model
  • Linear Predictive Coding Model

7
Band Pass Filter
  • Note that the bandpass filter can be defined as
  • a convolution with a filter response function in
    the time domain,
  • a multiplication with a filter response function
    in the frequency domain

8
Bank of Filters Analysis Model
9
Bank of Filters Analysis Model
  • Speech Signal s(n), n0,1,
  • Digital with Fs the sampling frequency of s(n)
  • Bank of q Band Pass Filters BPF1, ,BPFq
  • Spanning a frequency range of, e.g., 100-3000Hz
    or 100-16kHz
  • BPFi(s(n)) xn(ej?i), where ?i 2pfi/Fs is
    equal to the normalized frequency fi, where i1,
    , q.
  • xn(ej?i) is the short time spectral
    representation of s(n) at time n, as seen through
    the BPFi with centre frequency ?i, where i1, ,
    q.
  • Note Each BPF independently processes s to
    produce the spectral representation x

10
Bank of Filters Front End Processor
11
Typical Speech Wave Forms
12
MFCCs
Speech Audio,
Preemphasis
Windowing
Fast Fourier Transform
Mel-Scale Filter Bank
MFCCs are calculated using the formula
Log()
  • Where
  • Ci is the cepstral coefficient
  • P the order (12 in our case)
  • K the number of discrete Fourier
  • transform magnitude coefficients
  • Xk the kth order log-energy output
  • from the Mel-Scale filterbank.
  • N is the number of filters

MFCCs first 12 most Signiifcant coefficients
Direct Cosine Transform
13
Linear Predictive Coding Model
14
Filter Response Functions
15
SomeExamples of Ideal Band Filters
16
Perceptually Based Critical Band Scale
17
Short Time Fourier Transform
  • s(m) signal
  • w(n-m) a fixed low pass window

18
Short Time Fourier TransformLong Hamming Window
500 samples (50msec)
Voiced Speech
19
Short Time Fourier TransformShort Hamming
Window 50 samples (5msec)
Voiced Speech
20
Short Time Fourier TransformLong Hamming Window
500 samples (50msec)
Unvoiced Speech
21
Short Time Fourier TransformShort Hamming
Window 50 samples (5msec)
Unvoiced Speech
22
Short Time Fourier TransformLinear Filter
Interpretation
23
Linear Predictive Coding (LPC) Model
  • Speech Signal s(n), n0,1,
  • Digital with Fs the sampling frequency of s(n)
  • Spectral Analysis on Blocks of Speech with an all
    pole modeling constraint
  • LPC of analysis order p
  • s(n) is blocked into frames n,m
  • Again consider xn(ej?) the short time spectral
    representation of s(n) at time n. (where ?
    2pf/Fs is equal to the normalized frequency f).
  • Now the spectral representation xn(ej?) is
    constrained to be of the form s/A(ej?), where
    A(ej?) is the pth order polynomial with
    z-transform
  • A(z) 1 a1z-1 a2z-2 apz-p
  • The output of the LPC parametric Conversion on
    block n,m is the vector a1,,ap.
  • It specifies parametrically the spectrum of an
    all-pole model that best matches the signal
    spectrum over the period of time in which the
    frame of speech samples was accumulated (pth
    order polynomial approximation of the signal).

24
Vector Quantization
  • Data represented as feature vectors.
  • VQ Training set to determine a set of code words
    that constitute a code book.
  • Code words are centroids using a similarity or
    distance measure d.
  • Code words together with d divide the space into
    a Voronoi regions.
  • A query vector falls into a Voronoi region and
    will be represented by the respective codeword.

25
Vector Quantization
  • Distance measures d(x,y)
  • Euclidean distance
  • Taxi cab distance
  • Hamming distance
  • etc.

26
Vector Quantization
  • Clustering the Training Vectors
  • Initialize choose M arbitrary vectors of the L
    vectors of the training set. This is the initial
    code book.
  • Nearest neighbor search for each training
    vector, find the code word in the current code
    book that is closest and assign that vector to
    the corresponding cell.
  • Centroid update update the code word in each
    cell using the centroid of the training vectors
    that are assigned to that cell.
  • Iteration repeat step 2-3 until the averae
    distance falls below a preset threshold.

27
Vector Classification
  • For an M-vector code book CB with codes
  • CB yi 1 i M ,
  • the index m of the best codebook entry for a
    given vector v is
  • m arg min d(v, yi)
  • 1 i M

28
VQ for Classification
  • A code book CBk yki 1 i M, can be used
    to define a class Ck.
  • Example Audio Classification
  • Classes crowd, car, silence, scream,
    explosion, etc.
  • Determine by using VQ code books CBk for each of
    the classes.
  • VQ is very often used as a baseline method for
    classification problems.

29
Sound, DNA Sequences!
  • DNA helix-shaped molecule whose constituents are
    two parallel strands of nucleotides
  • DNA is usually represented by sequences of these
    four nucleotides
  • This assumes only one strand is considered the
    second strand is always derivable from the first
    by pairing As with Ts and Cs with Gs and
    vice-versa
  • Nucleotides (bases)
  • Adenine (A)
  • Cytosine (C)
  • Guanine (G)
  • Thymine (T)

30
Biological Information From Genes to Proteins
31
From Amino Acids to Proteins Functions
CGCCAGCTGGACGGGCACACCATGAGGCTGCTGACCCTCCTGGGCCTTCT
G TDQAAFDTNIVTLTRFVMEQGRKARGTGEMTQLLNSLCTAVKAIST
AVRKAGIAHLYGIAGSTNVTGDQVKKLDVLSNDLVINVLKSSFATCVLVT
EEDKNAIIVEPEKRGKYVVCFDPLDGSSNIDCLVSIGTIFGIYRKNSTDE
PSEKDALQPGRNLVAAGYALYGSATML
DNA / amino acid sequence 3D
structure protein functions
DNA (gene) ??? pre-RNA ??? RNA ??? Protein
RNA-polymerase Spliceosome
Ribosome
32
Motivation for Markov Models
  • There are many cases in which we would like to
    represent the statistical regularities of some
    class of sequences
  • genes
  • proteins in a given family
  • Sequences of audio features
  • Markov models are well suited to this type of
    task

33
A Markov Chain Model
  • Transition probabilities
  • Pr(xiaxi-1g)0.16
  • Pr(xicxi-1g)0.34
  • Pr(xigxi-1g)0.38
  • Pr(xitxi-1g)0.12

34
Definition of Markov Chain Model
  • A Markov chain1 model is defined by
  • a set of states
  • some states emit symbols
  • other states (e.g., the begin state) are silent
  • a set of transitions with associated
    probabilities
  • the transitions emanating from a given state
    define a distribution over the possible next
    states
  • 1 ?????? ?. ?., ??????????????? ?????? ???????
    ????? ?? ????????, ????????? ???? ?? ?????. 
    ???????? ??????-??????????????? ???????? ???
    ????????? ????????????.  2-? ?????.  ??? 15.
    (1906)  ?. 135156

35
Markov Chain Models Properties
  • Given some sequence x of length L, we can ask how
    probable the sequence is given our model
  • For any probabilistic model of sequences, we can
    write this probability as
  • key property of a (1st order) Markov chain the
    probability of each xi depends only on the value
    of xi-1

36
The Probability of a Sequence for a Markov Chain
Model
Pr(cggt)Pr(c)Pr(gc)Pr(gg)Pr(tg)
37
Example Application
  • CpG islands
  • CG di-nucleotides are rarer in eukaryotic genomes
    than expected given the marginal probabilities of
    C and G
  • but the regions upstream of genes are richer in
    CG di-nucleotides than elsewhere CpG islands
  • useful evidence for finding genes
  • Application Predict CpG islands with Markov
    chains
  • one Markov chain to represent CpG islands
  • another Markov chain to represent the rest of the
    genome

38
Markov Chains for Discrimination
  • Suppose we want to distinguish CpG islands from
    other sequence regions
  • Given sequences from CpG islands, and sequences
    from other regions, we can construct
  • a model to represent CpG islands
  • a null model to represent the other regions
  • We can then score a test sequence by

39
Markov Chains for Discrimination
  • Why can we use
  • According to Bayes rule
  • If we are not taking into account prior
    probabilities (Pr(CpG) and Pr(null)) of the two
    classes, then from Bayes rule it is clear that
    we just need to compare Pr(xCpG) and Pr(xnull)
    as is done in our scoring function score().

40
Higher Order Markov Chains
  • The Markov property specifies that the
    probability of a state depends only on the
    probability of the previous state
  • But we can build more memory into our states by
    using a higher order Markov model
  • In an n-th order Markov model
  • The probability of the current state depends on
    the previous n states.

41
Selecting the Order of a Markov Chain Model
  • But the number of parameters we need to estimate
    grows exponentially with the order
  • for modeling DNA we need
    parameters for an n-th order model
  • The higher the order, the less reliable we can
    expect our parameter estimates to be
  • estimating the parameters of a 2nd order Markov
    chain from the complete genome of E. Coli (5.44 x
    106 bases) , wed see each word 85.000 times on
    average (divide by 43)
  • estimating the parameters of a 9th order chain,
    wed see each word 5 times on average (divide
    by 410 106)

42
Higher Order Markov Chains
  • An n-th order Markov chain over some alphabet A
    is equivalent to a first order Markov chain over
    the alphabet of n-tuples An
  • Example A 2nd order Markov model for DNA can be
    treated as a 1st order Markov model over alphabet
  • AA, AC, AG, AT
  • CA, CC, CG, CT
  • GA, GC, GG, GT
  • TA, TC, TG, TT

43
A Fifth Order Markov Chain
Pr(gctaca)Pr(gctac)Pr(agctac)
44
Hidden Markov Model A Simple HMM
Model 2
Model 1
Given observed sequence AGGCT, which state emits
every item?
45
Tutorial on HMM
  • L.R. Rabiner, A Tutorial on Hidden Markov Models
    and Selected Applications in Speech Recognition,
  • Proceeding of the IEEE, Vol. 77, No. 22, February
    1989.

46
HMM for Hidden Coin Tossing
T
H
H
H H T T H T H H T T H
T
T
T
T
T
47
Hidden State
  • Well distinguish between the observed parts of a
    problem and the hidden parts
  • In the Markov models weve considered previously,
    it is clear which state accounts for each part of
    the observed sequence
  • In the model above, there are multiple states
    that could account for each part of the observed
    sequence
  • this is the hidden part of the problem

48
Learning and Prediction Tasks(in general, i.e.,
applies on both MM as HMM)
  • Learning
  • Given a model, a set of training sequences
  • Do find model parameters that explain the
    training sequences with relatively high
    probability (goal is to find a model that
    generalizes well to sequences we havent seen
    before)
  • Classification
  • Given a set of models representing different
    sequence classes, and given a test sequence
  • Do determine which model/class best explains the
    sequence
  • Segmentation
  • Given a model representing different sequence
    classes, and given a test sequence
  • Do segment the sequence into subsequences,
    predicting the class of each subsequence

49
Algorithms for Learning Prediction
  • Learning
  • correct path known for each training sequence -gt
    simple maximum likelihood or Bayesian estimation
  • correct path not known -gt Forward-Backward
    algorithm ML or Bayesian estimation
  • Classification
  • simple Markov model -gt calculate probability of
    sequence along single path for each model
  • hidden Markov model -gt Forward algorithm to
    calculate probability of sequence along all paths
    for each model
  • Segmentation
  • hidden Markov model -gt Viterbi algorithm to find
    most probable path for sequence

50
The Parameters of an HMM
  • Transition Probabilities
  • Probability of transition from state k to state l
  • Emission Probabilities
  • Probability of emitting character b in state k
  • Note HMMs can also be formulated using an
    emission probability associated with a transition
    from state k to state l.

51
An HMM Example
Transition probabilities ? pi 1
Emission probabilities ? pi 1
52
Three Important Questions(See also L.R. Rabiner
(1989))
  • How likely is a given sequence?
  • The Forward algorithm
  • What is the most probable path for generating a
    given sequence?
  • The Viterbi algorithm
  • How can we learn the HMM parameters given a set
    of sequences?
  • The Forward-Backward (Baum-Welch) algorithm

53
How Likely is a Given Sequence?
  • The probability that a given path is taken and
    the sequence is generated

54
How Likely is a Given Sequence?
  • The probability over all paths is
  • but the number of paths can be exponential in the
    length of the sequence...
  • the Forward algorithm enables us to compute this
    efficiently

55
The Forward Algorithm
  • Define to be the probability of being in
    state k having observed the first i characters of
    sequence x of length L
  • To compute , the probability of being in
    the end state having observed all of sequence x
  • Can be defined recursively
  • Compute using dynamic programming

56
The Forward Algorithm
  • fk(i) equal to the probability of being in state
    k having observed the first i characters of
    sequence x
  • Initialization
  • f0(0) 1 for start state fi(0) 0 for other
    state
  • Recursion
  • For emitting state (i 1, L)
  • For silent state
  • Termination

57
Forward Algorithm Example
Given the sequence xTAGA
58
Forward Algorithm Example
  • Initialization
  • f0(0)1, f1(0)0f5(0)0
  • Computing other values
  • f1(1)e1(T)(f0(0)a01f1(0)a11)
  • 0.3(10.500.2)0.15
  • f2(1)0.4(10.500.8)
  • f1(2)e1(A)(f0(1)a01f1(1)a11)
  • 0.4(00.50.150.2)
  • Pr(TAGA) f5(4)f3(4)a35f4(4)a45

59
Three Important Questions
  • How likely is a given sequence?
  • What is the most probable path for generating a
    given sequence?
  • How can we learn the HMM parameters given a set
    of sequences?

60
Finding the Most Probable Path The Viterbi
Algorithm
  • Define vk(i) to be the probability of the most
    probable path accounting for the first i
    characters of x and ending in state k
  • We want to compute vN(L), the probability of the
    most probable path accounting for all of the
    sequence and ending in the end state
  • Can be defined recursively
  • Again we can use use Dynamic Programming to
    compute vN(L) and find the most probable path
    efficiently

61
Finding the Most Probable Path The Viterbi
Algorithm
  • Define vk(i) to be the probability of the most
    probable path p accounting for the first i
    characters of x and ending in state k
  • The Viterbi Algorithm
  • Initialization (i 0)
  • v0(0) 1, vk(0) 0 for kgt0
  • Recursion (i 1,,L)
  • vl(i) el(xi) .maxk(vk(i-1).akl)
  • ptri(l) argmaxk(vk(i-1).akl)
  • Termination
  • P(x,p) maxk(vk(L).ak0)
  • pL argmaxk(vk(L).ak0)

62
Three Important Questions
  • How likely is a given sequence?
  • What is the most probable path for generating a
    given sequence?
  • How can we learn the HMM parameters given a set
    of sequences?

63
Learning Without Hidden State
  • Learning is simple if we know the correct path
    for each sequence in our training set
  • estimate parameters by counting the number of
    times each parameter is used across the training
    set

64
Learning With Hidden State
  • If we dont know the correct path for each
    sequence in our training set, consider all
    possible paths for the sequence
  • Estimate parameters through a procedure that
    counts the expected number of times each
    parameter is used across the training set

65
Learning Parameters The Baum-Welch Algorithm
  • Also known as the Forward-Backward algorithm
  • An Expectation Maximization (EM) algorithm
  • EM is a family of algorithms for learning
    probabilistic models in problems that involve
    hidden states
  • In this context, the hidden state is the path
    that best explains each training sequence

66
Learning Parameters The Baum-Welch Algorithm
  • Algorithm sketch
  • initialize parameters of model
  • iterate until convergence
  • calculate the expected number of times each
    transition or emission is used
  • adjust the parameters to maximize the likelihood
    of these expected values

67
Computational Complexity of HMM Algorithms
  • Given an HMM with S states and a sequence of
    length L, the complexity of the Forward, Backward
    and Viterbi algorithms is
  • This assumes that the states are densely
    interconnected
  • Given M sequences of length L, the complexity of
    Baum Welch on each iteration is

68
Markov Models Summary
  • We considered models that vary in terms of order,
    hidden state
  • Three DP-based algorithms for HMMs Forward,
    Backward and Viterbi
  • We discussed three key tasks learning,
    classification and segmentation
  • The algorithms used for each task depend on
    whether there is hidden state (correct path
    known) in the problem or not

69
Summary
  • Markov chains and hidden Markov models are
    probabilistic models in which the probability of
    a state depends only on that of the previous
    state
  • Given a sequence of symbols, x, the forward
    algorithm finds the probability of obtaining x in
    the model
  • The Viterbi algorithm finds the most probable
    path (corresponding to x) through the model
  • The Baum-Welch learns or adjusts the model
    parameters (transition and emission
    probabilities) to best explain a set of training
    sequences.
Write a Comment
User Comments (0)
About PowerShow.com