Audio Features - PowerPoint PPT Presentation

About This Presentation

Title:

Audio Features

Description:

Learning Without Hidden State Learning is simple if we know the correct path for each sequence in our training set ... Vector Quantization Distance ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 70

Provided by: bak109

Category:

more less

Transcript and Presenter's Notes

Title: Audio Features

1
Audio Features Machine Learning

E.M. Bakker

2
Features for Speech Recognition and Audio Indexing

Parametric Representations
Short Time Energy
Zero Crossing Rates
Level Crossing Rates
Short Time Spectral Envelope
Spectral Analysis
Filter Design
Filter Bank Spectral Analysis Model
Linear Predictive Coding (LPC)

3
Methods

Vector Quantization
Finite code book of spectral shapes
The code book codes for typical spectral shape
Method for all spectral representations (e.g.
Filter Banks, LPC, ZCR, etc. )
Ensemble Interval Histogram (EIH) Model
Auditory-Based Spectral Analysis Model
More robust to noise and reverberation
Expected to be inherently better representation
of relevant spectral information because it
models the human cochlea mechanics

4
Pattern Recognition
Parameter Measurements
Speech Audio,
Test Pattern Query Pattern
Reference Patterns
Pattern Comparison
Decision Rules
Recognized Speech, Audio,
5
Pattern Recognition
6
Spectral Analysis Models

Pattern Recognition Approach
Parameter Measurement gt Pattern
Pattern Comparison
Decision Making
Parameter Measurements
Bank of Filters Model
Linear Predictive Coding Model

7
Band Pass Filter

Note that the bandpass filter can be defined as
a convolution with a filter response function in
the time domain,
a multiplication with a filter response function
in the frequency domain

8
Bank of Filters Analysis Model
9
Bank of Filters Analysis Model

Speech Signal s(n), n0,1,
Digital with Fs the sampling frequency of s(n)
Bank of q Band Pass Filters BPF1, ,BPFq
Spanning a frequency range of, e.g., 100-3000Hz
or 100-16kHz
BPFi(s(n)) xn(ej?i), where ?i 2pfi/Fs is
equal to the normalized frequency fi, where i1,
, q.
xn(ej?i) is the short time spectral
representation of s(n) at time n, as seen through
the BPFi with centre frequency ?i, where i1, ,
q.
Note Each BPF independently processes s to
produce the spectral representation x

10
Bank of Filters Front End Processor
11
Typical Speech Wave Forms
12
MFCCs
Speech Audio,
Preemphasis
Windowing
Fast Fourier Transform
Mel-Scale Filter Bank
MFCCs are calculated using the formula
Log()

Where
Ci is the cepstral coefficient
P the order (12 in our case)
K the number of discrete Fourier
transform magnitude coefficients
Xk the kth order log-energy output
from the Mel-Scale filterbank.
N is the number of filters

MFCCs first 12 most Signiifcant coefficients
Direct Cosine Transform
13
Linear Predictive Coding Model
14
Filter Response Functions
15
SomeExamples of Ideal Band Filters
16
Perceptually Based Critical Band Scale
17
Short Time Fourier Transform

s(m) signal
w(n-m) a fixed low pass window

18
Short Time Fourier TransformLong Hamming Window
500 samples (50msec)
Voiced Speech
19
Short Time Fourier TransformShort Hamming
Window 50 samples (5msec)
Voiced Speech
20
Short Time Fourier TransformLong Hamming Window
500 samples (50msec)
Unvoiced Speech
21
Short Time Fourier TransformShort Hamming
Window 50 samples (5msec)
Unvoiced Speech
22
Short Time Fourier TransformLinear Filter
Interpretation
23
Linear Predictive Coding (LPC) Model

Speech Signal s(n), n0,1,
Digital with Fs the sampling frequency of s(n)
Spectral Analysis on Blocks of Speech with an all
pole modeling constraint
LPC of analysis order p
s(n) is blocked into frames n,m
Again consider xn(ej?) the short time spectral
representation of s(n) at time n. (where ?
2pf/Fs is equal to the normalized frequency f).
Now the spectral representation xn(ej?) is
constrained to be of the form s/A(ej?), where
A(ej?) is the pth order polynomial with
z-transform
A(z) 1 a1z-1 a2z-2 apz-p
The output of the LPC parametric Conversion on
block n,m is the vector a1,,ap.
It specifies parametrically the spectrum of an
all-pole model that best matches the signal
spectrum over the period of time in which the
frame of speech samples was accumulated (pth
order polynomial approximation of the signal).

24
Vector Quantization

Data represented as feature vectors.
VQ Training set to determine a set of code words
that constitute a code book.
Code words are centroids using a similarity or
distance measure d.
Code words together with d divide the space into
a Voronoi regions.
A query vector falls into a Voronoi region and
will be represented by the respective codeword.

25
Vector Quantization

Distance measures d(x,y)
Euclidean distance
Taxi cab distance
Hamming distance
etc.

26
Vector Quantization

Clustering the Training Vectors
Initialize choose M arbitrary vectors of the L
vectors of the training set. This is the initial
code book.
Nearest neighbor search for each training
vector, find the code word in the current code
book that is closest and assign that vector to
the corresponding cell.
Centroid update update the code word in each
cell using the centroid of the training vectors
that are assigned to that cell.
Iteration repeat step 2-3 until the averae
distance falls below a preset threshold.

27
Vector Classification

For an M-vector code book CB with codes
CB yi 1 i M ,
the index m of the best codebook entry for a
given vector v is
m arg min d(v, yi)
1 i M

28
VQ for Classification

A code book CBk yki 1 i M, can be used
to define a class Ck.
Example Audio Classification
Classes crowd, car, silence, scream,
explosion, etc.
Determine by using VQ code books CBk for each of
the classes.
VQ is very often used as a baseline method for
classification problems.

29
Sound, DNA Sequences!

DNA helix-shaped molecule whose constituents are
two parallel strands of nucleotides
DNA is usually represented by sequences of these
four nucleotides
This assumes only one strand is considered the
second strand is always derivable from the first
by pairing As with Ts and Cs with Gs and
vice-versa

Nucleotides (bases)
Adenine (A)
Cytosine (C)
Guanine (G)
Thymine (T)

30
Biological Information From Genes to Proteins
31
From Amino Acids to Proteins Functions
CGCCAGCTGGACGGGCACACCATGAGGCTGCTGACCCTCCTGGGCCTTCT
G TDQAAFDTNIVTLTRFVMEQGRKARGTGEMTQLLNSLCTAVKAIST
AVRKAGIAHLYGIAGSTNVTGDQVKKLDVLSNDLVINVLKSSFATCVLVT
EEDKNAIIVEPEKRGKYVVCFDPLDGSSNIDCLVSIGTIFGIYRKNSTDE
PSEKDALQPGRNLVAAGYALYGSATML
DNA / amino acid sequence 3D
structure protein functions
DNA (gene) ??? pre-RNA ??? RNA ??? Protein
RNA-polymerase Spliceosome
Ribosome
32
Motivation for Markov Models

There are many cases in which we would like to
represent the statistical regularities of some
class of sequences
genes
proteins in a given family
Sequences of audio features
Markov models are well suited to this type of
task

33
A Markov Chain Model

Transition probabilities
Pr(xiaxi-1g)0.16
Pr(xicxi-1g)0.34
Pr(xigxi-1g)0.38
Pr(xitxi-1g)0.12

34
Definition of Markov Chain Model

A Markov chain1 model is defined by
a set of states
some states emit symbols
other states (e.g., the begin state) are silent
a set of transitions with associated
probabilities
the transitions emanating from a given state
define a distribution over the possible next
states
1 ?????? ?. ?., ??????????????? ?????? ???????
????? ?? ????????, ????????? ???? ?? ?????.
???????? ??????-??????????????? ???????? ???
????????? ????????????. 2-? ?????. ??? 15.
(1906) ?. 135156

35
Markov Chain Models Properties

Given some sequence x of length L, we can ask how
probable the sequence is given our model
For any probabilistic model of sequences, we can
write this probability as
key property of a (1st order) Markov chain the
probability of each xi depends only on the value
of xi-1

36
The Probability of a Sequence for a Markov Chain
Model
Pr(cggt)Pr(c)Pr(gc)Pr(gg)Pr(tg)
37
Example Application

CpG islands
CG di-nucleotides are rarer in eukaryotic genomes
than expected given the marginal probabilities of
C and G
but the regions upstream of genes are richer in
CG di-nucleotides than elsewhere CpG islands
useful evidence for finding genes
Application Predict CpG islands with Markov
chains
one Markov chain to represent CpG islands
another Markov chain to represent the rest of the
genome

38
Markov Chains for Discrimination

Suppose we want to distinguish CpG islands from
other sequence regions
Given sequences from CpG islands, and sequences
from other regions, we can construct
a model to represent CpG islands
a null model to represent the other regions
We can then score a test sequence by

39
Markov Chains for Discrimination

Why can we use
According to Bayes rule
If we are not taking into account prior
probabilities (Pr(CpG) and Pr(null)) of the two
classes, then from Bayes rule it is clear that
we just need to compare Pr(xCpG) and Pr(xnull)
as is done in our scoring function score().

40
Higher Order Markov Chains

The Markov property specifies that the
probability of a state depends only on the
probability of the previous state
But we can build more memory into our states by
using a higher order Markov model
In an n-th order Markov model
The probability of the current state depends on
the previous n states.

41
Selecting the Order of a Markov Chain Model

But the number of parameters we need to estimate
grows exponentially with the order
for modeling DNA we need
parameters for an n-th order model
The higher the order, the less reliable we can
expect our parameter estimates to be
estimating the parameters of a 2nd order Markov
chain from the complete genome of E. Coli (5.44 x
106 bases) , wed see each word 85.000 times on
average (divide by 43)
estimating the parameters of a 9th order chain,
wed see each word 5 times on average (divide
by 410 106)

42
Higher Order Markov Chains

An n-th order Markov chain over some alphabet A
is equivalent to a first order Markov chain over
the alphabet of n-tuples An
Example A 2nd order Markov model for DNA can be
treated as a 1st order Markov model over alphabet
AA, AC, AG, AT
CA, CC, CG, CT
GA, GC, GG, GT
TA, TC, TG, TT

43
A Fifth Order Markov Chain
Pr(gctaca)Pr(gctac)Pr(agctac)
44
Hidden Markov Model A Simple HMM
Model 2
Model 1
Given observed sequence AGGCT, which state emits
every item?
45
Tutorial on HMM

L.R. Rabiner, A Tutorial on Hidden Markov Models
and Selected Applications in Speech Recognition,
Proceeding of the IEEE, Vol. 77, No. 22, February
1989.

46
HMM for Hidden Coin Tossing
T
H
H
H H T T H T H H T T H
T
T
T
T
T
47
Hidden State

Well distinguish between the observed parts of a
problem and the hidden parts
In the Markov models weve considered previously,
it is clear which state accounts for each part of
the observed sequence
In the model above, there are multiple states
that could account for each part of the observed
sequence
this is the hidden part of the problem

48
Learning and Prediction Tasks(in general, i.e.,
applies on both MM as HMM)

Learning
Given a model, a set of training sequences
Do find model parameters that explain the
training sequences with relatively high
probability (goal is to find a model that
generalizes well to sequences we havent seen
before)
Classification
Given a set of models representing different
sequence classes, and given a test sequence
Do determine which model/class best explains the
sequence
Segmentation
Given a model representing different sequence
classes, and given a test sequence
Do segment the sequence into subsequences,
predicting the class of each subsequence

49
Algorithms for Learning Prediction

Learning
correct path known for each training sequence -gt
simple maximum likelihood or Bayesian estimation
correct path not known -gt Forward-Backward
algorithm ML or Bayesian estimation
Classification
simple Markov model -gt calculate probability of
sequence along single path for each model
hidden Markov model -gt Forward algorithm to
calculate probability of sequence along all paths
for each model
Segmentation
hidden Markov model -gt Viterbi algorithm to find
most probable path for sequence

50
The Parameters of an HMM

Transition Probabilities
Probability of transition from state k to state l
Emission Probabilities
Probability of emitting character b in state k
Note HMMs can also be formulated using an
emission probability associated with a transition
from state k to state l.

51
An HMM Example
Transition probabilities ? pi 1
Emission probabilities ? pi 1
52
Three Important Questions(See also L.R. Rabiner
(1989))

How likely is a given sequence?
The Forward algorithm
What is the most probable path for generating a
given sequence?
The Viterbi algorithm
How can we learn the HMM parameters given a set
of sequences?
The Forward-Backward (Baum-Welch) algorithm

53
How Likely is a Given Sequence?

The probability that a given path is taken and
the sequence is generated

54
How Likely is a Given Sequence?

The probability over all paths is
but the number of paths can be exponential in the
length of the sequence...
the Forward algorithm enables us to compute this
efficiently

55
The Forward Algorithm

Define to be the probability of being in
state k having observed the first i characters of
sequence x of length L
To compute , the probability of being in
the end state having observed all of sequence x
Can be defined recursively
Compute using dynamic programming

56
The Forward Algorithm

fk(i) equal to the probability of being in state
k having observed the first i characters of
sequence x
Initialization
f0(0) 1 for start state fi(0) 0 for other
state
Recursion
For emitting state (i 1, L)
For silent state
Termination

57
Forward Algorithm Example
Given the sequence xTAGA
58
Forward Algorithm Example

Initialization
f0(0)1, f1(0)0f5(0)0
Computing other values
f1(1)e1(T)(f0(0)a01f1(0)a11)
0.3(10.500.2)0.15
f2(1)0.4(10.500.8)
f1(2)e1(A)(f0(1)a01f1(1)a11)
0.4(00.50.150.2)
Pr(TAGA) f5(4)f3(4)a35f4(4)a45

59
Three Important Questions

How likely is a given sequence?
What is the most probable path for generating a
given sequence?
How can we learn the HMM parameters given a set
of sequences?

60
Finding the Most Probable Path The Viterbi
Algorithm

Define vk(i) to be the probability of the most
probable path accounting for the first i
characters of x and ending in state k
We want to compute vN(L), the probability of the
most probable path accounting for all of the
sequence and ending in the end state
Can be defined recursively
Again we can use use Dynamic Programming to
compute vN(L) and find the most probable path
efficiently

61
Finding the Most Probable Path The Viterbi
Algorithm

Define vk(i) to be the probability of the most
probable path p accounting for the first i
characters of x and ending in state k
The Viterbi Algorithm
Initialization (i 0)
v0(0) 1, vk(0) 0 for kgt0
Recursion (i 1,,L)
vl(i) el(xi) .maxk(vk(i-1).akl)
ptri(l) argmaxk(vk(i-1).akl)
Termination
P(x,p) maxk(vk(L).ak0)
pL argmaxk(vk(L).ak0)

62
Three Important Questions

How likely is a given sequence?
What is the most probable path for generating a
given sequence?
How can we learn the HMM parameters given a set
of sequences?

63
Learning Without Hidden State

Learning is simple if we know the correct path
for each sequence in our training set
estimate parameters by counting the number of
times each parameter is used across the training
set

64
Learning With Hidden State

If we dont know the correct path for each
sequence in our training set, consider all
possible paths for the sequence
Estimate parameters through a procedure that
counts the expected number of times each
parameter is used across the training set

65
Learning Parameters The Baum-Welch Algorithm

Also known as the Forward-Backward algorithm
An Expectation Maximization (EM) algorithm
EM is a family of algorithms for learning
probabilistic models in problems that involve
hidden states
In this context, the hidden state is the path
that best explains each training sequence

66
Learning Parameters The Baum-Welch Algorithm

Algorithm sketch
initialize parameters of model
iterate until convergence
calculate the expected number of times each
transition or emission is used
adjust the parameters to maximize the likelihood
of these expected values

67
Computational Complexity of HMM Algorithms

Given an HMM with S states and a sequence of
length L, the complexity of the Forward, Backward
and Viterbi algorithms is
This assumes that the states are densely
interconnected
Given M sequences of length L, the complexity of
Baum Welch on each iteration is

68
Markov Models Summary

We considered models that vary in terms of order,
hidden state
Three DP-based algorithms for HMMs Forward,
Backward and Viterbi
We discussed three key tasks learning,
classification and segmentation
The algorithms used for each task depend on
whether there is hidden state (correct path
known) in the problem or not

69
Summary

Markov chains and hidden Markov models are
probabilistic models in which the probability of
a state depends only on that of the previous
state
Given a sequence of symbols, x, the forward
algorithm finds the probability of obtaining x in
the model
The Viterbi algorithm finds the most probable
path (corresponding to x) through the model
The Baum-Welch learns or adjusts the model
parameters (transition and emission
probabilities) to best explain a set of training
sequences.