SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION

Description:

Page 4 of 38. Signal Processing Tools for Speech Recognition. WHY REINVENT THE WHEEL? A Front-end has many areas of complexity: Run-time efficiency ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 40
Provided by: jennifer194
Category:

less

Transcript and Presenter's Notes

Title: SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION


1
SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION
Presented by Richard Duncan Group??? Lab??? Compa
ny???
in collaboration with Hualin Gao, Richard
Duncan, Julie A. Baca, Joseph Picone Human and
Systems Engineering Center of Advanced Vehicular
System Mississippi State University
2
WHICH TWO ARE THE SAME PHONEME?
We need to extract meaningful information from
the signal for a speech recognition system to
model
3
WHICH TWO ARE THE SAME PHONEME?
a ow
b aa
c ow
4
WHAT IS AN ACOUSTIC FRONT-END?
It encapsulates the signal processing of a speech
recognition system. It computes a sequence of
feature vectors from an audio stream. These
vectors are then processed by HMMs, neural
networks, or other classifiers.
5
  • WHY REINVENT THE WHEEL?
  • A Front-end has many areas of complexity
  • Run-time efficiency
  • File I/O
  • Data management (framing)
  • DSP algorithm complexity
  • Algorithm re-use
  • Our system abstracts the researcher/student from
    these mundane issues to so he or she can focus on
    the algorithms

6
  • DATA FRAMING

framen
framen1
windown
windown1
New data
Shared data
7
FEATURES OF ISIP FOUNDATION CLASSES
  • Efficient memory management and tracking
  • System and I/O libraries that abstract details of
    the operating system
  • Math classes that provide basic linear algebra
    and efficient matrix manipulations
  • Generic data structures
  • Built-in unit tests to verify component
    correctness.

8
  • DESIGN REQUIREMENTS
  • A library of standard algorithms provides basic
    digital signal processing (DSP) functions
  • New algorithms can be added without modifying
    existing classes
  • A block diagram tool allows rapid prototyping
    without programming or recompiling
  • The same system is used for offline feature
    extraction, recognition, and general DSP work.

9
  • BASIC DIGITAL PROCESSING FUNCTIONS

This example shows how to realize the basic
digital signal processing functions. It computes
the energy of input vector in dB using the SUM
algorithm // declare an Energy object, input
vector, and output vector Energy egy
VectorFloat output VectorFloat input(L"0, 1,
2") // choose algorithm enrgy.setAlgorithm(Ene
rgySUM) // choose implementation egy.setImple
mentation(EnergyDB) // compute the energy of
input data egy.compute(output, input)
10
  • ADDING NEW ALGORITHMS
  • Interface contract allows extensibility to new
    algorithms
  • All algorithms are classes that implement this
    interface
  • Most have a default implementation.

11
  • ADDING NEW ALGORITHMS

boolean Energyinit() const String
className() const return CLASS_NAME int
GetLeadingPad() const return 0 int
GetTrailingPad() const return 0 bool
Apply(VectorltAlgorithmDatagt output,
VectorltAlgorithmDatagt input) // determine
what channel to operate on if
(algorithm_d SUM)
computeSum(output(0).makeVectorFloat(),
input(0).getVectorFloat())
12
  • ADDING NEW ALGORITHMS

boolean EnergycomputeSum(VectorFloat output_a,

const VectorFloat input_a) // compute the
sum of squares Float e input_a.sumSquare()
// compute the scale factor according to
specified implementation float scaled_energy
scale(e, input_a.length()) // the length of
the output vector should be 1 as it only contains
the energy output_a.setLength(1) // assign
the value of energy to the output output_a(0)
Integralmax(floor_d, scaled_energy) // exit
gracefully return true
13
  • DEFINITIONS
  • Algorithm
  • Input and output is an array of floating point
    numbers
  • Correspond to basic DSP principles
  • Recipe
  • Collection of algorithms which are run serially,
    output of An-1 is the input to An
  • Named input and outputs
  • Allows reuse of processing blocks between systems

14
HIERARCHY OF ALGORITHM CLASSES
15
  • FRONT-END CONFIGURATION TOOL
  • Design a front-end by creating a block diagram
  • Allows rapid prototyping of ideas.
  • New modules can easily be added into the system
  • Parameter file is then the input to a full speech
    recognition system

16
  • FRONT-END CONFIGURATION TOOL

17
  • FRONT-END CONFIGURATION TOOL

18
  • FRONT-END CONFIGURATION TOOL

19
  • FRONT-END CONFIGURATION TOOL

20
  • FRONT-END CONFIGURATION TOOL

21
  • FRONT-END CONFIGURATION TOOL

22
  • FRONT-END CONFIGURATION TOOL

23
  • FRONT-END CONFIGURATION TOOL

24
  • FRONT-END CONFIGURATION TOOL

25
  • FRONT-END CONFIGURATION TOOL

26
  • FRONT-END CONFIGURATION TOOL

27
  • FRONT-END CONFIGURATION TOOL

28
  • FRONT-END CONFIGURATION TOOL

29
  • FRONT-END CONFIGURATION TOOL

30
  • FRONT-END CONFIGURATION TOOL

31
  • FRONT-END CONFIGURATION TOOL

32
  • FRONT-END CONFIGURATION TOOL

33
  • FRONT-END CONFIGURATION TOOL

34
  • FRONT-END CONFIGURATION TOOL

35
RESPONSIBILITIES OF THE UTILITY
  • Parses the file containing the recipe created in
    the configuration tool
  • Synchronizes different paths along the block flow
    diagram contained in the recipe
  • Prepares input and output data buffers for each
    algorithm
  • Schedules the sequence of required signal
    processing operations
  • Processes data through the recipe
  • Manages large collections of data files.

36
  • VERIFICATION STRATEGY
  • The correctness The implementation of each
    algorithm is verified manually or by using other
    tools such as MATLAB.
  • Usability Assessed and enhanced the usability of
    our tools through extensive user testing
    conducted over the course of several training
    sessions.
  • Speech recognition experiments The correctness
    of the tools was also verified by speech
    recognition experiments.

37
  • STATE-OF-THE-ART FEATURES
  • Mel-frequency cepstral coefficients (MFCCs)
  • Cepstral mean subtraction
  • Energy normalization
  • 1st and 2nd order differential features
  • These features are used by most commercial speech
    recognition systems.

38
EXPERIMENTAL RESULTS
39
  • CONCLUSION
  • The front-end performs signal processing for
    speech recognition systems
  • The ISIP front-end is implemented on an
    extensible library of basic DSP building blocks
  • A block diagram interface is used to configure
    the front-end data flow
  • The tools usability was optimized through
    multiple training sessions with new users
  • The systems correctness was verified through
    speech recognition experiments.
Write a Comment
User Comments (0)
About PowerShow.com