A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS - PowerPoint PPT Presentation

About This Presentation
Title:

A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS

Description:

A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS. 3rd Annual Intelligent Vehicle Systems Symposium ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 49
Provided by: taj121
Category:

less

Transcript and Presenter's Notes

Title: A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS


1
A COMPARISON OF COMMERCIAL SPEECH RECOGNITION
COMPONENTS FOR USE IN POLICE CRUISERS
  • 3rd Annual Intelligent Vehicle Systems Symposium
  • Andrew L. Kun
  • Brett Vinciguerra
  • June 11, 2003

2
Outline of Presentation
  • Introduction - What, Why and How?
  • Background
  • Speech Recognition Evaluation Program Software
  • Testing
  • Results and Discussion
  • Conclusion

3
Project54 Overview
  • UNH / NHSP / DOJ
  • Integrates
  • Controls
  • Standard Interface

4
(No Transcript)
5
(No Transcript)
6
Introduction
  • What was the goal of this research?
  • Compare SR engine and microphone combinations
  • Accuracy and efficiency
  • Quantitatively

7
Introduction
  • Why was this research important?
  • Limit distraction
  • Limit frustration
  • Standard Process

8
Introduction
  • How was this goal accomplished?
  • 16 combinations (4 engines x 4 mics) evaluated
  • Speech Recognition Evaluation Program (SREP)
  • Simulates
  • Classifies
  • Calculates

9
Introduction
  • Accuracy
  • of correct commands verses total commands
  • Efficiency
  • false recognitions
  • weighted

10
Outline of Presentation
  • Introduction - What, Why and How?
  • Background
  • Speech Recognition Evaluation Program Software
  • Testing
  • Results
  • Discussion
  • Conclusion

11
SR ENGINE OPTIONS
  • Speed of Speech
  • Discrete
  • Continuous
  • Type of Application
  • Command-and-control
  • Dictation
  • User-Dependency
  • Speaker dependent
  • Speaker independent
  • Field of Application
  • PC
  • Telephone
  • Noise robust
  • Grammar File

12
Comparing SR Engines
  • Field test
  • Simulated tests
  • Speaker source
  • Background noise
  • Number of speakers

13
Accuracy Ratings
  • Not consistent
  • Different conditions
  • Hydes Law
  • Because speech recognisers have an accuracy of
    98, tests must be arranged to prove it

14
Component Requirements
  • Speech Recognition Engine
  • Must be SAPI 4.0
  • Microphone
  • Must be far-field
  • Mountable on dashboard
  • Cancel noise
  • Array
  • Directional

15
Outline of Presentation
  • Introduction - What, Why and How?
  • Background
  • Speech Recognition Evaluation Program Software
  • Testing
  • Results and Discussion
  • Conclusion

16
(No Transcript)
17

18

LOOP ENGINES
LOOP BACKGROUND
LOOP COMMANDS
19
Obtaining Sound Files
  • Laptop w/ SoundBlaster
  • Earthworks M30BX
  • Background recorded on patrol
  • Speech commands in lab
  • Microsoft Audio Collection Tool
  • 5 Speakers (4 male, 1 female)
  • 40 phrases

20
Processing Sound Files
  • Matlab script
  • Signal strength variance(signal)
    mean(signal)2
  • Set volume and signal-to-noise ratio

21

22
Control File Structure
  • Background Noises
  • WAV filename
  • Desired SNR
  • Signal strength
  • Description of file
  • Voice Commands
  • WAV filename
  • Number of loops
  • Signal strength
  • Phrase

23
Outline of Presentation
  • Introduction - What, Why and How?
  • Background
  • Speech Recognition Evaluation Program Software
  • Testing
  • Results and Discussion
  • Conclusion

24
PRODUCTS TESTED
  • Four microphones
  • A, B, C and D.
  • Four SR engines
  • 1, 2, 3, and 4.
  • 16 unique combinations
  • A1 through D4

25
(No Transcript)
26
SR ENGINES
  • SR Engine 1
  • Microsoft SR Engine 4.0
  • SR Engine 2
  • Microsoft SR Engine 4.0
  • SR Engine 3
  • Dragon NaturallySpeaking 4.0
  • SR Engine 4
  • IBM ViaVoice 8.01

27
PREPERATION
  • Freshly installed engines
  • Minimum training
  • Default settings
  • Microphone Set-up Wizard

28
TEST SCENERIO
  • Identical conditions
  • 42 phrase grammar
  • 10 speech commands
  • 5 speakers
  • 6 background noises
  • 3 SNR levels

29
(No Transcript)
30
Outline of Presentation
  • Introduction - What, Why and How?
  • Background
  • Speech Recognition Evaluation Program Software
  • Testing
  • Results and Discussion
  • Conclusion

31
ACCURACY BY ENGINE
32
ACCURACY BY MIC
33
RANKED ACCURACY
34
Efficiency Score
  • Specific to Project54
  • False recognitions

35
Efficiency Score
  • SAID HEARD
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS LOSS 0
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS

36
Efficiency Score
  • SAID HEARD
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS LOSS 1
  • LIGHTS UNRECOGNIZED
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS

37
Efficiency Score
  • SAID HEARD
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS LOSS 1.5
  • LIGHTS SIREN ON
  • SIREN OFF SIREN OFF
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS
  • LIGHTS LIGHTS

38
Efficiency Score
  • Scoring system
  • Correctly recognized 1.5
  • Unrecognised 0.5
  • Falsely recognized 0
  • Eff. ((correct 1.5) (unrec. 0.5)) /
    13.5
  • Extreme scores
  • All correct gt Eff. 100
  • All unrecognised gt Eff. 33
  • All falsely recognised gt Eff. 0

39
RANKED EFFICIENCY
40
WINNER
  • Accuracy
  • Configuration C2 accuracy 70.3
  • Efficiency
  • Configuration C2 efficiency 72.4
  • Logical choices
  • Microphone C
  • SR Engine 2

41
WHY LOW ACCURACIES?
  • Speakers SR experience
  • Limited training
  • Training Environment
  • Default settings
  • Microphone and speaker placement
  • SNR
  • Absolute scores not important

42
Outline of Presentation
  • Introduction - What, Why and How?
  • Background
  • Speech Recognition Evaluation Program Software
  • Testing
  • Results and Discussion
  • Conclusion

43
CONCLUSION
  • The main goal of this research was
  • SR engine and microphone combinations
  • Accuracy and efficiency
  • Quantitatively

44
CONCLUSION
  • This research was important in order to
  • Limit distraction
  • Limit frustration

45
CONCLUSION
  • The goal was reached by
  • Evaluating 16 combinations (4 engines x 4 mics)
  • Speech Recognition Evaluation Program (SREP)
  • Simulated
  • Classified
  • Calculated

46
CONCLUSION
  • Configuration C2
  • Most accurate
  • Most efficient

SR ENGINE 2 Microsoft SR Engine 4.0 Telephone mode
47
CURRENT STATUS
  • 9 vehicles on road
  • 300 in production
  • Now support non SAPI 4.0
  • Evaluating new engines

48
MORE INFORMATION
  • www.project54.unh.edu
  • andrew.kun_at_unh.edu
  • brettv_at_unh.edu
Write a Comment
User Comments (0)
About PowerShow.com