Speech Recognition Application - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Speech Recognition Application

Description:

Automatic speech interacting phone directory assistance. Automatic Speech Recognition - Sphinx ... free speech enabled phone directory. Speaker Independent ... – PowerPoint PPT presentation

Number of Views:447
Avg rating:3.0/5.0
Slides: 15
Provided by: Tru3
Category:

less

Transcript and Presenter's Notes

Title: Speech Recognition Application


1
Speech Recognition Application
  • Voice Enabled Phone Directory
  • - Yousef Rabah
  • ???? ???? -

2
Why Speech Enabled Phone Directory
  • Growing Technology
  • Easy Access
  • Mainly used for
  • Educational purposes
  • People with certain Disabilities
  • Mobile use

3
Problem
  • Automatic speech interacting phone directory
    assistance

4
Automatic Speech Recognition - Sphinx
  • Speaker Dependent vs. Independent
  • Acoustic modeling
  • Isolated vs. Continuous
  • HMM Probabilities, Parameters, Training
  • Language Model
  • Unigrams
  • Bigrams P(word2 word1)
  • Phonemes
  • Lexicon Structure
  • ZERO Z IH R OW
  • TWO T UW
  • H A HEIGH H

5
Input / Output
  • 24003 samples in file /usr/local/share/sphinx3/mod
    el/lm/an4/hell.raw
  • INFO live.c(239) live_nfeatvec 13
  • INFO main_live_pretend.c(92) PARTIAL HYP
  • INFO live.c(239) live_nfeatvec 12
  • INFO main_live_pretend.c(92) PARTIAL HYP
    A(2)
  • INFO live.c(239) live_nfeatvec 13
  • INFO main_live_pretend.c(92) PARTIAL HYP
    EIGHTH
  • INFO live.c(239) live_nfeatvec 12
  • INFO main_live_pretend.c(92) PARTIAL HYP
    H
  • INFO live.c(239) live_nfeatvec 13
  • INFO main_live_pretend.c(92) PARTIAL HYP
    H E
  • INFO live.c(239) live_nfeatvec 12
  • INFO main_live_pretend.c(92) PARTIAL HYP
    H E
  • INFO live.c(239) live_nfeatvec 13
  • INFO main_live_pretend.c(92) PARTIAL HYP
    H E L
  • INFO live.c(239) live_nfeatvec 12
  • INFO main_live_pretend.c(92) PARTIAL HYP
    H E L
  • INFO live.c(239) live_nfeatvec 13
  • INFO main_live_pretend.c(92) PARTIAL HYP
    H E L OH
  • Backtrace (null)
  • LatID SFrm EFrm AScr LScr Type
  • 254 0 45 -391470 -74100 -1
  • 594 46 81 -472155 -148846 0 H
  • 1291 82 102 -288621 -148846 0 E
  • 1850 103 126 -235274 -148846 0 L
  • 2599 127 147 -430694 -148846 0 L
  • 2650 148 148 0 -148846 0
  • 0 148 -1818214 -818330 (Total)
  • FWDVIT H E L L (null)

6
Difficulties
  • Hardware issues
  • ASR software issues
  • Letter phonemes
  • Time

7
Solution
4 Stage Process
8
Solution
  • Database (PostgreSQL)
  • Names
  • Phone numbers
  • Fast access

9
Solution
  • Architecture of application
  • db.pm
  • people.pm
  • people.pl
  • record.pl
  • wav_to_raw.pl
  • get_speech.pl
  • display_speech.pm
  • display_speech.pl
  • VEPD.pm
  • VEPD.pl
  • Example
  • PC press space bar before and after you speak
  • User S AH EM
  • PC Decoded as, SAM ?
  • Results 1
  • 1. SAM SMITH 765-973-2145

10
Solution
11
Results
  • A first step towards hands free speech enabled
    phone directory
  • Speaker Independent
  • Applications Features
  • Adding user
  • Retrieving user (via speech)
  • Manual search
  • Viewing current phone directory

12
Possible Future Enhancement
  • ASR enabled for
  • Adding users
  • Phone search
  • Word Recognition (instead of letters)
  • More accurate ASR (as tech. Grows)
  • Graphical outlook (via perl/tk)
  • Communication through VoiceXML

13
Special Thanks
  • To friends and family
  • Jim Rogers
  • Hassan Halta
  • Skylar Thompson
  • Kushboo Goel
  • Rabah family
  • El-Shabab el-taybeh

14
Questions/Comments
Write a Comment
User Comments (0)
About PowerShow.com