Title: Avoiding the Pitfalls of Speech Application Rollouts Through Testing and Production Management
1(No Transcript)
2Avoiding the Pitfalls of Speech Application
Rollouts Through Testing and Production Management
- Rob Edmondson, Senior Field Engineer
- Empirix, Inc.
3Overview
You are about to deploy a new call center, with
new PBX, Speech enabled IVR deployed on VXML
architecture, post-routing CTI, and 200 agent
stations with IP phones
- What is the customer perceived latency for your
IVR to respond to callers speech inputs? - What is the average host connection latency for
the IVR? - What percentage of callers utterances are
recognized the first time? - What percentage of calls fail to be completed in
the IVR because of application errors?
- How many calls fail to be routed to the correct
agent or skill group? - What is the average time it takes for screen pop
to occur? - What percentage of screen pops have missing or
incorrect information? - What percentage of screen pops never happen?
- What is the voice quality for the agent and
caller? - What is the impact on other users of your CRM
system?
At 5 Calls/Minute?
At 30 Calls/Minute?
At Maximum call load?
4Why Should We Care?
5Business Goals Driving Self-Service
...While Quality Strategies Focused on Agents.
100
80
Handled by Agents
60
Handled by Self-Service
40
20
Source Enterprise Integration Group 2004
0
Utilities
Telecom
Mortgage
Credit Card
Stock/Mutual
Retail Banking
Health Insurance
6Speech Application Quality Design and Delivery
Quality Evaluation Matrix
- Easy to Use
- Unpredictable Behavior
- Easy to Use
- Behaves as Designed
- Difficult to Use
- Behaves as Designed
- Difficult to Use
- Unpredictable Behavior
7Common Questions When Deploying Speech
Design
Delivery
- Can I just speechify my DTMF apps?
- Should I allow DTMF input?
- What voice should we use?
- How personal should the application be?
- Should I allow barge-in?
- Which utterances should I allow for a recognition
state? - How do I handle error conditions?
- When do I transfer to an agent?
- How do I test speech?
- Do I have enough speech/TTS resources?
- Do I need to test with different accents?
- How do I do usability testing?
- Will VoIP impact my speech recognition accuracy?
- How do I verify TTS quality?
- How do I make sure its working after we go into
production?
8Speech Testing
- Recognition Testing Evaluates recognizer
performance. Callers generate utterances by
talking to the application, using test scripts. - male and female speakers
- different dialects
- different noise conditions
- Accuracy is measured by comparing the
recognition results to a transcription of the
utterances. - barge-in, speaker verification, subscriber
profiles and dynamic grammars, should also be
tested for accuracy with a variety of speakers
and calling conditions
Usability Testing Conducted early in the design
process and is also helpful at this stage to
validate the performance of an application
against the metrics laid out in the requirements
phase
- Application Testing
- Dialog Traversal creates and executes a series
of test cases to cover all possible paths through
the dialog to verify - that the right prompts are played
- each state in the call flow is reached correctly
- ensure the universal, error, and help behaviors
are operational - System Load simulates a high in-bound call
volume to ensure that - expected caller capacity can be handled
- proper load balancing occurs across the system.
- Tuning and Monitoring
- Ongoing analysis of real caller interactions.
This occurs during - Pilot deployment (beta)
- Post-Deployment
- Ongoing Monitoring
Nuance Project Method
Introduction to the Nuance System, v8.5, pg 72
9Testing During the Lifecycle
Requirements
Usability
Design
Recognition
Implementation
Application
Testing
Performance
Deployment
Tuning
10Usability Testing A Key to Success
Usability testing is sometimes confused with
quality assurance (QA), but the two are very
different. QA usually measures a products
performance against its specifications. For
example, QA on an automobile would ensure that
the components function as specified, that the
gaps between the doors and the body are within
tolerances, and so forth. QA testing would not
determine whether a vehicle is easy for people to
operate, but usability testing would. In a speech
application, QA ensure that the appropriate
prompts do in fact play at the right times in the
right order. This kind of testing is important,
because designers generally shouldnt assume that
an application will work to spec. QA testing
can tell us a great deal about a systems
functionality. But it cant tell us if the target
population for the application can use it or
will like to use it. - Blade Kottely, The Art
and Business of Speech Recognition, pg. 122
Usability testing is just as important for
simple DTMF applications as it is for complex NL
(natural language) applications. In general, the
more control the user has over the application,
the more testing will be required and the more
valuable this testing will be. The subject is a
complex one, and both designers and developers
are encouraged to develop formal, documented test
plans early in the product life cycle. - Bruce
Balentine and David P. Morgan, How to Build a
Speech Recognition Application, 2nd Edition, pg.
294
11Recognition Testing - Useful Metrics
- First Time Recognition rate
- For a known good input prompt, what percentage of
the time is the expected prompt heard back - Timeout and Rejection rates
- For timeout and invalid input tests, how often is
the correct behavior observed? - Barge-in detection rate
- When barging in at an acceptable time, what
percentage of time is the speech detected - Menu response latency
- How long after the end of input utterance does it
take for the next prompt to begin
12Dialog State Testing Dashboard
Dialog State GetPizzaSize
Error Handling
First Time Recognition Rate
FileName Comments RawData Pct
Large.vce Male 100/100 100
Medium.vce female 98/100 98
Personal.vce Cell phone 96/100 96
Error RawData Pct
Timeout1 50/50 100
Timeout2 50/50 100
Reject1 46/50 92
Reject2 40/46 87
Response Time Data
Min Avg Max
0.3 sec 0.45 sec 1.2 sec
Tester Comments
- Dialog state performs very well
- Still need to test universal behaviors (Help,
Main Menu) - Used clip nothing.wav for Reject tests -
around 10 of calls came up with Medium instead
of correct rejection
Barge In Success Data
Pause RawData Pct
0.5 0/50 0
2.0 50/50 100
4.0 50/50 100
13Application Testing
14Performance Testing - System Overview
Telephony Infrastructure
Agents
Callers
Application Infrastructure
IVR/ Speech Platform
VoiceXML
MRCP
15Example configuration and vendors
Nuance, Scansoft, IBM, Microsoft, Loquendo,
ASR, TTS
VoiceXML 2.0
MRCP
Web Server
Nortel, Avaya, Genesys, IVB, Edify, IBM, Aspect,
Syntellect, Nuance, VoiceGenie,
BEA, IBM, Sun, Oracle, Microsoft, OpenSource
VoiceXML Platform
CCXML
SIP
Excel, AudioCodes, Voxeo, IVB, Cisco, Genesys,
Avaya,
Call Control/ Media Server
SIP,H.323
Cisco, Avaya, Nortel, VegaStream,
(Gateway)
JTAPI,
T1, E1 PRI,
Network/PBX
CTI Server
Genesys, Avaya, Nortel, Cisco, Apropos,
Avaya, Nortel, Intertel, NEC, Cisco, Siemens, ..
ACD
Avaya, Nortel, Cisco, Apropos, II, Siemens, ..
16Performance Testing
Load Test Objectives
- Application can handle expected load
- Find System bottlenecks
- Find pre-failure indicators
- Understand recovery procedures
Call Rate (CPH) Correct Classification Rate
1000 98
2000 98
3000 98
5100 97
6300 65
6600 45
Considerations
- component load tests to isolate specific pieces
- Test lab or Production?
- Emulate real-world call patterns
- Iterative testing allows find and fix
- Go beyond what you expect in production
- compare recognition rates at increasing load
levels
17Performance Testing
- Key Metrics
- Customer perceived latency at each step
- The time from end of caller input to the
beginning of the next response, which is dead
air to the caller - Time to Complete the Call (call length)
- Transactional Completion Rate
- First time recognition rate
- All of these metrics relative to call load
- Why are these important?
- Direct measures of callers quality of experience
- Cost implications to the enterprise
- Cost of variability
- Self service versus assisted help
- Quantify an otherwise subjective idea
18Performance Test Case Study
19Performance Test Case Study
20Performance Test Case Study
21Production Management
- Tuning/Monitoring Vendor Tools
- Application Monitoring
- 3rd party tools for device/application monitoring
- Proactive call transactions
- Key Metrics for Customer Experience
- Latencies
- Transactional errors
- Speech recognition success rates
22Customer Perceived Latencies
23Transaction Failures By Time of Day
excludes retry calls
24Review Common Questions
Design
Delivery
- Can I just speechify my DTMF apps?
- Should I allow DTMF input?
- What voice should we use?
- How personal should the application be?
- Should I allow barge-in?
- Which utterances should I allow for a recognition
state? - How do I handle error conditions?
- When do I transfer to an agent?
- How do I test speech?
- Do I have enough speech/TTS resources?
- Do I need to test with different accents?
- How do I do usability testing?
- Will VoIP impact my speech recognition accuracy?
- How do I verify TTS quality?
- How do I make sure its working after we go into
production?
25Review
You are about to deploy a new call center, with
new PBX, Speech enabled IVR deployed on VXML
architecture, post-routing CTI, and 200 agent
stations with IP phones
- What is the customer perceived latency for your
IVR to respond to callers speech inputs? - What is the average host connection latency for
the IVR? - What percentage of callers utterances are
recognized the first time? - What percentage of calls fail to be completed in
the IVR because of application errors?
- How many calls fail to be routed to the correct
agent or skill group? - What is the average time it takes for screen pop
to occur? - What percentage of screen pops have missing or
incorrect information? - What percentage of screen pops never happen?
- What is the voice quality for the agent and
caller? - What is the impact on other users of your CRM
system?
At 5 Calls/Minute?
At 30 Calls/Minute?
At Maximum call load?
26- Rob Edmondson
- Empirix, Inc.
- redmondson_at_empirix.com
- 916-781-9873