Avoiding the Pitfalls of Speech Application Rollouts Through Testing and Production Management

About This Presentation

Title:

Avoiding the Pitfalls of Speech Application Rollouts Through Testing and Production Management

Description:

Avoiding the Pitfalls of Speech Application Rollouts Through Testing and Production Management Rob Edmondson, Senior Field Engineer Empirix, Inc. Overview Why Should ... – PowerPoint PPT presentation

Number of Views:147

Avg rating:3.0/5.0

Slides: 27

Provided by: JGUS

Category:

more less

Transcript and Presenter's Notes

Title: Avoiding the Pitfalls of Speech Application Rollouts Through Testing and Production Management

1
(No Transcript)
2
Avoiding the Pitfalls of Speech Application
Rollouts Through Testing and Production Management

Rob Edmondson, Senior Field Engineer
Empirix, Inc.

3
Overview
You are about to deploy a new call center, with
new PBX, Speech enabled IVR deployed on VXML
architecture, post-routing CTI, and 200 agent
stations with IP phones

What is the customer perceived latency for your
IVR to respond to callers speech inputs?
What is the average host connection latency for
the IVR?
What percentage of callers utterances are
recognized the first time?
What percentage of calls fail to be completed in
the IVR because of application errors?

How many calls fail to be routed to the correct
agent or skill group?
What is the average time it takes for screen pop
to occur?
What percentage of screen pops have missing or
incorrect information?
What percentage of screen pops never happen?
What is the voice quality for the agent and
caller?
What is the impact on other users of your CRM
system?

At 5 Calls/Minute?
At 30 Calls/Minute?
At Maximum call load?
4
Why Should We Care?
5
Business Goals Driving Self-Service
...While Quality Strategies Focused on Agents.
100
80
Handled by Agents
60
Handled by Self-Service
40
20
Source Enterprise Integration Group 2004
0
Utilities
Telecom
Mortgage
Credit Card
Stock/Mutual
Retail Banking
Health Insurance
6
Speech Application Quality Design and Delivery
Quality Evaluation Matrix

Easy to Use
Unpredictable Behavior

Easy to Use
Behaves as Designed

Difficult to Use
Behaves as Designed

Difficult to Use
Unpredictable Behavior

7
Common Questions When Deploying Speech
Design
Delivery

Can I just speechify my DTMF apps?
Should I allow DTMF input?
What voice should we use?
How personal should the application be?
Should I allow barge-in?
Which utterances should I allow for a recognition
state?
How do I handle error conditions?
When do I transfer to an agent?

How do I test speech?
Do I have enough speech/TTS resources?
Do I need to test with different accents?
How do I do usability testing?
Will VoIP impact my speech recognition accuracy?
How do I verify TTS quality?
How do I make sure its working after we go into
production?

8
Speech Testing

Recognition Testing Evaluates recognizer
performance. Callers generate utterances by
talking to the application, using test scripts.
male and female speakers
different dialects
different noise conditions
Accuracy is measured by comparing the
recognition results to a transcription of the
utterances.
barge-in, speaker verification, subscriber
profiles and dynamic grammars, should also be
tested for accuracy with a variety of speakers
and calling conditions

Usability Testing Conducted early in the design
process and is also helpful at this stage to
validate the performance of an application
against the metrics laid out in the requirements
phase

Application Testing
Dialog Traversal creates and executes a series
of test cases to cover all possible paths through
the dialog to verify
that the right prompts are played
each state in the call flow is reached correctly
ensure the universal, error, and help behaviors
are operational
System Load simulates a high in-bound call
volume to ensure that
expected caller capacity can be handled
proper load balancing occurs across the system.

Tuning and Monitoring
Ongoing analysis of real caller interactions.
This occurs during
Pilot deployment (beta)
Post-Deployment
Ongoing Monitoring

Nuance Project Method
Introduction to the Nuance System, v8.5, pg 72
9
Testing During the Lifecycle
Requirements
Usability
Design
Recognition
Implementation
Application
Testing
Performance
Deployment
Tuning
10
Usability Testing A Key to Success
Usability testing is sometimes confused with
quality assurance (QA), but the two are very
different. QA usually measures a products
performance against its specifications. For
example, QA on an automobile would ensure that
the components function as specified, that the
gaps between the doors and the body are within
tolerances, and so forth. QA testing would not
determine whether a vehicle is easy for people to
operate, but usability testing would. In a speech
application, QA ensure that the appropriate
prompts do in fact play at the right times in the
right order. This kind of testing is important,
because designers generally shouldnt assume that
an application will work to spec. QA testing
can tell us a great deal about a systems
functionality. But it cant tell us if the target
population for the application can use it or
will like to use it. - Blade Kottely, The Art
and Business of Speech Recognition, pg. 122
Usability testing is just as important for
simple DTMF applications as it is for complex NL
(natural language) applications. In general, the
more control the user has over the application,
the more testing will be required and the more
valuable this testing will be. The subject is a
complex one, and both designers and developers
are encouraged to develop formal, documented test
plans early in the product life cycle. - Bruce
Balentine and David P. Morgan, How to Build a
Speech Recognition Application, 2nd Edition, pg.
294
11
Recognition Testing - Useful Metrics

First Time Recognition rate
For a known good input prompt, what percentage of
the time is the expected prompt heard back
Timeout and Rejection rates
For timeout and invalid input tests, how often is
the correct behavior observed?
Barge-in detection rate
When barging in at an acceptable time, what
percentage of time is the speech detected
Menu response latency
How long after the end of input utterance does it
take for the next prompt to begin

12
Dialog State Testing Dashboard
Dialog State GetPizzaSize
Error Handling
First Time Recognition Rate
FileName Comments RawData Pct
Large.vce Male 100/100 100
Medium.vce female 98/100 98
Personal.vce Cell phone 96/100 96
Error RawData Pct
Timeout1 50/50 100
Timeout2 50/50 100
Reject1 46/50 92
Reject2 40/46 87
Response Time Data
Min Avg Max
0.3 sec 0.45 sec 1.2 sec
Tester Comments

Dialog state performs very well
Still need to test universal behaviors (Help,
Main Menu)
Used clip nothing.wav for Reject tests -
around 10 of calls came up with Medium instead
of correct rejection

Barge In Success Data
Pause RawData Pct
0.5 0/50 0
2.0 50/50 100
4.0 50/50 100
13
Application Testing
14
Performance Testing - System Overview
Telephony Infrastructure
Agents
Callers
Application Infrastructure
IVR/ Speech Platform
VoiceXML
MRCP
15
Example configuration and vendors
Nuance, Scansoft, IBM, Microsoft, Loquendo,
ASR, TTS
VoiceXML 2.0
MRCP
Web Server
Nortel, Avaya, Genesys, IVB, Edify, IBM, Aspect,
Syntellect, Nuance, VoiceGenie,
BEA, IBM, Sun, Oracle, Microsoft, OpenSource
VoiceXML Platform
CCXML
SIP
Excel, AudioCodes, Voxeo, IVB, Cisco, Genesys,
Avaya,
Call Control/ Media Server
SIP,H.323
Cisco, Avaya, Nortel, VegaStream,
(Gateway)
JTAPI,
T1, E1 PRI,
Network/PBX
CTI Server
Genesys, Avaya, Nortel, Cisco, Apropos,
Avaya, Nortel, Intertel, NEC, Cisco, Siemens, ..
ACD
Avaya, Nortel, Cisco, Apropos, II, Siemens, ..
16
Performance Testing
Load Test Objectives

Application can handle expected load
Find System bottlenecks
Find pre-failure indicators
Understand recovery procedures

Call Rate (CPH) Correct Classification Rate
1000 98
2000 98
3000 98
5100 97
6300 65
6600 45
Considerations

component load tests to isolate specific pieces
Test lab or Production?
Emulate real-world call patterns
Iterative testing allows find and fix
Go beyond what you expect in production

compare recognition rates at increasing load
levels

17
Performance Testing

Key Metrics
Customer perceived latency at each step
The time from end of caller input to the
beginning of the next response, which is dead
air to the caller
Time to Complete the Call (call length)
Transactional Completion Rate
First time recognition rate
All of these metrics relative to call load
Why are these important?
Direct measures of callers quality of experience
Cost implications to the enterprise
Cost of variability
Self service versus assisted help
Quantify an otherwise subjective idea

18
Performance Test Case Study
19
Performance Test Case Study
20
Performance Test Case Study
21
Production Management

Tuning/Monitoring Vendor Tools
Application Monitoring
3rd party tools for device/application monitoring
Proactive call transactions
Key Metrics for Customer Experience
Latencies
Transactional errors
Speech recognition success rates

22
Customer Perceived Latencies
23
Transaction Failures By Time of Day
excludes retry calls
24
Review Common Questions
Design
Delivery

Can I just speechify my DTMF apps?
Should I allow DTMF input?
What voice should we use?
How personal should the application be?
Should I allow barge-in?
Which utterances should I allow for a recognition
state?
How do I handle error conditions?
When do I transfer to an agent?

How do I test speech?
Do I have enough speech/TTS resources?
Do I need to test with different accents?
How do I do usability testing?
Will VoIP impact my speech recognition accuracy?
How do I verify TTS quality?
How do I make sure its working after we go into
production?

25
Review
You are about to deploy a new call center, with
new PBX, Speech enabled IVR deployed on VXML
architecture, post-routing CTI, and 200 agent
stations with IP phones

What is the customer perceived latency for your
IVR to respond to callers speech inputs?
What is the average host connection latency for
the IVR?
What percentage of callers utterances are
recognized the first time?
What percentage of calls fail to be completed in
the IVR because of application errors?

How many calls fail to be routed to the correct
agent or skill group?
What is the average time it takes for screen pop
to occur?
What percentage of screen pops have missing or
incorrect information?
What percentage of screen pops never happen?
What is the voice quality for the agent and
caller?
What is the impact on other users of your CRM
system?

Avoiding the Pitfalls of Speech Application Rollouts Through Testing and Production Management - PowerPoint PPT Presentation

Avoiding the Pitfalls of Speech Application Rollouts Through Testing and Production Management

Avoiding the Pitfalls of Speech Application Rollouts Through Testing and Production Management Rob Edmondson, Senior Field Engineer Empirix, Inc. Overview Why Should ... – PowerPoint PPT presentation