Avoiding the Pitfalls of Speech Application Rollouts Through Testing and Production Management - PowerPoint PPT Presentation

About This Presentation
Title:

Avoiding the Pitfalls of Speech Application Rollouts Through Testing and Production Management

Description:

Avoiding the Pitfalls of Speech Application Rollouts Through Testing and Production Management Rob Edmondson, Senior Field Engineer Empirix, Inc. Overview Why Should ... – PowerPoint PPT presentation

Number of Views:147
Avg rating:3.0/5.0
Slides: 27
Provided by: JGUS
Category:

less

Transcript and Presenter's Notes

Title: Avoiding the Pitfalls of Speech Application Rollouts Through Testing and Production Management


1
(No Transcript)
2
Avoiding the Pitfalls of Speech Application
Rollouts Through Testing and Production Management
  • Rob Edmondson, Senior Field Engineer
  • Empirix, Inc.

3
Overview
You are about to deploy a new call center, with
new PBX, Speech enabled IVR deployed on VXML
architecture, post-routing CTI, and 200 agent
stations with IP phones
  • What is the customer perceived latency for your
    IVR to respond to callers speech inputs?
  • What is the average host connection latency for
    the IVR?
  • What percentage of callers utterances are
    recognized the first time?
  • What percentage of calls fail to be completed in
    the IVR because of application errors?
  • How many calls fail to be routed to the correct
    agent or skill group?
  • What is the average time it takes for screen pop
    to occur?
  • What percentage of screen pops have missing or
    incorrect information?
  • What percentage of screen pops never happen?
  • What is the voice quality for the agent and
    caller?
  • What is the impact on other users of your CRM
    system?

At 5 Calls/Minute?
At 30 Calls/Minute?
At Maximum call load?
4
Why Should We Care?
5
Business Goals Driving Self-Service
...While Quality Strategies Focused on Agents.
100
80
Handled by Agents
60
Handled by Self-Service
40
20
Source Enterprise Integration Group 2004
0
Utilities
Telecom
Mortgage
Credit Card
Stock/Mutual
Retail Banking
Health Insurance
6
Speech Application Quality Design and Delivery
Quality Evaluation Matrix

  • Easy to Use
  • Unpredictable Behavior
  • Easy to Use
  • Behaves as Designed
  • Difficult to Use
  • Behaves as Designed
  • Difficult to Use
  • Unpredictable Behavior

7
Common Questions When Deploying Speech
Design
Delivery
  • Can I just speechify my DTMF apps?
  • Should I allow DTMF input?
  • What voice should we use?
  • How personal should the application be?
  • Should I allow barge-in?
  • Which utterances should I allow for a recognition
    state?
  • How do I handle error conditions?
  • When do I transfer to an agent?
  • How do I test speech?
  • Do I have enough speech/TTS resources?
  • Do I need to test with different accents?
  • How do I do usability testing?
  • Will VoIP impact my speech recognition accuracy?
  • How do I verify TTS quality?
  • How do I make sure its working after we go into
    production?

8
Speech Testing
  • Recognition Testing Evaluates recognizer
    performance. Callers generate utterances by
    talking to the application, using test scripts.
  • male and female speakers
  • different dialects
  • different noise conditions
  • Accuracy is measured by comparing the
    recognition results to a transcription of the
    utterances.
  • barge-in, speaker verification, subscriber
    profiles and dynamic grammars, should also be
    tested for accuracy with a variety of speakers
    and calling conditions

Usability Testing Conducted early in the design
process and is also helpful at this stage to
validate the performance of an application
against the metrics laid out in the requirements
phase
  • Application Testing
  • Dialog Traversal creates and executes a series
    of test cases to cover all possible paths through
    the dialog to verify
  • that the right prompts are played
  • each state in the call flow is reached correctly
  • ensure the universal, error, and help behaviors
    are operational
  • System Load simulates a high in-bound call
    volume to ensure that
  • expected caller capacity can be handled
  • proper load balancing occurs across the system.
  • Tuning and Monitoring
  • Ongoing analysis of real caller interactions.
    This occurs during
  • Pilot deployment (beta)
  • Post-Deployment
  • Ongoing Monitoring

Nuance Project Method
Introduction to the Nuance System, v8.5, pg 72
9
Testing During the Lifecycle
Requirements
Usability
Design
Recognition
Implementation
Application
Testing
Performance
Deployment
Tuning
10
Usability Testing A Key to Success
Usability testing is sometimes confused with
quality assurance (QA), but the two are very
different. QA usually measures a products
performance against its specifications. For
example, QA on an automobile would ensure that
the components function as specified, that the
gaps between the doors and the body are within
tolerances, and so forth. QA testing would not
determine whether a vehicle is easy for people to
operate, but usability testing would. In a speech
application, QA ensure that the appropriate
prompts do in fact play at the right times in the
right order. This kind of testing is important,
because designers generally shouldnt assume that
an application will work to spec. QA testing
can tell us a great deal about a systems
functionality. But it cant tell us if the target
population for the application can use it or
will like to use it. - Blade Kottely, The Art
and Business of Speech Recognition, pg. 122
Usability testing is just as important for
simple DTMF applications as it is for complex NL
(natural language) applications. In general, the
more control the user has over the application,
the more testing will be required and the more
valuable this testing will be. The subject is a
complex one, and both designers and developers
are encouraged to develop formal, documented test
plans early in the product life cycle. - Bruce
Balentine and David P. Morgan, How to Build a
Speech Recognition Application, 2nd Edition, pg.
294
11
Recognition Testing - Useful Metrics
  • First Time Recognition rate
  • For a known good input prompt, what percentage of
    the time is the expected prompt heard back
  • Timeout and Rejection rates
  • For timeout and invalid input tests, how often is
    the correct behavior observed?
  • Barge-in detection rate
  • When barging in at an acceptable time, what
    percentage of time is the speech detected
  • Menu response latency
  • How long after the end of input utterance does it
    take for the next prompt to begin

12
Dialog State Testing Dashboard
Dialog State GetPizzaSize
Error Handling
First Time Recognition Rate
FileName Comments RawData Pct
Large.vce Male 100/100 100
Medium.vce female 98/100 98
Personal.vce Cell phone 96/100 96
Error RawData Pct
Timeout1 50/50 100
Timeout2 50/50 100
Reject1 46/50 92
Reject2 40/46 87
Response Time Data
Min Avg Max
0.3 sec 0.45 sec 1.2 sec
Tester Comments
  • Dialog state performs very well
  • Still need to test universal behaviors (Help,
    Main Menu)
  • Used clip nothing.wav for Reject tests -
    around 10 of calls came up with Medium instead
    of correct rejection

Barge In Success Data
Pause RawData Pct
0.5 0/50 0
2.0 50/50 100
4.0 50/50 100
13
Application Testing
14
Performance Testing - System Overview
Telephony Infrastructure
Agents
Callers
Application Infrastructure
IVR/ Speech Platform
VoiceXML
MRCP
15
Example configuration and vendors
Nuance, Scansoft, IBM, Microsoft, Loquendo,
ASR, TTS
VoiceXML 2.0
MRCP
Web Server
Nortel, Avaya, Genesys, IVB, Edify, IBM, Aspect,
Syntellect, Nuance, VoiceGenie,
BEA, IBM, Sun, Oracle, Microsoft, OpenSource
VoiceXML Platform
CCXML
SIP
Excel, AudioCodes, Voxeo, IVB, Cisco, Genesys,
Avaya,
Call Control/ Media Server
SIP,H.323
Cisco, Avaya, Nortel, VegaStream,
(Gateway)
JTAPI,
T1, E1 PRI,
Network/PBX
CTI Server
Genesys, Avaya, Nortel, Cisco, Apropos,
Avaya, Nortel, Intertel, NEC, Cisco, Siemens, ..
ACD
Avaya, Nortel, Cisco, Apropos, II, Siemens, ..
16
Performance Testing
Load Test Objectives
  • Application can handle expected load
  • Find System bottlenecks
  • Find pre-failure indicators
  • Understand recovery procedures

Call Rate (CPH) Correct Classification Rate
1000 98
2000 98
3000 98
5100 97
6300 65
6600 45
Considerations
  • component load tests to isolate specific pieces
  • Test lab or Production?
  • Emulate real-world call patterns
  • Iterative testing allows find and fix
  • Go beyond what you expect in production
  • compare recognition rates at increasing load
    levels

17
Performance Testing
  • Key Metrics
  • Customer perceived latency at each step
  • The time from end of caller input to the
    beginning of the next response, which is dead
    air to the caller
  • Time to Complete the Call (call length)
  • Transactional Completion Rate
  • First time recognition rate
  • All of these metrics relative to call load
  • Why are these important?
  • Direct measures of callers quality of experience
  • Cost implications to the enterprise
  • Cost of variability
  • Self service versus assisted help
  • Quantify an otherwise subjective idea

18
Performance Test Case Study
19
Performance Test Case Study
20
Performance Test Case Study
21
Production Management
  • Tuning/Monitoring Vendor Tools
  • Application Monitoring
  • 3rd party tools for device/application monitoring
  • Proactive call transactions
  • Key Metrics for Customer Experience
  • Latencies
  • Transactional errors
  • Speech recognition success rates

22
Customer Perceived Latencies
23
Transaction Failures By Time of Day
excludes retry calls
24
Review Common Questions
Design
Delivery
  • Can I just speechify my DTMF apps?
  • Should I allow DTMF input?
  • What voice should we use?
  • How personal should the application be?
  • Should I allow barge-in?
  • Which utterances should I allow for a recognition
    state?
  • How do I handle error conditions?
  • When do I transfer to an agent?
  • How do I test speech?
  • Do I have enough speech/TTS resources?
  • Do I need to test with different accents?
  • How do I do usability testing?
  • Will VoIP impact my speech recognition accuracy?
  • How do I verify TTS quality?
  • How do I make sure its working after we go into
    production?

25
Review
You are about to deploy a new call center, with
new PBX, Speech enabled IVR deployed on VXML
architecture, post-routing CTI, and 200 agent
stations with IP phones
  • What is the customer perceived latency for your
    IVR to respond to callers speech inputs?
  • What is the average host connection latency for
    the IVR?
  • What percentage of callers utterances are
    recognized the first time?
  • What percentage of calls fail to be completed in
    the IVR because of application errors?
  • How many calls fail to be routed to the correct
    agent or skill group?
  • What is the average time it takes for screen pop
    to occur?
  • What percentage of screen pops have missing or
    incorrect information?
  • What percentage of screen pops never happen?
  • What is the voice quality for the agent and
    caller?
  • What is the impact on other users of your CRM
    system?

At 5 Calls/Minute?
At 30 Calls/Minute?
At Maximum call load?
26
  • Rob Edmondson
  • Empirix, Inc.
  • redmondson_at_empirix.com
  • 916-781-9873
Write a Comment
User Comments (0)
About PowerShow.com