Abstract
Satisfying callers’ goals and expectations is the primary objective of every customer care contact center. However, quantifying how successfully interactive voice response (IVR) systems satisfy callers’ goals and expectations has historically proven to be a most difficult task. Such difficulties in assessing automated customer care contact centers can be traced to two assumptions made by most stakeholders in the call center industry:
-
1.
Performance can be effectively measured by deriving statistics from call logs; and
-
2.
The overall performance of an IVR can be expressed by a single numeric value.
This chapter introduces an IVR assessment framework which confronts these misguided assumptions head on and shows how they can be overcome. Our new framework for measuring the performance of IVR-driven call centers incorporates objective and subjective measures. Using the concepts of hidden and observable measures, we demonstrate in this chapter how it is possible to produce reliable and meaningful performance metrics which provide insights into multiple aspects of IVR performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
During a side conversation at the AVIxD workshop held August 2009 in New York, Mark Stallings reported about a GI stationed in Iraq that called an American IVR while rounds of explosions tore through the air.
- 2.
To make things even more complicated, there are cases where the distinction between observable and hidden becomes fuzzy: Dialog systems may acknowledge that they do not know the facts with certainty and, therefore, work with beliefs, i.e., with probability distributions over the observable facts. For instance, a system may ask a caller for his first name, but instead of accepting the first best hypothesis of the speech recognizer (e.g., “Bob”), it keeps the entire n-best list and the associated confidence scores (e.g., “Bob”: 50%; “Rob”: 20%; “Snob”: 10%, etc.). Such spoken dialog systems are referred to as belief systems [2, 28, 29].
- 3.
This can be crucial considering that speech recognition applied to real-world spoken dialog systems can produce word error rates of 30% or higher even after careful tuning [5].
- 4.
See Sect. 7.3 for more details on grammars used in IVRs.
- 5.
“A 19 minute call takes 19 minutes to listen to” is one of ISCA and IEEE fellow Roberto Pieraccini’s famous aphorisms.
- 6.
To give an example: We recently heard a call where the caller said “Cannot send e-mail” in a call-routing application and was forwarded to an automatic Internet troubleshooting application. This app took care of the problem and supposedly fixed it by successfully walking the caller through the steps of sending an e-mail to himself. Thereafter, the caller was asked whether there was anything else he needed help with, and he said “yes.” He was then connected back to the call router where he was asked to describe the reason for his call, and he said “Cannot send e-mail.” Instead of understanding that the caller’s problem was obviously not fixed by the Internet troubleshooting application during the first turn, he was routed there again and went through the same steps as he did the first time. Eventually, the caller requested human-agent assistance, understanding that he was caught in an infinite loop. Here, the caller’s opt-out was directly related to the app’s logical flaw.
References
Acomb, K., Bloom, J., Dayanidhi, K., Hunter, P., Krogh, P., Levin, E., and Pieraccini, R. (2007). Technical Support Dialog Systems: Issues, Problems, and Solutions. In Proc. of the HLT-NAACL, Rochester, USA.
Bohus, D. and Rudnicky, A. (2005). Constructing Accurate Beliefs in Spoken Dialog Systems. In Proc. of the ASRU, San Juan, Puerto Rico.
Danieli, M. and Gerbino, E. (1995). Metrics for Evaluating Dialogue Strategies in a Spoken Language System. In Proc. of the AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation, Torino, Italy.
Evanini, K., Hunter, P., Liscombe, J., Suendermann, D., Dayanidhi, K., and Pieraccini, R. (2008). Caller Experience: A Method for Evaluating Dialog Systems and Its Automatic Prediction. In Proc. of the SLT, Goa, India.
Evanini, K., Suendermann, D., and Pieraccini, R. (2007). Call Classification for Automated Troubleshooting on Large Corpora. In Proc. of the ASRU, Kyoto, Japan.
Gorin, A., Riccardi, G., and Wright, J. (1997). How May I Help You? Speech Communication, 23(1/2).
Hone, K. and Graham, R. (2000). Towards a Tool for the Subjective Assessment of Speech System Interfaces (SASSI). Natural Language Engineering, 6(34).
Kamm, C., Litman, D., and Walker, M. (1998). From Novice to Expert: The Effect of Tutorials on User Expertise with Spoken Dialogue Systems. In Proc. of the ICSLP, Sydney, Australia.
Knight, S., Gorrell, G., Rayner, M., Milward, D., Koeling, R., and Lewin, I. (2001). Comparing Grammar-Based and Robust Approaches to Speech Understanding: A Case Study. In Proc. of the Eurospeech, Aalborg, Denmark.
Levin, E. and Pieraccini, R. (2006). Value-Based Optimal Decision for Dialog Systems. In Proc. of the SLT, Palm Beach, Aruba.
McGlashan, S., Burnett, D., Carter, J., Danielsen, P., Ferrans, J., Hunt, A., Lucas, B., Porter, B., Rehor, K., and Tryphonas, S. (2004). VoiceXML 2.0. W3C Recommendation. http://www.w3.org/TR/2004/REC-voicexml20-20040316.
Melin, H., Sandell, A., and Ihse, M. (2001). CTT-Bank: A Speech Controlled Telephone Banking System – An Initial Evaluation. Technical report, KTH, Stockholm, Sweden.
Merriam-Webster (1998). Merriam-Webster’s Collegiate Dictionary. Merriam-Webster, Springfield, USA.
Minker, W. and Bennacef, S. (2004). Speech and Human-Machine Dialog. Springer, New York, USA.
Noeth, E., Boros, M., Fischer, J., Gallwitz, F., Haas, J., Huber, R., Niemann, H., Stemmer, G., and Warnke, V. (2001). Research Issues for the Next Generation Spoken Dialogue Systems Revisited. In Proc. of the TSD, Zelezna Ruda, Czech Republic.
Papineni, K., Roukos, S., Ward, T., and Zhu, W. J. (2002). BLEU: A Method for Automatic Evaluation of Machine Translation. In Proc. of the ACL, Philadelphia, USA.
Polifroni, J., Hirschman, L., Seneff, S., and Zue, V. (1992). Experiments in Evaluating Interactive Spoken Language Systems. In Proc. of the DARPA Workshop on Speech and Natural Language, Harriman, USA.
Quinlan, J. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco, USA.
Rabiner, L. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. of the IEEE, 77(2).
Raux, A., Langner, B., Black, A., and Eskenazi, M. (2005). Let’s Go Public! Taking a Spoken Dialog System to the Real World. In Proc. of the Interspeech, Lisbon, Portugal.
Shriberg, E., Wade, E., and Prince, P. (1992). Human-Machine Problem Solving Using Spoken Language Systems (SLS): Factors Affecting Performance and User Satisfaction. In Proc. of the DARPA Workshop on Speech and Natural Language, Harriman, USA.
Suendermann, D., Hunter, P., and Pieraccini, R. (2008a). Call Classification with Hundreds of Classes and Hundred Thousands of Training Utterances and No Target Domain Data. In Proc. of the PIT, Kloster Irsee, Germany.
Suendermann, D., Liscombe, J., Dayanidhi, K., and Pieraccini, R. (2009a). A Handsome Set of Metrics to Measure Utterance Classification Performance in Spoken Dialog Systems. In Proc. of the SIGdial Workshop on Discourse and Dialogue, London, UK.
Suendermann, D., Liscombe, J., and Pieraccini, R. (2010). How to Drink from a Fire Hose. One Person can Annoscribe 693 Thousand Utterances in One Month. In Proc. of the SIGDIL, 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Tokyo, Japan.
Suendermann, D., Liscombe, J., Evanini, K., Dayanidhi, K., and Pieraccini, R. (2008b). C5. In Proc. of the SLT, Goa, India.
Suendermann, D., Liscombe, J., Evanini, K., Dayanidhi, K., and Pieraccini, R. (2009c). From Rule-Based to Statistical Grammars: Continuous Improvement of Large-Scale Spoken Dialog Systems. In Proc. of the ICASSP, Taipei, Taiwan.
Williams, J. (2006). Partially Observable Markov Decision Processes for Spoken Dialogue Management. PhD thesis, Cambridge University, Cambridge, UK.
Williams, J. (2008). Exploiting the ASR N-Best by Tracking Multiple Dialog State Hypotheses. In Proc. of the Interspeech, Brisbane, Australia.
Young, S., Schatzmann, J., Weilhammer, K., and Ye, H. (2007). The Hidden Information State Approach to Dialog Management. In Proc. of the ICASSP, Hawaii, USA.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Suendermann, D., Liscombe, J., Pieraccini, R., Evanini, K. (2010). “How am I Doing?”: A New Framework to Effectively Measure the Performance of Automated Customer Care Contact Centers. In: Neustein, A. (eds) Advances in Speech Recognition. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-5951-5_7
Download citation
DOI: https://doi.org/10.1007/978-1-4419-5951-5_7
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5950-8
Online ISBN: 978-1-4419-5951-5
eBook Packages: EngineeringEngineering (R0)