Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths - PowerPoint PPT Presentation

Loading...

PPT – Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths PowerPoint presentation | free to download - id: 81d662-MzhlM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths

Description:

Title: Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths & Weaknesses for Practical Deployment Author: Tim Paek Last modified by – PowerPoint PPT presentation

Number of Views:9
Avg rating:3.0/5.0
Slides: 6
Provided by: TimP180
Learn more at: http://www.ling.helsinki.fi
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths


1
Reinforcement Learning for Spoken Dialogue
Systems Comparing Strengths Weaknesses for
Practical Deployment
  • Tim Paek
  • Microsoft Research
  • Dialogue on Dialogues Workshop 06

2
Reinforcement Learning for SDS
  • Dialogue manager (DM) in spoken dialogue systems
  • Selects actions observations beliefs
  • Typically, hand-crafted knowledge intensive
  • RL seeks to formalize optimize action selection
  • Once dialogue dynamics is represented as Markov
    Decision Process (MDP), derive optimal policy
  • MDP in a nutshell
  • Input tuple (S,A,T,R) ? Output policy
  • Objective Function
  • Value Function
  • To what extent can RL really automate
    hand-crafted DM?
  • Is RL practical for speech application
    development?

3
Strengths Weaknesses
Objective Function Optimization framework Explicitly defined adjustable Can serve as evaluation metric Unclear what dialogues can cannot be modeled using a specifiable objective function Not easily adjusted
Reward Function Overall behavior very sensitive to changes in reward Mostly hand-crafted tuned Not easily adjusted
State Space Transition Function Once modeled as MDP or POMDP, well-studied algorithms exist for deriving policy State space small due to algorithmic limitations Selection is still mostly manual No best practices No domain-independent state variables Markov assumption Not easily adjusted
Policy Guaranteed to be optimal with respect to the data Can function as black box Removes control away from developers Not easily adjusted No theoretical insight
Evaluation Can be rapidly trained and tested with user models Real user behavior may differ Testing on same user model is cheating Hand-crafted policy should be tuned to same objective function
4
Opportunities
Objective Function Open up black boxes and optimize ASR / SLU using same objective function as DM
Reward Function Inverse reinforcement learning Adapt reward function / policy based on user type / behavior (similar to adapting mixed initiative)
State Space Transition Function Learn what state space variables are important for local / long term reward Apply more efficient POMDP methods Identify domain-independent state variables Identify best practices
Policy Online policy learning (explore vs. exploit) Identify domain-independent, reusable error handling mechanisms
Evaluation Close gap between user model and real user behavior
5
Any comments?
About PowerShow.com