Spoken Dialogue Systems - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Spoken Dialogue Systems

Description:

Knowledge-based RS (Burke, 2001, 2002) ... CF in knowledge-based RS (Burke, 2002) ... Utility-based, knowledge-based, demographic. CF: What items to rate? ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 30
Provided by: mihair
Category:

less

Transcript and Presenter's Notes

Title: Spoken Dialogue Systems


1
(Spoken) Dialogue Systems
2
Dialogue structure
  • Dialogue Acts (shallow view)
  • Austin actions, Searle speech acts, Dialogue Acts
  • DAMSL (Core and Allen, 1997)
  • Predicting Dialogue Acts (Stolke et al. 2000)
  • Automated discourse analysis
  • Grosz Sidner Theory
  • Dialogue Managers
  • Finite state (McTear 1998)
  • Form-based (Bohus and Rudnicky 2003)
  • Plan-based (Larsson Traum, 2000 Rich et al.,
    2001)
  • Probabilistic (Horvitz Paek, 1999)

3
Dialogue strategy
  • Dialogue as Markov Decision Process
  • State space, cost function, parameters
  • Reinforcement Learning
  • Learn entire dialogue structure (Levin,
    Pieraccini, Eckert, 2000)
  • Learn initiative and confirmation strategy
    (Scheffler Young, 2002 Singh, Litman, Kearns,
    Walker, 2002)
  • Issues
  • State space design
  • Simulated user
  • Cost function
  • No user adaptation

4
Dialogue strategy (continued)
  • Other approaches
  • Utility based (Paek Horvitz, 2004)
  • Minimize support cost
  • Dempster-Shafer theory of evidence (Chu-Carroll,
    2000)
  • Value of Information (Horvitz Paek, 1999)

5
Errors in interaction
  • Channel Errors
  • Detection (Litman, Hirschberg, Swerts, 2000)
  • Recovery
  • Modality switch (Oviatt et al., 2000)
  • Speak-and-spell (Filisko Seneff, 2004)
  • Task errors
  • Detection Recovery
  • System mistakes (Horvitz Paek, 1999)
  • User mistakes (Garland, Lesh, Rich, 2003)

6
Issues in SDS
  • Interacting information sources
  • Speech repairs (Heeman Allen, 1999)
  • Dialogue act tagging (Stolcke et al., 2000)
  • Speak and spell (Filisko Seneff, 2004)
  • Portability
  • Hub-based architectures (Allen et al., 2000
    Pellom, Ward, Pradhan, 2000)
  • Task independent components/models
  • PARADISE framework (Walker et al., 2000 Walker
    et al., 2001)
  • Dialogue managers (Bohus Rudnicky, 2003)
  • Monologue trained components (speech repairs)

7
Issues in SDS (continued)
  • Computational issues
  • Limits the complexity of models used
  • Buying time
  • Semi-supervised offline analysis of system
    behavior
  • Standards
  • Tools

8
Theoretical models for NLP
9
Maximum Entropy
  • Introduction
  • Constraints feature functions
  • Assume nothing else principle
  • Generalized Iterative Scaling algorithm
  • Joint and conditional distributions

10
Maximum Entropy (continued)
  • Advantages
  • Combine multiple knowledge sources
  • Local
  • Word prefix, suffix, capitalization (POS -
    (Ratnaparkhi, 1996))
  • Word POS, POS class, suffix (WSD - (Chao Dyer,
    2002))
  • Token prefix, suffix, capitalization,
    abbreviation (Sentence Boundary - (Reynar
    Ratnaparkhi, 1997))
  • Global
  • N-grams (Rosenfeld, 1997)
  • Word window
  • Document title (Pakhomov, 2002)
  • Structurally related words (Chao Dyer, 2002)
  • Sentence length, conventional lexicon (Och Ney,
    2002)
  • Combine dependent knowledge sources

11
Maximum Entropy (continued)
  • Advantages
  • Add additional knowledge sources
  • Implicit smoothing
  • Disadvantages
  • Computational
  • Expected value at each iteration
  • Normalizing constant
  • Overfitting
  • Feature selection
  • Cutoffs
  • Basic Feature Selection (Berger et al., 1996)

12
Bayesian Belief Networks
  • Take advantage of independencies
  • Network structure design
  • Hand coded
  • DA tagging (Keizer, Akker, Nijholt, 2002)
  • Dialogue modeling (Wai, Meng, Pieraccini, 2001
    Horvitz Paek, 1999)
  • Argument generation (Zukerman, McConachy, Korb,
    1998)
  • Human sentence processing (Narayanan Jurafsky,
    1998)
  • External resources
  • WordNet Selectional Restriction, WSD, QA
  • Adjacency pairs (Galley, McKeown, Hirschberg,
    Shriberg, 2004)
  • Learned from data

13
BBN (continued)
  • Parameters
  • Hand-coded
  • Explain away effect (noisy-OR nodes)
  • Learned from data
  • Computational issues
  • Restrict network size
  • Focusing mechanism (Zukerman et al., 1998)

14
Hidden Markov Models
  • Introduction
  • Extensions
  • Nth order HMMs
  • Time duration HMMs, empty observation

15
HMMs (continued)
  • State space design
  • Classic view
  • POS tags (Brants, 2000)
  • Word classes (Law Chan, 1996)
  • Sentence words (Vogel, Ney, Tillmann, 1996)
  • Topic (Barzilay Lee, 2004)
  • Structural states
  • Text Chunking (Skut Brants, 1998)
  • Named Entity Recognition (Bikel et al., 1999
    Zhou Su, 2002)

16
HMMs (continued)
  • Transition and observation probabilities
  • Hand coded
  • Deterministic for structural states
  • Learned from data
  • Back-off smoothing deleted interpolation
  • Smoothed counts
  • Use sub-models
  • Unknown observations
  • Word suffix (Skut Brants, 1998)
  • Special unknown observation (Bikel et al., 1999)
  • Decoding
  • Viterbi versus Beam-search

17
Extensions - MEMM
  • HMMS
  • Generative model
  • Models joint distribution
  • Can not handle multiple non-independent features
    of the current observation
  • Maximum Entropy Markov Models
  • Conditional model
  • Models conditional distribution
  • Can handle multiple non-independent features of
    the current observation
  • Applications
  • Text segmentation (McCallum, Freitag, Pereira
    2000)
  • Stateless ME lt HMM lt Features HMM lt MEMM
  • Restore capitalization (Chelba and Acero, 2004)

18
Extensions - CRF
  • MEMM have the Label Bias Problem
  • Solution Conditional Random Fields (Lafferty,
    McCallum, Pereira, 2001)
  • Condition on the entire observation sequence
  • Undirected graphical model


19
Extensions CRF (continued)
  • Applications
  • POS (Lafferty, McCallum, Pereira, 2001)
  • Information Extraction (research papers) (Peng,
    McCallum 2004)
  • Shallow Parsing (NP chunking) (Sha, Pereira 2003)
  • Relational Markov Networks (Taskar, Abbeel,
    Koller 2002)
  • Extension of CRF
  • Model multiple type of entities and the
    relations between them
  • Collective classification instead of individual
    classification

20
Recommendation Systems
21
Introduction
  • Automatic information filtering
  • Active user, Active item
  • Ratings
  • RS vs. Information Retrieval
  • Classification
  • Item-centered
  • Memory-based
  • Model-based
  • User-centered
  • Collaborative filtering (memory-based,
    model-based)
  • Demographic
  • Hybrid

22
Item-centered RS
  • Memory-based
  • Neighborhood (of items) formation
  • Cosine similarity (Billsus Pazzani, 2000),
    adjusted cosine
  • Correlation similarity (Pearson)
  • Combining neighborhood (items) ratings
  • Weighted sum (weightsimilarity)
  • Linear regression (Sarwar et al., 2001)
  • Model-based
  • Mean-vector (Pazzani, 1999 Good et al., 1999)
  • Naïve Bayes (Billsus Pazzani, 2000), Ripper
    (Good et al., 1999)
  • Utility-based (Faltings et al. 2004 Jameson et
    al., 1995 Linden et al., 1997)
  • Knowledge-based RS (Burke, 2001, 2002)

Content based
23
User-centered RS
  • Collaborative filtering (CF) (Goldberg et al.,
    1992)
  • Memory-based
  • Neighborhood (of users) formation
  • Cosine similarity, Pearson coefficient
  • Default voting, Inverse user frequency (Breese et
    al., 1998)
  • Combining neighborhood (users) ratings
  • Weighted sum
  • Significance weighting, rating normalization
    (Herlocker et al., 1999)
  • Case amplification (Breese et al., 1998)
  • Computational cost

24
User-centered RS (continued)
  • Collaborative filtering (continued)
  • Model-based
  • BBN (Breese et al., 1998)
  • Clustering users (Breese et al., 1998),
    usersitems (Ungar Foster, 1998)
  • Cast as a machine learning problem (Billsus
    Pazzani, 1998)
  • Demographic
  • Demographic information (Krulwich, 1997)
  • Users webpage (Pazzani, 1999)

25
Hybrid RS
  • Complementarity between CF and content-based RS
  • Complex ratings, new item problem, serendipitous
    predictions
  • Hybrid RS
  • Pseudo users Filterbots
  • General (Sarwar et al., 1998)
  • Personal (Good et al., 1999) mean-vector based
  • Collaboration via content (Pazzani, 1999)
  • Item-similarity (O'Sullivan et al. 2004)
  • CF in knowledge-based RS (Burke, 2002)
  • Latent variable models (Schein, Popescul, Ungar,
    Pennock, 2002)
  • Ensemble methods (Pazzani, 1999)

26
Issues in RS
  • Cold-start problem
  • New user problem
  • Utility-based, knowledge-based, demographic
  • CF What items to rate?
  • A gauge set (Goldberg et al., 2001)
  • Specific items PopularityEntropy (Rashid et
    al., 2002)
  • New item problem
  • Content-based, hybrid

27
Issues in RS (continued)
  • Evaluation
  • Baseline
  • Cross-validation
  • Temporal issues
  • Training data size
  • Metrics
  • Mean absolute error
  • List of results
  • Precision, Recall, F-measure
  • Expected utility of a ranked list
  • Coverage
  • ROC curve

28
Issues in RS (continued)
  • Interaction with the user
  • Interests change over time (Billsus Pazzani,
    1998)
  • Interests change from interaction with the data
    (Hirashima et al. 1997)
  • Implicit interest indicators
  • Web-browsing (Claypool et al. 2001)
  • TV programs (O'Sullivan et al., 2004)
  • Presentation media (Billsus Pazzani, 2000)
  • User involvement in correcting profiles (Wærn,
    2004)
  • Interaction principles in utility-based RS

29
Issues in RS (continued)
  • Domain characteristics
  • Item properties
  • Dynamic domains
  • Domains with very similar items
  • Presentation media (what to present)
  • Hypermedia (Mobasher, Dai, Luo, Nakagawa, 2002)
  • Computational issues
  • PCA based RS (Goldberg et al., 2001)
  • Distribute load between server and client
  • Security/Privacy
Write a Comment
User Comments (0)
About PowerShow.com