Investigating Linguistic Knowledge In A Maximum Entropy TokenBased Language Model

About This Presentation

Title:

Investigating Linguistic Knowledge In A Maximum Entropy TokenBased Language Model

Description:

basketball. Data sharing depends on knowledge: ... Addressing data sharing in both history and future . Enabling Unlabeled training ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 2

Provided by: IBMU342

Category:

more less

Transcript and Presenter's Notes

Title: Investigating Linguistic Knowledge In A Maximum Entropy TokenBased Language Model

1
Investigating Linguistic Knowledge In A Maximum
Entropy Token-Based Language Model Jia Cui, Yi
Su, Keith Hall, Frederick Jelinek _at_clsp.jhu
Example A sentence in bigram METLM
Abstract We propose a novel language model
METLM (maximum entropy token-based language
model) capable of incorporating various types of
linguistic information as encoded in the form of
a token, a (word, label)-tuple. Using tokens as
hidden states, our model is effectively a hidden
Markov model (HMM) with ME transition
distributions. We investigated different types of
labels with a wide range of linguistic
implications. These models outperform Kneser-Ney
smoothed n-gram models both in terms of
perplexity on standard datasets and in terms of
word error rate for a large vocabulary speech
recognition system.
token
ME Training With Labeled Training Data
Data Sparseness and Sharing
Colin plays chess has no Google results, but
it is possible.
WW kept falling W-T falling-VBG WT
kept VBG TWT NNS kept VBG AA
WT,TW,TT,W-T,T
But CC stocks NNS kept
VBD falling VBG
Colin plays
chess
chess
he plays
chess
Colin takes
basketball
Colin plays
Data sharing depends on knowledge Lexical
Colin plays and he plays share the word
plays Syntactic takes and plays are both
VERBS Semantic basketball and chess are both
SPORTS ... ...
Word/label based features will not increase data
sparseness.
Word Classes/Labels
PI-CLS word classification using algorithm
proposed by Brown et. al, 1992 PD-CLS
position-dependent word classes, classifying
words at three positions simultaneously, classes
generated by Ahamad Emami Proximity-based word
classes word distances computed by Dekang Lin
(stock,C1 cost, currency, credit, salary,
refund, hourly) Dependency- based word classes
word distances computed by Dekang Lin,
(stock,C2bond, stock, cash, capacity,
decoration) Topic-based word classes distances
computed by Yonggang Deng (stock,C3indexes,
exchange, Chicago, crash, broker, unfolded
Experiments of Perplexities

Data Treebank WSJ 24 sections
Develop 0-19 sections, 41K sentences,1M words
Held 20-21 sections, 4.3K sentences,110K words
Test 22-23 sections,4.2K sentences,106K words
10K vocabulary

baseline
Experiments on ASR system
Fisher Data, 22M training data, Dialog, 4167
reference sentences. Lattice re-scoring 2.7M
predictions Use dominant POS tags Basic Features
AA, WTW,WWT, WWTW,WTWW