Fast Methods for Kernel-based Text Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Fast Methods for Kernel-based Text Analysis

Description:

Fast Methods for Kernel-based Text Analysis Taku Kudo Yuji Matsumoto NAIST (Nara Institute of Science and Technology) – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 38
Provided by: Taku9
Learn more at: http://chasen.org
Category:

less

Transcript and Presenter's Notes

Title: Fast Methods for Kernel-based Text Analysis


1
Fast Methods for Kernel-based Text Analysis
  • Taku Kudo ?? ?
  • Yuji Matsumoto ?? ??
  • NAIST (Nara Institute of Science and Technology)

41st Annual Meeting of the Association for
Computational Linguistics , Sapporo JAPAN
2
Background
  • Kernel methods (e.g., SVM) become popular
  • Can incorporate prior knowledge independently
    from the machine learning algorithms by giving
    task dependent kernel (generalized dot-product)
  • High accuracy

3
Problem
  • Too slow to use kernel-based text analyzers to
    the real NL applications (e.g., QA or text
    mining) because of their inefficiency in testing
  • Some kernel-based parsers run only at 2 - 3
    seconds/sentence

4
Goals
  • Build fast but still accurate kernel- based
    text analyzers
  • Make it possible to use them to wider range of NL
    applications

5
Outline
  • Polynomial Kernel of degree d
  • Fast Methods for Polynomial kernel
  • PKI
  • PKE
  • Experiments
  • Conclusions and Future Work

6
Outline
  • Polynomial Kernel of degree d
  • Fast Methods for Polynomial kernels
  • PKI
  • PKE
  • Experiments
  • Conclusions and Future Work

7
Kernel Methods
Training data
  • No need to represent example in an explicit
    feature vector
  • Complexity of testing is O(L X)

8
Kernels for Sets (1/3)
  • Focus on the special case where examples
    are represented as sets
  • The instances in NLP are usually
    represented as sets (e.g., bag-of-words)

Feature set
Training data
9
Kernels for Sets (2/3)
  • Simple definition
  • Combinations (subsets) of features

2nd order
3rd order
10
Kernels for Sets (3/3)
Dependent (1) or independent (-1) ?
I ate a cake PRP VBD
DT NN
head
modifier
11
Polynomial Kernel of degree d
Implicit form
12
Example (Cubic Kernel d3 )
Implicit form
Up to 3 subsets are used as new features
13
Outline
  • Polynomial Kernel of degree d
  • Fast Methods for Polynomial kernel
  • PKI
  • PKE
  • Experiments
  • Conclusions and Future Work

14
Toy Example
Feature Set Fa,b,c,d,e
Examples
X
a
j
j
1 0.5 -2
1 2 3
a, b, c a, b, d b, c, d
SVs L 3
Kernel
Test Example
Xa,c,e
15
PKB (Baseline)
3
K(X,X) (XnX1)
X
a
j
a, b, c a, b, d b, c, d
K(Xj,X)
1 0.5 -2
Test Example Xa,c,e
1 2 3
3
3
3
f(X) 1(21) 0.5(11) - 2 (11)
15 Complexity is always O(LX)
16
PKI (Inverted Representation)
3
K(X,X) (XnX1)
Inverted Index
Xj
a
B Avg. size
a b c d
1,2 1,2,3 1,3 2,3
Test Example X a, c, e
a, b, c a, b, d b, c, d
1 0.5 -2
1 2 3
3
3
3
f(X)1(21) 0.5(11) - 2 (11) 15
  • Average complexity is O(BXL)
  • Efficient if feature space is sparse
  • Suitable for many NL tasks

17
PKE (Expanded Representation)
  • Convert into linear form by calculating vector w
  • projects X into its subsets space

18
PKE (Expanded Representation)
3
K(X,X) (XnX1)
19
PKE in Practice
  • Hard to calculate Expansion Table exactly
  • Use Approximated Expansion Table
  • Subsets with smaller w can be removed, since
    w represents a contribution to the final
    classification
  • Use subset mining (a.k.a. basket mining)
    algorithm for efficient calculation

20
Subset Mining Problem
set
id
a3 b3 c3 d2 a b2 b
c 2 a c2 a d 2
1
a c d

2
a b c
3
a b d
4
b c e
Results
Transaction Database
  • Extract all subsets that occur in no less than
    sets of the transaction database
  • and no size constraints ? NP-hard
  • Efficient algorithms have been proposed
    (e.g., Apriori, PrefixSpan)

21
Feature Selection as Mining
Xi
ai
a, b, c a, b, d b, c, d
1 2 3
1 0.5 -2
  • Can efficiently build the approximated table
  • s controls the rate of approximation

22
Outline
  • Polynomial Kernel of degree d
  • Fast Methods for Polynomial kernel
  • PKI
  • PKE
  • Experiments
  • Conclusions and Future Work

23
Experimental Settings
  • Three NL tasks
  • English Base-NP Chunking (EBC)
  • Japanese Word Segmentation (JWS)
  • Japanese Dependency Parsing (JDP)
  • Kernel Settings
  • Quadratic kernel is applied to EBC
  • Cubic kernel is applied to JWS and JDP

24
Results (English Base-NP Chunking)
Time (Sec./Sent.) Speedup Ratio F-score
PKB .164 1.0 93.84
PKI .020 8.3 93.84
PKE (s.01) .0016 105.2 93.79
PKE (s.005) .0016 101.3 93.85
PKE (s.001) .0017 97.7 93.84
PKE (s.0005) .0017 96.8 93.84
25
Results (Japanese Word Segmentation)
Time (Sec./Sent.) Speedup Ratio Accuracy ()
PKB .85 1.0 97.94
PKI .49 1.7 97.94
PKE (s.01) .0024 358.2 97.93
PKE (s.005) .0028 300.1 97.95
PKE (s.001) .0034 242.6 97.94
PKE (s.0005) .0035 238.8 97.94
26
Results (Japanese Dependency Parsing)
Time (Sec./Sent.) Speedup Ratio Accuracy ()
PKB .285 1.0 89.29
PKI .0226 12.6 89.29
PKE (s.01) .0042 66.8 88.91
PKE (s.005) .0060 47.8 89.05
PKE (s.001) .0086 33.3 89.26
PKE (s.0005) .0090 31.8 89.29
27
Results
  • 2 - 12 fold speed up in PKI
  • 30 - 300 fold speed up in PKE
  • Preserve the accuracy when we set an appropriate
    s

28
Comparison with related work
  • XQK Isozaki et al. 02
  • Same concept as PKE
  • Designed only for the Quadratic Kernel
  • Exhaustively creates the expansion table
  • PKE
  • Designed for general Polynomial Kernels
  • Uses subset mining algorithms to create the
    expansion table

29
Conclusions
  • Propose two fast methods for the polynomial
    kernel of degree d
  • PKI (Inverted)
  • PKE (Expanded)
  • 2-12 fold speed up in PKI, 30-300 fold speed up
    in PKE
  • Preserve the accuracy

30
Future Work
  • Examine the effectiveness in a general machine
    learning dataset
  • Apply PKE to other convolution kernels
  • Tree Kernel Collins 00
  • Dot-product between trees
  • Feature space is all sub-tree
  • Apply sub-tree mining algorithm Zaki 02

31
English Base-NP Chunking
Extract Non-overlapping Noun Phrase from text
NP He reckons NP the current account deficit
will narrow to NP only 1.8 billion in NP
September .
  • BIO representation (seeing as a tagging task)
  • B beginning of chunk
  • I non-initial chunk
  • O outside
  • Pair-wise method to 3-class problem
  • training wsj15-18, test wsj20 (standard set)

32
Japanese Word Segmentation
Taro made Hanako read a book
? ? ? ? ? ? ? ? ? ? ? ?
Sentence
? ? ? ? ? ? ? ?
Boundaries
If there is a boundary between and
, otherwise
  • Distinguish the relative position
  • Use also the character types of Japanese
  • Training KUC 01-08, Test KUC 09

33
Japanese Dependency Parsing
?? ???? ??? I-top cake-acc. eat
I eat a cake
  • Identify the correct dependency relations
    between two bunsetsu (base phrase in English)
  • Linguistic features related to the modifier
    and head (word, POS, POS-subcat,
    inflections, punctuations, etc)
  • Binary classification (1 dependent, -1
    independent)
  • Cascaded Chunking Model kudo, et al. 02
  • Training KUC 01-08, Test KUC 09

34
Kernel Methods (1/2)
Suppose a learning task
training examples
  • X example to be classified
  • Xi training examples
  • weight for examples
  • a function to map examples to another
    vectorial space

35
PKE (Expanded Representation)
If we calculate in advance ( is the indicator
function)
for all subsets
36
TRIE representation
root
w
a d a,b a,c b,c b,d c,d b,c,d
10.5 -10.5 12 12 -12 -18 -24 -12
a
d
b
c
10.5
-10.5
c
c
d
d
b
-24
12
12
-18
-12
d
-12
  • Compress redundant structures
  • Classification can be done by simply
    traversing the TRIE

37
Kernel Methods
Training data
  • No need to represent example in an explicit
    feature vector
  • Complexity of testing is O(L X)
Write a Comment
User Comments (0)
About PowerShow.com