Kernels for Relation Extraction - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Kernels for Relation Extraction

Description:

No office hours this week due to open houses. So I have more time to chat with prospective students. ... If mistake: vk 1 = vk yi xi. Mathematically the same ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 18
Provided by: willia95
Category:

less

Transcript and Presenter's Notes

Title: Kernels for Relation Extraction


1
Kernels for Relation Extraction
  • William Cohen
  • 3-6-2007

2
Announcements
  • No office hours this week due to open houses
  • So I have more time to chat with prospective
    students.
  • Next Tuesdays lecture will summarize
  • Sarawagi Cohen, NIPS 2004

3
The kernel perceptron
instance xi
B
A
Mathematically the same as before but allows
use of the kernel trick
4
The kernel perceptron
instance xi
B
A
Mathematically the same as before but allows
use of the kernel trick
Other kernel methods (SVM, Gaussian processes)
arent constrained to limited set (1/-1/0) of
weights on the K(x,v) values.
5
Kernels vs Structured Output Spaces
  • Two kinds of structured learning
  • HMMs, CRFs, VP-trained HMM, structured SVMs,
    stacked learning, . the output of the learner
    is structured.
  • Eg for linear-chain CRF, the output is a sequence
    of labelsa string Yn
  • Bunescu Mooney (EMNLP, NIPS) the input to the
    learner is structured.
  • EMNLP structure derived from a dependency graph.

New!
6
Dependency graphs for sentences
CFG dependency parsers ? dependency
trees Context-senstive formalisms ? dependency
DAGs
7
(No Transcript)
8
Disclaimer this is a shortest path, not the
shortest path
9
x ? x
? x1 x2 x3 x4 x5 41314 48
features
x3
x2
x4
x5
x1
K( x1 xn, y1 yn ) ( x1 xn )
n (y1 yn)

10
Results
-CCG, -CFG Context-sensitive CCG vs Collins
(CFG) parser S1, S2 one multi-class SVM vs two
SVMs (binary, then multiclass) Correct entity
output is assumed
11
Now the NIPS paper
  • Similar representation for relation instances x1
    xn where each xi is a set.
  • but instead of informative dependency path
    elements, the xs just represent adjacent tokens.
  • To compensate use a richer kernel

12
Subsequence kernel
  • set of all sparse subsequences u of
  • x1 xn with each u downweighted according to
    sparsity
  • Relaxation of old kernel
  • We dont have to match everywhere, just at
    selected locations
  • For every position we decide to match at, we get
    a penalty of ?
  • To pick a feature inside (x1 xn)
  • Pick a subset of locations ii1,,ik and then
  • Pick a feature value in each location
  • In the preprocessed vector x weight every
    feature for i by ?length(i) ?ik-i11

13
Subsequence kernel
or
14
Dynamic programming computation
Only counts u that align with last char of s and t
Skipping position i in s
Including position i
Not aligned with end of s
Aligned with end of s
15
Dynamic programming computation
Only counts u that align with last char of s and t
Matching at last pos of s,t
Skipping last position in t
Not aligned with end of s
Aligned with end of s
16
Additional details
  • Special domain-specific tricks for combining the
    subsequences for what matches in the fore, aft,
    and between sections of a relation-instance pair.
  • Subsequences are of length less than 4.
  • Is DP needed for this now?
  • Count fore-between, between-aft, and between
    subsequences separately.

17
Results
Protein-protein interaction
Write a Comment
User Comments (0)
About PowerShow.com