Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction - PowerPoint PPT Presentation

About This Presentation

Title:

Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction

Description:

ftp.cs.wisc.edu – PowerPoint PPT presentation

Number of Views:109

Avg rating:3.0/5.0

Slides: 32

Provided by: MarkGo64

Learn more at: https://ftp.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction

1
Learning Ensembles ofFirst-Order Clauses for
Recall-Precision CurvesA Case Study
inBiomedical Information Extraction

Mark Goadrich, Louis Oliphant and Jude Shavlik
Department of Computer Sciences
University of Wisconsin Madison USA
6 Sept 2004

2
Talk Outline

Link Learning and ILP
Our Gleaner Approach
Aleph Ensembles
Biomedical Information Extraction
Evaluation and Results
Future Work

3
ILP Domains

Object Learning
Trains, Carcinogenesis
Link Learning
Binary predicates

4
Link Learning

Large skew toward negatives
500 relational objects
5000 positive links means 245,000 negative links
Difficult to measure success
Always negative classifier is 98 accurate
ROC curves look overly optimistic
Enormous quantity of data
4,285,199,774 web pages indexed by Google
PubMed includes over 15 million citations

5
Our Approach

Develop fast ensemble algorithms focused on
recall and precision evaluation
Key Ideas of Gleaner
Keep wide range of clauses
Create separate theories for different recall
ranges
Evaluation
Area Under Recall Precision Curve (AURPC)
Time Number of clauses considered

6
Gleaner - Background

Focus evaluation on positive examples
Recall
Precision
Rapid Random Restart (Zelezny et al ILP 2002)
Stochastic selection of starting clause
Time-limited local heuristic search
We store variety of clauses (based on recall)

7
Gleaner - Learning

Create B Bins
Generate Clauses
Record Best
Repeat for K seeds

Precision
Recall
8
Gleaner - Combining

Combine K clauses per bin
If at least L of K clauses match, call example
positive
How to choose L ?
L1 then high recall, low precision
LK then low recall, high precision
Our method
Choose L such that ensemble recall matches bin b
Bin bs precision should be higher than any
clause in it
We should now have set of high precision rule
sets spanning space of recall levels

9
How to use Gleaner

Generate Curve
User Selects Recall Bin
Return ClassificationsWith Precision Confidence

Precision
Recall 0.50 Precision 0.70
Recall
10
Aleph Ensembles

We compare to ensembles of theories
Algorithm (Dutra et al ILP 2002)
Use K different initial seeds
Learn K theories containing C clauses
Rank examples by the number of theories
Need to balance C for high performance
Small C leads to low recall
Large C leads to converging theories

11
Aleph Ensembles (100 theories)
12
Biomedical Information Extraction

Given Medical Journal abstracts tagged
with protein localization relations
Do Construct system to extract protein
localization phrases from unseen text
NPL3 encodes a nuclear protein with an RNA
recognition motif and similarities to a family of
proteins involved in RNA metabolism.

13
Biomedical Information Extraction

Hand-labeled dataset (Ray Craven 01)
7,245 sentences from 871 abstracts
Examples are phrase-phrase combinations
1,810 positive 279,154 negative
1.6 GB of background knowledge
Structural, Statistical, Lexical and Ontological
In total, 200 distinct background predicates

14
Evaluation Metrics
1.0

Two dimensions
Area Under Recall-Precision Curve (AURPC)
All curves standardized to cover full recall
range
Averaged AURPC over 5 folds
Number of clauses considered
Rough estimate of time
Both are stop anytime parallel algorithms

Precision
Recall
1.0
15
AURPC Interpolation

Convex interpolation in RP space?
Precision interpolation is counterintuitive
Example 1000 positive 9000 negative

TP FP TP Rate FP Rate Recall Prec
500 500 0.50 0.06 0.50 0.50

1000 9000 1.00 1.00 1.00 0.10
750
4750
0.75
0.53
0.75
0.14
Example Counts
RP Curves
ROC Curves
16
AURPC Interpolation
17
Experimental Methodology

Performed five-fold cross-validation
Variation of parameters
Gleaner (20 recall bins)
seeds 25, 50, 75, 100
clauses 1K, 10K, 25K, 50K, 100K, 250K, 500K
Ensembles (0.75 minacc, 35,000 nodes)
theories 10, 25, 50, 75, 100
clauses per theory 1, 5, 10, 15, 20, 25, 50

18
Results Testfold 5 at 1,000,000 clauses
Gleaner
Ensembles
19
Results Gleaner vs Aleph Ensembles
20
Conclusions

Gleaner
Focuses on recall and precision
Keeps wide spectrum of clauses
Good results in few cpu cycles
Aleph ensembles
Early stopping helpful
Require more cpu cycles
AURPC
Useful metric for comparison
Interpolation unintuitive

21
Future Work

Improve Gleaner performance over time
Explore alternate clause combinations
Better understanding of AURPC
Search for clauses that optimize AURPC
Examine more ILP link-learning datasets
Use Gleaner with other ML algorithms

22
Take-Home Message

Definition of Gleaner
One who gathers grain left behind by reapers
Gleaner and ILP
Many clauses constructed and evaluated in ILP
hypothesis search
We need to make better use of those that arent
the highest scoring ones
Thanks, Questions?

23
Acknowledgements

USA NLM Grant 5T15LM007359-02
USA NLM Grant 1R01LM07050-01
USA DARPA Grant F30602-01-2-0571
USA Air Force Grant F30602-01-2-0571
Condor Group
David Page
Vitor Santos Costa, Ines Dutra
Soumya Ray, Marios Skounakis, Mark Craven
Dataset available at (URL in proceedings)
ftp//ftp.cs.wisc.edu/machine-learning/shavlik-gro
up/datasets/IE-protein-location

24
Deleted Scenes

Aleph Learning
Clause Weighting
Sample Gleaner Recall-Precision Curve
Sample Extraction Clause
Gleaner Algorithm
Director Commentary
on off

25
Aleph - Learning

Aleph learns theories of clauses (Srinivasan, v4,
2003)
Pick a positive seed example and saturate
Use heuristic search to find best clause
Pick new seed from uncovered positivesand repeat
until threshold of positives covered
Theory produces one recall-precision point
Learning complete theories is time-consuming
Can produce ranking with theory ensembles

26
Clause Weighting

Single Theory Ensemble
rank by how many clauses cover examples
Weight clauses using tuneset statistics
CN2 (average precision of matching clauses)
Lowest False Positive Rate Score
Cumulative
F1 score Recall
Precision Diversity

27
Clause Weighting
28
Further Results
29
Biomedical Information Extraction

NPL3 encodes a nuclear protein with

marked location
alphanumeric
30
Sample Extraction Clause

P Protein, L Location, S Sentence
29 Recall 34 Precision on testset 1

contains alphanumeric
contains marked location
contains no between halfX verb
contains alphanumeric
31
Gleaner Algorithm

Create B equal-sized recall bins
For K different seeds
Generate rules using Rapid Random Restart
Record best rule (precision x recall)found for
each bin
For each recall bin B
Find threshold L of K clauses such thatrecall of
at least L of K clauses match examples recall
for this bin
Find recall and precision on testset using each
bins at least L of K decision process