Customizing Gene Taggers for BeeSpace

About This Presentation

Title:

Description:

Number of Views:41

Avg rating:3.0/5.0

Slides: 18

Provided by: jin144

Category:

Tags: beespace | customizing | gene | kex | taggers

Transcript and Presenter's Notes

Title: Customizing Gene Taggers for BeeSpace

1
Customizing Gene Taggersfor BeeSpace

2
Entity Recognition in BeeSpace

3
Input and Output

Input free text (w/ simple XML tags)
lt?xml version1.0 encodingUTF-8gtltDocument
id1gtWe have cloned and sequenced a cDNA
encoding Apis mellifera ultraspiracle (AMUSP) and
examined its responses to JH. lt/Documentgt
Output tagged text (XML format)
lt?xml version1.0 encodingUTF-8gt ltDocument
id1gt ltSentgtltNPgtWelt/NPgt have ltVPgtclonedlt/VPgt
and ltVPgtsequencedlt/VPgt ltNPgta cDNA encoding
ltGenegtApis mellifera ultraspiraclelt/GenegtltNPgt
(ltGenegtAMUSPlt/Genegt) and ltVPgtexaminedlt/VPgt
ltNPgtits responses to JHlt/NPgt.lt/Sentgtlt/Documentgt

4
Challenges

5
Automatic Gene RecognitionCharacteristics of
Gene Names

6
Existing Tools

KeX (Fukuda)
Based on hand-crafted rules
Recognizes proteins and other entities
Human efforts, not easy to modify
ABNER YAGI (Settles)
Based on conditional random fields (CRFs) to
learn the rules
ABNER identifies and classifies different
entities including proteins, DNAs, RNAs, cells
YAGI recognizes genes and gene products
No training

7
Existing Tools (cont.)

8
Comparison of Existing Tools

9
Comparison of Existing Tools (cont.)
10
Comparison of Existing Tools (cont.)

11
Lessons Learned

12
Customization

13
Customization (cont.)

Exploit more features such as global context
Occurrences of the same word/phrase should be
tagged all positive or all negative
Differentiate between domain-independent features
and domain-specific features
E.g., prefix Am is domain-specific for Apis
mellifera
Features can be weighted based on their
contribution across domains

14
Maximum Entropy Modelfor Gene Tagging

Given an observation (a token or a noun phrase),
together with its context, denoted as x
Predict y ? gene, non-gene
Maximum entropy model
P(yx) K exp(??ifi(x, y))
Typical f
y gene candidate phrase starts with a capital
letter
y gene candidate phrase contains digits
Estimate ?i with training data

15
Plan Customization with Feature Adaptation

16
Issues to Discuss

Customizing Gene Taggers for BeeSpace - PowerPoint PPT Presentation