Gene Expression Rule Modeling - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Gene Expression Rule Modeling

Description:

Gene Expression Rule Modeling. Jonathan Rudolph. CS Advisor: Carolina Ruiz ... Problem: Existing models provide either descriptive or predictive power. ... – PowerPoint PPT presentation

Number of Views:43

Avg rating:3.0/5.0

Slides: 21

Provided by: webC

Category:

more less

Transcript and Presenter's Notes

Title: Gene Expression Rule Modeling

1
Gene Expression Rule Modeling

Jonathan Rudolph
CS Advisor Carolina Ruiz
BBT Advisor Elizabeth Ryder
Worcester Polytechnic Institute
April 18, 2006

2
Problem

The process involved in gene expression is not
fully understood.
DNA contains all genes
Different cell-types express different genes
Promoter Region determines transcription
Motifs short sequences on promoter region

figure taken from http//www.garlandscience.com
3
Process overview

Gene Expression dataset
Rule Mining Process
M1first M5second ? Neural
Rule Modeling Process
M1first M5second ? Neural
M8first M8second ? Muscle
Apply model to test case
Model Evaluations

4
Software overview
Items in blue added by this MQP

Dharmesh Thakkar MS thesis
Rule Mining Process
Added constraints to Keith Pray MS Thesis
Rule Modeling Process
Senthil Palanisamys MS Thesis
Associative Classification
extended to handle Gene Expression
CBA model
TopN CBA model
Per-Class CBA model
Model Evaluations
Added new ways to evaluate E-measure (error) for
unclassified instances.

5
Dataset

Model species C. elegans is a common
roundworm
Genome fully sequenced
Neural gene expression mapped

Pictures courtesy wormatlas www.wormatlas.org
6
Anotated Promoter Regions motifs with orientation
genes
Cell Types
ASH,ASI,ASK HSN,PHA,ADL,ASK ASE,PHA,ASI,ASK
ALM,HSN,PHA,CAN ALM,HSN ALM ALM ALM,PHA
ALM,HSN,CAN ALM,HSN,CAN ASH,ASI,HSN,ASL,CAN,A
SK ALM,ASE ASH,ADL ALM ASE ALM ASH,ASI
,PHA,ADL,ASE,ASK PHA,ADL
Annotated by MAST http//meme.sdsc.edu/

Dataset
36 motifs length 8, 10, 12
80 promoter regions
4 Cell expression patterns

Dataset created from information available at
www.wormbase.org. Thanks also to the C. elegans
Consortium.
7
Rule Mining

Observe connections between attributes
Association Rule mining is applicable
Antecedent gt Consequent
M1first M5 second ? Neural Supp 0.25
Conf 0.5
Support
of instances that contain A U C
Confidence
Of instances which contain A, of instances that
contain A U C
P-value
probability of obtaining result by chance,
assuming null hypothesis is true.
E-measure
value produced from comparing two sets, Error
measure.

8
One rule M21first M12second M10third gt
exprALM
taken from Dharmesh Thakkar's Visualization module
9
Model Creation

If we mine, and have many rules, how do we pick
the best?
Modeling
selecting rules that work well together
Descriptive Power
Use as few rules as possible to classify our
instances
Use rules that represent our data
Predictive power
Verify by evaluating those rules on test data

10
Models what's out there?

Simple a model of All Rules
lots of detailed descriptive power
makes many predictions
more errors, higher E-measure
A little better Top N Rules
Small number of rules, sorted by confidence
For large N, same model as All Rules
More Sophisticated
CBA
TopN-CBA
Per-Class CBA

11
(No Transcript)
12
CBA

Sort rules by Confidence, Support, order of
generation
Pass through instances once per rule
If any instances are covered, mark rule and
remove instances.
If a rule applies to no instances, it is not
marked, and not used in the model.
Repeat until no more rules or no more instances.
Evaluate set of marked rules incrementally one
rule, two, three, etc.
When the average E-measure for the model of n
rules is worse than the E- measure for n-1 rules,
return the n-1 rules in order as our completed
model.

13
CBA

Sophisticated model Classification Based
Association, CBA
(Liu,Hsu,Ma '98 National Univ. Singapore)
Accurate rules that cover the most instances
first
Only rules that decrease errors
Not designed for strong descriptive power

14
New Model TopN-CBA

Problem Existing models provide either
descriptive or predictive power. We want a model
that does both..
Approach Start with a CBA generated model,
include 1 additional rule from each predicted
class.

15
TopN-CBA

generate CBA model (1)
separate model by predicted type (blue, red and
green circles) (2)
gather n best rules from each predicted type,
including CBA model rules (for this example n3
(3)
return list of top n rules for each prediction (4)

16
TopN-CBA model
17
(No Transcript)
18
(No Transcript)
19
New Model Per Class CBA

Problem TopN-CBA still adds some redundant
rules.
Approach Separate the rules by class before
selecting, then select separately for each class.
Combine the partial models.

20
(No Transcript)
21
Model Analysis

Best Per Class CBA,
CBA better predictor
Rules from each cell-type
few rules (1-50)
Applicable for
Brief models
Summarize each cell-type
Prediction for low support rules
Note
smaller number of motifs, lower minsupport gave
us better models all around.

22
Problem revisited

How will this help us understand gene expression?
Before
mine for rules in C. elegans
Now
Variety of models
25 best rules instead of 14,000
Ahead
Conduct lab experiments in the locations and
motifs nominated by the model

23
Questions?
24
Weka System Mining
Lift Confidence of a rule -----------------------
-- Confidence assuming indep. Consequent from
Antecedent. Lift gt 1 ? interesting
25
E-measure
developed by Senthil Palanisamny

E-measure is a metric for combining Precision and
Recall
it is a 0 1 value
An E-measure of 0 indicates two identical sets.
An E-measure of 1 indicates two sets that have no
items in common.
It is calculated from Precision, Recall, and beta
beta is a way of changing the E-measure to give
a greater penalty to either a low Precision or
low Recall
a beta of 1 gives equal penalty