Gene Expression Rule Modeling - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Gene Expression Rule Modeling

Description:

Gene Expression Rule Modeling. Jonathan Rudolph. CS Advisor: Carolina Ruiz ... Problem: Existing models provide either descriptive or predictive power. ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 21
Provided by: webC
Category:

less

Transcript and Presenter's Notes

Title: Gene Expression Rule Modeling


1
Gene Expression Rule Modeling
  • Jonathan Rudolph
  • CS Advisor Carolina Ruiz
  • BBT Advisor Elizabeth Ryder
  • Worcester Polytechnic Institute
  • April 18, 2006

2
Problem
  • The process involved in gene expression is not
    fully understood.
  • DNA contains all genes
  • Different cell-types express different genes
  • Promoter Region determines transcription
  • Motifs short sequences on promoter region

figure taken from http//www.garlandscience.com
3
Process overview
  • Gene Expression dataset
  • Rule Mining Process
  • M1first M5second ? Neural
  • Rule Modeling Process
  • M1first M5second ? Neural
  • M8first M8second ? Muscle
  • Apply model to test case
  • Model Evaluations

4
Software overview
Items in blue added by this MQP
  • Dharmesh Thakkar MS thesis
  • Rule Mining Process
  • Added constraints to Keith Pray MS Thesis
  • Rule Modeling Process
  • Senthil Palanisamys MS Thesis
  • Associative Classification
  • extended to handle Gene Expression
  • CBA model
  • TopN CBA model
  • Per-Class CBA model
  • Model Evaluations
  • Added new ways to evaluate E-measure (error) for
    unclassified instances.

5
Dataset
  • Model species C. elegans is a common
    roundworm
  • Genome fully sequenced
  • Neural gene expression mapped

Pictures courtesy wormatlas www.wormatlas.org
6
Anotated Promoter Regions motifs with orientation
genes
Cell Types
ASH,ASI,ASK HSN,PHA,ADL,ASK ASE,PHA,ASI,ASK
ALM,HSN,PHA,CAN ALM,HSN ALM ALM ALM,PHA
ALM,HSN,CAN ALM,HSN,CAN ASH,ASI,HSN,ASL,CAN,A
SK ALM,ASE ASH,ADL ALM ASE ALM ASH,ASI
,PHA,ADL,ASE,ASK PHA,ADL
Annotated by MAST http//meme.sdsc.edu/
  • Dataset
  • 36 motifs length 8, 10, 12
  • 80 promoter regions
  • 4 Cell expression patterns

Dataset created from information available at
www.wormbase.org. Thanks also to the C. elegans
Consortium.
7
Rule Mining
  • Observe connections between attributes
  • Association Rule mining is applicable
  • Antecedent gt Consequent
  • M1first M5 second ? Neural Supp 0.25
    Conf 0.5
  • Support
  • of instances that contain A U C
  • Confidence
  • Of instances which contain A, of instances that
    contain A U C
  • P-value
  • probability of obtaining result by chance,
    assuming null hypothesis is true.
  • E-measure
  • value produced from comparing two sets, Error
    measure.

8
One rule M21first M12second M10third gt
exprALM
taken from Dharmesh Thakkar's Visualization module
9
Model Creation
  • If we mine, and have many rules, how do we pick
    the best?
  • Modeling
  • selecting rules that work well together
  • Descriptive Power
  • Use as few rules as possible to classify our
    instances
  • Use rules that represent our data
  • Predictive power
  • Verify by evaluating those rules on test data

10
Models what's out there?
  • Simple a model of All Rules
  • lots of detailed descriptive power
  • makes many predictions
  • more errors, higher E-measure
  • A little better Top N Rules
  • Small number of rules, sorted by confidence
  • For large N, same model as All Rules
  • More Sophisticated
  • CBA
  • TopN-CBA
  • Per-Class CBA

11
(No Transcript)
12
CBA
  • Sort rules by Confidence, Support, order of
    generation
  • Pass through instances once per rule
  • If any instances are covered, mark rule and
    remove instances.
  • If a rule applies to no instances, it is not
    marked, and not used in the model.
  • Repeat until no more rules or no more instances.
  • Evaluate set of marked rules incrementally one
    rule, two, three, etc.
  • When the average E-measure for the model of n
    rules is worse than the E- measure for n-1 rules,
    return the n-1 rules in order as our completed
    model.

13
CBA
  • Sophisticated model Classification Based
    Association, CBA
  • (Liu,Hsu,Ma '98 National Univ. Singapore)
  • Accurate rules that cover the most instances
    first
  • Only rules that decrease errors
  • Not designed for strong descriptive power

14
New Model TopN-CBA
  • Problem Existing models provide either
    descriptive or predictive power. We want a model
    that does both..
  • Approach Start with a CBA generated model,
    include 1 additional rule from each predicted
    class.

15
TopN-CBA
  • generate CBA model (1)
  • separate model by predicted type (blue, red and
    green circles) (2)
  • gather n best rules from each predicted type,
    including CBA model rules (for this example n3
    (3)
  • return list of top n rules for each prediction (4)

16
TopN-CBA model
17
(No Transcript)
18
(No Transcript)
19
New Model Per Class CBA
  • Problem TopN-CBA still adds some redundant
    rules.
  • Approach Separate the rules by class before
    selecting, then select separately for each class.
    Combine the partial models.

20
(No Transcript)
21
Model Analysis
  • Best Per Class CBA,
  • CBA better predictor
  • Rules from each cell-type
  • few rules (1-50)
  • Applicable for
  • Brief models
  • Summarize each cell-type
  • Prediction for low support rules
  • Note
  • smaller number of motifs, lower minsupport gave
    us better models all around.

22
Problem revisited
  • How will this help us understand gene expression?
  • Before
  • mine for rules in C. elegans
  • Now
  • Variety of models
  • 25 best rules instead of 14,000
  • Ahead
  • Conduct lab experiments in the locations and
    motifs nominated by the model

23
Questions?
24
Weka System Mining
Lift Confidence of a rule -----------------------
-- Confidence assuming indep. Consequent from
Antecedent. Lift gt 1 ? interesting
25
E-measure
developed by Senthil Palanisamny
  • E-measure is a metric for combining Precision and
    Recall
  • it is a 0 1 value
  • An E-measure of 0 indicates two identical sets.
  • An E-measure of 1 indicates two sets that have no
    items in common.
  • It is calculated from Precision, Recall, and beta
  • beta is a way of changing the E-measure to give
    a greater penalty to either a low Precision or
    low Recall
  • a beta of 1 gives equal penalty
Write a Comment
User Comments (0)
About PowerShow.com