A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data - PowerPoint PPT Presentation

1 / 70

About This Presentation

Title:

A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data

Description:

PS3. The network score is a function of conditional probabilities ... PS3. Step 3: Randomly add/delete an edge and rescore the Bayesian network ... – PowerPoint PPT presentation

Number of Views:101

Avg rating:3.0/5.0

Slides: 71

Provided by: elinorve

Category:

more less

Transcript and Presenter's Notes

Title: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data

1
A Bayesian Statistical Approach to Modeling Gene
Regulatory Pathways in Human Placental Data

Elinor Velasquez
Dept. of Biology
San Francisco State University

2
Outline of talk

Introduction
The experimental approach Obtaining placenta
data
The experimental approach Modeling gene
regulatory networks
Results from experiments
Conclusions and future work
Acknowledgements

3
Introduction

4
Overall goal
To use a bioinformatics model for which to better
understand the human placenta
http//www.biotechnologycenter.org/hio/assets/hisi
mages/placenta/placenta44.jpg
5
The human placenta
http//www.uchsc.edu/winnlab/index.html
6
The basal plate in the placenta
Site of known anatomical abnormalities in
preeclampsia
http//www.uchsc.edu/winnlab/projects.html
7
EGFR pathway

EGFR, cell surface receptor for epidermal growth
factors
Potentially important gene for the placenta

British Journal of Cancer (2006) 94, 184 188
8
EGFR regulates gene expression

EGFR
CSPG2
DCN
ANGPT2
9
Causal relationships

EGFR
CSPG2
DCN
ANGPT2
10
Example of a gene regulatory network
Gene 1
Gene 3
Gene 2
Gene 5
Gene 6
Gene 4
11
Definition of a Bayesian network

There exist nodes (disks)
There are edges (arrows) between some of the
nodes
Causality is implied by the edges
Acyclic

Gene 1
Gene 3
Gene 2
Gene 5
Gene 6
Gene 4
12
The experimental approach Obtaining placenta data

13
Data collected from microarrays

Data comes from 36 experiments conducted by
Virginia Winn et al. at the SJ Fisher lab, UCSF
Gene expression profiling experiments

cRNA
hybridization
45000 dots (25-mer oligo probe sets) representing
the human genome
14

Traditional spotted arrays
15
What is a probe set?

Several oligonucleotides designed to hybridize to
various parts of the mRNA generated from a single
gene

Probe set
mRNA
gene
16

Affymetrix GeneChips
17
Microarray data

The normalized log 2 intensity values were
centered to the median value of each probe set,
by Virginia Winn et al.

5 time segments
1 2
3 4 5
A probe set x1 ... x6 y1 ... y9 z1 ...z6
w1...w6 s1 ... s9
36 data points per probe set
18
Microarray data

Red denotes the up regulated expression and green
denotes the down regulated expression relative to
the median value
Genes differentially expressed in the basal plate
of placentas Rows contain data from a single
basal plate cRNA sample and columns correspond to
a single probe set.

http//www.uchsc.edu/winnlab/index.html
19
Summary of data used in bioinformatics experiments
Gene egfr

36 placentas
45, 000 probe sets
Time-series data from 14-16 weeks to term

20
The experimental approach Modeling gene
regulatory networks

21
Outline of bioinformatics experimental design

PS 1
PS 2
PS 4
PS 3

Step 1. Create a naïve Bayesian network using the
probe set data
Step 2. Score the naïve Bayesian network
Step 3. Randomly add/delete an edge and rescore
the Bayesian network
Step 4. Continue until best score reached
Step 5. Combine probe sets to create the gene
regulatory network

22
Four probe sets (Three genes)

23
Define naïve Bayesian network

Choose a root node
All other nodes branch off of the root node
PS1 is the parent node

PS 1
PS 2
PS 4
PS 3
24
Step 1 Create a naïve Bayesian network using
probe set data
PS1
PS3
PS4
PS2

Use data from one time segment
Choose Weeks 23-24 data (6 placentas)
Choose 4 probe sets

25
Placenta data for Weeks 23-24
PS1 corresponds to 201984 which corresponds to
EGFR PS2 corresponds to 236034, PS3 corresponds
to 211148 PS2 and PS3
both correspond to ANGPT2 PS4 corresponds to
204620 which corresponds to CSPG2
26
Step 2 Score the naïve Bayesian network

We want to score this network

PS1
PS4
PS3
PS2
27
The network score is a function of conditional
probabilities

Conditional probability, Prob(N Pa(N)), is
the probability of child node N given parent of N
Example Given a parent PS1s node has an
associated expression value 10, what is the
probability that its child node, PS4, has an
expression value of 8?

PS1
PS4
28
Conditional probability

PS1

EGFR (PS1) is the parent node and
has value 10.
CSPG2 (PS4) is the child node and has
value 8 two times
Conditional probability 2/6

PS4
29
Score for a Bayesian network

The score of the naive network equals the
product of all the nonzero conditional
probabilities associated with the network
P(N1, N2, N3, N4) ? P(Ni pa(Ni))

4
i1
30
Score for the naïve Bayesian network

P(N1, N2, N3, N4) 1/3966
2.54 x 10-5

PS1
PS4
PS2
PS3
31
Step 3 Randomly add/delete an edge and rescore
the Bayesian network
PS1

The score becomes 1/78732 1.27 x 10-5.

PS2
PS3
PS4
32
Step 4. Continue until best score reached

Since the score is a probability, we want the
score to be high.
The naïve network is the better choice between
the two networks, so we pick it as our final
network.

PS1
PS4
PS2
PS3
33
Step 5. Combine probe sets to create the gene
regulatory network

EGFR
CSPG2
ANGPT2
34
40 probe sets (26 genes)

35
Gene regulatory pathwayfor 26 genes

Step 1. Create a naïve Bayesian network using 40
probe sets for each time segment Step 2. Score
the naïve Bayesian network Step 3. Randomly
add/delete an edge and rescore the Bayesian
network Step 4. Continue until best score
reached Step 5. Combine probe sets to create the
gene regulatory network for the placenta
36
Step 1. Create a naïve Bayesian network using
40 probe sets for each time segment

37
Create a naïve Bayesian network

PS 7
PS 8
PS 6
PS 9
PS 1
PS 5
PS 2
PS 4
PS 3
38
Step 2. Score the naïve Bayesian network

39
Score for a Bayesian network

The score of the naive network equals the
product of all the nonzero conditional
probabilities associated with the network
P(N1, N2, N3, N4) ? P(Ni pa(Ni))

40
i1
40

Step 3. Randomly add/delete an edge and rescore
the Bayesian network
Step 4. Continue until best score reached

41
With four probe sets, at least two Bayesian
networks were constructed

PS1
PS1
PS2
PS3
PS4
PS2
PS3
PS4
42
Exhaustive search

To be certain that we have the best scoring
network, we need to construct all possible
networks from our naïve networks
With four probe sets, we only constructed one
other network than the naïve network
How to construct all possible networks?

43
How do we construct all possible networks?

1 probe set 1 Bayesian network
2 probe sets 2 possible Bayesian networks
3 probe sets 12 possible Bayesian networks
4 probe sets 144 possible Bayesian networks
5 probe sets 4800 possible Bayesian networks!
6 probe sets ??
And so on

44
Welcome to Modern Heuristics

Step 1. Representation of a model
Step 2. The scoring function
Step 3. Defining the search problem
Step 4. Consider local optima

score
local change
45
Step 1 Representation of the model

The model is a gene regulatory pathway.
We are going to assume a Bayesian model for our
probe set
The number of possible pathways is so large as to
forbid an exhaustive search for the best Bayesian
network.

PS 1
PS 2
PS 4
PS 3
46
Step 2 The scoring function

The fair coin, p(X heads) ½
What happens if the coin is unfairly weighted?
We need to re-think probability
p(X) ?p(x) r(x) dx
r(x) is a weight function.

47
Step 2. The scoring function

The scoring function is a probability
Assume the network has a Dirichlet distribution
which is the weight function used to weight the
conditional probabilities.

www.wikipedia.com
48
Step 2. The scoring function

Probability of a fixed network equals product
of conditional probabilities times the Dirichlet
distribution

40
P(N) ? P(Ni pa(Ni)) D(Ni)
i 1
such that
D(Ni) ? Ti?i-1(N i)
49
Step 3 Defining the search problem

What it means to search
a. Construct a first network (Use a naïve
Bayesian network)
b. Score the first network using the scoring
function
c. Perform the Hill-climbing algorithm.

50
Step 3. Defining the search problem The
Hill-climbing Algorithm

Randomly choose a node
Search in the neighborhood of that node for the
best scoring network

51
Step 4. Consider local optima

Hill-Climbing is a traditional method for search
techniques
Can get caught on local maxima
Step 4 is to keep choosing random nodes.

score
local change
randomly chosen node is the origin
From http//content.answers.com/
52
Software

Weka software package written by members of the
University of Waikato, New Zealand,
http//www.cs.waikato.ac.nz/ml/people.html
DEAL, R package, written by Susanne G. Bøttcher,
Claus Dethlefsen, http//www.math.auc.dk/novo/deal
BayesNet Toolbox, Matlab package, written by
Kevin Murphy, http//www.cs.ubc.ca/murphyk/Softwa
re/BNT/bnt.html
ExpressionNet, written by Jingchun Zhu,
http//expressionnet.sourceforge.net/

53
Results from experiments

54
26 genes

55

COL5A2
COL5A1
COL3A1
DCN
COL1A2
CSPG2
SPP1
INHBA
ANGPT2
BAMBI

SFRP1

P4HA1
IGFBP1
RAP2B
PLAU
CCNG2

MRC2
GLB1
ATP5E
ADAM9
EGFR
ERG
Ingenuity network
USP6NL
PECAM1
IL2RB
CECAM1
CYP19A1
56
Results for 26 genes

40 probe sets (26 genes)
Data comes from five different time intervals
1. 14 16 gestational weeks
2. 18 19 gestational weeks
3. 21 gestational week
4. 23 24 gestational weeks
5. 37 40 gestational weeks

57

COL5A2
COL5A1
COL3A1
DCN
COL1A2
CSPG2
SPP1
INHBA
ANGPT2

BAMBI
SFRP1

P4HA1
IGFBP1
RAP2B
PLAU
CCNG2

MRC2
GLB1
ATP5E
ADAM9
EGFR
ERG
Time Segment Week 14-16 weeks
USP6NL
PECAM1
CYP19A1
IL2RB
CECAM1
58

COL5A2
COL5A1
COL3A1
DCN
COL1A2
CSPG2
SPP1
INHBA
ANGPT2

BAMBI
SFRP1

P4HA1
IGFBP1
RAP2B
PLAU
CCNG2

MRC2
GLB1
ATP5E
ADAM9
EGFR
ERG
USP6NL
Time segment 18 19 weeks
PECAM1
CYP19A1
IL2RB
CECAM1
59

COL5A2
COL5A1
COL3A1
DCN
COL1A2
CSPG2
SPP1
INHBA
ANGPT2

BAMBI
SFRP1

P4HA1
IGFBP1
RAP2B
PLAU
CCNG2

MRC2
GLB1
ATP5E
ADAM9
EGFR
ERG
USP6NL
Time segment 21 weeks
PECAM1
CYP19A1
IL2RB
CECAM1
60

COL5A2
COL5A1
COL3A1
DCN
COL1A2
CSPG2
SPP1
INHBA
ANGPT2

BAMBI
SFRP1

P4HA1
IGFBP1
RAP2B
PLAU
CCNG2

MRC2
GLB1
ATP5E
ADAM9
EGFR
ERG
Time segment 23 24 weeks
USP6NL
PECAM1
CYP19A1
IL2RB
CECAM1
61

COL5A2
COL5A1
COL3A1
DCN
COL1A2
CSPG2
SPP1
INHBA
ANGPT2

BAMBI
SFRP1

P4HA1
IGFBP1
RAP2B
PLAU
CCNG2

MRC2
GLB1
ATP5E
ADAM9
EGFR
ERG
USP6NL
Time segment 37 40 weeks
PECAM1
CYP19A1
IL2RB
CECAM1
62
How to display data

One of the most pressing questions in
bioinformatics research is how to display the
data effectively
We have two solutions
1. An interaction map
2. Geometrical considerations

63
An interaction map for 26 genes
64
Geometrical considerations

Will illustrate with the gene egfr
egfr is an epidermal growth factor
Functions on the cell surface
Activated by binding of its specific ligands
Responsible for many pathways in animal models

65
Gene egfr regulated by

66
Genes on a dodecahedron Gene regulatory network
for egfr
CSPG2
CCNG2
COL1A2
On backside PECAM1 ANGPT2 IGFBP1 MRC2
SPP1 USP6NL
PLAU
INHBA
DCN
Adapted from http//www.math.cornell.edu/mec/2003
-2004/geometry/platonic/dodecahedron.jpg
67
Conclusions

We can predict gene regulatory networks using
Bayesian networks as an intermediate step
When we leave arrows in network, we are able to
show causal relationships between the genes
Interaction maps and use of geometry are novel
ways to display gene behavior

68
Future Directions

A three-dimensional viewer with numerical values
will be implemented to use with the Weka software
Use molecular genetics techniques to validate a
portion of the results
Design a genetic programming algorithm
(evolutionary algorithm) to create a Bayesian
network

69
Acknowledgements

San Francisco State University
Leticia Márquez-Magaña, Chris Smith, Frank
Bayliss, Juan Castellon, Ernesto Flores, Rebecca
Garcia, Alba Gutierrez, Jainee Lewis, Rebecca
Mendez, Cylyn Cruz, Jasmin Reyes, Jackie
Robinson, Peter Thorsen, My family
UC San Francisco
Susan Fisher, Matthew Gormley
M.B.R.S.-R.I.S.E. Grant 5 - R25-GM59298

70
(No Transcript)

Write a Comment

User Comments (0)