A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data

Description:

PS3. The network score is a function of conditional probabilities ... PS3. Step 3: Randomly add/delete an edge and rescore the Bayesian network ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 71
Provided by: elinorve
Category:

less

Transcript and Presenter's Notes

Title: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data


1
A Bayesian Statistical Approach to Modeling Gene
Regulatory Pathways in Human Placental Data
  • Elinor Velasquez
  • Dept. of Biology
  • San Francisco State University

2
Outline of talk
  • Introduction
  • The experimental approach Obtaining placenta
    data
  • The experimental approach Modeling gene
    regulatory networks
  • Results from experiments
  • Conclusions and future work
  • Acknowledgements

3
Introduction

4
Overall goal
To use a bioinformatics model for which to better
understand the human placenta
http//www.biotechnologycenter.org/hio/assets/hisi
mages/placenta/placenta44.jpg
5
The human placenta
http//www.uchsc.edu/winnlab/index.html
6
The basal plate in the placenta
Site of known anatomical abnormalities in
preeclampsia
http//www.uchsc.edu/winnlab/projects.html
7
EGFR pathway
  • EGFR, cell surface receptor for epidermal growth
    factors
  • Potentially important gene for the placenta

British Journal of Cancer (2006) 94, 184 188
8
EGFR regulates gene expression

EGFR
CSPG2
DCN
ANGPT2
9
Causal relationships

EGFR
CSPG2
DCN
ANGPT2
10
Example of a gene regulatory network
Gene 1
Gene 3
Gene 2
Gene 5
Gene 6
Gene 4
11
Definition of a Bayesian network
  • There exist nodes (disks)
  • There are edges (arrows) between some of the
    nodes
  • Causality is implied by the edges
  • Acyclic

Gene 1
Gene 3
Gene 2
Gene 5
Gene 6
Gene 4
12
The experimental approach Obtaining placenta data

13
Data collected from microarrays
  • Data comes from 36 experiments conducted by
    Virginia Winn et al. at the SJ Fisher lab, UCSF
  • Gene expression profiling experiments

cRNA
hybridization
45000 dots (25-mer oligo probe sets) representing
the human genome
14

Traditional spotted arrays
15
What is a probe set?
  • Several oligonucleotides designed to hybridize to
    various parts of the mRNA generated from a single
    gene

Probe set
mRNA
gene
16

Affymetrix GeneChips
17
Microarray data
  • The normalized log 2 intensity values were
    centered to the median value of each probe set,
    by Virginia Winn et al.

5 time segments
1 2
3 4 5
A probe set x1 ... x6 y1 ... y9 z1 ...z6
w1...w6 s1 ... s9
36 data points per probe set
18
Microarray data
  • Red denotes the up regulated expression and green
    denotes the down regulated expression relative to
    the median value
  • Genes differentially expressed in the basal plate
    of placentas Rows contain data from a single
    basal plate cRNA sample and columns correspond to
    a single probe set.


http//www.uchsc.edu/winnlab/index.html
19
Summary of data used in bioinformatics experiments
Gene egfr
  • 36 placentas
  • 45, 000 probe sets
  • Time-series data from 14-16 weeks to term

20
The experimental approach Modeling gene
regulatory networks

21
Outline of bioinformatics experimental design

PS 1
PS 2
PS 4
PS 3
  • Step 1. Create a naïve Bayesian network using the
    probe set data
  • Step 2. Score the naïve Bayesian network
  • Step 3. Randomly add/delete an edge and rescore
    the Bayesian network
  • Step 4. Continue until best score reached
  • Step 5. Combine probe sets to create the gene
    regulatory network

22
Four probe sets (Three genes)

23
Define naïve Bayesian network
  • Choose a root node
  • All other nodes branch off of the root node
  • PS1 is the parent node

PS 1
PS 2
PS 4
PS 3
24
Step 1 Create a naïve Bayesian network using
probe set data
PS1
PS3
PS4
PS2
  • Use data from one time segment
  • Choose Weeks 23-24 data (6 placentas)
  • Choose 4 probe sets

25
Placenta data for Weeks 23-24
PS1 corresponds to 201984 which corresponds to
EGFR PS2 corresponds to 236034, PS3 corresponds
to 211148 PS2 and PS3
both correspond to ANGPT2 PS4 corresponds to
204620 which corresponds to CSPG2
26
Step 2 Score the naïve Bayesian network
  • We want to score this network

PS1
PS4
PS3
PS2
27
The network score is a function of conditional
probabilities
  • Conditional probability, Prob(N Pa(N)), is
    the probability of child node N given parent of N
  • Example Given a parent PS1s node has an
    associated expression value 10, what is the
    probability that its child node, PS4, has an
    expression value of 8?

PS1
PS4
28
Conditional probability

PS1
  • EGFR (PS1) is the parent node and
  • has value 10.
  • CSPG2 (PS4) is the child node and has
  • value 8 two times
  • Conditional probability 2/6

PS4
29
Score for a Bayesian network
  • The score of the naive network equals the
    product of all the nonzero conditional
    probabilities associated with the network
  • P(N1, N2, N3, N4) ? P(Ni pa(Ni))

4
i1
30
Score for the naïve Bayesian network
  • P(N1, N2, N3, N4) 1/3966
  • 2.54 x 10-5

PS1
PS4
PS2
PS3
31
Step 3 Randomly add/delete an edge and rescore
the Bayesian network
PS1
  • The score becomes 1/78732 1.27 x 10-5.

PS2
PS3
PS4
32
Step 4. Continue until best score reached
  • Since the score is a probability, we want the
    score to be high.
  • The naïve network is the better choice between
    the two networks, so we pick it as our final
    network.

PS1
PS4
PS2
PS3
33
Step 5. Combine probe sets to create the gene
regulatory network

EGFR
CSPG2
ANGPT2
34
40 probe sets (26 genes)

35
Gene regulatory pathwayfor 26 genes

Step 1. Create a naïve Bayesian network using 40
probe sets for each time segment Step 2. Score
the naïve Bayesian network Step 3. Randomly
add/delete an edge and rescore the Bayesian
network Step 4. Continue until best score
reached Step 5. Combine probe sets to create the
gene regulatory network for the placenta
36
Step 1. Create a naïve Bayesian network using
40 probe sets for each time segment

37
Create a naïve Bayesian network

PS 7
PS 8
PS 6
PS 9
PS 1
PS 5
PS 2
PS 4
PS 3
38
Step 2. Score the naïve Bayesian network

39
Score for a Bayesian network
  • The score of the naive network equals the
    product of all the nonzero conditional
    probabilities associated with the network
  • P(N1, N2, N3, N4) ? P(Ni pa(Ni))

40
i1
40
  • Step 3. Randomly add/delete an edge and rescore
    the Bayesian network
  • Step 4. Continue until best score reached

41
With four probe sets, at least two Bayesian
networks were constructed

PS1
PS1
PS2
PS3
PS4
PS2
PS3
PS4
42
Exhaustive search
  • To be certain that we have the best scoring
    network, we need to construct all possible
    networks from our naïve networks
  • With four probe sets, we only constructed one
    other network than the naïve network
  • How to construct all possible networks?

43
How do we construct all possible networks?
  • 1 probe set 1 Bayesian network
  • 2 probe sets 2 possible Bayesian networks
  • 3 probe sets 12 possible Bayesian networks
  • 4 probe sets 144 possible Bayesian networks
  • 5 probe sets 4800 possible Bayesian networks!
  • 6 probe sets ??
  • And so on

44
Welcome to Modern Heuristics
  • Step 1. Representation of a model
  • Step 2. The scoring function
  • Step 3. Defining the search problem
  • Step 4. Consider local optima

score
local change
45
Step 1 Representation of the model
  • The model is a gene regulatory pathway.
  • We are going to assume a Bayesian model for our
    probe set
  • The number of possible pathways is so large as to
    forbid an exhaustive search for the best Bayesian
    network.

PS 1
PS 2
PS 4
PS 3
46
Step 2 The scoring function
  • The fair coin, p(X heads) ½
  • What happens if the coin is unfairly weighted?
  • We need to re-think probability
  • p(X) ?p(x) r(x) dx
  • r(x) is a weight function.


47
Step 2. The scoring function
  • The scoring function is a probability
  • Assume the network has a Dirichlet distribution
    which is the weight function used to weight the
    conditional probabilities.

www.wikipedia.com
48
Step 2. The scoring function
  • Probability of a fixed network equals product
    of conditional probabilities times the Dirichlet
    distribution

40
P(N) ? P(Ni pa(Ni)) D(Ni)
i 1
such that
D(Ni) ? Ti?i-1(N i)
49
Step 3 Defining the search problem
  • What it means to search
  • a. Construct a first network (Use a naïve
    Bayesian network)
  • b. Score the first network using the scoring
    function
  • c. Perform the Hill-climbing algorithm.

50
Step 3. Defining the search problem The
Hill-climbing Algorithm
  • Randomly choose a node
  • Search in the neighborhood of that node for the
    best scoring network

51
Step 4. Consider local optima
  • Hill-Climbing is a traditional method for search
    techniques
  • Can get caught on local maxima
  • Step 4 is to keep choosing random nodes.

score
local change
randomly chosen node is the origin
From http//content.answers.com/
52
Software
  • Weka software package written by members of the
    University of Waikato, New Zealand,
    http//www.cs.waikato.ac.nz/ml/people.html
  • DEAL, R package, written by Susanne G. Bøttcher,
    Claus Dethlefsen, http//www.math.auc.dk/novo/deal
  • BayesNet Toolbox, Matlab package, written by
    Kevin Murphy, http//www.cs.ubc.ca/murphyk/Softwa
    re/BNT/bnt.html
  • ExpressionNet, written by Jingchun Zhu,
    http//expressionnet.sourceforge.net/

53
Results from experiments

54
26 genes

55

COL5A2
COL5A1
COL3A1
DCN
COL1A2
CSPG2
SPP1
INHBA
ANGPT2
BAMBI

SFRP1

P4HA1
IGFBP1
RAP2B
PLAU
CCNG2

MRC2
GLB1
ATP5E
ADAM9
EGFR
ERG
Ingenuity network
USP6NL
PECAM1
IL2RB
CECAM1
CYP19A1
56
Results for 26 genes
  • 40 probe sets (26 genes)
  • Data comes from five different time intervals
  • 1. 14 16 gestational weeks
  • 2. 18 19 gestational weeks
  • 3. 21 gestational week
  • 4. 23 24 gestational weeks
  • 5. 37 40 gestational weeks

57

COL5A2
COL5A1
COL3A1
DCN
COL1A2
CSPG2
SPP1
INHBA
ANGPT2

BAMBI
SFRP1

P4HA1
IGFBP1
RAP2B
PLAU
CCNG2

MRC2
GLB1
ATP5E
ADAM9
EGFR
ERG
Time Segment Week 14-16 weeks
USP6NL
PECAM1
CYP19A1
IL2RB
CECAM1
58

COL5A2
COL5A1
COL3A1
DCN
COL1A2
CSPG2
SPP1
INHBA
ANGPT2

BAMBI
SFRP1

P4HA1
IGFBP1
RAP2B
PLAU
CCNG2

MRC2
GLB1
ATP5E
ADAM9
EGFR
ERG
USP6NL
Time segment 18 19 weeks
PECAM1
CYP19A1
IL2RB
CECAM1
59

COL5A2
COL5A1
COL3A1
DCN
COL1A2
CSPG2
SPP1
INHBA
ANGPT2

BAMBI
SFRP1

P4HA1
IGFBP1
RAP2B
PLAU
CCNG2

MRC2
GLB1
ATP5E
ADAM9
EGFR
ERG
USP6NL
Time segment 21 weeks
PECAM1
CYP19A1
IL2RB
CECAM1
60

COL5A2
COL5A1
COL3A1
DCN
COL1A2
CSPG2
SPP1
INHBA
ANGPT2

BAMBI
SFRP1

P4HA1
IGFBP1
RAP2B
PLAU
CCNG2

MRC2
GLB1
ATP5E
ADAM9
EGFR
ERG
Time segment 23 24 weeks
USP6NL
PECAM1
CYP19A1
IL2RB
CECAM1
61

COL5A2
COL5A1
COL3A1
DCN
COL1A2
CSPG2
SPP1
INHBA
ANGPT2

BAMBI
SFRP1

P4HA1
IGFBP1
RAP2B
PLAU
CCNG2

MRC2
GLB1
ATP5E
ADAM9
EGFR
ERG
USP6NL
Time segment 37 40 weeks
PECAM1
CYP19A1
IL2RB
CECAM1
62
How to display data
  • One of the most pressing questions in
    bioinformatics research is how to display the
    data effectively
  • We have two solutions
  • 1. An interaction map
  • 2. Geometrical considerations

63
An interaction map for 26 genes
64
Geometrical considerations
  • Will illustrate with the gene egfr
  • egfr is an epidermal growth factor
  • Functions on the cell surface
  • Activated by binding of its specific ligands
  • Responsible for many pathways in animal models

65
Gene egfr regulated by

66
Genes on a dodecahedron Gene regulatory network
for egfr
CSPG2
CCNG2
COL1A2
On backside PECAM1 ANGPT2 IGFBP1 MRC2
SPP1 USP6NL
PLAU
INHBA
DCN
Adapted from http//www.math.cornell.edu/mec/2003
-2004/geometry/platonic/dodecahedron.jpg
67
Conclusions
  • We can predict gene regulatory networks using
    Bayesian networks as an intermediate step
  • When we leave arrows in network, we are able to
    show causal relationships between the genes
  • Interaction maps and use of geometry are novel
    ways to display gene behavior

68
Future Directions
  • A three-dimensional viewer with numerical values
    will be implemented to use with the Weka software
  • Use molecular genetics techniques to validate a
    portion of the results
  • Design a genetic programming algorithm
    (evolutionary algorithm) to create a Bayesian
    network

69
Acknowledgements
  • San Francisco State University
  • Leticia Márquez-Magaña, Chris Smith, Frank
    Bayliss, Juan Castellon, Ernesto Flores, Rebecca
    Garcia, Alba Gutierrez, Jainee Lewis, Rebecca
    Mendez, Cylyn Cruz, Jasmin Reyes, Jackie
    Robinson, Peter Thorsen, My family
  • UC San Francisco
  • Susan Fisher, Matthew Gormley
  • M.B.R.S.-R.I.S.E. Grant 5 - R25-GM59298

70
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com