Title: Explore Biological Pathways from Noisy Array Data by Directed Acyclic Boolean Networks
1Explore Biological Pathways from Noisy Array Data
by Directed Acyclic Boolean Networks
- Lei M. Li
- University of Southern California
- Email lilei_at_hto.usc.edu
- Henry Horng-Shing Lu
- National Chiao Tung University
- Email hslu_at_stat.nctu.edu.tw
- Journal of Computational Biology (2005)
2Biochips DNA Microarrays
- Experiment Designs
- Image Processing
- Normalization
- Gene Selection
- Clustering
- Classification
- Pathway/Network Analysis
http//microarray.vai.org/Quality_control/images/H
A68_156_TestHyb.jpg
3Cell Cycle
4Pathway Reconstruction
- Explore pathways of biological elements from
microarray or other array data? - Structure of pathway sub-structure
- Beyond similarity scores
- Computational implementation
- Statistical significance
- Control of false positives and false negatives
5Boolean Networks
- Kauffman (1969, 1974, 1977, 1979), Kauffman and
Glass (1973) - Gene Networks (Liang et al., 1998 Akutsu et al.,
2000a-b, 2003 Shmulevich et al., 2002a-c,
2003a-b, 2004 Kim et al., 2002 Datta et al.,
2003, 2004 Hashimoto et al., 2004, Li and Lu,
2005) - Random Boolean Networks (http//www.activewebs.ch/
schwarzer/rbn/index.htm) - Probabilistic Boolean Networks(http//www2.mdande
rson.org/app/ilya/PBN/PBN.htm) - SPAN s-p-scores Associated with a Networks (Li
and Lu, 2005)
6An Example of Boolean Networks
1
2
3
From Boolean to Probabilistic Boolean Networks
as models of Genetic regulatory Networks, Ilya
et al., Proceeding of IEEE, 2002.
7Dynamics of Boolean Networks
Time Step t 1 1 0 1 0
011 001 010 011 011
From Boolean to Probabilistic Boolean Networks
as models of Genetic regulatory Networks, Ilya
et al., Proceeding of IEEE, 2002.
8Attractors and BasinsProliferation, Apoptosis,
Differentiation
From Boolean to Probabilistic Boolean Networks
as models of Genetic regulatory Networks, Ilya
et al., Proceeding of IEEE, 2002.
9Dynamical Patterns
http//www.geocities.com/jaap_bax/boolean.html
10Java Demo
- An Introduction to Complex Systems Torsten Reil,
Department of Zoology, University of Oxford
(http//users.ox.ac.uk/quee0818/compl
exity/complexity.html)
11Challenging Issues
- Random Boolean Networks (http//www.activewebs.ch/
schwarzer/rbn/index.htm) - Probabilistic Boolean Networks(http//www2.mdande
rson.org/app/ilya/PBN/PBN.htm) - Reconstruction of Boolean Networks
- SPAN s-p-scores Associated with a Networks (Li
and Lu, 2005)
12Directed Acyclic Boolean (DAB) Networks
- Objects binary elements and their Boolean duals
A and its dual ,on and off status of a gene - Relations
- Prerequisite
- Similarity AB
- Negative-similarity
- Not trivial in the presence of measurement errors
13Graph Representation
- Representation directed graph
- Ground set two vertices for one element, A and
it dual -
- Directed relation , A is prerequisite
for B - Undirected relation A-B, A is similar to B
- Redundancy covering pairs
- Acyclic and no conflict never
14An Example
Diagram
The table of state values
15How Many DAB Networks?
- Example
- A DAB corresponds to a subset of on-off states
- How many feasible DAB networks?
- Super-exponential
16Sampling from DAB Networks
- Sample with replacement from the table of states
- Exhaustive sample
- Arrange them in a binary array
- How to identify pairwise relations?
17Six Patterns
18Sampling and Design Issues
- How will this count strategy work if we sample
only a fraction of the state space? - Assumption no measurement error
- No false negative pairwise relations
- False positive pairwise relations can happen
- Selection bias a big problem!
- Design random if not exhaustive
19Miclassification Errors
- An exhaustive sample from the state space with
replacement- - Data-a binary array-
- An example of binary array data
20Problem and Strategy
- Data array with measurement error
- Goal reconstruct the DAB networks
- Strategy
- Consider each pair of elements
- find the most likely relation
- assign a significance score to this relation
- s-p-score
- Put together pairwise relations by ranking
- s-p-score
21Pairwsie Relation in 2 by 2 Tables
22Misclassified Data
Counts with error
23Probability Splitting
Observations with errors
24Diagonal Models and s-scores
-
-
- EM Algorithm
- Expectation-redistribute the count in each cell
- Asymptotics classical results
25Triangular Models and p-scores
-
- Computation EM algorithm
- The full model is saturated with parameters
- Will the likelihood method work? How and Why?
26Model Selection and s-p-scores
- Model selection
- The smaller the score, the more support to the
null hypothesis - Between the two diagonal models, select the
smaller s-score - Among the four triangular models, select the
smallest p-score - s-p-score the score of the model that minimizes
BIC - SPAN s-p-scores Associated with a Networks
27Statistical Inferences
- Statistical tests type I and II errors.
- Good controls of false negatives and positives.
- S-p-scores play the role of test statistics
- We expect those relations with smaller
s-p-scores and real, and those with larger scores
are false. - Irregularity
28Control of False Negatives
- We expect the p-score is a good estimate of p
under the null hypothesis - Accuracy of estimates
- The asymptotic of MLE works under the null
hypothesis - Fisher information matrix
29Asymptotic Variances
- Irregularity one singularity point
- Fix eliminate house-keeping and silent genes
30Control of False Positives
- Why not likelihood ratio test?
-
- Steins lemma the chance of type II error does
not go to zero
31Reconstruction of DAB Networks
- The s-p-score does not make too much sense if we
have two elements without a prior knowledge of p. - They are more meaningful if we have many
elements. - Rank the s-p scores in the ascending order.
- Watch list of pairwise relations.
- How to determine the threshold? Know biology.
32Simulation
simulate a data set of 76 samples with flipping
probability p0.05
33Reconstruction from Simulation
Estimated DAB networks
True DAB networks
34Yeast MARK pathway (Robert et al., 2000,
Science)
35Conclusion and Discussion
- DAB directed acyclic Boolean networks
- SPAN s-p-scores an exploratory tool with the
control of false positives and negatives - Future studies
- Discretization
- Relations involving more than two elements
- Time course data
- Dynamics and evolution
- Integrations with other methods
- Incorporation of related data and evidences
36Acknowledgements
- Professor Wing H. Wong, Michael Waterman, and
Simon Tavaré. - Institute of Pure and Applied Mathematics, UCLA.
- CEGS grant from NIH in USA for Lei M. Li.
- Grants from National Science Council in Taiwan
for Henry H.-S. Lu.