Inference of Transcriptional Regulation Network with Gene Expression Data - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Inference of Transcriptional Regulation Network with Gene Expression Data

Description:

Each protein has a specific function ... Different regulators act in different parts and stages in concert to control cell cycle ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 50
Provided by: andrewta1
Category:

less

Transcript and Presenter's Notes

Title: Inference of Transcriptional Regulation Network with Gene Expression Data


1
Inference of Transcriptional Regulation Network
with Gene Expression Data
  • Andrew Kwon

2
(No Transcript)
3
Role of Proteins
  • Both functional and structural
  • Main agents of cellular functions
  • Each protein has a specific function
  • The amount of each protein in the cell must be
    controlled carefully
  • Elaborate Regulatory Network

4
Gene Regulatory Network
  • Fundamental mechanism by which protein production
    and cellular functions are controlled
  • Complex input-output system made of proteins and
    genes for controlling cellular functions
  • Important for understanding of many important
    problems, including medical ones

5
Cell Cycle
  • After certain amount of growth, cell divides into
    two identical cells
  • Need to duplicate cellular components and equally
    divide among progenitors
  • Different regulators act in different parts and
    stages in concert to control cell cycle

6
Types of Regulation
  • Activation
  • Increase in protein A leads to increase in gene
    Bs transcription
  • Inhibition
  • Increase in protein A leads to decrease in gene
    Bs transcription
  • Not a simple binary relationship
  • Many genes could act on a particular gene at once
    - Complexes
  • Feedback and Self-Regulation

7
Example of Regulatory Network
S phase control in yeast
8
(No Transcript)
9
Microarray
  • Each spot contains a specific probe designed for
    a single cDNA
  • When more cDNA binds to a spot, the red intensity
    increases
  • Allow study of gene expression in large scale

10
Which Genes Are Related?
  • Goal to find out which pairs of genes have
    direct regulatory relationship

11
Correlation Method
  • Standard correlation coefficient
  • Widely used method for sequence similarity
    comparisons
  • Tests for degree of linear relationship between
    two variables
  • Cannot take into account the time delay involved
    in gene regulation
  • Strongly favours global over local similarities

12
Edge Detection Method (1)
  • By Filkov et al.
  • Focus on improving local similarity detection
  • Scan through gene expression curves and determine
    where major edges occur, and remove spurious
    edges
  • Construct primary edges using local minima and
    maxima
  • Filter out those edges whose height does not make
    the pre-determined threshold

13
Edge Detection Method (2)
  • Group those edges with similar direction
  • Now left with edges depicting the major features
    only
  • compare the edge profiles between two genes by
    summing up closely located edges from two genes
    with the same direction

14
Edge Detection Method (3)
  • Scoring Formula
  • d agreement of slopes of edges (-1 or 1)
  • n number of edges
  • a, b two genes being compared
  • ? gap between edges
  • ?max maximum allowable time difference between
    two edges

15
Edge Detection Method (4)
  • Does not differentiate between the direction of
    regulation
  • Cannot be used to find inhibitory relationships
  • Allows for negative time delays between two
    corresponding edges on the basis that there is
    not enough data resolution
  • Detects strong local matches only

16
Bayesian Networks
  • Consists of two parts
  • Directed Acyclic Graph (Structure of GRN)
  • Set of parameters for the DAG (Statistical
    Hypothesis)
  • DAG represents the causal relations among a set
    of random variables (gene expression levels)
  • X causes Y if and only if there is a direct edge
    from X to Y

17
Bayesian Networks (2)
  • Must learn the network using observed data
  • Perform a series of conditional independence
    tests and construct the most likely set of DAGs
    based on the results
  • Assign a score to each DAG based on the sample
    data, and search for the highest scoring one

18
Bayesian Networks (3)
  • Need large sample size for accuracy
  • Representing Time
  • Increases the number of variables dramatically,
    if one is to represent the time in the bayesian
    network
  • Dynamic Bayesian Network
  • High complexity

19
Event Method
  • Need a method that balances between global and
    local similarity
  • Need to make use of temporal evidence
  • Need to account for directionality of regulation
  • Need to be computationally efficient

20
Hypotheses on Regulation
  • Hypothesis 1 A activates B
  • Rise in expression of A followed by rise in
    expression of B
  • Fall in expression of A followed by fall in
    expression B
  • Hypothesis 2 A inhibits B
  • Rise in A followed by fall in B
  • Fall in A followed by rise in A
  • Time delay between 2 corresponding events

21
Events
  • Directional changes in expression profile
  • State of gene expression at an instant
  • 3 possible states
  • Rise, Constant, Fall (R, C, F)
  • Event state/type determined by the slope of the
    expression profile

22
Event Conversion
  • Microarray data is quite noisy
  • Perform smoothing to reduce noise before
    calculating slopes
  • Select the flat region around slope of 0
  • Classify into R, C, F based on the slope values
  • Any value falling in the flat region ? C
  • Result 2 event strings

23
Event String Alignment
  • Need to best match 2 event strings with noise and
    time delay in mind
  • Use Needleman-Wunschs global sequence alignment
    algorithm
  • Handling of time delay
  • Events that do not occur at the same time may
    still be related to each other
  • No negative time delay

24
Scoring Matrix (1)
  • Scoring Method for Event Method

R C F
R S(dT) 0 -ßS(dT)
C 0 0 0
F -ßS(dT) 0 aS(dT)
0 lt S(dT) 1 0 a 1, 0 ß 1
dT time delay between two events If dT lt 0,
match penalty 8
25
Scoring Matrix (2)
  • R-R matches weighted more than F-F matches
  • Decreases in mRNA levels less indicative
  • Any match with C assigned neutral score of 0
  • C region of uncertainty
  • Could be due to any number of reasons
  • Penalty for R-F matches
  • Scores function of time delay dT

26
Example
27
Event vs. Correlation
  • Event scores high, but correlation scores low
  • Time delay lowers the correlation coefficient

28
Event vs. Edge Detection
  • Event scores high, edge detection scores low
  • Bolded edges what edge detection finds
  • Only edges A and B are close enough to be added
    to score

29
Spellmans Data Sets
  • Snapshots of yeast cellular mRNA levels at
    regular time intervals using cDNA microarrays
  • 4 separate data sets based on different cell
    arresting methods used
  • a-arrest, elutriation, CDC15, CDC28 temp.
    sensitive mutants
  • Yeast genome 6200 genes
  • Too many need to reduce search space

30
Selecting Genes to Study
  • Want to restrict to genes related to cell-cycle
    regulation
  • Filkov et al searched for known transcriptional
    regulation pairs in Yeast Proteome Database
  • 888 transcriptional regulations
  • 486 genes
  • 647 activations, 241 inhibitions

31
Pre-Processing Data
  • Microarray data by Spellman contains many missing
    points
  • Experimental errors
  • Use linear interpolation to fill in for the
    missing points
  • If the ratio of the missing points to valid
    points is greater than the threshold, ignore the
    gene data in question

32
Analysis of the Test Set (1)
  • a and CDC28 data sets analyzed

Data Set ORFs Genes
a 4489 348
CDC28 6103 458
  • Need to compare each gene with all the others
  • gt120,000 comparisons for alpha
  • gt200,000 comparisons for CDC28

33
Analysis of the Test Set (2)
  • Correlation and edge detection methods no
    directionality of regulation
  • Only ½ as many comparisons as the event method
  • To make comparison possible, remove
    directionality aspect from the event method as
    well

34
Analysis Results (1)
  • Overlapping results among 3 methods (all results)

Methods Alpha CDC28
Event Correlation 3367 2916
Event Edge 2081 3362
Correlation Edge 1989 2252
  • a0.7, -ß 0.3 used for scoring matrix
  • Top-10,000 rankings

35
Analysis Results (2)
  • Overlapping results among 3 methods (true
    positive results only)

Methods Alpha CDC28
Event Correlation 11 9
Event Edge 0 0
Correlation Edge 0 0
  • a0.7, -ß 0.3 used for scoring matrix
  • Top-10,000 rankings

36
Analysis Results (3)
  • lt 1/3 of results by any 2 methods overlap
  • Event method finds significantly different pairs
    from the other methods
  • Very little overlap between true positives
  • Consistent with the fact the 3 methods employ
    different search strategies
  • Local vs. global similarity

37
True () distribution for top-k results
CDC28 data set
Alpha data set
  • 0 lt k lt 10,000

38
Effects of Time Delay (1)
  • Perform time-shifting experiments and see how
    score changes

Gene 1 Gene 2 Correlation Edge Event
YDR225W YDR224C 0.94 0.30 13.41
YDR225W YDR224C-1 0.46 0.05 12.92
YDR225W YDR224C-2 -0.24 -0.46 11.98
YMR199W YPL256C 0.82 0.78 8.92
YMR199W YPL256C-1 0.40 0.39 8.64
YMR199W YPL256C-2 -0.19 -0.06 9.24
39
Effects of Time Delay (2)
  • Correlation coefficients drop rapidly as time
    delay is introduced
  • Supports assertion that correlation cannot handle
    time delay gracefully
  • Unexpected drop in edge detection scores
  • Probably due to problem in finding significant
    edges to compare

40
Effects of Scoring Matrix Parameters
  • True () for Event Method

a -ß Alpha Act. Alpha Inh. CDC28 Act. CDC28 Inh.
0.7 0.7 62 20 72 20
0.7 0.5 62 20 72 20
0.7 0.3 71 20 93 24
0.5 0.7 62 21 73 26
0.5 0.5 62 21 73 25
0.5 0.3 72 22 92 24
0.3 0.7 62 16 72 24
0.3 0.5 62 16 72 24
0.3 0.3 71 20 87 21
41
Problems with Results
  • Many genes shared identical expression curves,
    incl. unrelated genes
  • Poor resolution of data
  • Edge detection method
  • Too many scores of 0
  • Simply cannot find enough edges
  • Significance of scores doubtful

42
More Notes on Edge
  • Cumulative Distribution Function for Edge
  • Zero scores make up the vertical column

43
Synthetic Data Sets (1)
  • Spellmans data sets not enough to test the
    algorithms properly
  • 4 different data sets
  • Constant time delay
  • Irregular time delay
  • Partial matching
  • Differential weighting of events

44
Synthetic Data Sets (2)
  • Each data set consists of equal number of gene
    profiles and random profiles
  • Gene profiles genei
  • Random profiles randomi
  • genei and geneix related
  • Better match if x is smaller

45
Synthetic Data Sets (3)
  • Avg. No. of True ()

Data Set Correlation Event
Constant Time Delay 31.6 39.8
Irregular Time Delay 27.2 33.8
Partial Matching 44.6 40.6
Differential Weighting 36.2 45.0
  • Event method superior except in partial matching
  • Could not test edge detection method
  • Could not produce non-zero scores

46
Summary
  • Event Method find potential regulatory pairs
    from gene expression data
  • Based on key features of gene expression
  • Computationally efficient
  • Perform comparably to correlation and edge
    detection methods in finding true () from
    Spellmans data sets
  • Outperform correlation in synthetic data sets

47
Future Work (1)
  • Limitation of real-world data
  • Obtain data with better resolution
  • Integrate data with other a priori knowledge
  • Narrow down focus to transcription factors
  • More realistic synthetic data
  • Realistic modeling of artificial regulatory
    network

48
Future Work (2)
  • Transitive Closure
  • It would make sense to remove E13 from the pair
    rankings in order to accommodate other potential
    pairs

1
If E12 and E23 have higher scores than E13, Node
3 would be only conditionally dependent on Node 1
3
2
49
Future Work (3)
  • Improvement of event method
  • Different number of event types
  • Global regulatory network
  • Combine pairings by event method to form
    potential networks
  • Other uses for event method
  • Different types of data, such as proteins
  • Adaptation to other fields may be possible
Write a Comment
User Comments (0)
About PowerShow.com