Computational Systems Biology of Cancer - PowerPoint PPT Presentation

About This Presentation
Title:

Computational Systems Biology of Cancer

Description:

(I) Computational Systems Biology of Cancer: Bud Mishra War on Cancer Reports that say that something hasn't happened are always interesting to me, because as we ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 61
Provided by: csNyuEdu3
Learn more at: https://cs.nyu.edu
Category:

less

Transcript and Presenter's Notes

Title: Computational Systems Biology of Cancer


1
(I)
Computational Systems Biology of Cancer
2
Bud Mishra
  • Professor of Computer Science, Mathematics and
    Cell Biology
  • Courant Institute, NYU School of Medicine, Tata
    Institute of Fundamental Research, and Mt. Sinai
    School of Medicine

3
(No Transcript)
4
War on Cancer
  • Reports that say that something hasn't happened
    are always interesting to me, because as we know,
    there are known knowns there are things we know
    we know. We also know there are known unknowns
    that is to say we know there are some things we
    do not know. But there are also unknown unknowns
    the ones we don't know we don't know.
  • US Secretary of Defense, Mr. Donald Rumsfeld,
    Quoted completely out of context.

5
Introduction Cancer and Genomics
  • What we know what we do not
  • Cancer is a disease of the genome.

6
Outline
  • Genomics
  • Genome Modification Repair
  • Segmental Duplications Models

7
Genomics
  • Genome
  • Hereditary information of an organism is encoded
    in its DNA and enclosed in a cell (unless it is a
    virus). All the information contained in the DNA
    of a single organism is its genome.
  • DNA molecule can be thought of as a very long
    sequence of nucleotides or bases
  • S A, T, C, G

8
Complementarity
  • DNA is a double-stranded polymer and should be
    thought of as a pair of sequences over S.
  • However, there is a relation of complementarity
    between the two sequences
  • A , T, C , G

9
DNA Structure.
  • The four nitrogenous bases of DNA are arranged
    along the sugar- phosphate backbone in a
    particular order (the DNA sequence), encoding all
    genetic instructions for an organism.
  • Adenine (A) pairs with thymine (T), while
    cytosine (C) pairs with guanine (G).
  • The two DNA strands are held together by weak
    bonds between the bases.

10
Structure and Components
  • Complementary base pairs
  • (A-T and C-G)
  • Cytosine and thymine are smaller (lighter)
    molecules, called pyrimidines
  • Guanine and adenine are bigger (bulkier)
    molecules, called purines.
  • Adenine and thymine allow only for double
    hydrogen bonding, while cytosine and guanine
    allow for triple hydrogen bonding.

11
Inert Rigid
  • Thus the chemical (hydrogen bonding) and the
    mechanical (purine to pyrimidine) constraints on
    the pairing lead to the complementarity and makes
    the double stranded DNA both chemically inert and
    mechanically quite rigid and stable.

12
The Central Dogma
  • The central dogma(due to Francis Crick in 1958)
    states that these information flows are all
    unidirectional
  • The central dogma states that once information'
    has passed into protein it cannot get out again.

13
The Central Dogma
  • The transfer of information from nucleic acid
    to nucleic acid, or from nucleic acid to protein,
    may be possible, but transfer from protein to
    protein, or from protein to nucleic acid is
    impossible. Information means here the precise
    determination of sequence, either of bases in the
    nucleic acid or of amino acid residues in the
    protein.

14
The New Synthesis
15
Cancer Initiation and Progression
Mutations, Translocations, Amplifications,
Deletions Epigenomics (Hyper Hypo-Methylation) A
lternate Splicing
Cancer Initiation and Progression
Proliferation, Motility, Immortality, Metastasis,
Signaling
16
Multi-step Nature of Cancer
  • Cancer is a stepwise process, typically requiring
    accumulation of mutations in a number of genes.
  • 6-7 independent mutations typically occur over
    several decades
  • Conversion of proto-oncogenes to oncogenes
  • Inactivation of tumor suppressor gene

17
Amplifications Deletions
Mutation in a TSG
Epigenomics
Conversion of a Proto-Oncogene
Deletion of a TSG
Deletion of a TSG
18
P53 Gene (TSG)
19
The Cancer Genome Atlas
  • Obtain a comprehensive description of the genetic
    basis of human cancer.
  • Identify and characterize all the sites of
    genomic alteration associated at significant
    frequency with all major types of cancers.

20
The Cancer Genome Atlas
  • Increase the effectiveness of research to
    understand
  • tumor initiation and progression,
  • susceptibility to carcinogensis,
  • development of cancer therapeutics,
  • approaches for early detection of tumors
  • the design of clinical trials.

21
Specific Goals
  • Identify all genomic alterations significantly
    associated with all major cancer types.
  • Such knowledge will propel work by thousands of
    investigators in cancer biology, epidemiology,
    diagnostics and therapeutics.

22
To Achieve this goal
  • Create large collection of appropriate,
    clinically annotated samples from all major types
    of cancer and
  • Characterize each sample in terms of
  • All regions of genomic loss or amplification,
  • All mutations in the coding regions of all human
    genes,
  • All chromosomal rearrangements,
  • All regions of aberrant methylation, and
  • Complete gene expression profile, as well as
    other appropriate technologies.

23
Biomedical Rationale
  • Cancer is a heterogeneous collection of
    heterogeneous diseases.
  • For example, prostate cancer can be an indolent
    disease remaining dormant throughout life or an
    aggressive disease leading to death.
  • However, we have no clear understanding of why
    such tumors differ.

24
Biomedical Rationale
  • Cancer is fundamentally a disease of genomic
    alteration.
  • Cancer cells typically carry many genomic
    alterations that confer on tumors their
    distinctive abilities (such as the capacity to
    proliferate and metastasize, ignoring the normal
    signals that block cellular growth and migration)
    and liabilities (such as unique dependence on
    certain cellular pathways, which potentially
    render them sensitive to certain treatments that
    spare normal cells).

25
History
  • 1960s
  • The genetic basis of cancer was clear from
    cytogenetic studies that showed consistent
    translocations associated with specific cancers
    (notably the so-called Philadelphia chromosome in
    chronic myelogenous leukemia).
  • 1970s
  • Recognize specific cancer-causing mutations
    through recombinant DNA revolution of the 1970s.
  • The identification of the first vertebrate and
    human oncogenes and the first tumor suppressor
    genes,
  • These discoveries have elucidated the cellular
    pathways governing processes such as cell-cycle
    progression, cell-death control, signal
    transduction, cell migration, protein
    translation, protein degradation and
    transcription.
  • For no human cancer do we have a comprehensive
    understanding of the events required.

26
Scientific Foundation for a Human Cancer Genome
Project
  • Gene resequencing.
  • Specific gene classes (such as kinases and
    phosphatases) in particular cancer types.
  • Epigenetic changes.
  • Loss of function of tumor suppressor genes by
    epigenetic modification of the genome such as
    DNA methylation and histone modification.
  • Genomic loss and amplification.
  • Consistent association with genomic loss or
    amplification in many specific regions,
    indicating that these regions harbor key cancer
    associated genes
  • Chromosome rearrangements.
  • Activate kinase pathways through fusion proteins
    or inactivating differentiation programs through
    gene disruption.
  • Hematological malignancies a single
    stereotypical translocation in some diseases
    (such as CML) and as many as 20 important
    translocations in others (such as AML).
  • Adult solid tumors have not been as well
    characterized, in part owing to technical
    hurdles.

27
Human Genome Structure
28
EBD
  • J.B.S. Haldane (1932)
  • A redundant duplicate of a gene may acquire
    divergent mutations and eventually emerge as a
    new gene.
  • Susumu Ohno (1970)
  • Natural selection merely modified, while
    redundancy created.

29
Evolution by Duplication
30
Human Condition
31
Mer-scape
  • Overlapping words of different sizes are analyzed
    for their frequencies along the whole human
    genome
  • Red 24-mers,
  • Green 21-mers
  • Blue18 mers
  • Gray15 mers
  • To the very left is a ubiquitous human
    transposon Alu. The high frequency is indicative
    of its repetitive nature.
  • To the very right is the beginning of a gene. The
    low frequency is indicative of its uniqueness in
    the whole genome.

32
Doublet Repeats
  • Serendipitous discovery of a new uncataloged
    class of short duplicate sequences doublet
    repeats.
  • almost always lt 100 bp
  • (Top) . The distance between the two loci of a
    doublet is plotted versus the chromosomal
    position of the first locus.
  • (Bottom) Distribution of doublets (black) and
    segmental duplications (red) across human
    chromosome 2

33
Segmental Duplications
  • 3.5 5 of the human genome is found to contain
  • segmental duplications, with length gt 5 or 1kb,
    identity gt 90.
  • These duplications are estimated to have emerged
    about 40Mya under neutral assumption.
  • The duplications are mostly interspersed
    (non-tandem), and happen both inter- and
    intra-chromosomally.

Human
From Bailey, et al. 2002
34
The Model
35
The Mathematical Model
0 d lt e
e d lt 2e
(k-1)e d lt ke
h1 proportion of duplications by repeat
recombination h1 proportion of
duplications by recombination of the specific
repeat h1- - proportion of duplications
by recombination of other repeats h0 proportion
of duplications by other repeat-unrelated
mechanism h0 proportion of h0 with
common specific repeat in the flanking regions
h0- proportion of h0 with no common
specific repeat in the flanking regions
h0- - proportion of h0 with no specific repeat
in the flanking regions
a mutation rate in duplicated sequences ß
insertion rate of the specific repeat ?
mutation rate in the specific repeat d
divergence level of duplications e divergence
interval of duplications.
36
Model Fitting
  • The model parameters (aAlu, ßAlu, ?Alu, aL1, ßL1,
    ?L1) are estimated from the reported mutation and
    insertion rates in the literature.
  • The relative strengths of the alternative
    hypotheses can be estimated by model fitting to
    the real data.
  • h1Alu 0.76 h1Alu 0.3 h1L1 0.76 h1 L1
    0.35.

37
Polyas Urn
38
Repetitive Random Eccentric GOD
  • Genome Organizing Devices (GOD)
  • Polyas Urn Model
  • Fs functions deciding probability distributions

39
DNA polymerase stuttering(replication slippage)
  • Insertion (duplication)
  • Deletion

normal replication
DNA polymerase
normal replication
polymerase pausing and dissociation
polymerase pausing and dissociation
3 realignment and polymerase reloading
3 realignment and polymerase reloading
40
Transposons causes deletions and duplications
  • Transposon actions in
  • genomic DNA

Donor DNA
  • Duplication

transposon
DNA intermediates
Target DNA
Transposase cuts in target DNA
  • Deletion

transposon
IS
IS
Transposon looping out
Transposon inserted
Transposon deleted
DNA is repaired-resulting in a duplication of the
transposon and target site
41
DNA mismatch repair mechanism prevents
duplications and deletions
Mismatch recognition on daughter strand
Degradation of the mismatched daughter strand in
the a-loop
DNA a-loop formation by translocation through the
proteins
Refilling the gap by DNA polymerase
Corrected daughter strand
DNA polymerase
42
Graph Model
A. Deletion With probability p0
  • Graph description
  • G V, E, G is a directed multi-graph
  • V ? Vi, all n-mers, i 14n,
  • E ? (Vi , Vj ), when Vi represents the n-mer
    immdiately upstream of the n-mer represented by
    Vj in the genomic sequence
  • ki incoming (or outgoing) degree of node i
    (Vi) copy number of the n-mer represented by
    Vi
  • During the graph evolution, at each iteration,
    one of the following happens deletion (with
    probability p0), duplication (with probability
    p1) or substitution (with probability q). p0 p1
    q 1.
  • For an arbitrary node Vi , the probabilities of
    one of the above events happens is as follows

B. DuplicationWith probability p1
C. Substitution With probability q
43
Model Fitting
44
Model Fitted Parameter
  • The substitution rate, q, increases with the
    sizes of mers.
  • The ratio between duplication and deletion rate,
    p1/p0, increases with sizes of mers.
  • The substitution rate, q, tends to decrease when
    the genome sizes are larger. Especially, q is
    much smaller in eukaryotic genomes than in
    prokaryotic genomes.

45
J.B.S. Haldane
  • If I were compelled to give my own appreciation
    of the evolutionary process, I should say this
    In the first place it is very beautiful. In that
    beauty, there is an element of tragedyIn an
    evolutionary line rising from simplicity to
    complexity, then often falling back to an
    apparently primitive condition before its end, we
    perceive an artistic unity
  • To me at least the beauty of evolution is far
    more striking than its purpose.
  • J.B.S. Haldane, The Causes of Evolution. 1932.

46
Human Cancer Genome
47
Cancer
48
A Challenge
  • At present, description of a recently diagnosed
    tumor in terms of its underlying genetic lesions
    remains a distant prospect. Nonetheless, we look
    ahead 10 or 20 years to the time when the
    diagnosis of all somatically acquired lesions
    present in a tumor cell genome will become a
    routine procedure.
  • Douglas Hanahan and Robert Weinberg
  • Cell, Vol. 100, 57-70, 7 Jan 2000

49
Karyotyping
50
CGHComparative Genomic Hybridization.
  • Equal amounts of biotin-labeled tumor DNA and
    digoxigenin-labeled normal reference DNA are
    hybridized to normal metaphase chromosomes
  • The tumor DNA is visualized with fluorescein and
    the normal DNA with rhodamine
  • The signal intensities of the different
    fluorochromes are quantitated along the single
    chromosomes
  • The over-and underrepresented DNA segments are
    quantified by computation of tumor/normal ratio
    images and average ratio profiles

Amplification
Deletion
51
CGH Comparative Genomic Hybridization.
52
Microarray Analysis of Cancer Genome
  • Representations are reproducible samplings of DNA
    populations in which the resulting DNA has a new
    format and reduced complexity.
  • We array probes derived from low complexity
    representations of the normal genome
  • We measure differences in gene copy number
    between samples ratiometrically
  • Since representations have a lower nucleotide
    complexity than total genomic DNA, we obtain a
    stronger specific hybridization signal relative
    to non-specific and noise

53
Copy Number Fluctuation
A1
B1
C1
A2
B2
C2
A3
B3
C3
54
Measuring gene copy number differences between
complex genomes
  • Compare the genomes of diseased and normal
    samples
  • Error Control
  • The use of representations augmenting microarrays
  • Representations reproducibly sample the genome
    thereby reducing its complexity. This increases
    the signal-to-noise ratio and improves
    sensitivity
  • Statistical Modeling the sources of Noise
  • Bayesian Analysis

55
Nattering Nabobs of Negativism
  • Some scientists are concerned about the cost and
    the possibility that a project of this scale
    could take money away from smaller ones
  • Craig Venter, who led a private project to
    determine the human DNA blueprint in competition
    with the human genome project, said it would make
    more sense to look at specific families of genes
    known to be involved in cancer.
  • Lee Hood, president of the Institute for Systems
    Biology, has called the premise of the Cancer
    Genome Project naïve, suggesting that
    signal-to-noise issues its researchers are likely
    to encounter will be absolutely enormous.

56
Challenges
  • ArrayCGH data is noisy!
  • Better computational biology algorithms
  • Better statistical modelingIn some technologies,
    the noise is systematic (e.g., affy SNP-chips)
  • Better bio-technologies (GRIN Genomics,
    Robotics, Informatics, Nanotechnology)
  • No efficient technology for epigenomics,
    translocations or de novo mutations.
  • Algorithms for multi-locus association studies
  • Systems view of cancer by integrating data from
    multiple sources
  • Genomic, Epigenomic, Transcriptomic, Metabolomic
    Proteomic.
  • Regulatory, Metabolic and Signaling pathways

57
Mishras Mystical 3Ms
  • Rapid and accurate solutions
  • Bioinformatic, statistical, systems, and
    computational approaches.
  • Approaches that are scalable, agnostic to
    technologies, and widely applicable
  • Promises, challenges and obstacles

Measure
Mine
Model
58
Discussions
  • QA

59
Answer to Cancer
  • If I know the answer I'll tell you the answer,
    and if I don't, I'll just respond, cleverly.
  • US Secretary of Defense, Mr. Donald Rumsfeld.

60
To be continued
  • Break
Write a Comment
User Comments (0)
About PowerShow.com