Identification of Copy Number Variants using Genome Graphs - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Identification of Copy Number Variants using Genome Graphs

Description:

Identification of Copy Number Variants using Genome Graphs Dhawal Verma Advisor: Dr. Hesham Ali – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 25
Provided by: Dhaw150
Category:

less

Transcript and Presenter's Notes

Title: Identification of Copy Number Variants using Genome Graphs


1
Identification of Copy Number Variants using
Genome Graphs
Dhawal Verma Advisor Dr. Hesham Ali
2
Introduction
  • The genome of an organism offers great insight
    into its
  • phylogenetic history
  • interaction with the environment
  • internal functions
  • Even within the same species, the genomes of two
    individuals differ. Although the genomic
    variations are relatively small, they account for
    the observed variations in
  • Phenotypes (Heterozygosity)
  • Susceptibility towards various diseases.

3
Motivation
  • Heterozygosity is of major interest to
    researchers of genetic variation in natural
    populations.
  • It refers to the state of having different
    alleles at one or more corresponding chromosomal
    loci.
  • It is often one of the first "parameters" that
    one presents in a data set. It can tell us a
    great deal about the structure and even history
    of a population.

4
Motivation
  • Role in diseases
  • SVs and CNVs have been associated with
    susceptibility or resistance to disease.
  • Gene copy number can be elevated in cancer cells.
  • Copy number variation has also been associated
    with autism, schizophrenia and idiopathic
    learning disability.

5
Visualization of Genome
  • Genome A Book
  • Written in 4 letters of nucleotides A T G C
  • 23 Chromosomes 23 Chapters
  • Genes Stories in each chapter

6
Genome
A T G C
7
Genomic Structural Variation
  • Every Genome differs from another, however like
    different books differ from one another, the list
    of words used in the book comes from a known
    dictionary of words.
  • Like different positions of various words in a
    sentence give out a different meaning, different
    positions of the same gene in a genome give us a
    distinct feature and causes a variation in
    genomes.

8
Genomic Structural Variation
  • Until fairly recently, single nucleotide
    polymorphisms (SNPs) were thought to be the main
    source of variation in the human genome.
  • SNPs are variations that involve a change in just
    one nucleotide.
  • THE RAT CAN RUN FAST
  • THE CAT CAN RUN FAST
  • High-throughput genome scanning technologies
    revealed that there are other forms of genomic
    variation beyond single base-pair substitutions.

9
Structural Variants
  • Structural variant is the umbrella term to
    encompass a group of genomic alterations
    involving segments of DNA typically larger than 1
    kb.
  • The structural variation may be
  • Quantitative (CNVs indels and duplications)
  • Positional (translocations)
  • Orientational (inversions).

10
Copy Number Variants (CNVs)
  • CNVs are defined as chromosomal segments, at
    least 1000 bases (1 kb) in length that vary in
    number of copies from human to human.
  • CNVs are large chunks of DNA that are deleted,
    copied, flipped or otherwise rearranged in
    combinations that can be unique for each
    individual.
  • YOU CAN RUN FAST
  • YOU CAN RUN RUN RUN FAST

11
SNP v CNV
  • SNPs always occur in two alleles, while
    approximately 5 of the human genome are defined
    as structurally variant in the normal population,
    involving more than 800 independent genes.
  • Of the total amount of variation between two
    human individuals
  • CNVs SVs gtgtgt SNPs

12
Primitive methods for detection of CNVs
  1. Whole-genome array comparative genome
    hybridization(aCGH), which tests the relative
    frequencies of probe DNA segments between two
    genomes
  2. SNP arrays to measure the intensity of probe
    signals at known SNP loci.

13
Limitations of the methods
  • The size and breakpoint resolution of any
    prediction is correlated with the density of the
    probes on the array, which is limited by
  • the density of the array itself (for aCGH)
  • the density of known SNP loci (for SNP arrays).
  • The limited resolution of arrays for high copy
    count segments and the lack of unique probes make
    it difficult to identify CNVs in repetitive
    regions.

14
Research Proposal
  • An effective computational method for the
    identification of Copy Number Variants in
    genomes.
  • Model
  • Next generation sequencing data can be modeled in
    a graph that we call a Genome Graph
  • Algorithm
  • By effectively mapping the reference genome graph
    with the donor graph and making use of two
    different existing methods known as Depth of
    coverage and Paired end mapping together, we can
    overcome their limitations and detect the CNVs
    with higher sensitivity and specificity.

15
Research Proposal
  • Our literature survey indicates that PEM method
    is used specifically for detecting SVs and DOC
    method for CNVs.
  • CNVs in general are considered as a subset of
    SVs.
  • By integrating the two methods we can use PEM
    signatures at a higher magnification level.
  • Also the complexity can be reduced by using the
    bi-directional genome graphs.

16
Genome Graphs
  • With the advent of Next Generation Sequencing
    data that provides as much as 40x coverage for a
    human genome, a special class of graphs known as
    Genome graphs emerged.
  • The vertices represent either the reads or their
    substrings (k-mers expressed by various
    combinations of the letters A,T,G and C)
  • The edges represent overlaps between them (the
    prefix of one read is the suffix of the other).

17
Genome Graphs
  • A genome graph can be unidirectional or
    bi-directional.
  • Bi-directional genome graph implements the
    double-strandedness of DNA.
  • Bi-directional graphs help reduce the complexity
    of algorithm as in unidirectional graphs two
    complementary walks are searched while in
    bi-directional graph a single walk can fetch both
    the sequence and its complement.

18
Depth of Coverage method
  • Depth of Coverage
  • The density of reads mapping to the region
  • Several recent studies have shown that by
    comparing the DOC within a sliding window of the
    genome to what is expected in the reference
    genome, it is possible to detect changes in copy
    number
  • Limitations
  • Very Complicated
  • difficult to separate true changes in copy number
    from segments that are over or under sampled by
    the sequencing technology.

19
Depth of Coverage
In a genome graph, an increase/decrease in number
of vertices between two known vertices in the
reference genome gives an indication of CNV.
20
Paired End Mapping method
  • PEM method
  • two paired reads (called matepairs) are generated
    at an approximately known distance in the donor
    genome.
  • The reads are mapped to a reference genome, and
    matepairs mapping at a distance significantly
    different from the expected length (termed
    discordant) suggest structural variants.
  • Limitations
  • Difficulty in detecting larger insertions and
    variation within areas of segmental duplications

21
PEM signatures in Genome Graphs
22
PEM signatures v DOC signatures
  • In contrast to most PEM signatures, DOC
    signatures can be used to detect very large
    events.
  • The larger the event, the stronger the signature.
  • However, they are not able to accurately identify
    smaller events that PEM signatures, even with low
    coverage, are able to detect.

23
Next Steps
  • While inversions do not cause any changes in copy
    number, an area that is deleted (SV) will
    correspond to a loss (CNV). Similarly, a region
    containing a tandem duplication will be annotated
    as both having an insertion (SV) and as
    exhibiting a gain (CNV). In this way, any PEM
    method for SV detection can be viewed as a method
    for detecting a subset of CNVs
  • Depth of Coverage method is used extensively for
    detecting CNVs, PEM technique is majorly used for
    detecting SVs
  • Our hypothesis is that PEM techniques can be used
    to improve both the sensitivity and specificity
    of depth of coverage based methods using a
    probabilistic graph-theoretic framework.

24
THANK YOU!
Write a Comment
User Comments (0)
About PowerShow.com