Identification of Copy Number Variants using Genome Graphs - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Identification of Copy Number Variants using Genome Graphs

Description:

Identification of Copy Number Variants using Genome Graphs Dhawal Verma Advisor: Dr. Hesham Ali – PowerPoint PPT presentation

Number of Views:116

Avg rating:3.0/5.0

Slides: 25

Provided by: Dhaw150

Category:

more less

Transcript and Presenter's Notes

Title: Identification of Copy Number Variants using Genome Graphs

1
Identification of Copy Number Variants using
Genome Graphs
Dhawal Verma Advisor Dr. Hesham Ali
2
Introduction

The genome of an organism offers great insight
into its
phylogenetic history
interaction with the environment
internal functions
Even within the same species, the genomes of two
individuals differ. Although the genomic
variations are relatively small, they account for
the observed variations in
Phenotypes (Heterozygosity)
Susceptibility towards various diseases.

3
Motivation

Heterozygosity is of major interest to
researchers of genetic variation in natural
populations.
It refers to the state of having different
alleles at one or more corresponding chromosomal
loci.
It is often one of the first "parameters" that
one presents in a data set. It can tell us a
great deal about the structure and even history
of a population.

4
Motivation

Role in diseases
SVs and CNVs have been associated with
susceptibility or resistance to disease.
Gene copy number can be elevated in cancer cells.
Copy number variation has also been associated
with autism, schizophrenia and idiopathic
learning disability.

5
Visualization of Genome

Genome A Book
Written in 4 letters of nucleotides A T G C
23 Chromosomes 23 Chapters
Genes Stories in each chapter

6
Genome
A T G C
7
Genomic Structural Variation

Every Genome differs from another, however like
different books differ from one another, the list
of words used in the book comes from a known
dictionary of words.
Like different positions of various words in a
sentence give out a different meaning, different
positions of the same gene in a genome give us a
distinct feature and causes a variation in
genomes.

8
Genomic Structural Variation

Until fairly recently, single nucleotide
polymorphisms (SNPs) were thought to be the main
source of variation in the human genome.
SNPs are variations that involve a change in just
one nucleotide.
THE RAT CAN RUN FAST
THE CAT CAN RUN FAST
High-throughput genome scanning technologies
revealed that there are other forms of genomic
variation beyond single base-pair substitutions.

9
Structural Variants

Structural variant is the umbrella term to
encompass a group of genomic alterations
involving segments of DNA typically larger than 1
kb.
The structural variation may be
Quantitative (CNVs indels and duplications)
Positional (translocations)
Orientational (inversions).

10
Copy Number Variants (CNVs)

CNVs are defined as chromosomal segments, at
least 1000 bases (1 kb) in length that vary in
number of copies from human to human.
CNVs are large chunks of DNA that are deleted,
copied, flipped or otherwise rearranged in
combinations that can be unique for each
individual.
YOU CAN RUN FAST
YOU CAN RUN RUN RUN FAST

11
SNP v CNV

SNPs always occur in two alleles, while
approximately 5 of the human genome are defined
as structurally variant in the normal population,
involving more than 800 independent genes.
Of the total amount of variation between two
human individuals
CNVs SVs gtgtgt SNPs

12
Primitive methods for detection of CNVs

Whole-genome array comparative genome
hybridization(aCGH), which tests the relative
frequencies of probe DNA segments between two
genomes
SNP arrays to measure the intensity of probe
signals at known SNP loci.

13
Limitations of the methods

The size and breakpoint resolution of any
prediction is correlated with the density of the
probes on the array, which is limited by
the density of the array itself (for aCGH)
the density of known SNP loci (for SNP arrays).
The limited resolution of arrays for high copy
count segments and the lack of unique probes make
it difficult to identify CNVs in repetitive
regions.

14
Research Proposal

An effective computational method for the
identification of Copy Number Variants in
genomes.
Model
Next generation sequencing data can be modeled in
a graph that we call a Genome Graph
Algorithm
By effectively mapping the reference genome graph
with the donor graph and making use of two
different existing methods known as Depth of
coverage and Paired end mapping together, we can
overcome their limitations and detect the CNVs
with higher sensitivity and specificity.

15
Research Proposal

Our literature survey indicates that PEM method
is used specifically for detecting SVs and DOC
method for CNVs.
CNVs in general are considered as a subset of
SVs.
By integrating the two methods we can use PEM
signatures at a higher magnification level.
Also the complexity can be reduced by using the
bi-directional genome graphs.

16
Genome Graphs

With the advent of Next Generation Sequencing
data that provides as much as 40x coverage for a
human genome, a special class of graphs known as
Genome graphs emerged.
The vertices represent either the reads or their
substrings (k-mers expressed by various
combinations of the letters A,T,G and C)
The edges represent overlaps between them (the
prefix of one read is the suffix of the other).

17
Genome Graphs

A genome graph can be unidirectional or
bi-directional.
Bi-directional genome graph implements the
double-strandedness of DNA.
Bi-directional graphs help reduce the complexity
of algorithm as in unidirectional graphs two
complementary walks are searched while in
bi-directional graph a single walk can fetch both
the sequence and its complement.

18
Depth of Coverage method

Depth of Coverage
The density of reads mapping to the region
Several recent studies have shown that by
comparing the DOC within a sliding window of the
genome to what is expected in the reference
genome, it is possible to detect changes in copy
number
Limitations
Very Complicated
difficult to separate true changes in copy number
from segments that are over or under sampled by
the sequencing technology.

19
Depth of Coverage
In a genome graph, an increase/decrease in number
of vertices between two known vertices in the
reference genome gives an indication of CNV.
20
Paired End Mapping method

PEM method
two paired reads (called matepairs) are generated
at an approximately known distance in the donor
genome.
The reads are mapped to a reference genome, and
matepairs mapping at a distance significantly
different from the expected length (termed
discordant) suggest structural variants.
Limitations
Difficulty in detecting larger insertions and
variation within areas of segmental duplications

21
PEM signatures in Genome Graphs
22
PEM signatures v DOC signatures

In contrast to most PEM signatures, DOC
signatures can be used to detect very large
events.
The larger the event, the stronger the signature.
However, they are not able to accurately identify
smaller events that PEM signatures, even with low
coverage, are able to detect.

23
Next Steps

While inversions do not cause any changes in copy
number, an area that is deleted (SV) will
correspond to a loss (CNV). Similarly, a region
containing a tandem duplication will be annotated
as both having an insertion (SV) and as
exhibiting a gain (CNV). In this way, any PEM
method for SV detection can be viewed as a method
for detecting a subset of CNVs
Depth of Coverage method is used extensively for
detecting CNVs, PEM technique is majorly used for
detecting SVs
Our hypothesis is that PEM techniques can be used
to improve both the sensitivity and specificity
of depth of coverage based methods using a
probabilistic graph-theoretic framework.