Discovery of higherorder functional domains in the human genome - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Discovery of higherorder functional domains in the human genome

Description:

Data for two different cell types (GM06990 and HeLaS3) is available for H3ac and ... High-confidence regions. Intersection of 4 individual segmentations. ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 24
Provided by: rthu
Category:

less

Transcript and Presenter's Notes

Title: Discovery of higherorder functional domains in the human genome


1
Discovery of higher-order functional domains in
the human genome
  • Bob Thurman
  • Noble Lab, Department of Genome Sciences
  • University of Washington
  • Seattle
  • 11 October, 2006
  • ASHG 2006, New Orleans

2
Idea
ENm005
1.7 Mb
  • Idea of functional domains has been around for a
    long time.
  • Now there are a variety of functional datasets
    available in nearly continuous fashion across
    genome. (ENCODE)
  • Apply modern computational techniques to these
    datasets to delineate large-scale functionally
    active and inactive regions of the genome.

RNA
H3K27me3
H3ac
TR50
A
A
I
3
The tools
  • Use wavelets to smooth disparate datasets out to
    a common scale.

1.6kb scale
6.4kb scale
4
The tools
  • Use Hidden Markov Models (HMMs) to segment
    regions based on data.

5
The procedure
6
Results single-track segmentations
A
A
A
A
I
I
I
I
H3ac segmentation, ENm005 (1.7Mb)
7
Concordance of segmentations
  • TR50 generally concordant with everything except
    RNA
  • H3ac generally concordant with everything except
    H3K27me3
  • H3K27me3 not very concordant with anything except
    TR50

8
Simultaneous four-track segmentation
A
A
I
I
Concordance with single-track
9
Enrichment of annotated genomic features
All
LTR
strict
loose
moderate
SINE Alus
LINEs (L1)
LINEs (L2)
non-exonic
EST overlap
CpG Islands
Simple repeats
mRNA Tx Starts
DNA transposons
Gencode Tx Starts
Spliced EST Tx Starts
Repeats
Conserved Elements (ENCODE MSA)
10
Conserved non-coding sequence
  • Against expectations, active domains are somewhat
    depleted in CNS (18 depleted over random
    expectation)
  • Does adding CNS track add anything? No even in
    terms of CN elements, there is little difference.
    In addition, there is very poor single-track
    concordance with other data types.

11
Cell-type differences
  • Data for two different cell types (GM06990 and
    HeLaS3) is available for H3ac and RNA expression.
  • Single-track segmentation concordance between
    cell types

12
Future work
  • Scale up!
  • More data types
  • More cell types
  • Model organisms whole genome functional data
    already available

13
Acknowledgments
  • John Stamatoyannopoulos
  • Bill Noble and the Noble lab
  • Nathan Day
  • Andrew Hemmaplardh
  • HMMseg, a Java program for multi-variate HMM
    segmentation with optional wavelet smoothing, to
    be released soon

14
FIN
15
The data
  • ENCODE project identify all functional elements
    of the genome.
  • Pilot phase looks at 1(30Mb) of genome
    comprising the ENCODE regions.
  • Functional datasets collected in common cell line
    HeLaS3.
  • Histone modifications H3ac (Sanger) and H3K27me3
    (UCSD)
  • RNA transcription levels (Affymetrix)
  • DNA replication timing TR50 (University of
    Virginia)

16
4-state segmenation stats
17
Gene Ontology (GO) analysis
  • Any classes of genes over-represented in
    active/inactive states?
  • over-representation of genes involved in signal
    transduction (particularly olfactory
    G-protein-coupled receptors) within repressed
    domains

18
Enrichment of annotated genomic features
Affy RNA
H3K27me3
TR50
19
Enrichment of annotated genomic features
A
A
A
A
I
I
I
I
H3ac
20
High-confidence regions
  • Intersection of 4 individual segmentations.
  • 10.3Mb total (over 1/3 of ENCODE), and almost all
    of that (except for 1.5kb) is concordant with the
    4-track segmentation.

21
Outline
  • Continuous genome-wide data and scale
  • Tools for analysis
  • Wavelets and HMMs
  • Results
  • Definition of functional domains

22
Continuous genome-wide data and scale
  • A wide variety of nearly continuous genomic data
    is now available in 30Mb (1) of the human
    genome comprising the ENCODE regions.
  • We focus on the issue of scale. Use wavelets to
  • normalize datasets collected on widely disparate
    scales and
  • elucidate trends in single and combined datasets
    at large scales.

23
Segment regions using wavelets and Hidden Markov
Models (HMMs)
Approximate scale of 60kb gives minimum segment
size of 10kb.
Write a Comment
User Comments (0)
About PowerShow.com