EXPression ANalyzer and DisplayER - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

EXPression ANalyzer and DisplayER

Description:

EXPression ANalyzer and DisplayER. Adi Maron-Katz. Igor Ulitsky. Chaim Linhart ... Seagull Shavit. Roded Sharan. Israel Steinfeld. Yossi Shiloh. Ron Shamir ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 42
Provided by: rani151
Category:

less

Transcript and Presenter's Notes

Title: EXPression ANalyzer and DisplayER


1
EXPression ANalyzer and DisplayER
  • Adi Maron-Katz
  • Igor Ulitsky
  • Chaim Linhart
  • Amos Tanay
  • Rani Elkon

Seagull Shavit Roded Sharan Israel
Steinfeld Yossi Shiloh Ron Shamir
Ron Shamirs Computational Genomics Group
2
Schedule
  • 1015 1110 Expander
  • 1110 1130 Amadeus
  • 1130 1145 Spike
  • 1145 1210 Matisse, FAME
  • 1310 1500 Hands-on

3
  • EXPANDER an integrative package for analysis of
    gene expression data
  • Built-in support for 11 organisms
  • human, mouse, rat, chicken, zebra-fish, fly,
  • worm, arabidopsis, yeast (sce, pombe), E.coli ()
  • Demonstration - on a dataset collected in our
    labs

4
What can it do?
  • Low level analysis
  • Missing data estimation (KNN or manual)
  • Data adjustments (merge conditions, divide by
    base, take log)
  • Normalization quantile, loess
  • Filtering fold change, variation, t-test
  • Standardization mean 0 std 1
  • High level gene partition analysis
  • Clustering
  • Biclustering
  • Network based clustering

5
What Can it do? (II)
  • Ascribing biological meaning to patterns
  • Functional analysis (enriched Gene Ontology
    terms)
  • Promoter analysis (over-represented
    transcription factor binding sites)
  • Chromosomal location analysis
  • miRNA targets enrichment analysis
  • Custom annotations enrichment analysis
  • Signaling pathway enrichment analysis and
    visualization

6
Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
7
EXPANDER Data
  • Input data
  • Expression matrix (probe-row condition-column)
  • One-channel data (e.g., Affymetrix)
  • Dual-channel data, in which data is log R/G (e.g.
    cDNA microarrays)
  • .cel files
  • ID conversion file maps probes to genes
  • Gene sets data defines gene groups

8
EXPANDER Data (II)
  • Data definitions
  • Defining condition subsets
  • Data type scale (log)
  • Data Adjustments
  • Missing value estimation (KNN or arbitrary)
  • Merging conditions
  • Divide by base
  • Log data (base 2)

9
EXPANDER Preprocessing
  • Normalization removal of systematic biases from
    the analyzed chips
  • Implemented methods quantile, lowess
  • Visualization box plots, scatter plots (simple,
    M vs. A)
  • Filtering Focus downstream analysis on the set
    of responding genes
  • Fold-Change
  • Variation
  • Statistical tests (T-test)
  • SAM (Significance Analysis of Microarrays)
  • Standardization Mean0, STD1 (visualization)

10
Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
11
Cluster Analysis
  • partition the responding genes into distinct
    sets, each with a particular expression pattern
  • Identify major patterns ? reduce dimensionality
    of the problem
  • co-expression ? co-function
  • co-expression ? co-regulation
  • Partition the genes to achieve
  • High Homogeneity within clusters
  • High Separation between clusters

12
Cluster Analysis (II)
  • Implemented algorithms
  • CLICK, K-means, SOM, Hierarchical
  • Visualization
  • Mean expression patterns
  • Heat-maps
  • Chromosomal positions
  • Network sub-graph

13
Example study responses to ionizing radiation
Ionizing Radiation
Double Strand Breaks
14
Example study experimental design
  • Genotypes Atm-/- and control w.t. mice
  • Tissue Lymph node
  • Treatment Ionizing radiation
  • Time points 0, 30 min, 120 min
  • Microarrays Affymetrix U74Av2 (12k probesets)

15
Test case - Data Analysis
  • Dataset six conditions (2 genotypes, 3 time
    points)
  • Normalization
  • Filtering step define the responding genes
    set
  • genes whose expression level is changed by at
    least 1.75 fold
  • 700 genes met this criterion
  • The set contains genes with various response
    patterns we applied CLICK to this set of genes

16
Major Gene Clusters Irradiated Lymph node
Atm-dependent early responding genes
17
Major Gene Clusters Irradiated Lymph node
Atm-dependent 2nd wave of responding genes
18
Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
19
Ascribe functional meaning to clusters
  • Gene Ontology (GO) annotations for human, mouse,
    rat, chicken, fly, worm, arabidopsis, zebra-fish,
    yeast (sce and pombe) and e.coli.
  • TANGO Apply statistical tests that seek
    over-represented GO functional categories in the
    clusters.

20
Enriched GO Functional Categories
  • Hierarchical structure ? highly dependent
    categories.
  • Problems
  • High redundancy
  • Multiple testing corrections assume independent
    tests
  • TANGO

21
Functional Enrichment - Visualization
22
Functional Categories
cell cycle control (plt1x10-6 )
23
Functional Categories
Cell cycle control (plt5x10-6) Apoptosis (p0.001)
24
Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
25
Clues are in the promoters
Identify Transcriptional Regulators
ATM
Hidden layer
?
?
?
?
?
p53
TF-C
TF-B
TF-A
NEW
Observed layer
g3
g13
g12
g10
g9
g1
g8
g7
g6
g5
g4
g11
g2
26
Reverse engineering of transcriptional networks
  • Infers regulatory mechanisms from gene expression
    data
  • Assumption
  • co-expression ? transcriptional co-regulation ?
    common cis-regulatory promoter elements
  • Step 1 Identification of co-expressed genes
    using microarray technology (clustering algs)
  • Step 2 Computational identification of
    cis-regulatory elements that are over-represented
    in promoters of the co-expressed gene

27
PRIMA general description
  • Input
  • Target set (e.g., co-expressed genes)
  • Background set (e.g., all genes on the chip)
  • Analysis
  • Identify transcription factors whose binding site
    signatures are enriched in the Target set with
    respect to the Background set.
  • TF binding site models TRANSFAC DB
  • Default From -1000 bp to 200 bp relative the TSS

28
Promoter Analysis - Visualization
29
PRIMA - Results
30
PRIMA Results
NF-?B
5.1
3.8x10-8
p53
4.2
9.6x10-7
STAT-1
3.2
5.4x10-6
Sp-1
1.7
6.5x10-4
31
Biclustering
  • Clustering becomes too restrictive on large
    datasets
  • Seeks global partition of genes according to
    similarity in their expression across ALL
    conditions
  • Relevant knowledge can be revealed by identifying
    genes with common pattern across a subset of the
    conditions
  • Novel algorithmic approach is needed
    Biclustering

32
Biclustering SAMBAStatistical Algorithmic
Method for Bicluster Analysis
A. Tanay, R. Sharan, R. Shamir RECOMB 02
  • Bicluster (module) subset of genes with
    similar behavior in a subset of conditions
  • Computationally challenging has to consider
    many combinations of sub-conditions

33
Biclustering Visualization
34
Network based clustering
  • Goal to identify modules using gene expression
    data and interaction networks.
  • GE data Interactions file (.sif) .
  • MATISSE (Module Analysis via Topology of
    Interactions and Similarity SEts).

35
Network based clustering visualization
  • Similar to clustering visualization (gene list,
    mean patterns, heat maps, etc.).
  • Interactions map

36
Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
37
Location analysis
  • Goal Detect genes that are located in the same
    area and are co-expressed.
  • Search for over represented chromosomal areas
    within gene groups.
  • Statistical test.
  • Redundancy filter
  • Ignoring known gene clusters

38
Location analysis visualization
  • Enrichment analysis visualization
  • Positions view with color assignments

39
Input data
Normalization/ Filtering
Visualization utilities
Links to public annotation databases
Grouping (Clustering/ Biclustering/ Network
based clustering)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
Location enrichment
miRNA Targets enrichment (FAME)
40
miRNA Analysis
  • Goal detect microRNAs whose binding sites are
    over/under represented in the 3' UTRs of a gene
    groups.
  • FAME Algorithm
  • Empirical tests using a sampling technique
    (random permutations). 
  • Accounting for biases in the 3' UTR sequences

41
Thank you
42
Expression Data Input File
conditions
probes
43
ID Conversion File
44
Gene Sets File
45
Normalization Box plots
46
Standardization of Expression Levels
47
Cluster Analysis Visualization (I)
48
Cluster Analysis - Visualization (II)
49
Positions visualization
Write a Comment
User Comments (0)
About PowerShow.com