Canadian Bioinformatics Workshops - PowerPoint PPT Presentation

1 / 88
About This Presentation
Title:

Canadian Bioinformatics Workshops

Description:

Canadian Bioinformatics Workshops www.bioinformatics.ca – PowerPoint PPT presentation

Number of Views:322
Avg rating:3.0/5.0
Slides: 89
Provided by: DavidW320
Category:

less

Transcript and Presenter's Notes

Title: Canadian Bioinformatics Workshops


1
Canadian Bioinformatics Workshops
  • www.bioinformatics.ca

2
2
Module Title of Module
3
Module 6
  • David Wishart
  • Informatics and Statistics for Metabolomics
  • June 16-17, 2011

4
A Typical Metabolomics Experiment
5
2 Routes to Metabolomics
Quantitative (Targeted) Methods
Chemometric (Profiling) Methods
6
Metabolomics Data Workflow
Chemometric Methods Targeted Methods
  • Data Integrity Check
  • Spectral alignment or binning
  • Data normalization
  • Data QC/outlier removal
  • Data reduction analysis
  • Compound ID
  • Data Integrity Check
  • Compound ID and quantification
  • Data normalization
  • Data QC/outlier removal
  • Data reduction analysis

7
Data Integrity/Quality
  • LC-MS and GC-MS have high number of false
    positive peaks
  • Problems with adducts (LC), extra derivatization
    products (GC), isotopes, breakdown products
    (ionization issues), etc.
  • Not usually a problem with NMR
  • Check using replicates and adduct calculators

MZedDB http//maltese.dbs.aber.ac.uk8888/hrmet/in
dex.html HMDB http//www.hmdb.ca/search/spectra?ty
pems_search
8
Data/Spectral Alignment
  • Important for LC-MS and GC-MS studies
  • Not so important for NMR (pH variation)
  • Many programs available (XCMS, ChromA, Mzmine)
  • Most based on time warping algorithms

http//mzmine.sourceforge.net/ http//bibiserv.tec
hfak.uni-bielefeld.de/chroma http//metlin.scripps
.edu/download/
9
Binning (3000 pts to 14 bins)
xi,yi x 232.1 (AOC) y 10 (bin )
bin1 bin2 bin3 bin4 bin5 bin6 bin7 bin8...
10
Data Normalization/Scaling
Same or different?
  • Can scale to sample or scale to feature
  • Scaling to whole sample controls for dilution
  • Normalize to integrated area, probabilistic
    quotient method, internal standard, sample
    specific (weight or volume of sample)
  • Choice depends on sample circumstances

11
Data Normalization/Scaling
  • Can scale to sample or scale to feature
  • Scaling to feature(s) helps manage outliers
  • Several feature scaling options available log
    transformation, auto-scaling, Pareto scaling,
    probabilistic quotient, and range scaling

MetaboAnalyst http//www.metaboanalyst.ca Dieterle
F et al. Anal Chem. 2006 Jul 178(13)4281-90.
12
Data QC, Outlier Removal Data Reduction
  • Data filtering (remove solvent peaks, noise
    filtering, false positives, outlier removal --
    needs justification)
  • Dimensional reduction or feature selection to
    reduce number of features or factors to consider
    (PCA or PLS-DA)
  • Clustering to find similarity

13
MetaboAnalyst
  • Web server designed to handle large sets of
    LC-MS, GC-MS or NMR-based metabolomic data
  • Supports both univariate and multivariate data
    processing, including t-tests, ANOVA, PCA, PLS-DA
  • Identifies significantly altered metabolites,
    produces colorful plots, provides detailed
    explanations summaries
  • Links sig. metabolites to pathways via SMPDB

http//www.metaboanalyst.ca
14
Metabolite concentrations
MS / NMR peak lists
GC/LC-MS raw spectra
MS / NMR spectra bins
  • Peak detection
  • Retention time correction

Baseline filtering
Peak alignment
  • Resources utilities
  • Peak searching
  • Pathway mapping
  • Name conversion
  • Lipidomics
  • Metabolite set libraries
  • Data integrity check
  • Missing value imputation
  • Data normalization
  • Row-wise normalization (4)
  • Column-wise normalization (4)
  • Statistical analysis
  • Univariate analysis
  • Dimension reduction
  • Feature selection
  • Cluster analysis
  • Classification
  • Pathway analysis
  • Enrichment analysis
  • Topology analysis
  • Interactive visualization
  • Time-series /two factor
  • Visualization
  • Two-way ANOVA
  • ASCA
  • Temporal comparison

15
MetaboAnalyst Overview
  • Raw data processing
  • Using MetaboAnalyst
  • Data Reduction Statistical analysis
  • Using Metaboanalyst
  • Functional enrichment analysis
  • Using MSEA in MetaboAnalyst
  • Metabolic pathway analysis
  • Using MetPA in MetaboAnalyst

16
Example Datasets
17
Example Datasets
18
Metabolomic Data Processing
19
Common Tasks
  • Purpose to convert various raw data forms into
    data matrices suitable for statistical analysis
  • Supported data formats
  • Concentration tables (Targeted Analysis)
  • Peak lists (Untargeted)
  • Spectral bins (Untargeted)
  • Raw spectra (Untargeted)

20
Data Upload
21
Alternatively
22
Data Set Selected
  • Here we will be selecting a data set from dairy
    cattle fed different proportions of cereal grains
    (0, 15, 30, 45)
  • The rumen was analyzed using NMR spectroscopy
    using quantitative metabolomic techniques
  • High grain diets are thought to be stressful on
    cows

23
Data Integrity Check
24
Data Normalization
25
Data Normalization
  • At this point, the data has been transformed to a
    matrix with the samples in rows and the variables
    (compounds/peaks/bins) in columns
  • MetaboAnalyst offers three types of
    normalization, row-wise normalization,
    column-wise normalization and combined
    normalization
  • Row-wise normalization aims to make each sample
    (row) comparable to each other (i.e. urine
    samples with different dilution effects)

26
Data Normalization
  • Column-wise normalization aims to make each
    variable (column) comparable to each other. This
    procedure is useful when variables are of very
    different orders of magnitude. Four methods have
    been implemented for this purpose log
    transformation, autoscaling, Pareto scaling and
    range scaling

27
Normalization Result
28
Quality Control
  • Dealing with outliers
  • Detected mainly by visual inspection
  • May be corrected by normalization
  • May be excluded
  • Noise reduction
  • More of a concern for spectral bins/ peak lists
  • Usually improves downstream results

29
Visual Inspection
  • What does an outlier look like?

30
Outlier Removal
31
Noise Reduction
32
Noise Reduction (cont.)
  • Characteristics of noise
  • Low intensities
  • Low variances (default)

33
Data Reduction and Statistical Analysis

34
Common tasks
  • To detect interesting patterns
  • To identify important features
  • To assess difference between the phenotypes
  • Classification / prediction

35
(No Transcript)
36
ANOVA
37
View Individual Compounds
38
Questions
  • Q Which compounds show significant difference
    among all the neighboring groups (0-15, 15-30,
    and 30-45)?
  • Q For Uracil, are groups 15, 30, 45
    significantly different from each other?

39
Template Matching
  • Looking for compounds showing interesting
    patterns of change

40
Template Matching (cont.)
41
Question
  • Q Identify compounds that decrease in the first
    three groups but increase in the last group?

42
PCA Scores Plot
43
PCA Loading Plot
44
Question
  • Q Identify compounds that contribute most to the
    separation between group 15 and 45

45
PLS-DA Score Plot
46
Determine of Components
47
Important Compounds
48
Model Validation
49
Questions
  • Q What does p lt 0.01 mean?
  • Q How many permutations need to be performed if
    you want to claim p value lt 0.0001?

50
Heatmap Visualization
51
Heatmap Visualization (cont.)
52
Question
  • Q Identify compounds with a low concentration in
    group 0, 15 but increase in the group 35 and 45
  • Q Which compound is the only one significantly
    increased in group 45?

53
Download Results
54
Analysis Report
55
Metabolite Set Enrichment Analysis

56
Metabolite Set Enrichment Analysis (MSEA)
  • Web tool designed to handle lists of metabolites
    (with or without concentration data)
  • Modeled after Gene Set Enrichment Analysis (GSEA)
  • Supports over representation analysis (ORA),
    single sample profiling (SSP) and quantitative
    enrichment analysis (QEA)
  • Contains a library of 6300 pre-defined metabolite
    sets including 85 pathway sets 850 disease sets

http//www.msea.ca
57
Enrichment Analysis
  • Purpose To test if there are some biologically
    meaningful groups of metabolites that are
    significantly enriched in your data
  • Biological meaningful
  • Pathways
  • Disease
  • Localization
  • Currently, only supports human metabolomic data

58
MSEA
  • Accepts 3 kinds of input files
  • 1) list of metabolite names only
  • 2) list of metabolite names concentration data
    from a single sample
  • 3) a concentration table with a list of
    metabolite names concentrations for multiple
    samples/patients

59
Start with a Compound List
60
Upload Compound List
61
Compound Name Standardization
62
Name Standardization (cont.)
63
Select a Metabolite Set Library
64
Result
65
Result (cont.)
66
The Matched Metabolite Set
67
Single Sample Profiling
68
Single Sample Profiling (cont.)
69
Concentration Comparison
70
Concentration Comparison (cont.)
71
Quantitative Enrichment Analysis
72
Data Set Selected
  • Here we are using a collection of metabolites
    identified by NMR (compound list
    concentrations) from the urine from 77 lung and
    colon cancer patients, some of whom were
    suffering from cachexia (muscle wasting)

73
Result
74
The Matched Metabolite Set
75
Question
  • Q Are these metabolites increased or decreased
    in the cachexia group?

76
Metabolic Pathway Analysis with MetPA

77
Pathway Analysis
  • Purpose to extend and enhance metabolite set
    enrichment analysis for pathways by
  • Considering the pathway structures
  • Supporting pathway visualization
  • Currently supports 15 organisms

78
Data Upload
79
Data Set Selected
  • Here we are using a collection of metabolites
    identified by NMR (compound list
    concentrations) from the urine from 77 lung and
    colon cancer patients, some of whom were
    suffering from cachexia (muscle wasting)

80
Normalization
81
Pathway Libraries
82
Network Topology Analysis
83
Which Node is More Important?
High degree centrality
High betweenness centrality
84
Pathway Visualization
85
Pathway Visualization (cont.)
86
Question
  • Q Which pathway do you think is likely to be
    affected the most? Why?

87
Result
88
Not Everything Was Covered
  • Clustering (K-means, SOM)
  • Classification (SVM, randomForests)
  • Time-series data analysis
  • Two factor data analysis
  • Peak searching
  • .
Write a Comment
User Comments (0)
About PowerShow.com