Canadian Bioinformatics Workshops

About This Presentation

Title:

Canadian Bioinformatics Workshops

Description:

Canadian Bioinformatics Workshops www.bioinformatics.ca – PowerPoint PPT presentation

Number of Views:322

Avg rating:3.0/5.0

Slides: 89

Provided by: DavidW320

Category:

more less

Transcript and Presenter's Notes

Title: Canadian Bioinformatics Workshops

1
Canadian Bioinformatics Workshops

www.bioinformatics.ca

2
2
Module Title of Module
3
Module 6

David Wishart
Informatics and Statistics for Metabolomics
June 16-17, 2011

4
A Typical Metabolomics Experiment
5
2 Routes to Metabolomics
Quantitative (Targeted) Methods
Chemometric (Profiling) Methods
6
Metabolomics Data Workflow
Chemometric Methods Targeted Methods

Data Integrity Check
Spectral alignment or binning
Data normalization
Data QC/outlier removal
Data reduction analysis
Compound ID

Data Integrity Check
Compound ID and quantification
Data normalization
Data QC/outlier removal
Data reduction analysis

7
Data Integrity/Quality

LC-MS and GC-MS have high number of false
positive peaks
Problems with adducts (LC), extra derivatization
products (GC), isotopes, breakdown products
(ionization issues), etc.
Not usually a problem with NMR
Check using replicates and adduct calculators

MZedDB http//maltese.dbs.aber.ac.uk8888/hrmet/in
dex.html HMDB http//www.hmdb.ca/search/spectra?ty
pems_search
8
Data/Spectral Alignment

Important for LC-MS and GC-MS studies
Not so important for NMR (pH variation)
Many programs available (XCMS, ChromA, Mzmine)
Most based on time warping algorithms

http//mzmine.sourceforge.net/ http//bibiserv.tec
hfak.uni-bielefeld.de/chroma http//metlin.scripps
.edu/download/
9
Binning (3000 pts to 14 bins)
xi,yi x 232.1 (AOC) y 10 (bin )
bin1 bin2 bin3 bin4 bin5 bin6 bin7 bin8...
10
Data Normalization/Scaling
Same or different?

Can scale to sample or scale to feature
Scaling to whole sample controls for dilution
Normalize to integrated area, probabilistic
quotient method, internal standard, sample
specific (weight or volume of sample)
Choice depends on sample circumstances

11
Data Normalization/Scaling

Can scale to sample or scale to feature
Scaling to feature(s) helps manage outliers
Several feature scaling options available log
transformation, auto-scaling, Pareto scaling,
probabilistic quotient, and range scaling

MetaboAnalyst http//www.metaboanalyst.ca Dieterle
F et al. Anal Chem. 2006 Jul 178(13)4281-90.
12
Data QC, Outlier Removal Data Reduction

Data filtering (remove solvent peaks, noise
filtering, false positives, outlier removal --
needs justification)
Dimensional reduction or feature selection to
reduce number of features or factors to consider
(PCA or PLS-DA)
Clustering to find similarity

13
MetaboAnalyst

Web server designed to handle large sets of
LC-MS, GC-MS or NMR-based metabolomic data
Supports both univariate and multivariate data
processing, including t-tests, ANOVA, PCA, PLS-DA
Identifies significantly altered metabolites,
produces colorful plots, provides detailed
explanations summaries
Links sig. metabolites to pathways via SMPDB

http//www.metaboanalyst.ca
14
Metabolite concentrations
MS / NMR peak lists
GC/LC-MS raw spectra
MS / NMR spectra bins

Peak detection
Retention time correction

Baseline filtering
Peak alignment

Resources utilities
Peak searching
Pathway mapping
Name conversion
Lipidomics
Metabolite set libraries

Data integrity check
Missing value imputation

Data normalization
Row-wise normalization (4)
Column-wise normalization (4)

Statistical analysis
Univariate analysis
Dimension reduction
Feature selection
Cluster analysis
Classification

Pathway analysis
Enrichment analysis
Topology analysis
Interactive visualization

Time-series /two factor
Visualization
Two-way ANOVA
ASCA
Temporal comparison

15
MetaboAnalyst Overview

Raw data processing
Using MetaboAnalyst
Data Reduction Statistical analysis
Using Metaboanalyst
Functional enrichment analysis
Using MSEA in MetaboAnalyst
Metabolic pathway analysis
Using MetPA in MetaboAnalyst

16
Example Datasets
17
Example Datasets
18
Metabolomic Data Processing
19
Common Tasks

Purpose to convert various raw data forms into
data matrices suitable for statistical analysis
Supported data formats
Concentration tables (Targeted Analysis)
Peak lists (Untargeted)
Spectral bins (Untargeted)
Raw spectra (Untargeted)

20
Data Upload
21
Alternatively
22
Data Set Selected

Here we will be selecting a data set from dairy
cattle fed different proportions of cereal grains
(0, 15, 30, 45)
The rumen was analyzed using NMR spectroscopy
using quantitative metabolomic techniques
High grain diets are thought to be stressful on
cows

23
Data Integrity Check
24
Data Normalization
25
Data Normalization

At this point, the data has been transformed to a
matrix with the samples in rows and the variables
(compounds/peaks/bins) in columns
MetaboAnalyst offers three types of
normalization, row-wise normalization,
column-wise normalization and combined
normalization
Row-wise normalization aims to make each sample
(row) comparable to each other (i.e. urine
samples with different dilution effects)

26
Data Normalization

Column-wise normalization aims to make each
variable (column) comparable to each other. This
procedure is useful when variables are of very
different orders of magnitude. Four methods have
been implemented for this purpose log
transformation, autoscaling, Pareto scaling and
range scaling

27
Normalization Result
28
Quality Control

Dealing with outliers
Detected mainly by visual inspection
May be corrected by normalization
May be excluded
Noise reduction
More of a concern for spectral bins/ peak lists
Usually improves downstream results

29
Visual Inspection

What does an outlier look like?

30
Outlier Removal
31
Noise Reduction
32
Noise Reduction (cont.)

Characteristics of noise
Low intensities
Low variances (default)

33
Data Reduction and Statistical Analysis

34
Common tasks

To detect interesting patterns
To identify important features
To assess difference between the phenotypes
Classification / prediction

35
(No Transcript)
36
ANOVA
37
View Individual Compounds
38
Questions

Q Which compounds show significant difference
among all the neighboring groups (0-15, 15-30,
and 30-45)?
Q For Uracil, are groups 15, 30, 45
significantly different from each other?

39
Template Matching

Looking for compounds showing interesting
patterns of change

40
Template Matching (cont.)
41
Question

Q Identify compounds that decrease in the first
three groups but increase in the last group?

42
PCA Scores Plot
43
PCA Loading Plot
44
Question

Q Identify compounds that contribute most to the
separation between group 15 and 45

45
PLS-DA Score Plot
46
Determine of Components
47
Important Compounds
48
Model Validation
49
Questions

Q What does p lt 0.01 mean?
Q How many permutations need to be performed if
you want to claim p value lt 0.0001?

50
Heatmap Visualization
51
Heatmap Visualization (cont.)
52
Question

Q Identify compounds with a low concentration in
group 0, 15 but increase in the group 35 and 45
Q Which compound is the only one significantly
increased in group 45?

53
Download Results
54
Analysis Report
55
Metabolite Set Enrichment Analysis

56
Metabolite Set Enrichment Analysis (MSEA)

Web tool designed to handle lists of metabolites
(with or without concentration data)
Modeled after Gene Set Enrichment Analysis (GSEA)
Supports over representation analysis (ORA),
single sample profiling (SSP) and quantitative
enrichment analysis (QEA)
Contains a library of 6300 pre-defined metabolite
sets including 85 pathway sets 850 disease sets

http//www.msea.ca
57
Enrichment Analysis

Purpose To test if there are some biologically
meaningful groups of metabolites that are
significantly enriched in your data
Biological meaningful
Pathways
Disease
Localization
Currently, only supports human metabolomic data

58
MSEA

Accepts 3 kinds of input files
1) list of metabolite names only
2) list of metabolite names concentration data
from a single sample
3) a concentration table with a list of
metabolite names concentrations for multiple
samples/patients

59
Start with a Compound List
60
Upload Compound List
61
Compound Name Standardization
62
Name Standardization (cont.)
63
Select a Metabolite Set Library
64
Result
65
Result (cont.)
66
The Matched Metabolite Set
67
Single Sample Profiling
68
Single Sample Profiling (cont.)
69
Concentration Comparison
70
Concentration Comparison (cont.)
71
Quantitative Enrichment Analysis
72
Data Set Selected

Here we are using a collection of metabolites
identified by NMR (compound list
concentrations) from the urine from 77 lung and
colon cancer patients, some of whom were
suffering from cachexia (muscle wasting)

73
Result
74
The Matched Metabolite Set
75
Question

Q Are these metabolites increased or decreased
in the cachexia group?

76
Metabolic Pathway Analysis with MetPA

77
Pathway Analysis

Purpose to extend and enhance metabolite set
enrichment analysis for pathways by
Considering the pathway structures
Supporting pathway visualization
Currently supports 15 organisms

78
Data Upload
79
Data Set Selected

Here we are using a collection of metabolites
identified by NMR (compound list
concentrations) from the urine from 77 lung and
colon cancer patients, some of whom were
suffering from cachexia (muscle wasting)

80
Normalization
81
Pathway Libraries
82
Network Topology Analysis
83
Which Node is More Important?
High degree centrality
High betweenness centrality
84
Pathway Visualization
85
Pathway Visualization (cont.)
86
Question

Q Which pathway do you think is likely to be
affected the most? Why?

87
Result
88
Not Everything Was Covered

Clustering (K-means, SOM)
Classification (SVM, randomForests)
Time-series data analysis
Two factor data analysis
Peak searching
.

Write a Comment

User Comments (0)

About PowerShow.com

Canadian Bioinformatics Workshops - PowerPoint PPT Presentation

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops www.bioinformatics.ca – PowerPoint PPT presentation