Considering uncertainties in multivariate curve resolution alternating least squares strategies - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Considering uncertainties in multivariate curve resolution alternating least squares strategies

Description:

Considering uncertainties in multivariate curve resolution alternating least squares strategies ... wentzell_at_dal.ca. Department of Chemistry, Dalhousie University, Halifax, ... – PowerPoint PPT presentation

Number of Views:577
Avg rating:3.0/5.0
Slides: 46
Provided by: cidC5
Category:

less

Transcript and Presenter's Notes

Title: Considering uncertainties in multivariate curve resolution alternating least squares strategies


1
Considering uncertainties in multivariate curve
resolution alternating least squares strategies
  • Romà Tauler, rtaqam_at_iiqab.csic.es Department of
    Environmental Chemistry. IIQAB CSIC. Jordi
    Girona, 18, 08034 Barcelona
  • Peter Wentzell, peter.wentzell_at_dal.ca
  • Department of Chemistry, Dalhousie University,
    Halifax, NS B3H 4J3, Canada

2
  • MOTIVATIONS OF THIS WORK
  • The effect of data uncertainties in traditional
    alternating least squares strategies in
    multivariate curve resolution are investigated.
  • Examples of application in the resolution of
    environmental patterns in contamination studies
    considering uncertainties will be given
  • P.D. Wentzell, T.K. Karakach, S. Roy, M.J.
    Martinez, C.P. Allen and M. Werner-Washburne,
    "Multivariate Curve Resolution of Time Course
    Microarray Data", BMC Bioinformatics, 7, 343
    (2006).

3
  • OUTLINE
  • Introduction to Multivariate Curve Resolution by
    Alternating Weighted Least Squares (MCR-AWLS)
  • Testing the MCR-AWLS method
  • Application of MCR-AWLS method to resolution and
    apportionment of air particulate contamination
    sources

4
Chemometric data models
Bilinear models for resolution of two way data
J
dij
I
D
dij is the data measurement (response) of
variable j in sample i n1,...,N are the number
of components (species, sources...) cin is the
concentration of component n in sample i snj is
the response of component n at variable j
5
An algorithm to solve Bilinear models using
Multivariate Curve Resolution (MCR) Alternating
Least Squares (MCR-ALS)
C and ST are obtained by solving iteratively the
two alternating LS equations
  • Optional constraints (local rank,
    non-negativity, unimodality,closure,) are
    applied at each iteration
  • Initial estimates of C or ST are needed

6
Flowchart of MCR-ALS http//www.ub.es/gesq/mcr/mcr
.htm
Journal of Chemometrics, 1995, 9, 31 2001, 15,
749 Chemomet.Intel. Lab. Systems, 1995, 30, 133
2005, 76, 101 Analytica Chimica Acta, 2003,
500,195-210
ST
D C ST E (bilinear model)
Data Matrix
Resolved Spectra profiles
ALS optimization
SVD or PCA
Initial Estimation
Resolved Concentration profiles
E
D
C

Estimation of the number of components
Initial estimation
ALS optimization CONSTRAINTS
Data matrix decomposition according to a bilinear
model
Results of the ALS optimization procedure Fit
and Diagnostics
7
Introduction
Chemometrical Methods
MCR-ALS Quality Assessment
Rotational ambiguities
When two or more components are overlapped (in
the spectra or in the concentrations profiles),
rotational ambiguities appear. Resolved profiles
can be an unknown linear combination of the true
(sought) profiles. If this ambiguity is present,
the concentration and spectra profiles cannot be
represented by a single unique profile. We will
have to represent the profile as a band of
feasible solutions
Tmax
Tmin
Tmax gives the maximum of the feasible band
boundary
Tmin gives the minimum of the feasible band
boundary
  • Boundaries of the feasible bands of the MCR-ALS
    solutions
  • can be calculated using constrained non-linear
    optimization procedure.
  • J.of Chemometrics, 2001, 15, 627-646
  • Analytica Cimica Acta, 2007. 595 289298

8
MCR-ALS Quality Assessment
  • Propagation of experimental noise into the
    MCR-ALS solutions
  • Experimental noise is propagated into the MCR-ALS
    solutions and
  • causes uncertainties in the obtained results.
  • To estimate these uncertainties for non-linear
    models like MCR-ALS
  • computer intensive resampling methods can be used

Noise added
Mean, max and min profiles Confidence range
profiles
(J. of Chemometrics, 2004, 18, 327340
J.Chemometrics, 2006, 20, 4-67)
9
Including uncertainties in data MCR-AWLS
(Alternating Weighted Least Squares algorithm)
  • In some circumstances (e.g. in the analysis of
    environmental data tables, microarray data,
    etc.), measurement errors can be high and not
    uniformly distributed
  • There is a need of incorporating these
    measurement errors into MCR-ALS data analysis to
    obtain optimal solutions under these
    circumstances
  • Effect of experimental errors in MCR-ALS
    estimations weighting schemes and use of error
    in variable methods P.D.Wentzell et al. J.of
    Chemometrics, 11 (1997), 339-366
  • P.D. Wentzell et al. BMC Bioinformatics, 7 (2006)
    343

10
MCR-ALS Model and solutions
LS objective function without considering
experimental uncertainties
MCR-ALS unconstrained solutions
11
MCR-AWLS Model and solutions
P.D. Wentzell et al. BMC Bioinformatics, 7 (2006)
343
LS objective function considering experimental
uncertainties errors
rows or columns
MCR-AWLS unconstrained solutions
12
MCR Model and ALS solutions
Including uncertainties ?i,j
Without including uncertainties
Unconstrained AWLS solution
rows or columns
Unconstrained ALS solution
13
Different weighting alternatives in MCR-AWLS
  • Traditional MCR-ALS, without weighting, on DPCA
    projection or directly on D (experimental)
  • MCR-AWLS weighting from externally estimated
    uncertainties in variables (without correlation)
  • MCR-AWLS with weights equal to standard
    deviations of variables (like in scaling
    variables)
  • MCR-AWLS with weights proportional to variables
    intensities
  • MCR-AWLS weighting recalculated iteratively from
    residuals
  • MCR-AWLS weighting using asymetric least squares
    principles (i.e. to promote positive or negative
    residuals)
  • .

14
  • OUTLINE
  • Introduction to Multivariate Curve Resolution by
    Alternating Weighted Least Squares (MCR-AWLS)
  • Testing the MCR-AWLS method
  • Application of MCR-AWLS method to resolution and
    apportionment of air particulate contamination
    sources

15
  • Example Simulation of an environmental two-way
  • data set following a bilinear model
  • X G FT E
  • Environmental Factors G and FT
  • Map of variables
  • FT(N,NC) ? FT(nr. of sources, nr. of variables)
  • Chemical composition of the different sources
  • This will identify/define/scribe the major
    contamination
  • sources/patterns
  • Map of samples
  • G(NR,N) ?G(nr.samples,nr.of sources)
  • Distribution/Contribution of the sources on the
    samples
  • This will indicate the geographical/temporal/compa
    rtment
  • Contribution and distribution of contamination
    sources defined by FT

16
Matrix FT 4 Factor loadings for 50
variables flognrnd(0.01,1,4,50) Map of
variables Composition profiles
Correlation between factor loadings 1.0000
0.6990 -0.0247 -0.0488 0.6990 1.0000
-0.0230 -0.1578 -0.0247 -0.0230 1.0000
0.1781 -0.0488 -0.1578 0.1781 1.0000
Correllation among 1st and 2nd loadings
Each factor has a very positively
skewed distribution of values!!
FT(4,50) loadings (normalized)
17
Matrix G 4 Factor scores for 30
samples glognrnd(0.01,1,30,4) Map of
samples Contribution profiles
Correlation between factor scores 1.0000
-0.1027 0.0483 -0.0107 -0.1027 1.0000
0.0900 0.1030 0.0483 0.0900 1.0000
0.1370 -0.0107 0.1030 0.1370 1.0000
Little correlation among scores!!!
G(30,4) scores
18
Y E X
lof () 14 R2 98.0 mean(S/N)21.7
Noise structure r 0.01max(max(Y)) 3.21 S
I . r E S . N(0,1)
HOMOCEDASTIC NOISE CASE
SVD Y E X
818.1 348.9 112.9 66.1 37.0
815.2 346.6 104.1 62.9 0.0
39.4 36.6
G FT
19
Red max and min bands Blue true FT from
true from pure
20
Red max and min bands Blue true G from true
from pure
21
MCR-ALS results quality assesment Data
Fitting - lof - R Profiles recovery -
r2 (similarity) - recovery angles measured by
the inverse cosine ?, expressed in hexadecimal
degrees r2 1 0.99 0.95 0.90 0.80 0.70
0.60 0.50 0.40 0.30 0.20 0.10 0.00 ?
0 8.1 18 26 37 46 53
60 66 72 78 84 90
22
No noise and homocedastic noise cases results
recovery angles ?
System init method lof R2 f1 f2 f3 f4
g1 g2 g3 g4 No noise true ALS 0 100 0 0 0
0 0 0 0 0 No noise purest ALS 0 100 1.8 1
1 7.9 5.0 5.9 9.1 13 2.8 max band
- Bands 0 100 3.1 13 7.5 5.5 8.2 18 10 1.7 m
in band - Bands 0 100 2.1 3.7 3.9 3.9 5.2
8.1 14 3.0 Homo noise true ALS 12.6
98.4 3.0 12 8.7 2.1 4.8 12 9.0 2.4 Homo
noise purest ALS 12.6 98.4 3.0 17 8.5 5.0
7.1 12 16 3.7 Homo noise ----- Theor 14.0 98
.0 ---- ---- ---- ---- Homo noise ----- PCA 12.6 9
8.4 ---- ---- ---- ----
23
No noise and homocedastic noise cases results
  • Only non-negativity and normalization constraints
    were used
  • Data fitting is perfect in the case of no noise,
    but solutions are a little different when
    different initial estimates are used due to
    rotation ambiguity
  • For environmental profiles rotation ambiguity
    effects are present giving recoveries with
    recovery angles for the band boundaries always
    below 20 degrees
  • Data fitting in the case of homocedastic noise
    reflects noise level, although with a little
    tendency to overfit
  • Rotation ambiguity effects are mixed with
    incipient noise propagation effects, giving
    slightly worse recoveries than in the case of no
    noise, but still within feasible band angle
    recoveries below 20 degrees (rotation ambiguities
    max/min bands)!!!
  • PCA also slightly overfits, since PCA fits better
    than theoretical even for random noise (lof is
    also 12.6 and R2 is 98.4)

24
Y E X
lof () 12, 25, 44 R2 99, 94, 80 mean(S/N)
17, 10, 3
HETEROCEDASTIC NOISE CASE Low, Medium, High
random numbers
Noise structure r 5, 10, 20 S r. R(0,1)
(interv 0-1) E S. N(0,1)
Normal Distributed
SVD Y E X
L M H 814 829 823 348 340 347 111
118 154 67 82 135 33 64 130
815 347 104 63 0
L M H 36 71 145 34 69 134
G FT
gtgt
25
  • Red max and min bands
  • Blue true FT
  • from true
  • from pure
  • No Weighting

26
  • Red max and min bands
  • Blue true FT
  • from true
  • from pure
  • weighting

weighting improves recoveries
27
  • Red max and min bands
  • Blue true G
  • from true
  • from pure
  • no weighting

28
  • Red max and min bands
  • Blue true G
  • from true
  • from pure
  • weighting

weighting recovery overall improvement
29
Hoterocedastic noise case results
recovery angles ?
System init w lof R2 f1 f2 f3 f4 (Case)
exp exp g1 g2 g3 g4 Hetero noise purest ALS 10
.7 98.8 3.1 14 9.0 3.8 (low) 7.0 10 15 4
.3 Hetero noise purest WALS 12.0
98.6 2.6 12 15 4.3 (low) 7.8 15 15 3.7 T
heoretical ---- ---- 12.0 98.6 ---- ---- ---- --
-- PCA ---- ---- 10.7 98.8 ---- ---- ---- ----
Hetero noise purest ALS 22.3 95.0 7.7 22 22 5.7
(medium) 7.2 21 24 4.5 Hetero
noise purest WALS 24.0 94.2 6.6 22 18 5.7
(medium 7.4 14 17 5.5 Theoretical ---- ----
25.0 93.6 ---- ---- ---- ---- PCA ---- ----
22.0 95.1 ---- ---- ---- ---- Hetero
noise purest ALS 40.0 84.0 12 33 38 10
(high) 15 38 34 9.0 Hetero
noise purest WALS 43.1 81.4 12 26 25 6.0
(high) 5.0 27 16 3.0 Theoretical ---- ---- 44
.2 80.4 ---- ---- ---- ---- PCA ---- ----
40.8 83.4 ---- ---- ---- ----
30
Heterocedastic high noise, w0, c2
lof DW lof Dexp R2 Dw R2 Dexp
rmsdif(s) rmsdif(c) Niter flag 12.15
40.0 98.5 84.0
0.00001 0.0139 201 0
31
Heterocedastic high noise, w2, c2
lof DW lof Dexp R2 Dw R2 Dexp
rmsdif(s) rmsdif(c) Niter flag 0.92
44.6 99.9916 80.0750 0.0003
0.1385 340 1
32
Heterocedastic (non correlated) noise case results
  • Low and medium levels of heterocedastic noise
    seem not to affect much the parameters (G and FT)
    of the bilinear model estimated by ordinary
    MCR-ALS (without weighting)
  • Worse results are obtained by MCR-ALS (without
    weighting) for cases where heterocedastic noise
    contributions are high.
  • In these cases the use of the weighting approach
    (MCR-AWLS) produces better estimations of the
    parameters of the models
  • Tendency to overfit is observed for unweighted
    MCR-ALS and PCA. This problem is only partly
    solved by MCR-AWLS.
  • Further research is needed to check for quality
    of residuals in both cases. It is expected a
    better behavior of the residuals in the case of
    MCR-AWLS.

33
  • OUTLINE
  • Introduction to Multivariate Curve Resolution by
    Alternating Weighted Least Squares (MCR-AWLS)
  • Testing the MCR-AWLS method
  • Application of MCR-AWLS method to resolution and
    apportionment of air particulate contamination
    sources

34
Figure 1 Geographical location of Llodio site
and plot of samples taken during the whole year
2001
35
SO42-
NH4
Ctotal
NO3-
Ca
Fe
Cl-
Zn
Na
Al2O3
36
Plot of variables and uncertainties
SO42-
Raw data
scaling
Ctotal
NH4
Zn
Uncertainties (a proportional and a constant part)
scaling
37
Principal Components Analysis Model Developed
10-Jul-2007 160340.39 X-block xscaled 87 by
34 Included 1-87 1-34 Preprocessing
None Num. PCs 6 Cross validation random samples
w/ 9 splits and 20iterations RMSEC 0.540988
RMSECV 4.90579 Percent Variance
Captured by PCA Model Principal Eigenvalue
Variance Variance Component
of Captured Captured Number
Cov(X) This PC Total ---------
---------- ---------- ---------- 1
6.81e001 72.79 72.79
2 5.80e000 6.21
79.00 3 3.42e000 3.66
82.65 4 2.43e000 2.60
85.25 5 2.04e000
2.18 87.43 6 1.68e000
1.80 89.23
38
blue ALS R2 99.3, black AWLS R2 88.1 (raw data)
Ctotal
SO42-
Ctotal
NH4
Cl-
Na
SO42-
Zn
SO42-
Fe
Zn
39
blue ALS, black AWLS (scaled)
Crustal
Steel
Ctotal
Sn
Ca
Sr
As
Zn
Fe
Cd
Pb
Ba
Ti
Na
Rb
Mg
Mo
Mg
Mn
K
Ge
Valley
Na Mg
SO42-
Ctotal
Cl-
Tl
K
Sr
Marine
NH4
Sn
Ca
La
Ti
Co
Cr,Mn,Co, Ni,Cu
Pigment
Se
Mo
As
Ctotal
Traffic
Fe
Ctotal
Sn
Ca, K
Ba
K
Cd
40
Loadings comparison (correlation) considering
different methods
41
blue ALS, black AWLS (scaled) Scores
MCR-ALS fails
?
42
Scores comparison (correlation) considering
different methods
43
Conclusions Llodio experimental data
  • Scaling as a data pretreatment allows
    distinguishing better minor contributions but
    then, uncertainties weighting or not has a larger
    effect and becomes more critical
  • For major contamination sources, conclusions are
    similar, either uncertainties and weighting are
    considered or not
  • Including uncertainties have more important
    effects for the interpretation of minor
    contamination sources, specially if these
    uncertainties are large

44
  • ACKNOWLEDGEMENTS
  • P. Wentzell, Dalhousie University University
  • Llodio data, M. Viana and X. Querol from IJA-CSIC
  • Research project CTQ-15052-C02-01

45
  • Recent advances on MCR-ALS method
  • Hybrid soft- hard- (grey) bilinear models
    (kinetic and equilibrium chemical
  • reactions, profile responses shape...)
  • Extension of MCR-ALS to multiway data analysis
    (MA-MCR-ALS including PARAFAC, Tucker3 and mixed
    models....)
  • Spectroscopic Image Analysis.using MCR-ALS
  • Calculation of feasible band boundaries (rotation
    ambiguity)
  • Error propagation in MCR-ALS solutions
  • Alternating Weighted Least Squares (AWLS)
  • Applications
  • Environmental contamination sources resolution
    and apportionemnt
  • Bioanalytical polynucleotides, proteins, DNA
    u-array...
  • Analytical Hyphenated methods(LC-DAD, LC-MS,
    GC-MS, FIA-DAD,), multidimensional
    spectroscopies (2D-NMR, EEM ,
  • On-line spectroscopic monitoring of
    (bio)chemical processes and reactions......
  • .
  • New user interface http//www.ub.es/gesq/mcr/mcr.
    htm
  • J. Jaumot,et al., Chemometrics and Intelligent
    Laboratory Systems, 2005, 76(1) 101-110
Write a Comment
User Comments (0)
About PowerShow.com