Linear Techniques for Regression and Classification on Functional Data - PowerPoint PPT Presentation

1 / 44

About This Presentation

Title:

Linear Techniques for Regression and Classification on Functional Data

Description:

Linear Techniques for Regression and Classification on Functional Data ... Matzner-Lober E., Molinari N. (2004): Discrimination de courbes de p trissage. ... – PowerPoint PPT presentation

Number of Views:190

Avg rating:3.0/5.0

Slides: 45

Provided by: gilbert85

Category:

more less

Transcript and Presenter's Notes

Title: Linear Techniques for Regression and Classification on Functional Data

1
Linear Techniques for Regression and
Classification on Functional Data
Gilbert Saporta Chaire de Statistique Appliquée
CEDRIC Conservatoire National des Arts et
Métiers 292 rue Saint Martin F 75141 Paris Cedex
03 saporta_at_cnam.fr http//cedric.cnam.fr/saporta

Joint work with D. Costanzo (U.Calabria)
C.Preda (U.Lille2)
2
Outline

1. Introduction
2. OLS regression on functional data
3. PLS functional regression
4. Clusterwise regression
5. Discrimination
6. Anticipated prediction
7. Conclusion and perspectives

3
1.Introduction

Very high dimensional data an infinite number of
variables
Regression on functional data
Example 1 Y amount of crop
Xt temperature curves
p ?

R.A.Fisher The Influence of Rainfall on the
Yield of Wheat at Rothamsted Philosophical
Transactions of the Royal Society, B 213 89-142
(1924)
4

Example 2 Growth index of 84 shares at Paris
stock exchange during 60 minutes

How to predict X55 till X60, for a new share,
knowing X from t0 till t55?
5
Discrimination on functional data

Example 3 Kneading curves for cookies (Danone
Vitapole)

After smoothing with cubic B-splines (Lévéder
al, 2004)

How to predict the quality of the cookies?
7

Linear combination
Integral regression (Fisher 1924)
instead of a finite sum

Discrimination on functional data
Particular case of regression when the response
is binary
Anticipation
Determine an optimal time tltT giving a
prediction based on 0t almost as good as the
prediction using all the data 0T

9
2. OLS regression on functional data

Y Xt (with zero mean)
2.1 The OLS problem
Minimizing
leads to normal, or Wiener-Hopf, equations
where C(t,s) cov(Xt, Xs)E(XtXs)

2.2 Karhunen-Loeve decomposition (functional PCA)
factor loadings
principal components

Picards theorem ? is unique if and only if
Generally not trueespecially when n is finite
since p gtn. Perfect fit when minimizing

Even if ? is unique, Wiener-Hopf equation is
not an ordinary integral equation the solution
is more frequently a distribution than a function
Constrained solutions are needed. (cf Green
Silverman 1994, Ramsay Silverman 1997).

2.3 Regression on principal components
Rank q approximation

Numerical computations
Solve integral equations in the general case
for step functions finite number of variables
and of units operators are matrices, but with a
very high size
Approximations by discretisation of time

Which principal components?
First q?
q best correlated with Y?
Principal components are computed irrespective of
the response

16
3. Functional PLS regression

Use PLS components instead of principal
components.
first PLS component
further PLS components as usual

order q approximation of Y by Xt
Convergence theorem
q have to be finite in order to get a formula!
Usually q is selected by cross-validation
(Preda Saporta, 2005a)

First PLS component easily interpretable
coefficients with the same sign as r(yxt)
No integral equation
PLS fits better than PCR
Same proof as in De Jong, 1993

19
4. Clusterwise regression

4.1 Model
G , variable with K categories (sub-populations)

4.2 OLS and clusterwise regression
Residual variance of global regression within
cluster residual variance variance due to the
difference between local (clusterwise) and global
regression (OLS)

4.3 Estimation (Charles, 1977)
number of clusters k needs to be known
Alternated least squares
For a given partition estimate linear
regressions for each cluster
Reallocate each point to the closest regression
line (or surface)
Equivalent to ML for fixed regressors, fixed
partition model (Hennig, 2000)
4.4 Optimal k
AIC, BIC, crossvalidation

22
4.5 Clusterwise functional PLS regression

OLS functional regression not adequate to give
estimations in each cluster
Our proposal estimate local models with
functional PLS regression
Is the clusterwise algorithm still consistent?
Proof in Preda Saporta, 2005b

Prediction
Allocate a new observation to a cluster (nearest
neighbor or other classification technique)
Use the corresponding local model
May be generalised if Y is itself a random
vector

24
4.6 Application to stock market data

Growth index during 1 hour (between 10h and 11h)
of 84 shares at Paris Stock Exchange
Goal predict a new share between 10h55 and 11h
using data between 10h and 10h55

Exact computations need 1366 variables (number of
intervals where the 85 curves are constant)
Discretisation in 60 intervals.
Comparison between PCR and PLS

Crash of share 85 not detected!

Clusterwise PLS
Four clusters (17321025)
Number of PLS component for each cluster 1 3 2
2 (cross-validation)

Share 85 classified into cluster 1

29
3. Functional linear discrimination

LDA linear combinations
maximizing the ratio
Between group variance /Within group variance
For 2 groups Fishers LDF via a regression
between coded Y and Xt
eg
(Preda Saporta, 2005a)

PLS regression with q components gives an
approximation of ß(t) and of the score
For more than 2 groups PLS2 regression between
k-1 indicators of Y and Xt
First PLS component given by the first
eigenvector of the product of Escoufier operators
WxWY
Preda Saporta, 2002 and Barker Rayens , 2003

31
Quality measures

For k2 ROC curve and AUC
For a specific threshold, x is classified into
G1if dT(x)gts
Sensitivity or true positive rate
P(dT(x)gts/Y1)1-ß
1- specificity or 1- true negative rate
P(dT(x)gts/Y0)?

32
ROC curve

Perfect discrimination
ROC curve is confounded with the edges of unit
square
For identical conditional distributions ROC
curve is confounded with the diagonal

ROC curve invariant for any increasing monotonous
transformation
Area under ROC curve a global measure of
performance allowing model comparisons
(partially)
X1 drawn from G1 and X2 from G2
AUC estimated by the proportion of concordant
pairs
nc Wilcoxon-Mann-Whitney statistic
UW n1n20.5n1(n11) AUCU/n1n2

34
4. Anticipated prediction

tltT such that the analysis on 0t give donne
predictions almost as good as with 0T
Solution
When increasing s from 0 to T, look for the first
value such that AUC(s) does not differ
significantly from AUC(T)

A bootstrap procedure
Stratified resampling of the data
For each replication b, AUCb(s) and AUCb(T) are
computed
Students T test or Wilcoxon on the B paired
differences ?bAUCb(s)- AUCb(T)

36
5.Applications

5.1 simulated data
Two classes with equal priors
W(t) brownian motion

37
(No Transcript)
38

With B50

5.2 Kneading curves
After T 480s of kneading one gets cookies
where quality is Y
115 observations 50 good , 40 bad et 25
adjustable
241 equally spaced measurements
Smoothing with cubic B-splines , 16 knots

Performance for Ygood,bad
Repeat 100 times the split into learning and test
samples of size (60, 30)
Average error rate
0.142 with principal components
0.112 with PLS components
Average AUC 0.746
ß(t)

Anticipated prediction
B50
t186
The recording period of the resistance dough can
be reduced to less than half of the current one

42
6.Conclusions and perspectives

PLS regression is an efficient and simple way to
get linear prediction for functional data
We have proposed a bootstrap procedure for the
problem of anticipated prediction

Works in progress
on-line forecasting instead of using the
same anticipated decision time t for all data,
we could adapt t to each new trajectory given
its incoming measurements.
Clusterwise discrimination
Comparison with functional logistic regression
Aguilera et al, 2006

44
References

Aguilera A.M., Escabias, M. Valderrama M.J.
(2006) Using principal components for estimating
logistic regression with high-dimensional
multicollinear data, Computational Statistics
Data Analysis, 50, 1905-1924
Barker M., Rayens W. (2003) Partial least squares
for discrimination. J Chemomet 17166173
Charles, C., 1977. Régression typologique et
reconnaissance des formes. Ph.D., Université
Paris IX.
D. Costanzo, C. Preda et G. Saporta (2006).
Anticipated prediction in discriminant analysis
on functional data for binary response . In
COMPSTAT2006, p. 821-828, Physica-Verlag
Hennig, C., (2000). Identifiability of models for
clusterwise linear regression. J. Classification
17, 273296.
Lévéder C., Abraham C., Cornillon P. A.,
Matzner-Lober E., Molinari N. (2004)
Discrimination de courbes de pétrissage.
Chimiometrie 2004, 3743.
Preda C. , Saporta G. (2005a) PLS regression on
a stochastic process, Computational Statistics
and Data Analysis, 48, 149-158.
Preda C. , Saporta G. (2005b) Clusterwise PLS
regression on a stochastic process,
Computational Statistics and Data Analysis, 49,
99-108.
Preda C., Saporta G. Lévéder C., (2007) PLS
classification of functional data, Computational
Statistics
Ramsay Silverman (1997) Functional data
analysis, Springer