# Linear Techniques for Regression and Classification on Functional Data - PowerPoint PPT Presentation

1 / 44
Title:

## Linear Techniques for Regression and Classification on Functional Data

Description:

### Linear Techniques for Regression and Classification on Functional Data ... Matzner-Lober E., Molinari N. (2004): Discrimination de courbes de p trissage. ... – PowerPoint PPT presentation

Number of Views:188
Avg rating:3.0/5.0
Slides: 45
Provided by: gilbert85
Category:
Tags:
Transcript and Presenter's Notes

Title: Linear Techniques for Regression and Classification on Functional Data

1
Linear Techniques for Regression and
Classification on Functional Data
Gilbert Saporta Chaire de Statistique Appliquée
CEDRIC Conservatoire National des Arts et
Métiers 292 rue Saint Martin F 75141 Paris Cedex
03 saporta_at_cnam.fr http//cedric.cnam.fr/saporta

Joint work with D. Costanzo (U.Calabria)
C.Preda (U.Lille2)
2
Outline
• 1. Introduction
• 2. OLS regression on functional data
• 3. PLS functional regression
• 4. Clusterwise regression
• 5. Discrimination
• 6. Anticipated prediction
• 7. Conclusion and perspectives

3
1.Introduction
• Very high dimensional data an infinite number of
variables
• Regression on functional data
• Example 1 Y amount of crop
• Xt temperature curves
• p ?

R.A.Fisher The Influence of Rainfall on the
Yield of Wheat at Rothamsted Philosophical
Transactions of the Royal Society, B 213 89-142
(1924)
4
• Example 2 Growth index of 84 shares at Paris
stock exchange during 60 minutes

How to predict X55 till X60, for a new share,
knowing X from t0 till t55?
5
Discrimination on functional data
Vitapole)

6
• After smoothing with cubic B-splines (Lévéder
al, 2004)

How to predict the quality of the cookies?
7
• Linear combination
• Integral regression (Fisher 1924)
• instead of a finite sum

8
• Discrimination on functional data
• Particular case of regression when the response
is binary
• Anticipation
• Determine an optimal time tltT giving a
prediction based on 0t almost as good as the
prediction using all the data 0T

9
2. OLS regression on functional data
• Y Xt (with zero mean)
• 2.1 The OLS problem
• Minimizing
• leads to normal, or Wiener-Hopf, equations
• where C(t,s) cov(Xt, Xs)E(XtXs)

10
• 2.2 Karhunen-Loeve decomposition (functional PCA)
• principal components

11
• Picards theorem ? is unique if and only if
• Generally not trueespecially when n is finite
since p gtn. Perfect fit when minimizing

12
• Even if ? is unique, Wiener-Hopf equation is
not an ordinary integral equation the solution
is more frequently a distribution than a function
• Constrained solutions are needed. (cf Green
Silverman 1994, Ramsay Silverman 1997).

13
• 2.3 Regression on principal components
• Rank q approximation

14
• Numerical computations
• Solve integral equations in the general case
• for step functions finite number of variables
and of units operators are matrices, but with a
very high size
• Approximations by discretisation of time

15
• Which principal components?
• First q?
• q best correlated with Y?
• Principal components are computed irrespective of
the response

16
3. Functional PLS regression
• Use PLS components instead of principal
components.
• first PLS component
• further PLS components as usual

17
• order q approximation of Y by Xt
• Convergence theorem
• q have to be finite in order to get a formula!
• Usually q is selected by cross-validation
• (Preda Saporta, 2005a)

18
• First PLS component easily interpretable
coefficients with the same sign as r(yxt)
• No integral equation
• PLS fits better than PCR
• Same proof as in De Jong, 1993

19
4. Clusterwise regression
• 4.1 Model
• G , variable with K categories (sub-populations)

20
• 4.2 OLS and clusterwise regression
• Residual variance of global regression within
cluster residual variance variance due to the
difference between local (clusterwise) and global
regression (OLS)

21
• 4.3 Estimation (Charles, 1977)
• number of clusters k needs to be known
• Alternated least squares
• For a given partition estimate linear
regressions for each cluster
• Reallocate each point to the closest regression
line (or surface)
• Equivalent to ML for fixed regressors, fixed
partition model (Hennig, 2000)
• 4.4 Optimal k
• AIC, BIC, crossvalidation

22
4.5 Clusterwise functional PLS regression
• OLS functional regression not adequate to give
estimations in each cluster
• Our proposal estimate local models with
functional PLS regression
• Is the clusterwise algorithm still consistent?
• Proof in Preda Saporta, 2005b

23
• Prediction
• Allocate a new observation to a cluster (nearest
neighbor or other classification technique)
• Use the corresponding local model
• May be generalised if Y is itself a random
vector

24
4.6 Application to stock market data
• Growth index during 1 hour (between 10h and 11h)
of 84 shares at Paris Stock Exchange
• Goal predict a new share between 10h55 and 11h
using data between 10h and 10h55

25
• Exact computations need 1366 variables (number of
intervals where the 85 curves are constant)
• Discretisation in 60 intervals.
• Comparison between PCR and PLS

26
• Crash of share 85 not detected!

27
• Clusterwise PLS
• Four clusters (17321025)
• Number of PLS component for each cluster 1 3 2
2 (cross-validation)

28
• Share 85 classified into cluster 1

29
3. Functional linear discrimination
• LDA linear combinations
• maximizing the ratio
• Between group variance /Within group variance
• For 2 groups Fishers LDF via a regression
between coded Y and Xt
• eg
• (Preda Saporta, 2005a)

30
• PLS regression with q components gives an
approximation of ß(t) and of the score
• For more than 2 groups PLS2 regression between
k-1 indicators of Y and Xt
• First PLS component given by the first
eigenvector of the product of Escoufier operators
WxWY
• Preda Saporta, 2002 and Barker Rayens , 2003

31
Quality measures
• For k2 ROC curve and AUC
• For a specific threshold, x is classified into
G1if dT(x)gts
• Sensitivity or true positive rate
P(dT(x)gts/Y1)1-ß
• 1- specificity or 1- true negative rate
P(dT(x)gts/Y0)?

32
ROC curve
• Perfect discrimination
• ROC curve is confounded with the edges of unit
square
• For identical conditional distributions ROC
curve is confounded with the diagonal

33
• ROC curve invariant for any increasing monotonous
transformation
• Area under ROC curve a global measure of
performance allowing model comparisons
(partially)
• X1 drawn from G1 and X2 from G2
• AUC estimated by the proportion of concordant
pairs
• nc Wilcoxon-Mann-Whitney statistic
• UW n1n20.5n1(n11) AUCU/n1n2

34
4. Anticipated prediction
• tltT such that the analysis on 0t give donne
predictions almost as good as with 0T
• Solution
• When increasing s from 0 to T, look for the first
value such that AUC(s) does not differ
significantly from AUC(T)

35
• A bootstrap procedure
• Stratified resampling of the data
• For each replication b, AUCb(s) and AUCb(T) are
computed
• Students T test or Wilcoxon on the B paired
differences ?bAUCb(s)- AUCb(T)

36
5.Applications
• 5.1 simulated data
• Two classes with equal priors
• W(t) brownian motion

37
(No Transcript)
38
• With B50

39
where quality is Y
• 115 observations 50 good , 40 bad et 25
• 241 equally spaced measurements
• Smoothing with cubic B-splines , 16 knots

40
• Repeat 100 times the split into learning and test
samples of size (60, 30)
• Average error rate
• 0.142 with principal components
• 0.112 with PLS components
• Average AUC 0.746
• ß(t)

41
• Anticipated prediction
• B50
• t186
• The recording period of the resistance dough can
be reduced to less than half of the current one

42
6.Conclusions and perspectives
• PLS regression is an efficient and simple way to
get linear prediction for functional data
• We have proposed a bootstrap procedure for the
problem of anticipated prediction

43
• Works in progress
• on-line forecasting instead of using the
same anticipated decision time t for all data,
we could adapt t to each new trajectory given
its incoming measurements.
• Clusterwise discrimination
• Comparison with functional logistic regression
• Aguilera et al, 2006

44
References
• Aguilera A.M., Escabias, M. Valderrama M.J.
(2006) Using principal components for estimating
logistic regression with high-dimensional
multicollinear data, Computational Statistics
Data Analysis, 50, 1905-1924
• Barker M., Rayens W. (2003) Partial least squares
for discrimination. J Chemomet 17166173
• Charles, C., 1977. Régression typologique et
reconnaissance des formes. Ph.D., Université
Paris IX.
• D. Costanzo, C. Preda et G. Saporta (2006).
Anticipated prediction in discriminant analysis
on functional data for binary response . In
COMPSTAT2006, p. 821-828, Physica-Verlag
• Hennig, C., (2000). Identifiability of models for
clusterwise linear regression. J. Classification
17, 273296.
• Lévéder C., Abraham C., Cornillon P. A.,
Matzner-Lober E., Molinari N. (2004)
Discrimination de courbes de pétrissage.
Chimiometrie 2004, 3743.
• Preda C. , Saporta G. (2005a) PLS regression on
a stochastic process, Computational Statistics
and Data Analysis, 48, 149-158.
• Preda C. , Saporta G. (2005b) Clusterwise PLS
regression on a stochastic process,
Computational Statistics and Data Analysis, 49,
99-108.
• Preda C., Saporta G. Lévéder C., (2007) PLS
classification of functional data, Computational
Statistics
• Ramsay Silverman (1997) Functional data
analysis, Springer