Linear Techniques for Regression and Classification on Functional Data - PowerPoint PPT Presentation

1 / 44
About This Presentation

Linear Techniques for Regression and Classification on Functional Data


Linear Techniques for Regression and Classification on Functional Data ... Matzner-Lober E., Molinari N. (2004): Discrimination de courbes de p trissage. ... – PowerPoint PPT presentation

Number of Views:188
Avg rating:3.0/5.0
Slides: 45
Provided by: gilbert85


Transcript and Presenter's Notes

Title: Linear Techniques for Regression and Classification on Functional Data

Linear Techniques for Regression and
Classification on Functional Data
Gilbert Saporta Chaire de Statistique Appliquée
CEDRIC Conservatoire National des Arts et
Métiers 292 rue Saint Martin F 75141 Paris Cedex
03 http//

Joint work with D. Costanzo (U.Calabria)
C.Preda (U.Lille2)
  • 1. Introduction 
  • 2. OLS regression on functional data
  • 3. PLS functional regression
  • 4. Clusterwise regression
  • 5. Discrimination
  • 6. Anticipated prediction
  • 7. Conclusion and perspectives

  • Very high dimensional data an infinite number of
  • Regression on functional data
  • Example 1 Y amount of crop
  • Xt temperature curves
  • p ?

R.A.Fisher The Influence of Rainfall on the
Yield of Wheat at Rothamsted Philosophical
Transactions of the Royal Society, B 213 89-142
  • Example 2 Growth index of 84 shares at Paris
    stock exchange during 60 minutes

How to predict X55 till X60, for a new share,
knowing X from t0 till t55?
Discrimination on functional data
  • Example 3 Kneading curves for cookies (Danone

  • After smoothing with cubic B-splines (Lévéder
    al, 2004)

How to predict the quality of the cookies?
  • Linear combination
  • Integral regression (Fisher 1924)
  • instead of a finite sum

  • Discrimination on functional data
  • Particular case of regression when the response
    is binary
  • Anticipation
  • Determine an optimal time tltT giving a
    prediction based on 0t almost as good as the
    prediction using all the data 0T

2. OLS regression on functional data
  • Y Xt (with zero mean)
  • 2.1 The OLS problem
  • Minimizing
  • leads to normal, or Wiener-Hopf, equations
  • where C(t,s) cov(Xt, Xs)E(XtXs)

  • 2.2 Karhunen-Loeve decomposition (functional PCA)
  • factor loadings
  • principal components

  • Picards theorem ? is unique if and only if
  • Generally not trueespecially when n is finite
    since p gtn. Perfect fit when minimizing

  • Even if ? is unique, Wiener-Hopf equation is
    not an ordinary integral equation the solution
    is more frequently a distribution than a function
  • Constrained solutions are needed. (cf Green
    Silverman 1994, Ramsay Silverman 1997).

  • 2.3 Regression on principal components
  • Rank q approximation

  • Numerical computations
  • Solve integral equations in the general case
  • for step functions finite number of variables
    and of units operators are matrices, but with a
    very high size
  • Approximations by discretisation of time

  • Which principal components?
  • First q?
  • q best correlated with Y?
  • Principal components are computed irrespective of
    the response

3. Functional PLS regression
  • Use PLS components instead of principal
  • first PLS component
  • further PLS components as usual

  • order q approximation of Y by Xt
  • Convergence theorem
  • q have to be finite in order to get a formula!
  • Usually q is selected by cross-validation
  • (Preda Saporta, 2005a)

  • First PLS component easily interpretable
    coefficients with the same sign as r(yxt)
  • No integral equation
  • PLS fits better than PCR
  • Same proof as in De Jong, 1993

4. Clusterwise regression
  • 4.1 Model
  • G , variable with K categories (sub-populations)

  • 4.2 OLS and clusterwise regression
  • Residual variance of global regression within
    cluster residual variance variance due to the
    difference between local (clusterwise) and global
    regression (OLS)

  • 4.3 Estimation (Charles, 1977)
  • number of clusters k needs to be known
  • Alternated least squares
  • For a given partition estimate linear
    regressions for each cluster
  • Reallocate each point to the closest regression
    line (or surface)
  • Equivalent to ML for fixed regressors, fixed
    partition model (Hennig, 2000)
  • 4.4 Optimal k
  • AIC, BIC, crossvalidation

4.5 Clusterwise functional PLS regression
  • OLS functional regression not adequate to give
    estimations in each cluster
  • Our proposal estimate local models with
    functional PLS regression
  • Is the clusterwise algorithm still consistent?
  • Proof in Preda Saporta, 2005b

  • Prediction
  • Allocate a new observation to a cluster (nearest
    neighbor or other classification technique)
  • Use the corresponding local model
  • May be generalised if Y is itself a random

4.6 Application to stock market data
  • Growth index during 1 hour (between 10h and 11h)
    of 84 shares at Paris Stock Exchange
  • Goal predict a new share between 10h55 and 11h
    using data between 10h and 10h55

  • Exact computations need 1366 variables (number of
    intervals where the 85 curves are constant)
  • Discretisation in 60 intervals.
  • Comparison between PCR and PLS

  • Crash of share 85 not detected!

  • Clusterwise PLS
  • Four clusters (17321025)
  • Number of PLS component for each cluster 1 3 2
    2 (cross-validation)

  • Share 85 classified into cluster 1

3. Functional linear discrimination
  • LDA linear combinations
  • maximizing the ratio
  • Between group variance /Within group variance
  • For 2 groups Fishers LDF via a regression
    between coded Y and Xt
  • eg
  • (Preda Saporta, 2005a)

  • PLS regression with q components gives an
    approximation of ß(t) and of the score
  • For more than 2 groups PLS2 regression between
    k-1 indicators of Y and Xt
  • First PLS component given by the first
    eigenvector of the product of Escoufier operators
  • Preda Saporta, 2002 and Barker Rayens , 2003

Quality measures
  • For k2 ROC curve and AUC
  • For a specific threshold, x is classified into
    G1if dT(x)gts
  • Sensitivity or true positive rate
  • 1- specificity or 1- true negative rate

ROC curve
  • Perfect discrimination
  • ROC curve is confounded with the edges of unit
  • For identical conditional distributions ROC
    curve is confounded with the diagonal

  • ROC curve invariant for any increasing monotonous
  • Area under ROC curve a global measure of
    performance allowing model comparisons
  • X1 drawn from G1 and X2 from G2
  • AUC estimated by the proportion of concordant
  • nc Wilcoxon-Mann-Whitney statistic
  • UW n1n20.5n1(n11) AUCU/n1n2

4. Anticipated prediction
  • tltT such that the analysis on 0t give donne
    predictions almost as good as with 0T
  • Solution
  • When increasing s from 0 to T, look for the first
    value such that AUC(s) does not differ
    significantly from AUC(T)

  • A bootstrap procedure
  • Stratified resampling of the data
  • For each replication b, AUCb(s) and AUCb(T) are
  • Students T test or Wilcoxon on the B paired
    differences ?bAUCb(s)- AUCb(T)

  • 5.1 simulated data
  • Two classes with equal priors
  • W(t) brownian motion

(No Transcript)
  • With B50

  • 5.2 Kneading curves
  • After T 480s of kneading one gets cookies
    where quality is Y
  • 115 observations 50 good , 40 bad et 25
  • 241 equally spaced measurements
  • Smoothing with cubic B-splines , 16 knots

  • Performance for Ygood,bad
  • Repeat 100 times the split into learning and test
    samples of size (60, 30)
  • Average error rate
  • 0.142 with principal components
  • 0.112 with PLS components
  • Average AUC 0.746
  • ß(t)

  • Anticipated prediction
  • B50
  • t186
  • The recording period of the resistance dough can
    be reduced to less than half of the current one

6.Conclusions and perspectives
  • PLS regression is an efficient and simple way to
    get linear prediction for functional data
  • We have proposed a bootstrap procedure for the
    problem of anticipated prediction

  • Works in progress
  • on-line forecasting instead of using the
    same anticipated decision time t for all data,
    we could adapt t to each new trajectory given
    its incoming measurements.
  • Clusterwise discrimination
  • Comparison with functional logistic regression
  • Aguilera et al, 2006

  • Aguilera A.M., Escabias, M. Valderrama M.J.
    (2006) Using principal components for estimating
    logistic regression with high-dimensional
    multicollinear data, Computational Statistics
    Data Analysis, 50, 1905-1924
  • Barker M., Rayens W. (2003) Partial least squares
    for discrimination. J Chemomet 17166173
  • Charles, C., 1977. Régression typologique et
    reconnaissance des formes. Ph.D., Université
    Paris IX.
  • D. Costanzo, C. Preda et G. Saporta (2006).
    Anticipated prediction in discriminant analysis
    on functional data for binary response . In
    COMPSTAT2006, p. 821-828, Physica-Verlag
  • Hennig, C., (2000). Identifiability of models for
    clusterwise linear regression. J. Classification
    17, 273296.
  • Lévéder C., Abraham C., Cornillon P. A.,
    Matzner-Lober E., Molinari N. (2004)
    Discrimination de courbes de pétrissage.
    Chimiometrie 2004, 3743.
  • Preda C. , Saporta G. (2005a) PLS regression on
    a stochastic process, Computational Statistics
    and Data Analysis, 48, 149-158.
  • Preda C. , Saporta G. (2005b) Clusterwise PLS
    regression on a stochastic process,
    Computational Statistics and Data Analysis, 49,
  • Preda C., Saporta G. Lévéder C., (2007) PLS
    classification of functional data, Computational
  • Ramsay Silverman (1997) Functional data
    analysis, Springer
Write a Comment
User Comments (0)