Linear Techniques for Regression and Classification on Functional Data - PowerPoint PPT Presentation

About This Presentation
Title:

Linear Techniques for Regression and Classification on Functional Data

Description:

Title: Multicolin arit et r gression PLS Author: Gilbert Saporta Last modified by: Saporta Created Date: 4/16/2002 8:19:12 AM Document presentation format – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 45
Provided by: Gilbe63
Category:

less

Transcript and Presenter's Notes

Title: Linear Techniques for Regression and Classification on Functional Data


1
Linear Techniques for Regression and
Classification on Functional Data
Gilbert Saporta Chaire de Statistique Appliquée
CEDRIC Conservatoire National des Arts et
Métiers 292 rue Saint Martin F 75141 Paris Cedex
03 saporta_at_cnam.fr http//cedric.cnam.fr/saporta

Joint work with D. Costanzo (U.Calabria)
C.Preda (U.Lille2)
2
Outline
  • 1. Introduction 
  • 2. OLS regression on functional data
  • 3. PLS functional regression
  • 4. Clusterwise regression
  • 5. Discrimination
  • 6. Anticipated prediction
  • 7. Conclusion and perspectives

3
1.Introduction
  • Very high dimensional data an infinite number of
    variables
  • Regression on functional data
  • Example 1 Y amount of crop
  • Xt temperature curves
  • p ?

R.A.Fisher The Influence of Rainfall on the
Yield of Wheat at Rothamsted Philosophical
Transactions of the Royal Society, B 213 89-142
(1924)
4
  • Example 2 Growth index of 84 shares at Paris
    stock exchange during 60 minutes

How to predict X55 till X60, for a new share,
knowing X from t0 till t55?
5
Discrimination on functional data
  • Example 3 Kneading curves for cookies (Danone
    Vitapole)

6
  • After smoothing with cubic B-splines (Lévéder
    al, 2004)

How to predict the quality of the cookies?
7
  • Linear combination
  • Integral regression (Fisher 1924)
  • instead of a finite sum

8
  • Discrimination on functional data
  • Particular case of regression when the response
    is binary
  • Anticipation
  • Determine an optimal time tltT giving a
    prediction based on 0t almost as good as the
    prediction using all the data 0T

9
2. OLS regression on functional data
  • Y Xt (with zero mean)
  • 2.1 The OLS problem
  • Minimizing
  • leads to normal, or Wiener-Hopf, equations
  • where C(t,s) cov(Xt, Xs)E(XtXs)

10
  • 2.2 Karhunen-Loeve decomposition (functional PCA)
  • factor loadings
  • principal components

11
  • Picards theorem ? is unique if and only if
  • Generally not trueespecially when n is finite
    since p gtn. Perfect fit when minimizing

12
  • Even if ? is unique, Wiener-Hopf equation is
    not an ordinary integral equation the solution
    is more frequently a distribution than a function
  • Constrained solutions are needed. (cf Green
    Silverman 1994, Ramsay Silverman 1997).

13
  • 2.3 Regression on principal components
  • Rank q approximation

14
  • Numerical computations
  • Solve integral equations in the general case
  • for step functions finite number of variables
    and of units operators are matrices, but with a
    very high size
  • Approximations by discretisation of time

15
  • Which principal components?
  • First q?
  • q best correlated with Y?
  • Principal components are computed irrespective of
    the response

16
3. Functional PLS regression
  • Use PLS components instead of principal
    components.
  • first PLS component
  • further PLS components as usual

17
  • order q approximation of Y by Xt
  • Convergence theorem
  • q have to be finite in order to get a formula!
  • Usually q is selected by cross-validation
  • (Preda Saporta, 2005a)

18
  • First PLS component easily interpretable
    coefficients with the same sign as r(yxt)
  • No integral equation
  • PLS fits better than PCR
  • Same proof as in De Jong, 1993

19
4. Clusterwise regression
  • 4.1 Model
  • G , variable with K categories (sub-populations)

20
  • 4.2 OLS and clusterwise regression
  • Residual variance of global regression within
    cluster residual variance variance due to the
    difference between local (clusterwise) and global
    regression (OLS)

21
  • 4.3 Estimation (Charles, 1977)
  • number of clusters k needs to be known
  • Alternated least squares
  • For a given partition estimate linear
    regressions for each cluster
  • Reallocate each point to the closest regression
    line (or surface)
  • Equivalent to ML for fixed regressors, fixed
    partition model (Hennig, 2000)
  • 4.4 Optimal k
  • AIC, BIC, crossvalidation

22
4.5 Clusterwise functional PLS regression
  • OLS functional regression not adequate to give
    estimations in each cluster
  • Our proposal estimate local models with
    functional PLS regression
  • Is the clusterwise algorithm still consistent?
  • Proof in Preda Saporta, 2005b

23
  • Prediction
  • Allocate a new observation to a cluster (nearest
    neighbor or other classification technique)
  • Use the corresponding local model
  • May be generalised if Y is itself a random
    vector

24
4.6 Application to stock market data
  • Growth index during 1 hour (between 10h and 11h)
    of 84 shares at Paris Stock Exchange
  • Goal predict a new share between 10h55 and 11h
    using data between 10h and 10h55

25
  • Exact computations need 1366 variables (number of
    intervals where the 85 curves are constant)
  • Discretisation in 60 intervals.
  • Comparison between PCR and PLS

26
  • Crash of share 85 not detected!

27
  • Clusterwise PLS
  • Four clusters (17321025)
  • Number of PLS component for each cluster 1 3 2
    2 (cross-validation)

28
  • Share 85 classified into cluster 1

29
3. Functional linear discrimination
  • LDA linear combinations
  • maximizing the ratio
  • Between group variance /Within group variance
  • For 2 groups Fishers LDF via a regression
    between coded Y and Xt
  • eg
  • (Preda Saporta, 2005a)

30
  • PLS regression with q components gives an
    approximation of ß(t) and of the score
  • For more than 2 groups PLS2 regression between
    k-1 indicators of Y and Xt
  • First PLS component given by the first
    eigenvector of the product of Escoufier operators
    WxWY
  • Preda Saporta, 2002 and Barker Rayens , 2003

31
Quality measures
  • For k2 ROC curve and AUC
  • For a specific threshold, x is classified into
    G1if dT(x)gts
  • Sensitivity or true positive rate
    P(dT(x)gts/Y1)1-ß
  • 1- specificity or 1- true negative rate
    P(dT(x)gts/Y0)?

32
ROC curve
  • Perfect discrimination
  • ROC curve is confounded with the edges of unit
    square
  • For identical conditional distributions ROC
    curve is confounded with the diagonal

33
  • ROC curve invariant for any increasing monotonous
    transformation
  • Area under ROC curve a global measure of
    performance allowing model comparisons
    (partially)
  • X1 drawn from G1 and X2 from G2
  • AUC estimated by the proportion of concordant
    pairs
  • nc Wilcoxon-Mann-Whitney statistic
  • UW n1n20.5n1(n11) AUCU/n1n2

34
4. Anticipated prediction
  • tltT such that the analysis on 0t give donne
    predictions almost as good as with 0T
  • Solution
  • When increasing s from 0 to T, look for the first
    value such that AUC(s) does not differ
    significantly from AUC(T)

35
  • A bootstrap procedure
  • Stratified resampling of the data
  • For each replication b, AUCb(s) and AUCb(T) are
    computed
  • Students T test or Wilcoxon on the B paired
    differences ?bAUCb(s)- AUCb(T)

36
5.Applications
  • 5.1 simulated data
  • Two classes with equal priors
  • W(t) brownian motion

37
(No Transcript)
38
  • With B50

39
  • 5.2 Kneading curves
  • After T 480s of kneading one gets cookies
    where quality is Y
  • 115 observations 50 good , 40 bad et 25
    adjustable
  • 241 equally spaced measurements
  • Smoothing with cubic B-splines , 16 knots

40
  • Performance for Ygood,bad
  • Repeat 100 times the split into learning and test
    samples of size (60, 30)
  • Average error rate
  • 0.142 with principal components
  • 0.112 with PLS components
  • Average AUC 0.746
  • ß(t)

41
  • Anticipated prediction
  • B50
  • t186
  • The recording period of the resistance dough can
    be reduced to less than half of the current one

42
6.Conclusions and perspectives
  • PLS regression is an efficient and simple way to
    get linear prediction for functional data
  • We have proposed a bootstrap procedure for the
    problem of anticipated prediction

43
  • Works in progress
  • on-line forecasting instead of using the
    same anticipated decision time t for all data,
    we could adapt t to each new trajectory given
    its incoming measurements.
  • Clusterwise discrimination
  • Comparison with functional logistic regression
  • Aguilera et al, 2006

44
References
  • Aguilera A.M., Escabias, M. Valderrama M.J.
    (2006) Using principal components for estimating
    logistic regression with high-dimensional
    multicollinear data, Computational Statistics
    Data Analysis, 50, 1905-1924
  • Barker M., Rayens W. (2003) Partial least squares
    for discrimination. J Chemomet 17166173
  • Charles, C., 1977. Régression typologique et
    reconnaissance des formes. Ph.D., Université
    Paris IX.
  • D. Costanzo, C. Preda et G. Saporta (2006).
    Anticipated prediction in discriminant analysis
    on functional data for binary response . In
    COMPSTAT2006, p. 821-828, Physica-Verlag
  • Hennig, C., (2000). Identifiability of models for
    clusterwise linear regression. J. Classification
    17, 273296.
  • Lévéder C., Abraham C., Cornillon P. A.,
    Matzner-Lober E., Molinari N. (2004)
    Discrimination de courbes de pétrissage.
    Chimiometrie 2004, 3743.
  • Preda C. , Saporta G. (2005a) PLS regression on
    a stochastic process, Computational Statistics
    and Data Analysis, 48, 149-158.
  • Preda C. , Saporta G. (2005b) Clusterwise PLS
    regression on a stochastic process,
    Computational Statistics and Data Analysis, 49,
    99-108.
  • Preda C., Saporta G. Lévéder C., (2007) PLS
    classification of functional data, Computational
    Statistics
  • Ramsay Silverman (1997) Functional data
    analysis, Springer
Write a Comment
User Comments (0)
About PowerShow.com