USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES

Description:

... and forecasting of the air passengers' conveyances ... 'The Statistical Office of the European Communities' (EUROSTAT) http://epp.eurostat.ec.europa.eu ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 27
Provided by: katjazhu
Category:

less

Transcript and Presenter's Notes

Title: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES


1
USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF
AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES
  • Catherine Zhukovskaya
  • Faculty of Transport and Mechanical Engineering
  • Riga Technical University

2
Outline
  1. Introduction
  2. Informative base
  3. Used models for analyzing and forecasting of the
    air passengers conveyances
  4. Elaboration of linear models
  5. Elaboration of generalized linear models
  6. Conclusion
  7. References

3
1. Introduction
  • Most the literature which is devoted to
    forecasting of transport flows contain only
    simple forecasting models on the base of the time
    series methods Hünt (2003) or linear regression
    methods with small number of explanatory
    variables Butkevicius, Vyskupaitis (2005),
    Šliupas (2006).
  • Two different approaches for the forecasting of
    air passengers conveyances from EU countries were
    considered in this investigation
  • the classical method of linear regression
  • the generalized linear model (GLM).
  • The aim of this investigation is to illustrate
    the advantage of using the GLM comparing with the
    simple linear regression models.
  • The verification of the models and the evaluation
    of the unknown parameters are included as well.
  • All calculations are being done with Statistica
    6.0 and elaborated computer software in MathCad
    12.

4
Factors
2. Informative base
  • The forecasted variable was the number of air
    passenger carried, expressed in millions of
    passengers.
  • t1 - total population of the country (TP),
    millions of inhabitants
  • t2 - area of the country (AREA), thousands of
    km2
  • t3 - density of the country population (PD),
    number of inhabitants per km2
  • t4 - monthly labour costs (MLC), thousands of
    euros
  • t5 - gross domestic product (GDP) per capita
    in Purchasing Power Standards (PPS) (GDP_PPS)
  • t6 - gross domestic product (GDP), billions of
    euro
  • t7 - comparative price level (CPL)
  • t8 - inflation rate (IR)
  • t9 - unemployment rate (UR)
  • t10 - labour productivity per hour worked (LPHW).

5
  • The following 25 countries of EU were selected
    Belgium, Czech Republic, Denmark, Germany,
    Estonia, Greece, Spain, France, Ireland, Italy,
    Cyprus, Latvia, Lithuania, Luxembourg, Hungary,
    Malta, Netherlands, Austria, Poland, Portugal,
    Slovenia, Slovakia, Finland, Sweden and United
    Kingdom.
  • The considered period was from 1996 to 2005.
  • All data for this investigation have been
    received from the electronic databaseThe
    Statistical Office of the European Communities
    (EUROSTAT) http//epp.eurostat.ec.europa.eu
  • The final number of the observation was 161
  • Data for the period from 1996 to 2004 have been
    used for the estimation and forecasting - 140
    observations
  • Data of the 2005 have been used for the check out
    of the quality of forecasting, so called the
    cross-validation (CV) - 21 observations.

6
3. Used models for analyzing and forecasting of
the air passengers conveyances
Main notions
  • The data about concrete country for the concrete
    year were taken as the observation.
  • The main object of the consideration was the air
    passengers conveyances from EU countries.
  • All the considered models were the group models
    Andronov (1983).
  • Classification of regressional models according
    to their mathematical form
  • Linear regression models
  • Generalized linear regression models (GLM).

7
  • The linear regression model Hardle (2004)
  • E(Y(k)(x))  xT?, (1)
  • where
  • Y(k) is a dependent variable for the k-th
    considered model
  • x  (x1, x2, , xd)T is d-dimensional vector of
    explanatory variables
  • ?  (?0, ?1, ?2, , ?d)T is a coefficient vector
    that has to be estimated from observations for
    Y(k) and x.
  • The generalized linear regression model
  • E(Y(k)(x))  GxT?, (2)
  • where G(?) is the known function of the one
    dimensional variable.

8
4. Elaboration of linear models
  • The basic criteria for the best model choosing
  • Multiple coefficient of determination (R2)
  • Fisher criterion (F)
  • Sum of the squares of the residuals (SSRes)
  • Sum of the squares of residuals for the
    cross-validation (CV SSRes).
  • For the checking of the statistical hypotheses we
    always used the statistical significance level
    ?  0.05.

MODEL 1Y(1)  ?0  ?1x1  ?2x2  ?3x3  ?4x4  
?5x5  ?6x6 ?7x7  ?8x8  ?9x9  ?10x10,where
Y(1) is the total number of air passenger
carriedx1 t1, x2 t2, x3 t3, x4 t4, x5
t5, x6 t6, x7 t7, x8 t8, x9 t9, x10 t10.
9
Results for the MODEL 1 Ê(Y(1)(x))  14
0,77x1  0,16x2  185,8x3 -2,44x4  0,53x5 
0,07x6 0,05x7   0,32x8 -1,2x9 - 1,03x10
Table 1
Variable Factor b t(129) p-level
Intercept 14.00 0.84 0.405
x1 TP -0.77 -1.56 0.121
x2 AREA 0.16 5.60 0.000
x3 PD 185.80 4.67 0.000
x4 MLC -2.44 -0.44 0.660
x5 GDP_PPS 0.53 1.68 0.096
x6 GDP 0.07 3.81 0.000
x7 CPL 0.05 0.37 0.710
x8 IR 0.32 0.29 0.771
x9 UR -1.20 -1.59 0.114
x10 LPHW -1.03 -3.75 0.000
.
.
Fisher criterion F 63.49
R2 0.831
10
New factor
0, if the considered country is the old member of
EU 1, if the considered country is the new one.
t11 (ON)
MODEL 2Y(2)  ?0  ?1x1  ?2x2  ?3x3  ?4x4  
?5x5,where Y(2) Y(1)x1 t2, x2 t3, x3
t6, x4 t10, x5 t11. Results for the MODEL
2 Ê(Y(2)(x))  13.56 0,09x1  134,01x2  0,05x
3 - 0,68x4  29,36x5.
Table 2
Variable Factor b t(134) p-level
Intercept 13.56 2.45 0.016
x1 AREA 0.09 4.45 0.000
x2 PD 134.01 4.32 0.000
x3 GDP 0.05 10.34 0.000
x4 LPHW -0.68 -5.12 0.000
x5 ON 29.36 4.21 0.000
R2 0.829 Fisher criterion F 129.85
11
Modifications of factors
MODEL 3Y(2)  ?0  ?1x1  ?2x2  ?3x3  ?4x4  
?5x5,where Y(3) Y(1)
Results for the MODEL 3 Ê(Y(3)(x))  -6,34
113,26x1  0,14x2 - 0,52x3 - 0,03x4  3,03x5
Table 3
Variable Factor b t(134) p-level
Intercept -6.34 -1.05 0.296
x1 PD 113.26 4.00 0.000
x2 GDP 0.14 10.66 0.000
x3 LPHW -0.52 -5.80 0.000
x4 sq(TP) -0.03 -7.56 0.000
x5 sqrt(AREA) 3.03 5.74 0.000
R2 0.867 Fisher criterion F 174.08
12
Analysis of observed and predicted valuesfor the
MODEL 3
1
2
Figure 1. Plot of observed and predicted
values. Figure 2. Plot of observed and predicted
values for the CV.
13
MODEL 4Y(4)  ?0  ?1x1  ?2x2  ?3x3  ?4x4  
?5x5  ?6x6  ?7x7  ?8x8  ?9x9,where Y(4)
Y(1)/t1 - the ratio between the total number of
air passenger carried and the number of
inhabitants of the country
Results for the MODEL 4 Ê(Y(4)(x))  0,56
2,33x1 - 1,04x2 - 0,02x3  0,001x4  1,76x5
- 0,0004x6  0,04x7  0,17x8.
Variable Factor b t(131) p-level
Intercept -5.67 -6.25 0.000
x1 AREA -0.02 -6.73 0.000
x2 PD 10.37 6.19 0.000
x3 MLC -0.73 -4.19 0.000
x4 ON 0.83 8.30 0.000
x5 sqrt(TP) -1.02 -7.32 0.000
x6 sqrt(AREA) 1.06 7.10 0.000
x7 AREA/TP -0.12 -6.98 0.000
x8 sqrt(AREA)/TP 0.94 5.84 0.000
x9 GDP/TP 0.15 6.28 0.000
Table 4
R2 0.760 Fisher criterion F 45.81
14
New factor
0, if the value y/t1 for the considered country
is small (less than 2) 1, if the value y/t1 is
larger than 2.
t12 (HL)
MODEL 5Y(2)  ?0  ?1x1  ?2x2  ?3x3  ?4x4  
?5x5  ?6x6  ?7x7  ?8x8,where Y(5) Y(4)
Results for the MODEL 5 Ê(Y(5)(x))  0,99 -
0,46x1 - 0,02x2 - 0,02x3 - 0,02x4  0,01x5
 1,27x6  1,15x7  0,07x8
Table 5
Variable Factor b t(131) p-level
Intercept 0.99 3.93 0.000
x1 MLC -0.46 -3.41 0.001
x2 GDP_PPS -0.02 -3.81 0.000
x3 IR -0.02 -1.33 0.187
x4 UR -0.02 -1.90 0.056
x5 LPHW 0.01 3.72 0.000
x6 ON 1.27 9.21 0.000
x7 HL 1.15 15.30 0.000
x8 GDP/TP 0.07 3.41 0.001
R2 0.864 Fisher criterion F 104.174
15
Pivot results for the linear regression models
Table 6
Model R2 R1 F R2 SSRes R3 CV SSRes R4 Sum R Total R
1 0.831 3 63.49 4 52 651 5 114 885 5 17 5
2 0.829 4 129.85 2 53 344 5 109 723 4 15 3
3 0.867 1 174.10 1 41 599 2 49 450 1 5 1
4 0.760 5 45.81 5 35 064 3 57 310 3 16 4
5 0.864 2 104.20 3 12 775 1 51 448 2 8 2
16
Analysis of observed and predicted valuesfor the
MODEL 5
3
4
Figure 3. Plot of recalculated observed and
predicted values. Figure 4. Plot of recalculated
observed and predicted values for the CV.
17
4. Elaboration of generalized linear models
  • For the further investigation the best linear
    regression model (Model 5) has been chosen
  • Two different GLM were considered. In both of
    them the value of the regressand Y(GLM) Y(5) /
    t1 and the collection of the regressors are the
    same as for Model 5.

GLM1
(3)
where hi is the total population number, xi is
vector-columns of the independent variables, i is
the observation number, i  1, 2, , n.
GLM2
(4)
where a is additional parameter (constant).
18
  • For unknown parameter vector ? estimation we used
    the least squares criterion

(5)
where Yi and Yi are observed and calculated
values of Y.
1. Linearization
LM1
(6)
LM2
(7)
where Y Y/ h.
19
The models LM1 and LM2 give the following
estimate for E(Y)
  • The values of SSRes and CV SSRes for the Model 5
    and LM

Table 7
SSRes SSRes SSRes CV SSRes CV SSRes CV SSRes
Model 5 LM1 LM2 Model 5 LM1 LM2
R0/n 12 775 27 447 21 834 51 448 676 576 229 554
  • We can see that linearization gives bad results.
    Making attempts to improve the obtained results a
    two-stage estimation procedure was developed.
  • The first stage corresponds to the considered
    linearization. As the second step we used the
    procedure of calibration when we precise the
    gotten estimates by using the well-known gradient
    method.

20
2. Calibration
  • Gradients for the least squares criterion

GLM1
(8)
GLM2
(9)
21
  • For the GLM2 we found the optimum value of R0 not
    only from the values ? but from the parameter ?
    also.

The GLM1 and GLM2 have the following estimates
for E(Y)
Table 8
CV SSRes CV SSRes CV SSRes
Model 5 GLM1 GLM2
R0/n 51 447 47 807 34 567
22
Analysis of observed and predicted valuesfor the
GLM
5
6
Figure 5. Plot of observed and predicted
values. Figure 6. Plot of observed and predicted
values for the CV.
23
Dependence of values SSRes and CV SSRes from the
value of parameter ? for GLM2
7
Figure 7. The values of SSRes and CV SSRes as a
function of parameter ? for GLM 2
  • The optimal value for analysis of SSRes was
    obtained then ?  2.
  • The best result for the analysis of CV SSRes was
    obtained then ?  6.

24
6. Conclusion
  • The linear and generalized linear regressional
    models for the forecasting of air passengers
    conveyances from EU countries were considered.
    These models contain a big number of explanatory
    factors and their combinations.
  • For the estimation of the unknown parameters of
    the linear regressional models we used the
    standard procedures. For the estimation of
    unknown parameters of GLM the special two-stage
    procedure has been elaborated.
  • The cross-validation approach has been taken as
    the main procedure for the check out the adequacy
    of all considered models and choosing the best
    model for the forecasting.
  • The advantage of GLM application has been shown.

25
7. References
  1. Andronov A.M. etc. Forecasting of air passengers
    conveyances on the transport. // Transport,
    Moscow, 1983. (In Russian).
  2. Butkevicius J., Vyskupaitis A. Development of
    passenger transportation by Lithuanian sea
    transport. // In Proceedings of International
    Conference RelStat04, Transport and
    Telecommunication, Vol.6. N 2, 2005.
  3. Hardle W., Muller M., Sperlich S., Werwatz A.
    Nonparametric and Semiparametric Models.
    Springer, Berlin, 2004.
  4. Hünt U. Forecasting of railway freight volume
    approach of Estonian railway to arise efficiency.
    // In TRANSPORT 2003, Vol. XXVIII, No 6, pp.
    255-258.
  5. Šliupas T. Annual average daily traffic
    forecasting using different techniques. // In
    TRANSPORT 2006, Vol. XXI, No 1, pp. 38-43.
  6. EUROSTAT YEARBOOK 2005. The statistical guide to
    Europe. Data 19932004. EU, EuroSTAT, 2005.URL
    http//epp.eurostat.ec.europa.eu

26
THANK YOU FOR YOUR ATTENTION
Write a Comment
User Comments (0)
About PowerShow.com