Title: USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES
1USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF
AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES
- Catherine Zhukovskaya
- Faculty of Transport and Mechanical Engineering
- Riga Technical University
2Outline
- Introduction
- Informative base
- Used models for analyzing and forecasting of the
air passengers conveyances - Elaboration of linear models
- Elaboration of generalized linear models
- Conclusion
- References
31. Introduction
- Most the literature which is devoted to
forecasting of transport flows contain only
simple forecasting models on the base of the time
series methods Hünt (2003) or linear regression
methods with small number of explanatory
variables Butkevicius, Vyskupaitis (2005),
Šliupas (2006). - Two different approaches for the forecasting of
air passengers conveyances from EU countries were
considered in this investigation - the classical method of linear regression
- the generalized linear model (GLM).
- The aim of this investigation is to illustrate
the advantage of using the GLM comparing with the
simple linear regression models. - The verification of the models and the evaluation
of the unknown parameters are included as well. - All calculations are being done with Statistica
6.0 and elaborated computer software in MathCad
12.
4Factors
2. Informative base
- The forecasted variable was the number of air
passenger carried, expressed in millions of
passengers.
- t1 - total population of the country (TP),
millions of inhabitants - t2 - area of the country (AREA), thousands of
km2 - t3 - density of the country population (PD),
number of inhabitants per km2 - t4 - monthly labour costs (MLC), thousands of
euros - t5 - gross domestic product (GDP) per capita
in Purchasing Power Standards (PPS) (GDP_PPS) - t6 - gross domestic product (GDP), billions of
euro - t7 - comparative price level (CPL)
- t8 - inflation rate (IR)
- t9 - unemployment rate (UR)
- t10 - labour productivity per hour worked (LPHW).
5- The following 25 countries of EU were selected
Belgium, Czech Republic, Denmark, Germany,
Estonia, Greece, Spain, France, Ireland, Italy,
Cyprus, Latvia, Lithuania, Luxembourg, Hungary,
Malta, Netherlands, Austria, Poland, Portugal,
Slovenia, Slovakia, Finland, Sweden and United
Kingdom. - The considered period was from 1996 to 2005.
- All data for this investigation have been
received from the electronic databaseThe
Statistical Office of the European Communities
(EUROSTAT) http//epp.eurostat.ec.europa.eu - The final number of the observation was 161
- Data for the period from 1996 to 2004 have been
used for the estimation and forecasting - 140
observations - Data of the 2005 have been used for the check out
of the quality of forecasting, so called the
cross-validation (CV) - 21 observations.
63. Used models for analyzing and forecasting of
the air passengers conveyances
Main notions
- The data about concrete country for the concrete
year were taken as the observation. - The main object of the consideration was the air
passengers conveyances from EU countries. - All the considered models were the group models
Andronov (1983). - Classification of regressional models according
to their mathematical form - Linear regression models
- Generalized linear regression models (GLM).
7- The linear regression model Hardle (2004)
- E(Y(k)(x)) xT?, (1)
- where
- Y(k) is a dependent variable for the k-th
considered model - x (x1, x2, , xd)T is d-dimensional vector of
explanatory variables - ? (?0, ?1, ?2, , ?d)T is a coefficient vector
that has to be estimated from observations for
Y(k) and x. - The generalized linear regression model
- E(Y(k)(x)) GxT?, (2)
- where G(?) is the known function of the one
dimensional variable. -
84. Elaboration of linear models
- The basic criteria for the best model choosing
- Multiple coefficient of determination (R2)
- Fisher criterion (F)
- Sum of the squares of the residuals (SSRes)
- Sum of the squares of residuals for the
cross-validation (CV SSRes). - For the checking of the statistical hypotheses we
always used the statistical significance level
? 0.05.
MODEL 1Y(1) ?0 ?1x1 ?2x2 ?3x3 ?4x4
?5x5 ?6x6 ?7x7 ?8x8 ?9x9 ?10x10,where
Y(1) is the total number of air passenger
carriedx1 t1, x2 t2, x3 t3, x4 t4, x5
t5, x6 t6, x7 t7, x8 t8, x9 t9, x10 t10.
9Results for the MODEL 1 Ê(Y(1)(x)) 14
0,77x1 0,16x2 185,8x3 -2,44x4 0,53x5
0,07x6 0,05x7 0,32x8 -1,2x9 - 1,03x10
Table 1
Variable Factor b t(129) p-level
Intercept 14.00 0.84 0.405
x1 TP -0.77 -1.56 0.121
x2 AREA 0.16 5.60 0.000
x3 PD 185.80 4.67 0.000
x4 MLC -2.44 -0.44 0.660
x5 GDP_PPS 0.53 1.68 0.096
x6 GDP 0.07 3.81 0.000
x7 CPL 0.05 0.37 0.710
x8 IR 0.32 0.29 0.771
x9 UR -1.20 -1.59 0.114
x10 LPHW -1.03 -3.75 0.000
.
.
Fisher criterion F 63.49
R2 0.831
10New factor
0, if the considered country is the old member of
EU 1, if the considered country is the new one.
t11 (ON)
MODEL 2Y(2) ?0 ?1x1 ?2x2 ?3x3 ?4x4
?5x5,where Y(2) Y(1)x1 t2, x2 t3, x3
t6, x4 t10, x5 t11. Results for the MODEL
2 Ê(Y(2)(x)) 13.56 0,09x1 134,01x2 0,05x
3 - 0,68x4 29,36x5.
Table 2
Variable Factor b t(134) p-level
Intercept 13.56 2.45 0.016
x1 AREA 0.09 4.45 0.000
x2 PD 134.01 4.32 0.000
x3 GDP 0.05 10.34 0.000
x4 LPHW -0.68 -5.12 0.000
x5 ON 29.36 4.21 0.000
R2 0.829 Fisher criterion F 129.85
11Modifications of factors
MODEL 3Y(2) ?0 ?1x1 ?2x2 ?3x3 ?4x4
?5x5,where Y(3) Y(1)
Results for the MODEL 3 Ê(Y(3)(x)) -6,34
113,26x1 0,14x2 - 0,52x3 - 0,03x4 3,03x5
Table 3
Variable Factor b t(134) p-level
Intercept -6.34 -1.05 0.296
x1 PD 113.26 4.00 0.000
x2 GDP 0.14 10.66 0.000
x3 LPHW -0.52 -5.80 0.000
x4 sq(TP) -0.03 -7.56 0.000
x5 sqrt(AREA) 3.03 5.74 0.000
R2 0.867 Fisher criterion F 174.08
12Analysis of observed and predicted valuesfor the
MODEL 3
1
2
Figure 1. Plot of observed and predicted
values. Figure 2. Plot of observed and predicted
values for the CV.
13MODEL 4Y(4) ?0 ?1x1 ?2x2 ?3x3 ?4x4
?5x5 ?6x6 ?7x7 ?8x8 ?9x9,where Y(4)
Y(1)/t1 - the ratio between the total number of
air passenger carried and the number of
inhabitants of the country
Results for the MODEL 4 Ê(Y(4)(x)) 0,56
2,33x1 - 1,04x2 - 0,02x3 0,001x4 1,76x5
- 0,0004x6 0,04x7 0,17x8.
Variable Factor b t(131) p-level
Intercept -5.67 -6.25 0.000
x1 AREA -0.02 -6.73 0.000
x2 PD 10.37 6.19 0.000
x3 MLC -0.73 -4.19 0.000
x4 ON 0.83 8.30 0.000
x5 sqrt(TP) -1.02 -7.32 0.000
x6 sqrt(AREA) 1.06 7.10 0.000
x7 AREA/TP -0.12 -6.98 0.000
x8 sqrt(AREA)/TP 0.94 5.84 0.000
x9 GDP/TP 0.15 6.28 0.000
Table 4
R2 0.760 Fisher criterion F 45.81
14New factor
0, if the value y/t1 for the considered country
is small (less than 2) 1, if the value y/t1 is
larger than 2.
t12 (HL)
MODEL 5Y(2) ?0 ?1x1 ?2x2 ?3x3 ?4x4
?5x5 ?6x6 ?7x7 ?8x8,where Y(5) Y(4)
Results for the MODEL 5 Ê(Y(5)(x)) 0,99 -
0,46x1 - 0,02x2 - 0,02x3 - 0,02x4 0,01x5
1,27x6 1,15x7 0,07x8
Table 5
Variable Factor b t(131) p-level
Intercept 0.99 3.93 0.000
x1 MLC -0.46 -3.41 0.001
x2 GDP_PPS -0.02 -3.81 0.000
x3 IR -0.02 -1.33 0.187
x4 UR -0.02 -1.90 0.056
x5 LPHW 0.01 3.72 0.000
x6 ON 1.27 9.21 0.000
x7 HL 1.15 15.30 0.000
x8 GDP/TP 0.07 3.41 0.001
R2 0.864 Fisher criterion F 104.174
15Pivot results for the linear regression models
Table 6
Model R2 R1 F R2 SSRes R3 CV SSRes R4 Sum R Total R
1 0.831 3 63.49 4 52 651 5 114 885 5 17 5
2 0.829 4 129.85 2 53 344 5 109 723 4 15 3
3 0.867 1 174.10 1 41 599 2 49 450 1 5 1
4 0.760 5 45.81 5 35 064 3 57 310 3 16 4
5 0.864 2 104.20 3 12 775 1 51 448 2 8 2
16Analysis of observed and predicted valuesfor the
MODEL 5
3
4
Figure 3. Plot of recalculated observed and
predicted values. Figure 4. Plot of recalculated
observed and predicted values for the CV.
174. Elaboration of generalized linear models
- For the further investigation the best linear
regression model (Model 5) has been chosen - Two different GLM were considered. In both of
them the value of the regressand Y(GLM) Y(5) /
t1 and the collection of the regressors are the
same as for Model 5.
GLM1
(3)
where hi is the total population number, xi is
vector-columns of the independent variables, i is
the observation number, i 1, 2, , n.
GLM2
(4)
where a is additional parameter (constant).
18- For unknown parameter vector ? estimation we used
the least squares criterion
(5)
where Yi and Yi are observed and calculated
values of Y.
1. Linearization
LM1
(6)
LM2
(7)
where Y Y/ h.
19The models LM1 and LM2 give the following
estimate for E(Y)
- The values of SSRes and CV SSRes for the Model 5
and LM
Table 7
SSRes SSRes SSRes CV SSRes CV SSRes CV SSRes
Model 5 LM1 LM2 Model 5 LM1 LM2
R0/n 12 775 27 447 21 834 51 448 676 576 229 554
- We can see that linearization gives bad results.
Making attempts to improve the obtained results a
two-stage estimation procedure was developed. - The first stage corresponds to the considered
linearization. As the second step we used the
procedure of calibration when we precise the
gotten estimates by using the well-known gradient
method.
202. Calibration
- Gradients for the least squares criterion
GLM1
(8)
GLM2
(9)
21- For the GLM2 we found the optimum value of R0 not
only from the values ? but from the parameter ?
also.
The GLM1 and GLM2 have the following estimates
for E(Y)
Table 8
CV SSRes CV SSRes CV SSRes
Model 5 GLM1 GLM2
R0/n 51 447 47 807 34 567
22Analysis of observed and predicted valuesfor the
GLM
5
6
Figure 5. Plot of observed and predicted
values. Figure 6. Plot of observed and predicted
values for the CV.
23Dependence of values SSRes and CV SSRes from the
value of parameter ? for GLM2
7
Figure 7. The values of SSRes and CV SSRes as a
function of parameter ? for GLM 2
- The optimal value for analysis of SSRes was
obtained then ? 2. - The best result for the analysis of CV SSRes was
obtained then ? 6.
246. Conclusion
- The linear and generalized linear regressional
models for the forecasting of air passengers
conveyances from EU countries were considered.
These models contain a big number of explanatory
factors and their combinations. - For the estimation of the unknown parameters of
the linear regressional models we used the
standard procedures. For the estimation of
unknown parameters of GLM the special two-stage
procedure has been elaborated. - The cross-validation approach has been taken as
the main procedure for the check out the adequacy
of all considered models and choosing the best
model for the forecasting. - The advantage of GLM application has been shown.
257. References
- Andronov A.M. etc. Forecasting of air passengers
conveyances on the transport. // Transport,
Moscow, 1983. (In Russian). - Butkevicius J., Vyskupaitis A. Development of
passenger transportation by Lithuanian sea
transport. // In Proceedings of International
Conference RelStat04, Transport and
Telecommunication, Vol.6. N 2, 2005. - Hardle W., Muller M., Sperlich S., Werwatz A.
Nonparametric and Semiparametric Models.
Springer, Berlin, 2004. - Hünt U. Forecasting of railway freight volume
approach of Estonian railway to arise efficiency.
// In TRANSPORT 2003, Vol. XXVIII, No 6, pp.
255-258. - Šliupas T. Annual average daily traffic
forecasting using different techniques. // In
TRANSPORT 2006, Vol. XXI, No 1, pp. 38-43. - EUROSTAT YEARBOOK 2005. The statistical guide to
Europe. Data 19932004. EU, EuroSTAT, 2005.URL
http//epp.eurostat.ec.europa.eu
26THANK YOU FOR YOUR ATTENTION