Title: Kafu Wong University of Hong Kong
1Ka-fu WongUniversity of Hong Kong
A Brief Review of Probability, Statistics, and
Regression for Forecasting
2Random variable
- A random variable is a mapping from the set of
all possible outcomes to the real numbers. - Todays Hang Seng Index can go up, down or stay
the same as yesterday. Consider the movement of
Hang Seng Index in a month of 22 trading days.
We can define a random variable Y of number of
days in which Hang Seng Index goes up. In this
case, Y assumes 22 values, y1, y2, , y22. - Discrete random variables can assume only a
countable number of values. A discrete
probability distribution describes the
probability of occurrence for all the events.
For instance, pi is the probability that event i
will occur. - Continuous random variables can assume a
continuum of values. A probability density
function, f(y), is a nonnegative continuous
function such that the area under f(y) between
any points a and b is the probability that Y
assumes a value between a and b.
3Moments
Mean (measures central tendency)
Variance (measures dispersion around mean)
Standard deviation
Skewness (measures the amount of asymmetry in a
distribution)
Kurtosis (measures the thickness of the tails in
a distribution)
4Multivariate Random Variables
Joint distribution
Covariance (measures dependence between two
variables)
Correlation
Conditional distribution
Conditional mean
Conditional variance
5Statistics
Sample mean
Sample variance
or
Sample standard deviation
or
6Statistics
Sample skewness
Sample kurtosis
Jarque-Bera test statistics
Under null of independent normally distributed
observations, JB is distributed in large samples
as a chi-square distribution with two degrees of
freedom.
7Example
What is our expectation of y given x0?
8Forecast
- Suppose we want to forecast the value of a
variable y, given the value of a variable x. - Denote that forecast yfx.
9Conditional expectation as a forecast
- Think of y and x as random variables jointly
drawn from some underlying population. - It seems reasonable to consider constructing the
forecast of y based on x as the expected value of
y conditional on x, i.e., - yfx E(y x ),the average population value
of y given that value of x. - E(y x ) is also called the population regression
of y (on x).
10Conditional expectation as a forecast
- The expected value of y conditional on x
- yfx E(y x ),
- It turns out that in many reasonable forecasting
settings, - this forecast has optimal properties (e.g.,
minimizing expected loss), and - (approximating) this forecast guides our choice
of forecast method.
11Unbiasedness of Conditional expectation as a
forecast
- The forecast error will be y - E(y x )
- Expected forecast error Ey - E(y x )
E(y)-EE(yx ) E(y)-E(y) 0 - Thus the conditional expectation is an unbiased
forecast. - Note that another name for E(y x ) is the
population regression of y (on x).
12Some operational assumptions about E(y x)
- In order to proceed in this direction, we need to
make some additional assumptions about the
underlying population and, in particular, the
form of E(y x ). - The simplest assumption to make is to assume that
the conditional expectation is a linear function
of x, i.e., assume - E(y x ) ß0 ß1x
- If ß0 and ß1 are known, then the forecast
problem is completed by setting - yfx ß0 ß1x
13When parameters are unknown
- Even if the conditional expectation is linear in
x, the parameters ß0 and ß1 will be unknown. - The next best thing for us to do would be to
estimate the values of ß0 and ß1 and use the
estimated ßs in place of their actual values to
form the forecasts. - This substitution will not provide as accurate a
forecast, since were introducing a new source of
forecast error due to estimation error or
sampling error. However, under certain
conditions the resulting forecast will still be
unbiased and retain certain optimality properties.
14When parameters are unknown
- Suppose we have access to a sample of T pairs of
(x,y) drawn from the population from which the
relevant value of y will be drawn
(x1,y1),(x2,y2),,(xT,yT). - In this case, a natural estimator of ß0 and ß1 is
the ordinary least squares (OLS) estimator, which
is obtained by minimizing the sum of squared
residuals - S (yt ß0 ß1xt)2
- with respect to ß0 and ß1. The solution are the
OLS estimates and . - Then, for a given value of x, we can forecast y
according to
15Fitting a regression lineEstimating ß0 and ß1
16When parameters are unknown
- This estimation procedure, also called the sample
regression of y on x, will provide us with a
good estimate of the conditional expectation of
y given x (i.e., the population regression of y
on x) and, therefore, a good forecast of y
given x, provided that certain additional
assumptions apply to the relationship between y
and x. - Let e denote the difference between y and E(y x
). That is, - e y - E(y x )
- i.e., y E(y x ) e
- and
- y ß0 ß1x e, if E(y x ) ß0 ß1x.
17When parameters are unknown
- The assumptions that we need pertain to these es
(the other factors that determine y) and their
relationship to the xs. - For instance, so long as E(et x1,,xT) 0 for t
1,,T, the OLS estimator of ß0 and ß1 based on
the data (x1,y1),,(xT,yT) will be unbiased and,
as a result, the forecast constructed by
replacing these population parameters with the
OLS estimates will be unbiased. - A standard set of assumptions that provide us
with a lot of value - Given x1,,xT , e1,,eT are i.i.d. N(0,s2)
random variables.
18When parameters are unknown
- These ideas and procedures extend naturally to
the setting where we want to forecast the value
of y based on the values of k other variables,
say, x1,,xk. - We begin by considering the conditional
expectation or population regression of y on
x1,,xk to make our forecast. That is, - yfx1,,xk E(yx1,,xk)
- To operationalize this forecast, we first assume
that the conditional expectation is linear, i.e., - E(yx1,,xk) ß0 ß1x1 ßkxk
19When parameters are unknown
- The unknown ßs are generally replaced the
estimate from a sample OLS regression. -
- Suppose we have the data set
- (y1,x11,,xk1), (y2,x12,,xk2), ,
(yT,x1T,,xkT) - The OLS estimate of the unknown parameters are
obtained by minimizing the sum-of-squared
residuals, - S(yt ß0 ß1x1t - - ßkxkt)2, t 1,,T.
- As in the case of the simple regression model,
this procedure to estimate the population
regression function will have good properties
provided that the regression errors - et yt E(ytx1t,,xkt) , t 1,,T
- have appropriate properties.
20ExampleMultiple Linear regression
21Residual plots
22Density Forecasts and Interval Forecasts
- The procedures we described above produce point
forecasts of y. They can also be used to produce
density and interval forecasts of y, provided
that the xs and the regression errors, i.e., the
es, meet certain conditions.
23End