THE INSTABILITY OF RISK MEASURES The problem of estimation error in complex systems

About This Presentation

Title:

THE INSTABILITY OF RISK MEASURES The problem of estimation error in complex systems

Description:

Coherent risk measures (promoted by academics) ... The same consideration can be extended to any coherent risk measure. ... Coherent measures on a given sample ... – PowerPoint PPT presentation

Number of Views:88

Avg rating:3.0/5.0

Slides: 96

Provided by: col69

Category:

more less

Transcript and Presenter's Notes

Title: THE INSTABILITY OF RISK MEASURES The problem of estimation error in complex systems

1
THE INSTABILITY OF RISK MEASURESThe problem of
estimation error in complex systems

Imre Kondor
Collegium Budapest and Eötvös University,
Budapest
European Conference on Complex Systems 2009 (ECCS
09)
University of Warwick, Coventry, UK
21-25 September, 2009

2
Coworkers

Szilárd Pafka (ELTE PhD student ? CIB Bank,
?Paycom.net, California)
Gábor Nagy (Debrecen University PhD student and
CIB Bank, Budapest)
Richárd Karádi (Technical University MSc student
?ProcterGamble)
Nándor Gulyás (ELTE PhD student ? Budapest Bank
?Lombard Leasing ?private enterpreneur)
István Varga-Haszonits (ELTE PhD student
?Morgan-Stanley)

3
Contents

The subject of the talk lies at the crossroads
of finance, statistical physics and computer
science
I. The investment problem portfolios, rational
portfolio selection, risk measures, the
problem of estimation error (noise), noise
sensitivity of risk measures, instability of
risk measures
II. The wider context having ramifications in
model building for complex systems,
computational complexity, critical phenomena,
estimation error as a critical phenomenon,
machine learning, statistics in high dimensions

4
Contents

The subject of the talk lies at the crossroads
of finance, statistical physics and computer
science
I. The investment problem portfolios, rational
portfolio selection, risk measures, the
problem of estimation error (noise), noise
sensitivity of risk measures, instability of
risk measures
II. The wider context having ramifications in
model building for complex systems,
computational complexity, critical phenomena,
estimation error as a critical phenomenon,
machine learning, statistics in high dimensions

5
Contents

The subject of the talk lies at the crossroads
of finance, statistical physics and computer
science
I. The investment problem portfolios, rational
portfolio selection, risk measures, the
problem of estimation error (noise), noise
sensitivity of risk measures, instability of
risk measures
II. The wider context having ramifications in
model building for complex systems,
computational complexity, critical phenomena,
estimation error as a critical phenomenon,
machine learning, statistics in high dimensions

I. THE INVESTMENT PROBLEM

7
A portfolio

is a combination of assets or investment
instruments (shares, bonds, foreign exchange,
derivatives, precious metals, commodities,
artworks, property, etc.). More generally, the
various business lines of a big firm, or even the
economy as a whole, can also be regarded as a
portfolio. The generic problem is how to allocate
available resources.

8
A portfolio

is a combination of assets or investment
instruments (shares, bonds, foreign exchange,
derivatives, precious metals, commodities,
artworks, property, etc.). More generally, the
various business lines of a big firm, or even the
economy as a whole, can also be regarded as a
portfolio. The generic problem is how to allocate
available resources.

9
Rational portfolio selection

The value of assets fluctuates.
It is dangerous to invest all our money into a
single asset.
Investment should be diversified, distributed
among the various assets.
More risky assets tend to yield higher return.
Some assets tend to fluctuate together, some
others in an opposite way.
Rational portfolio selection seeks a tradeoff
between risk and reward.

10
Rational portfolio selection

The value of assets fluctuates.
It is dangerous to invest all our money into a
single asset.
Investment should be diversified, distributed
among the various assets.
More risky assets tend to yield higher return.
Some assets tend to fluctuate together, some
others in an opposite way.
Rational portfolio selection seeks a tradeoff
between risk and reward.

11
Rational portfolio selection

The value of assets fluctuates.
It is dangerous to invest all our money into a
single asset.
Investment should be diversified, distributed
among the various assets.
More risky assets tend to yield higher return.
Some assets tend to fluctuate together, some
others in an opposite way.
Rational portfolio selection seeks a tradeoff
between risk and reward.

12
Rational portfolio selection

The value of assets fluctuates.
It is dangerous to invest all our money into a
single asset.
Investment should be diversified, distributed
among the various assets.
More risky assets tend to yield higher return.
Some assets tend to fluctuate together, some
others in an opposite way.
Rational portfolio selection seeks a tradeoff
between risk and reward.

13
Rational portfolio selection

The value of assets fluctuates.
It is dangerous to invest all our money into a
single asset.
Investment should be diversified, distributed
among the various assets.
More risky assets tend to yield higher return.
Some assets tend to fluctuate together, some
others in an opposite way.
Rational portfolio selection seeks a tradeoff
between risk and reward.

14
Rational portfolio selection

The value of assets fluctuates.
It is dangerous to invest all our money into a
single asset.
Investment should be diversified, distributed
among the various assets.
More risky assets tend to yield higher return.
Some assets tend to fluctuate together, some
others in an opposite way.
Rational portfolio selection seeks a tradeoff
between risk and reward.

15
Risk and reward

Financial reward can be measured in terms of the
return (relative price change)
or the log return
The characterization of risk is more controversial

16
Risk measures

A risk measure is a quantitative characterization
of our intuitive concept of risk (fear of
uncertainty and loss).
Risk is related to the stochastic nature of
returns. Mathematically, it is (or should be) a
convex functional of the pdf of returns.
The appropriate choice may depend on the nature
of data (e.g. on their asymptotics) and on the
context (investment, risk management,
benchmarking, tracking, regulation, capital
allocation)

17
Risk measures

A risk measure is a quantitative characterization
of our intuitive concept of risk (fear of
uncertainty and loss).
Risk is related to the stochastic nature of
returns. Mathematically, it is (or should be) a
convex functional of the pdf of returns.
The appropriate choice may depend on the nature
of data (e.g. on their asymptotics) and on the
context (investment, risk management,
benchmarking, tracking, regulation, capital
allocation)

18
Risk measures

A risk measure is a quantitative characterization
of our intuitive concept of risk (fear of
uncertainty and loss).
Risk is related to the stochastic nature of
returns. Mathematically, it is (or should be) a
convex functional of the pdf of returns.
The appropriate choice may depend on the nature
of data (e.g. on their asymptotics) and on the
context (investment, risk management,
benchmarking, tracking, regulation, capital
allocation)

19
The most obvious choice for a risk measure
Variance

Variance is the square of the average quadratic
deviation from the average a time honoured
statistical tool
Its use assumes that the probability distribution
of the returns is sufficiently concentrated
around the average, that there are no large
fluctuations
This is true in several instances, but we often
encounter fat tails, huge deviations with a
non-negligible probability (e.g. the Black
Monday).

20
The most obvious choice for a risk measure
Variance

Variance is the square of the average quadratic
deviation from the average a time honoured
statistical tool
Its use assumes that the probability distribution
of the returns is sufficiently concentrated
around the average, that there are no large
fluctuations
This is true in several instances, but we often
encounter fat tails, huge deviations with a
non-negligible probability (e.g. the Black
Monday).

21
The most obvious choice for a risk measure
Variance

Variance is the square of the average quadratic
deviation from the average a time honoured
statistical tool
Its use assumes that the probability distribution
of the returns is sufficiently concentrated
around the average, that there are no large
fluctuations
This is true in several instances, but we often
encounter fat tails, huge deviations with a
non-negligible probability (e.g. the Black
Monday).

22
Alternative risk measures

There are several alternative risk measures in
the academic literature, practice, and regulation
Value at risk (VaR) the best among the p
worst losses (not convex, punishes
diversification)
Mean absolute deviation (MAD) Algorithmics
Coherent risk measures (promoted by academics)
Expected shortfall (ES) average loss beyond a
high threshold
Maximal loss (ML) the single worst case

23
Alternative risk measures

There are several alternative risk measures in
the academic literature, practice, and regulation
Value at risk (VaR) the best among the p
worst losses (not convex, punishes
diversification)
Mean absolute deviation (MAD) Algorithmics
Coherent risk measures (promoted by academics)
Expected shortfall (ES) average loss beyond a
high threshold
Maximal loss (ML) the single worst case

24
Alternative risk measures

There are several alternative risk measures in
the academic literature, practice, and regulation
Value at risk (VaR) the best among the p
worst losses (not convex, punishes
diversification)
Mean absolute deviation (MAD) Algorithmics
Coherent risk measures (promoted by academics)
Expected shortfall (ES) average loss beyond a
high threshold
Maximal loss (ML) the single worst case

25
Alternative risk measures

There are several alternative risk measures in
the academic literature, practice, and regulation
Value at risk (VaR) the best among the p
worst losses (not convex, punishes
diversification)
Mean absolute deviation (MAD) Algorithmics
Coherent risk measures (promoted by academics)
Expected shortfall (ES) average loss beyond a
high threshold
Maximal loss (ML) the single worst case

26
Portfolios

A portfolio is a linear combination (a weighted
average) of assets, with a set of weights wi that
add up to unity (the budget constraint).
The weights are not necessarily positive short
selling
The fact that the weights can be negative means
that the region over which we are trying to
determine the optimal portfolio is not bounded

27
Portfolios

A portfolio is a linear combination (a weighted
average) of assets, with a set of weights wi that
add up to unity (the budget constraint).
The weights are not necessarily positive short
selling
The fact that the weights can be negative means
that the region over which we are trying to
determine the optimal portfolio is not bounded

28
Portfolios

A portfolio is a linear combination (a weighted
average) of assets, with a set of weights wi that
add up to unity (the budget constraint).
The weights are not necessarily positive short
selling
The fact that the weights can be negative means
that the region over which we are trying to
determine the optimal portfolio is not bounded

29
The variance of a portfolio

- a quadratic form of the weights. The
coefficients of this form are the elements of the
covariance matrix that measures the co-movements
between the various assets.

30
Markowitz portfolio selection theory

Rational portfolio selection realizes the
tradeoff between risk and reward by minimizing
the risk functional (e.g. the variance) over the
weights, given the expected return, the budget
constraint, and possibly other costraints. Here
we will consider the global minimum risk
portfolio, omitting the constraint on the
expected return.

31
Information parsimony

If we do not have enough information, we cannot
make a good decision.
In the context of portfolio selection this
embarrassing truism translates into the
requirement that the sample size (the length of
the time series) T must be much larger than the
size of the portfolio (the number of assets) N,
in order for us to be able to construct a good
portfolio.
For a large portfolio this condition is not easy
to satisfy (the sampling frequency cannot be
high, T cannot be very large).
Therefore, in real life N and T may well be of
the same order of magnitude, and it is
appropriate to consider the limit where N/T is of
the order of unity, while both N and T are large
(go to infinity).
In this limit we should expect large estimation
errors The optimal portfolio weights will be
very unstable, there will be huge sample to
sample fluctuations, and huge prediction errors.

32
Information parsimony

If we do not have enough information, we cannot
make a good decision.
In the context of portfolio selection this
embarrassing truism translates into the
requirement that the sample size (the length of
the time series) T must be much larger than the
size of the portfolio (the number of assets) N,
in order for us to be able to construct a good
portfolio.
For a large portfolio this condition is not easy
to satisfy (the sampling frequency cannot be
high, T cannot be very large).
Therefore, in real life N and T may well be of
the same order of magnitude, and it is
appropriate to consider the limit where N/T is of
the order of unity, while both N and T are large
(go to infinity).
In this limit we should expect large estimation
errors The optimal portfolio weights will be
very unstable, there will be huge sample to
sample fluctuations, and huge prediction errors.

33
Information parsimony

If we do not have enough information, we cannot
make a good decision.
In the context of portfolio selection this
embarrassing truism translates into the
requirement that the sample size (the length of
the time series) T must be much larger than the
size of the portfolio (the number of assets) N,
in order for us to be able to construct a good
portfolio.
For a large portfolio this condition is not easy
to satisfy (the sampling frequency cannot be
high, T cannot be very large).
Therefore, in real life N and T may well be of
the same order of magnitude, and it is
appropriate to consider the limit where N/T is of
the order of unity, while both N and T are large
(go to infinity).
In this limit we should expect large estimation
errors The optimal portfolio weights will be
very unstable, there will be huge sample to
sample fluctuations, and huge prediction errors.

34
Information parsimony

If we do not have enough information, we cannot
make a good decision.
In the context of portfolio selection this
embarrassing truism translates into the
requirement that the sample size (the length of
the time series) T must be much larger than the
size of the portfolio (the number of assets) N,
in order for us to be able to construct a good
portfolio.
For a large portfolio this condition is not easy
to satisfy (the sampling frequency cannot be
high, T cannot be very large).
Therefore, in real life N and T may well be of
the same order of magnitude, and it is
appropriate to consider the limit where N/T is of
the order of unity, while both N and T are large
(go to infinity).
In this limit we should expect large estimation
errors The optimal portfolio weights will be
very unstable, there will be huge sample to
sample fluctuations, and huge prediction errors.

35
Information parsimony

If we do not have enough information, we cannot
make a good decision.
In the context of portfolio selection this
embarrassing truism translates into the
requirement that the sample size (the length of
the time series) T must be much larger than the
size of the portfolio (the number of assets) N,
in order for us to be able to construct a good
portfolio.
For a large portfolio this condition is not easy
to satisfy (the sampling frequency cannot be
high, T cannot be very large).
Therefore, in real life N and T may well be of
the same order of magnitude, and it is
appropriate to consider the limit where N/T is of
the order of unity, while both N and T are large
(go to infinity).
In this limit we should expect large estimation
errors The optimal portfolio weights will be
very unstable, there will be huge sample to
sample fluctuations, and huge prediction errors.

In I. K., Sz. Pafka, G. Nagy Noise sensitivity
of portfolio selection under various risk
measures, Journal of Banking and Finance, 31,
1545-1573 (2007) we found
If there are no constraints on the portfolio
weights other than the budget constraint, the
fluctuations actually diverge, that is the
estimation error becomes infinite, at a critical
value of the ratio N/T.
The critical value of N/T depends on the risk
measure in question.
For the variance and the mean absolute deviation
the critical ratio is (N/T)crit 1, for Maximal
Loss (ML, the best combination of the worst
losses) (N/T)crit ½.
If the risk measure in question depends on a
parameter a, then the critical N/T value will
also depend on that parameter, and we obtain a
critical curve (a phase diagram) on the a N/T
plane.
For example, Expected Shortfall is the average
loss above a high threshold a. (ML is the a ? 1
limit of ES.) The phase boundary for ES runs
below ½ (the critical ratio N/T lt ½ for any a).

In I. K., Sz. Pafka, G. Nagy Noise sensitivity
of portfolio selection under various risk
measures, Journal of Banking and Finance, 31,
1545-1573 (2007) we found
If there are no constraints on the portfolio
weights other than the budget constraint, the
fluctuations actually diverge, that is the
estimation error becomes infinite, at a critical
value of the ratio N/T.
The critical value of N/T depends on the risk
measure in question.
For the variance and the mean absolute deviation
the critical ratio is (N/T)crit 1, for Maximal
Loss (ML, the best combination of the worst
losses) (N/T)crit ½.
If the risk measure in question depends on a
parameter a, then the critical N/T value will
also depend on that parameter, and we obtain a
critical curve (a phase diagram) on the a N/T
plane.
For example, Expected Shortfall is the average
loss above a high threshold a. (ML is the a ? 1
limit of ES.) The phase boundary for ES runs
below ½ (the critical ratio N/T lt ½ for any a).

In I. K., Sz. Pafka, G. Nagy Noise sensitivity
of portfolio selection under various risk
measures, Journal of Banking and Finance, 31,
1545-1573 (2007) we found
If there are no constraints on the portfolio
weights other than the budget constraint, the
fluctuations actually diverge, that is the
estimation error becomes infinite, at a critical
value of the ratio N/T.
The critical value of N/T depends on the risk
measure in question.
For the variance and the mean absolute deviation
the critical ratio is (N/T)crit 1, for Maximal
Loss (ML, the best combination of the worst
losses) (N/T)crit ½.
If the risk measure in question depends on a
parameter a, then the critical N/T value will
also depend on that parameter, and we obtain a
critical curve (a phase diagram) on the a N/T
plane.
For example, Expected Shortfall is the average
loss above a high threshold a. (ML is the a ? 1
limit of ES.) The phase boundary for ES runs
below ½ (the critical ratio N/T lt ½ for any a).

In addition, for finite N and T, the portfolio
optimization problem for ES and ML does not
always have a solution even below the critical
N/T ratio! (These risk measures may become
unbounded.)
For finite N and T, the existence of the optimum
is a probabilistic issue, it depends on the
sample. The probability of the existence of the
solution has been determined analytically for ML,
and numerically for ES.
As N and T ? 8 with N/T fixed, this probability
goes to 1 resp. 0, according to whether N/T is
below, or above (N/T)crit.

In addition, for finite N and T, the portfolio
optimization problem for ES and ML does not
always have a solution even below the critical
N/T ratio! (These risk measures may become
unbounded.)
For finite N and T, the existence of the optimum
is a probabilistic issue, it depends on the
sample. The probability of the existence of the
solution has been determined analytically for ML,
and numerically for ES.
As N and T ? 8 with N/T fixed, this probability
goes to 1 resp. 0, according to whether N/T is
below, or above (N/T)crit.

In addition, for finite N and T, the portfolio
optimization problem for ES and ML does not
always have a solution even below the critical
N/T ratio! (These risk measures may become
unbounded.)
For finite N and T, the existence of the optimum
is a probabilistic issue, it depends on the
sample. The probability of the existence of the
solution has been determined analytically for ML,
and numerically for ES.
As N and T ? 8 with N/T fixed, this probability
goes to 1 resp. 0, according to whether N/T is
below, or above (N/T)crit.

42
Illustration the case of Maximal Loss

Definition of the problem (for simplicity, we are
looking for the global minimum and allow
unlimited short selling)
where the ws are the portfolio weights and the
xs the returns.

43
Probability of finding a solution for the minimax
problem (for elliptic underlying distributions)

In the limit N,T ? 8, with N/T fixed, the
transition becomes sharp at N/T ½.
44

The phase boundary for ES has been obtained
numerically by I. K., Sz. Pafka, G. Nagy Noise
sensitivity of portfolio selection under various
risk measures, Journal of Banking and Finance,
31, 1545-1573 (2007) and calculated analytically
in A. Ciliberti, I. K., and M. Mézard On the
Feasibility of Portfolio Optimization under
Expected Shortfall, Quantitative Finance, 7,
389-396 (2007)

The estimation error diverges as one
approaches the phase boundary from below
45

The intuitive explanation for the instability of
ES and ML is that for a given finite sample there
may exist a dominant item (or a dominant
combination of items) that produces a larger
return at each time point than any of the others,
even if no such dominance relationship exist
between them on very large samples. This leads
the investor to believe that if she goes
extremely long in the dominant item and extremely
short in the rest, she can produce an arbitrarily
large return on the portfolio, at a risk that
goes to minus infinity (i.e. no risk).
The same consideration can be extended to any
coherent risk measure.
Evidently, the effect critically depends on the
weights being unbounded. Constraints on short
selling and other limits will be considered later.

The intuitive explanation for the instability of
ES and ML is that for a given finite sample there
may exist a dominant item (or a dominant
combination of items) that produces a larger
return at each time point than any of the others,
even if no such dominance relationship exist
between them on very large samples. This leads
the investor to believe that if she goes
extremely long in the dominant item and extremely
short in the rest, she can produce an arbitrarily
large return on the portfolio, at a risk that
goes to minus infinity (i.e. no risk).
The same consideration can be extended to any
coherent risk measure.
Evidently, the effect critically depends on the
weights being unbounded. Constraints on short
selling and other limits will be considered later.

The intuitive explanation for the instability of
ES and ML is that for a given finite sample there
may exist a dominant item (or a dominant
combination of items) that produces a larger
return at each time point than any of the others,
even if no such dominance relationship exist
between them on very large samples. This leads
the investor to believe that if she goes
extremely long in the dominant item and extremely
short in the rest, she can produce an arbitrarily
large return on the portfolio, at a risk that
goes to minus infinity (i.e. no risk).
The same consideration can be extended to any
coherent risk measure.
Evidently, the effect critically depends on the
weights being unbounded. Constraints on short
selling and other limits will be considered later.

48
Coherent measures on a given sample

Such apparent arbitrage can show up for any
coherent risk measure. (I.K. and I.
Varga-Haszonits Feasibility of portfolio
optimization under coherent risk measures,
submitted to Quantitative Finance)
Assume that the finite sample estimator
of our risk measure satisfies the coherence
axioms (Ph. Artzner, F. Delbaen, J. M. Eber, and
D. Heath, Coherent measures of risk, Mathematical
Finance, 9, 203-228, (1999)

49
The formal statements corresponding to the above
intuition

Proposition 1. If there exist two portfolios u
and v so that then the portfolio
optimisation task has no solution under any
coherent measure.
Proposition 2. Optimisation under ML has no
solution, if and only if there exists a pair of
portfolios such that one of them strictly
dominates the other.
Neither of these theorems assumes anything about
the underlying distribution.

50
The formal statements corresponding to the above
intuition

Proposition 1. If there exist two portfolios u
and v so that then the portfolio
optimisation task has no solution under any
coherent measure.
Proposition 2. Optimisation under ML has no
solution, if and only if there exists a pair of
portfolios such that one of them strictly
dominates the other.
Neither of these theorems assumes anything about
the underlying distribution.

51
The formal statements corresponding to the above
intuition

Proposition 1. If there exist two portfolios u
and v so that then the portfolio
optimisation task has no solution under any
coherent measure.
Proposition 2. Optimisation under ML has no
solution, if and only if there exists a pair of
portfolios such that one of them strictly
dominates the other.
Neither of these theorems assumes anything about
the underlying distribution.

52
Further generalization

As a matter of fact, this type of instability
appears even beyond the set of coherent risk
measures, and may appear in downside risk
measures in general.
By far the most widely used risk measure today is
Value at Risk (VaR). It is a downside measure. It
is not convex, therefore the stability problem of
its historical estimator is ill-posed.
Parametric VaR, however, is convex, and this
allows us to study the stability problem. Along
with VaR, we also look into the closely related
parametric estimates for two other downside risk
measures ES and semi variance.
Parametric estimates are expected to be more
stable than historical ones. We will then be able
to compare the phase diagrams for the historical
and parametric ES.

53
Further generalization

As a matter of fact, this type of instability
appears even beyond the set of coherent risk
measures, and may appear in downside risk
measures in general.
By far the most widely used risk measure today is
Value at Risk (VaR). It is a downside measure. It
is not convex, therefore the stability problem of
its historical estimator is ill-posed.
Parametric VaR, however, is convex, and this
allows us to study the stability problem. Along
with VaR, we also look into the closely related
parametric estimates for two other downside risk
measures ES and semi variance.
Parametric estimates are expected to be more
stable than historical ones. We will then be able
to compare the phase diagrams for the historical
and parametric ES.

54
Further generalization

As a matter of fact, this type of instability
appears even beyond the set of coherent risk
measures, and may appear in downside risk
measures in general.
By far the most widely used risk measure today is
Value at Risk (VaR). It is a downside measure. It
is not convex, therefore the stability problem of
its historical estimator is ill-posed.
Parametric VaR, however, is convex, and this
allows us to study the stability problem. Along
with VaR, we also look into the closely related
parametric estimates for two other downside risk
measures ES and semi variance.
Parametric estimates are expected to be more
stable than historical ones. We will then be able
to compare the phase diagrams for the historical
and parametric ES.

55
Further generalization

As a matter of fact, this type of instability
appears even beyond the set of coherent risk
measures, and may appear in downside risk
measures in general.
By far the most widely used risk measure today is
Value at Risk (VaR). It is a downside measure. It
is not convex, therefore the stability problem of
its historical estimator is ill-posed.
Parametric VaR, however, is convex, and this
allows us to study the stability problem. Along
with VaR, we also look into the closely related
parametric estimates for two other downside risk
measures ES and semi variance.
Parametric estimates are expected to be more
stable than historical ones. We will then be able
to compare the phase diagrams for the historical
and parametric ES.

56
Parametric estimation of VaR, ES, and
semi-variance

For simplicity, we assume that the historical
data are fitted to a Gaussian underlying process.
For a Gaussian process all three risk measures
can be written as
,
where

Here is the error function.
The condition for the existence of an optimum for
VaR and ES is
,
where

Note that there is no unconditional optimum even
if we know the underlying process exactly.
It can be shown that the meaning of the condition
is similar to the previous one (think e.g. of a
portfolio with one exceptionally high return item
that has a variance comparable to the others).
If we do not know the true process, but assume it
is, say, a Gaussian, we may estimate its mean
returns and covariances from the observed finite
time series as
and

Assume, for simplicity, that all the mean returns
are zero. After a long and tedious application of
the replica method imported from the theory of
random systems, the solvability condition works
out to be
lt
for all three risk measures. Note that this is
stronger than the solvability condition for the
exactly known process.
For the semivariance where the critical
N/T ratio is , which means that for the
parametrically estimated semivariance we need at
least three times larger samples than the size of
the portfolio.

60
For the parametric VaR and ES the result is shown
in the figure
61

In the region above the respective phase
boundaries the optimization problem does not have
a solution.
In the region below the phase boundary there is a
solution, but for it to be a good approximation
to the true risk we must go deep into the
feasible region. If we go to the phase boundary
from below, the estimation error diverges.
The phase boundary for ES runs above that of VaR,
so for a given confidence level a the critical
ratio for ES is larger than for VaR (we need less
data in order to have a solution). For
practically important values of a (95-99) the
difference is not significant.

In the region above the respective phase
boundaries the optimization problem does not have
a solution.
In the region below the phase boundary there is a
solution, but for it to be a good approximation
to the true risk we must go deep into the
feasible region. If we go to the phase boundary
from below, the estimation error diverges.
The phase boundary for ES runs above that of VaR,
so for a given confidence level a the critical
ratio for ES is larger than for VaR (we need less
data in order to have a solution). For
practically important values of a (95-99) the
difference is not significant.

In the region above the respective phase
boundaries the optimization problem does not have
a solution.
In the region below the phase boundary there is a
solution, but for it to be a good approximation
to the true risk we must go deep into the
feasible region. If we go to the phase boundary
from below, the estimation error diverges.
The phase boundary for ES runs above that of VaR,
so for a given confidence level a the critical
ratio for ES is larger than for VaR (we need less
data in order to have a solution). For
practically important values of a (95-99) the
difference is not significant.

64
Parametric vs. historical estimates

The parametric ES curve runs above the historical
one we need less data to have a solution when
the risk is estimated parametrically than when we
use raw historical data. It seems as if we had
some additional information in the parametric
approach.
Where does this information come from?
It is injected into the calculation by hand
when fitting the data to an independently chosen
probability distribution.

65
Parametric vs. historical estimates

The parametric ES curve runs above the historical
one we need less data to have a solution when
the risk is estimated parametrically than when we
use raw historical data. It seems as if we had
some additional information in the parametric
approach.
Where does this information come from?
It is injected into the calculation by hand
when fitting the data to an independently chosen
probability distribution.

66
Adding linear constraints

In practice, portfolio optimization is always
subject to some constraints on the allowed range
of the weights, such as a ban on short selling
and/or limits on various assets, industrial
sectors, regions, etc. These constraints restrict
the region over which the optimum is sought to a
finite volume where no infinite fluctuations can
appear. One might then think that under such
constraints the instability discussed above
disappears completely.

This is not so. If we work in the vicinity of the
phase boundary, sample to sample fluctuations in
the weights will still be large, but the
constraints will prevent the solution from
running away to infinity. Instead, it will stick
to the walls of the allowed region.
For example, for a ban on short selling (wi gt 0)
these walls will be the coordinate planes, and as
N/T increases, more and more of the weights will
become zero. This phenomenon is well known in
portfolio optimization. (B. Scherer, R. D.
Martin,
Introduction to Modern Portflio Optimization
with NUOPT and S-PLUS, Springer, New York (2005))

This is not so. If we work in the vicinity of the
phase boundary, sample to sample fluctuations in
the weights will still be large, but the
constraints will prevent the solution from
running away to infinity. Instead, it will stick
to the walls of the allowed region.
For example, for a ban on short selling (wi gt 0)
these walls will be the coordinate planes, and as
N/T increases, more and more of the weights will
become zero. This phenomenon is well known in
portfolio optimization. (B. Scherer, R. D.
Martin,
Introduction to Modern Portflio Optimization
with NUOPT and S-PLUS, Springer, New York (2005))

This spontaneous reduction of diversification is
entirely due to estimation error and does not
reflect any real structure of the objective
function.
In addition, for the next sample a completely
different set of weights will become zero the
solution keeps jumping about on the walls of the
allowed region.
Clearly, in this situation the solution reflects
the structure of the limit system (i.e. the
portfolio managers beliefs), rather than the
structure of the market. Therefore, whenever we
are working in or close to the unstable region
(which is almost always), the constraints only
mask rather than cure the instability.

This spontaneous reduction of diversification is
entirely due to estimation error and does not
reflect any real structure of the objective
function.
In addition, for the next sample a completely
different set of weights will become zero the
solution keeps jumping about on the walls of the
allowed region.
Clearly, in this situation the solution reflects
the structure of the limit system, (i.e. the
portfolio managers beliefs), rather than the
structure of the market. Therefore, whenever we
are working in or close to the unstable region
(which is almost always), the constraints only
mask rather than cure the instability.

This spontaneous reduction of diversification is
entirely due to estimation error and does not
reflect any real structure of the objective
function.
In addition, for the next sample a completely
different set of weights will become zero the
solution keeps jumping about on the walls of the
allowed region.
Clearly, in this situation the solution reflects
the structure of the limit system (i.e. the
portfolio managers beliefs), rather than the
structure of the market. Therefore, whenever we
are working in or close to the unstable region
(which is almost always), the constraints only
mask rather than cure the instability.

72
Closing remarks on portfolio selection

Given the nature of the portfolio optimization
task, one will typically work in that region of
parameter space where sample fluctuations are
large. Since the critical point where these
fluctuations diverge depends on the risk measure,
the confidence level, and on the method of
estimation, one must be aware of how close ones
working point is to the critical boundary,
otherwise one will be grossly misled by the
unstable algorithm.

Downside risk measures have been introduced,
because they ignore positive fluctuations that
investors are not supposed to be afraid of.
Perhaps they should be the downside risk
measures display the instability described here
which is basically due to a false arbitrage alert
and may induce an investor to take very large
positions on the basis of fragile information
stemming from finite samples. In a way, the
global disaster engulfing us is a macroscopic
example of such a folly.

74
II. THE PROBLEM OF ESTIMATION ERROR IN MODEL
BUILDING
75
Portfolio optimization is equivalent to Linear
Regression
76

Linear regression is a standard framework in
which to attempt to construct a first statistical
model.
It is ubiquitous (microarrays, medical sciences,
epidemology, sociology, macroeconomics, etc.)
It has a time-honored history and works fine
especially if the independent variables are few,
there are enough data, and they are drawn from a
tight distribution (such as a Gaussian)
Complications arise if we have a large number of
explicatory variables (their number grows at a
rate of 5 per decade), and a limited number of
data (as almost always).
Then we face a serious estimation error problem.

Linear regression is a standard framework in
which to attempt to construct a first statistical
model.
It is ubiquitous (microarrays, medical sciences,
epidemology, sociology, macroeconomics, etc.)
It has a time-honored history and works fine
especially if the independent variables are few,
there are enough data, and they are drawn from a
tight distribution (such as a Gaussian)
Complications arise if we have a large number of
explicatory variables (their number grows at a
rate of 5 per decade), and a limited number of
data (as almost always).
Then we face a serious estimation error problem.

Linear regression is a standard framework in
which to attempt to construct a first statistical
model.
It is ubiquitous (microarrays, medical sciences,
epidemology, sociology, macroeconomics, etc.)
It has a time-honored history and works fine
especially if the independent variables are few,
there are enough data, and they are drawn from a
tight distribution (such as a Gaussian)
Complications arise if we have a large number of
explicatory variables (their number grows at a
rate of 5 per decade), and a limited number of
data (as almost always).
Then we face a serious estimation error problem.

Linear regression is a standard framework in
which to attempt to construct a first statistical
model.
It is ubiquitous (microarrays, medical sciences,
epidemology, sociology, macroeconomics, etc.)
It has a time-honored history and works fine
especially if the independent variables are few,
there are enough data, and they are drawn from a
tight distribution (such as a Gaussian)
Complications arise if we have a large number of
explicatory variables (their number grows at a
rate of 5 per decade), and a limited number of
data (as almost always).
Then we face a serious estimation error problem.

80
Assume we know the underlying process and
minimize the residual error for an infinitely
large sample
81
In practice we can only minimize the residual
error for a sample of length T
82
The relative error

This is a measure of the estimation error.
It is a random variable, depends on the sample
Its distribution strongly depends on the ratio
N/T, where N is the number of dimensions and T
the sample size.
The average of qo diverges at a critical value of
N/T!

83
Critical behaviour for N,T large, with N/Tfixed

The average of qo diverges at the critical point
N/T1, just as in portfolio theory.

The regression coefficients fluctuate wildly
unless N/T 1. Geometric interpretation one
cannot fit a plane to one point.
84
CONCLUDING REMARKS ON MODELING COMPLEX SYSTEMS
85

Normally, one is supposed to work in the NltltT
limit, i.e. with low dimensional problems and
plenty of data.
Complex systems are very high dimensional and
irreducible (incompressible), they require a
large number of explicatory variables for their
faithful representation.
Therefore, we have to face the unconventional
situation in the regression problem that NT, or
even NgtT, and then the error in the regression
coefficients will be large.

Normally, one is supposed to work in the NltltT
limit, i.e. with low dimensional problems and
plenty of data.
Complex systems are very high dimensional and
irreducible (incompressible), they require a
large number of explicatory variables for their
faithful representation.
Therefore, we have to face the unconventional
situation in the regression problem that NT, or
even NgtT, and then the error in the regression
coefficients will be large.

Normally, one is supposed to work in the NltltT
limit, i.e. with low dimensional problems and
plenty of data.
Complex systems are very high dimensional and
irreducible (incompressible), they require a
large number of explicatory variables for their
faithful representation.
Therefore, we have to face the unconventional
situation in the regression problem that NT, or
even NgtT, and then the error in the regression
coefficients will be large.

If the number of explicatory variables is very
large and they are all of the same order of
magnitude, then there is no structure in the
system, it is just noise (like a completely
random string). So we have to assume that some of
the variables have a larger weight than others,
but we do not have a natural cutoff beyond which
it would be safe to forget about the higher order
variables. This leads us to the assumption that
the regression coefficients must have a scale
free, power law like distribution for complex
systems.

How can we understand that, in the social
sciences, medical sciences, etc., we are getting
away with insufficient statistics, even with NgtT?
We are projecting external information into our
statistical assessments. (I can draw a
well-determined straight line across even a
single point, if I know that it must be parallel
to another line.)
Humans do not optimize, but use quick and dirty
heuristics. This has an evolutionary meaning if
something looks vaguely like a leopard, one
jumps, rather than trying to seek the optimal fit
to the observed fragments of the picture to a
leopard.

How can we understand that, in the social
sciences, medical sciences, etc., we are getting
away with insufficient statistics, even with NgtT?
We are projecting external information into our
statistical assessments. (I can draw a
well-determined straight line across even a
single point, if I know that it must be parallel
to another line.)
Humans do not optimize, but use quick and dirty
heuristics. This has an evolutionary meaning if
something looks vaguely like a leopard, one
jumps, rather than trying to seek the optimal fit
to the observed fragments of the picture to a
leopard.

How can we understand that, in the social
sciences, medical sciences, etc., we are getting
away with insufficient statistics, even with NgtT?
We are projecting external information into our
statistical assessments. (I can draw a
well-determined straight line across even a
single point, if I know that it must be parallel
to another line.)
Humans do not optimize, but use quick and dirty
heuristics. This has an evolutionary meaning if
something looks vaguely like a leopard, one
jumps, rather than trying to seek the optimal fit
to the observed fragments of the picture to a
leopard.

Prior knowledge, the larger picture, values,
deliberate or unconscious bias, etc. are
essential features of model building.
When we have a chance to check this prior
knowledge millions of times in carefully designed
laboratory experiments, this is a well-justified
procedure.
In several applications (macroeconomics, medical
sciences, epidemology, etc.) there is no way to
perform these laboratory checks, and errors may
build up as one uncertain piece of knowledge
serves as a prior for another uncertain
statistical model. This is how we construct
myths, ideologies and social theories.

Prior knowledge, the larger picture, values,
deliberate or unconscious bias, etc. are
essential features of model building.
When we have a chance to check this prior
knowledge millions of times in carefully designed
laboratory experiments, this is a well-justified
procedure.
In several applications (macroeconomics, medical
sciences, epidemology, etc.) there is no way to
perform these laboratory checks, and errors may
build up as one uncertain piece of knowledge
serves as a prior for another uncertain
statistical model. This is how we construct
myths, ideologies and social theories.

Prior knowledge, the larger picture, values,
deliberate or unconscious bias, etc. are
essential features of model building.
When we have a chance to check this prior
knowledge millions of times in carefully designed
laboratory experiments, this is a well-justified
procedure.
In several applications (macroeconomics, medical
sciences, epidemology, etc.) there is no way to
perform these laboratory checks, and errors may
build up as one uncertain piece of knowledge
serves as a prior for another uncertain
statistical model. This is how we construct
myths, ideologies and social theories.

It is conceivable that theory building (in the
sense of constructing a low dimensional model)
for social phenomena will prove to be impossible,
and the best we will be able to do is to build a
life-size computer model of the system, a kind of
gigantic Simcity, or Borges map.
By playing and experimenting with these models we
may develop an intuition about its complex
behaviour that we couldnt gain by observing the
single sample of a society or economy.