Portfolio Selection, Multivariate Regression, and Complex Systems

About This Presentation

Title:

Portfolio Selection, Multivariate Regression, and Complex Systems

Description:

Portfolio Selection, Multivariate Regression, and Complex Systems Imre Kondor Collegium Budapest and E tv s University, Budapest IUPAP STATPHYS23 Conference – PowerPoint PPT presentation

Number of Views:175

Avg rating:3.0/5.0

Slides: 63

Provided by: colbudHu

Category:

more less

Transcript and Presenter's Notes

Title: Portfolio Selection, Multivariate Regression, and Complex Systems

1
Portfolio Selection, Multivariate Regression,
and Complex Systems

Imre Kondor
Collegium Budapest and Eötvös University,
Budapest
IUPAP STATPHYS23 Conference
Genova, Italy, July 9-13, 2007

2
Coworkers

Szilárd Pafka (Paycom.net, California)
Gábor Nagy (CIB Bank, Budapest)
Nándor Gulyás (Collegium Budapest)
István Varga-Haszonits (Morgan-Stanley Fixed
Income, Budapest)
Andrea Ciliberti (Science et Finance, Paris)
Marc Mézard (Orsay University)
Stefan Thurner (Vienna University

3
Summary

The subject of the talk lies at the crossroads of
finance, statistical physics, and statistics
The main message
- portfolio selection is highly unstable the
estimation error diverges for a critical value of
the ratio of the portfolio size N and the length
of the time series T,
- this divergence is an algorithmic phase
transition that is characterized by universal
scaling laws,
- multivariate regression is equivalent to
quadratic optimization, so concepts, methods, and
results can be taken over to the regression
problem,
- when applied to complex phenomena, the
classical problems with regression (hidden
variables, correlations, non-Gaussian noise) are
supplemented by the high number of the
explicatory variables and the scarcity of data,
- so modelling is often attempted in the vicinity
of, or even below, the critical point.

4
Rational portfolio selection seeks a tradeoff
between risk and reward

In this talk I will focus on equity portfolios
Financial reward can be measured in terms of the
return (relative gain)
or logarithmic return
The characterization of risk is more controversial

5
The most obvious choice for a risk measure
Variance

Its use for a risk measure assumes that the
probability distribution of returns is
sufficiently concentrated around the average,
that there are no large fluctuations
This is true in several instances, but we often
encounter fat tails, huge deviations with a
non-negligible probability which necessitates the
use of alternative risk measures

6
The most obvious choice for a risk measure
Variance

Its use for a risk measure assumes that the
probability distribution of returns is
sufficiently concentrated around the average,
that there are no large fluctuations
This is true in several instances, but we often
encounter fat tails, huge deviations with a
non-negligible probability which necessitates the
use of alternative risk measures.

7
Portfolios

A portfolio is a linear combination (a weighted
average) of assets
with a set of weights wi that add up to unity
(the budget constraint)
The weights are not necessarily positive short
selling
The fact that the weights can be arbitrary means
that the region over which we are trying to
determine the optimal portfolio is not bounded

8
Portfolios

A portfolio is a linear combination (a weighted
average) of assets
with a set of weights wi that add up to unity
(the budget constraint)
The weights are not necessarily positive short
selling
The fact that the weights can be arbitrary means
that the region over which we are trying to
determine the optimal portfolio is not bounded

9
Portfolios

A portfolio is a linear combination (a weighted
average) of assets
with a set of weights wi that add up to unity
(the budget constraint)
The weights are not necessarily positive short
selling
The fact that the weights can be arbitrary means
that the region over which we are trying to
determine the optimal portfolio is not bounded

10
Markowitz portfolio selection theory

The tradeoff between risk and reward is realized
by minimizing the variance
over the weights, given the expected return,
the budget constraint, and possibly other
costraints.

11
How do we know the returns and the covariances?

In principle, from observations on the market
If the portfolio contains N assets, we need O(N²)
data
The input data come from T observations for N
assets
The estimation error is negligible as long as
NTgtgtN², i.e. NltltT
This condition is often violated in practice

12
How do we know the returns and the covariances?

In principle, from observations on the market
If the portfolio contains N assets, we need O(N²)
data
The input data come from T observations for N
assets
The estimation error is negligible as long as
NTgtgtN², i.e. NltltT
This condition is often violated in practice

13
How do we know the returns and the covariances?

In principle, from observations on the market
If the portfolio contains N assets, we need O(N²)
data
The input data come from T observations for N
assets
The estimation error is negligible as long as
NTgtgtN², i.e. NltltT
This condition is often violated in practice

14
How do we know the returns and the covariances?

In principle, from observations on the market
If the portfolio contains N assets, we need O(N²)
data
The input data come from T observations for N
assets
The estimation error is negligible as long as
NTgtgtN², i.e. NltltT
This condition is often violated in practice

15
How do we know the returns and the covariances?

In principle, from observations on the market
If the portfolio contains N assets, we need O(N²)
data
The input data come from T observations for N
assets
The estimation error is negligible as long as
NTgtgtN², i.e. NltltT
This condition is often violated in practice

16
Information deficit

Thus the Markowitz problem suffers from the
curse of dimensions, or from information
deficit
The estimates will contain error and the
resulting portfolios will be suboptimal

17
Information deficit

Thus the Markowitz problem suffers from the
curse of dimensions, or from information
deficit
The estimates will contain error and the
resulting portfolios will be suboptimal

18
Fighting the curse of dimensions

Economists have been struggling with this problem
for ages. Since the root of the problem is lack
of sufficient information, the remedy is to
inject external info into the estimate. This
means imposing some structure on s. This
introduces bias, but beneficial effect of noise
reduction may compensate for this.
Examples
single-factor models (ßs) All these
help to
multi-factor models various degrees.
grouping by sectors Most studies are
based
principal component analysis on
empirical data
Bayesian shrinkage estimators, etc.
Random matrix theory

19
Our approach

Analytical Applying the methods of statistical
physics (random matrix theory, phase transition
theory, replicas, etc.)
Numerical To test the noise sensitivity of
various risk measures we use simulated data
The rationale is that in order to be able to
compare the sensitivity of various risk measures
to noise, we better get rid of other sources of
uncertainty, like non-stationarity. This can be
achieved by using artificial data where we have
total control over the underlying stochastic
process.
For simplicity, we mostly use iid normal
variables in the following.

20
Our approach

Analytical Applying the methods of statistical
physics (random matrix theory, phase transition
theory, replicas, etc.)
Numerical To test the noise sensitivity of
various risk measures we use simulated data
The rationale is that in order to be able to
compare the sensitivity of various risk measures
to noise, we better get rid of other sources of
uncertainty, like non-stationarity. This can be
achieved by using artificial data where we have
total control over the underlying stochastic
process.
For simplicity, we mostly use iid normal
variables in the following.

21
Our approach

Analytical Applying the methods of statistical
physics (random matrix theory, phase transition
theory, replicas, etc.)
Numerical To test the noise sensitivity of
various risk measures we use simulated data
The rationale is that in order to be able to
compare the sensitivity of various risk measures
to noise, we better get rid of other sources of
uncertainty, like non-stationarity. This can be
achieved by using artificial data where we have
total control over the underlying stochastic
process.
For simplicity, we mostly use iid normal
variables in the following.

22
Our approach

Analytical Applying the methods of statistical
physics (random matrix theory, phase transition
theory, replicas, etc.)
Numerical To test the noise sensitivity of
various risk measures we use simulated data
The rationale is that in order to be able to
compare the sensitivity of various risk measures
to noise, we better get rid of other sources of
uncertainty, like non-stationarity. This can be
achieved by using artificial data where we have
total control over the underlying stochastic
process.
For simplicity, we mostly use iid normal
variables in the following.

For such simple underlying processes the exact
risk measure can be calculated.
To construct the empirical risk measure
we generate long time series, and cut out
segments of length T from them, as if making
observations on the market.
From these observations we construct the
empirical risk measure and optimize our portfolio
under it.

For such simple underlying processes the exact
risk measure can be calculated.
To construct the empirical risk measure
we generate long time series, and cut out
segments of length T from them, as if making
observations on the market.
From these observations we construct the
empirical risk measure and optimize our portfolio
under it.

For such simple underlying processes the exact
risk measure can be calculated.
To construct the empirical risk measure
we generate long time series, and cut out
segments of length T from them, as if making
observations on the market.
From these observations we construct the
empirical risk measure and optimize our portfolio
under it.

26
The ratio qo of the empirical and the exact risk
measure is a measure of the estimation error due
to noise
27

The relative error of the optimal portfolio
is a random variable, fluctuating from sample to
sample.
The weights of the optimal portfolio also
fluctuate.

28
The distribution of qo over the samples
29
Critical behaviour for N,T large, with N/Tfixed

The average of qo as a function of N/T can be
calculated from random matrix theory it diverges
at the critical point N/T1

30
The standard deviation of the estimation error
diverges even more strongly than the average

,
where r N/T

31
Instability of the weigthsThe weights of a
portfolio of N100 iid normal variables for a
given sample, T500
32
The distribution of weights in a given sample

The optimization hardly determines the weights
even far from the critical point!
The standard deviation of the weights relative to
their exact average value also diverges at the
critical point

33
If short selling is banned

If the weights are constrained to be positive,
the instability will manifest itself by more and
more weights becoming zero the portfolio
spontaneously reduces its size!
Explanation the solution would like to run away,
the constraints prevent it from doing so,
therefore it will stick to the walls.
Similar effects are observed if we impose any
other linear constraints, like limits on sectors,
etc.
It is clear, that in these cases the solution is
determined more by the constraints (and the
experts who impose them) than the objective
function.

34
If short selling is banned

If the weights are constrained to be positive,
the instability will manifest itself by more and
more weights becoming zero the portfolio
spontaneously reduces its size!
Explanation the solution would like to run away,
the constraints prevent it from doing so,
therefore it will stick to the walls.
Similar effects are observed if we impose any
other linear constraints, like limits on sectors,
etc.
It is clear, that in these cases the solution is
determined more by the constraints (and the
experts who impose them) than the objective
function.

35
If short selling is banned

If the weights are constrained to be positive,
the instability will manifest itself by more and
more weights becoming zero the portfolio
spontaneously reduces its size!
Explanation the solution would like to run away,
the constraints prevent it from doing so,
therefore it will stick to the walls.
Similar effects are observed if we impose any
other linear constraints, like limits on sectors,
etc.
It is clear, that in these cases the solution is
determined more by the constraints (and the
experts who impose them) than the objective
function.

36
If short selling is banned

If the weights are constrained to be positive,
the instability will manifest itself by more and
more weights becoming zero the portfolio
spontaneously reduces its size!
Explanation the solution would like to run away,
the constraints prevent it from doing so,
therefore it will stick to the walls.
Similar effects are observed if we impose any
other linear constraints, like limits on sectors,
etc.
It is clear, that in these cases the solution is
determined more by the constraints (and the
experts who impose them) than the objective
function.

37
If the variables are not iid

Experimenting with various market models
(one-factor, market plus sectors, positive and
negative covariances, etc.) shows that the main
conclusion does not change a manifestation of
universality
Overwhelmingly positive correlations tend to
enhance the instability, negative ones decrease
it, but they do not change the power of the
divergence, only its prefactor

38
If the variables are not iid

Experimenting with various market models
(one-factor, market plus sectors, positive and
negative covariances, etc.) shows that the main
conclusion does not change a manifestation of
universality.
Overwhelmingly positive correlations tend to
enhance the instability, negative ones decrease
it, but they do not change the power of the
divergence, only its prefactor

39
After filtering the noise is much reduced, and we
can even penetrate into the region below the
critical point TltN . BUT the weights remain
extremely unstable even after filtering
ButButBUT
40
Similar studies under alternative risk measures
mean absolute deviation, expected shortfall and
maximal loss

Lead to similar conclusions, except that the
effect of estimation error is even more serious
In addition, no convincing filtering methods
exist for these measures
In the case of coherent measures the existence of
a solution becomes a probabilistic issue,
depending on the sample
Calculation of this probability leads to some
intriguing problems in random geometry that can
be solved by the replica method.

41
A wider context

The critical phenomena we observe in portfolio
selection are analogous to the phase transitions
discovered recently in some hard computational
problems, they represent a new random Gaussian
universality class within this family, where a
number of modes go soft in rapid succession, as
one approaches the critical point.
Filtering corresponds to discarding these soft
modes.

42
A wider context

The critical phenomena we observe in portfolio
selection are analogous to the phase transitions
discovered recently in some hard computational
problems, they represent a new random Gaussian
universality class within this family, where a
number of modes go soft in rapid succession, as
one approaches the critical point.
Filtering corresponds to discarding these soft
modes.

The appearence of powerful tools borrowed from
statistical physics (random matrices, phase
transition concepts, scaling, universality,
replicas) is an important development that
enriches finance theory

44
More generally

The sampling error catastrophe, due to lack of
sufficient information, appears in a much wider
set of problems than just the problem of
investment decisions (multivariate regression,
stochastic linear progamming and all their
applications.)
Whenever a phenomenon is influenced by a large
number of factors, but we have a limited amount
of information about this dependence, we have to
expect that the estimation error will diverge and
fluctuations over the samples will be huge.

45
Optimization and statistical mechanics

Any convex optimization problem can be
transformed into a problem in statistical
mechanics, by promoting the cost (objective,
target) function into a Hamiltonian, and
introducing a fictitious temperature. At the end
we can recover the original problem in the limit
of zero temperature.
Averaging over the time series segments (samples)
is similar to what is called quenched averaging
in the statistical physics of random systems one
has to average the logarithm of the partition
function (i.e. the cumulant generating function).
Averaging can then be performed by the replica
trick

46
Portfolio optimization and linear regression

Portfolios

47
Linear regression

48
Equivalence of the two
49
Translation
50
Minimizing the residual error for an infinitely
large sample
51
Minimizing the residual error for a sample of
length T
52
The relative error
53
Summary

If we do not have sufficient information we
cannot make an intelligent decision, nor can we
build a good model so far this is a triviality
The important message here is that there is a
critical point in both the optimization problem
and in the regression problem where the error
diverges, and its behaviour is subject to
universal scaling laws

54
A few remarks on modeling complex systems
55

Normally, one is supposed to work in the NltltT
limit, i.e. with low dimensional problems and
plenty of data.
Modern portfolio management (e.g. in hedge funds)
forces us to consider very large portfolios, but
the amount of input information is always
limited. So we have N T, or even NgtT.
Complex systems are very high dimensional and
irreducible (incompressible), they require a
large number of explicatory variables for their
faithful representation.
The dimensionality of the minimal model providing
an acceptable representation of a system can be
regarded as a measure of the complexity of the
system. (Cf. Kolmogorov Chaitin measure of the
complexity of a string. Also Jorge Luis Borges
map.)

Therefore, we have to face the unconventional
situation also in the regression problem that
NT, or NgtT, and then the error in the regression
coefficients will be large.
If the number of explicatory variables is very
large and they are all of the same order of
magnitude, then there is no structure in the
system, it is just noise (like a completely
random string). So we have to assume that some of
the variables have a larger weight than others,
but we do not have a natural cutoff beyond which
it would be safe to forget about the higher order
variables. This leads us to the assumption that
the regression coefficients must have a scale
free, power law like distribution for complex
systems.

The regression coefficients are proportional to
the covariances of the dependent and independent
variables. A power law like distribution of the
regression coefficients implies the same for the
covariances.
In a physical system this translates into the
power law like distribution of the correlations.
The usual behaviour of correlations in simple
systems is not like this correlations fall off
typically exponentially.

Exceptions systems at a critical point, or
systems with a broken continuous symmetry. Both
these are very special cases, however.
Correlations in a spin glass decay like a power,
without any continuous symmetry!
The power law like behaviour of correlations is a
typical behaviour in the spin glass phase, not
only on average, but for each sample.
A related phenomenon is what is called chaos in
spin glasses.
The long range correlations and the multiplicity
of ground states explain the extreme sensitivity
of the ground states the system reacts to any
slight external disturbance, but the statistical
properties of the new ground state are the same
as before this is a kind of adaptation or
learning process.

Other complex systems? Adaptation, learning,
evolution, self-reflexivity cannot be expected to
appear in systems with a translationally
invariant and all-ferromagnetic coupling. Some of
the characteristic features of spin glasses
(competition and cooperation, the existence of
many metastable equilibria, sensitivity, long
range correlations) seem to be necessary minimal
properties of any complex system.
This also means that we will always face the
information deficite catastrophe when we try to
build a model for a complex system.

How can we understand that people (in the social
sciences, medical sciences, etc.) are getting
away with lousy statistics, even with NgtT?
They are projecting external information into
their statistical assessments. (I can draw a
well-determined straight line across even a
single point, if I know that it must be parallel
to another line.)
Humans do not optimize, but use quick and dirty
heuristics. This has an evolutionary meaning if
something looks vaguely like a leopard, one
jumps, rather than trying to seek the optimal fit
to the observed fragments of the picture to a
leopard.

Prior knowledge, the larger picture, deliberate
or unconscious bias, etc. are essential features
of model building.
When we have a chance to check this prior
knowledge millions of times in carefully designed
laboratory experiments, this is a well-justified
procedure.
In several applications (macroeconomics, medical
sciences, epidemology, etc.) there is no way to
perform these laboratory checks, and errors may
build up as one uncertain piece of knowledge
serves as a prior for another uncertain
statistical model. This is how we construct
myths, ideologies and social theories.

It is conceivable that theory building, in the
sense of constructing a low dimensional model,
for social phenomena will prove to be impossible,
and the best we will be able to do is to build a
life-size computer model of the system, a kind of
gigantic Simcity.
It remains to be seen what we will mean by
understanding under those circumstances.

Write a Comment

User Comments (0)