Portfolio Selection, Multivariate Regression, and Complex Systems - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Portfolio Selection, Multivariate Regression, and Complex Systems

Description:

Portfolio Selection, Multivariate Regression, and Complex Systems Imre Kondor Collegium Budapest and E tv s University, Budapest IUPAP STATPHYS23 Conference – PowerPoint PPT presentation

Number of Views:175
Avg rating:3.0/5.0
Slides: 63
Provided by: colbudHu
Category:

less

Transcript and Presenter's Notes

Title: Portfolio Selection, Multivariate Regression, and Complex Systems


1
Portfolio Selection, Multivariate Regression,
and Complex Systems
  • Imre Kondor
  • Collegium Budapest and Eötvös University,
    Budapest
  • IUPAP STATPHYS23 Conference
  • Genova, Italy, July 9-13, 2007

2
Coworkers
  • Szilárd Pafka (Paycom.net, California)
  • Gábor Nagy (CIB Bank, Budapest)
  • Nándor Gulyás (Collegium Budapest)
  • István Varga-Haszonits (Morgan-Stanley Fixed
    Income, Budapest)
  • Andrea Ciliberti (Science et Finance, Paris)
  • Marc Mézard (Orsay University)
  • Stefan Thurner (Vienna University

3
Summary
  • The subject of the talk lies at the crossroads of
    finance, statistical physics, and statistics
  • The main message
  • - portfolio selection is highly unstable the
    estimation error diverges for a critical value of
    the ratio of the portfolio size N and the length
    of the time series T,
  • - this divergence is an algorithmic phase
    transition that is characterized by universal
    scaling laws,
  • - multivariate regression is equivalent to
    quadratic optimization, so concepts, methods, and
    results can be taken over to the regression
    problem,
  • - when applied to complex phenomena, the
    classical problems with regression (hidden
    variables, correlations, non-Gaussian noise) are
    supplemented by the high number of the
    explicatory variables and the scarcity of data,
  • - so modelling is often attempted in the vicinity
    of, or even below, the critical point.

4
Rational portfolio selection seeks a tradeoff
between risk and reward
  • In this talk I will focus on equity portfolios
  • Financial reward can be measured in terms of the
    return (relative gain)
  • or logarithmic return
  • The characterization of risk is more controversial

5
The most obvious choice for a risk measure
Variance
  • Its use for a risk measure assumes that the
    probability distribution of returns is
    sufficiently concentrated around the average,
    that there are no large fluctuations
  • This is true in several instances, but we often
    encounter fat tails, huge deviations with a
    non-negligible probability which necessitates the
    use of alternative risk measures

6
The most obvious choice for a risk measure
Variance
  • Its use for a risk measure assumes that the
    probability distribution of returns is
    sufficiently concentrated around the average,
    that there are no large fluctuations
  • This is true in several instances, but we often
    encounter fat tails, huge deviations with a
    non-negligible probability which necessitates the
    use of alternative risk measures.

7
Portfolios
  • A portfolio is a linear combination (a weighted
    average) of assets
  • with a set of weights wi that add up to unity
    (the budget constraint)
  • The weights are not necessarily positive short
    selling
  • The fact that the weights can be arbitrary means
    that the region over which we are trying to
    determine the optimal portfolio is not bounded

8
Portfolios
  • A portfolio is a linear combination (a weighted
    average) of assets
  • with a set of weights wi that add up to unity
    (the budget constraint)
  • The weights are not necessarily positive short
    selling
  • The fact that the weights can be arbitrary means
    that the region over which we are trying to
    determine the optimal portfolio is not bounded

9
Portfolios
  • A portfolio is a linear combination (a weighted
    average) of assets
  • with a set of weights wi that add up to unity
    (the budget constraint)
  • The weights are not necessarily positive short
    selling
  • The fact that the weights can be arbitrary means
    that the region over which we are trying to
    determine the optimal portfolio is not bounded

10
Markowitz portfolio selection theory
  • The tradeoff between risk and reward is realized
    by minimizing the variance
  • over the weights, given the expected return,
    the budget constraint, and possibly other
    costraints.

11
How do we know the returns and the covariances?
  • In principle, from observations on the market
  • If the portfolio contains N assets, we need O(N²)
    data
  • The input data come from T observations for N
    assets
  • The estimation error is negligible as long as
    NTgtgtN², i.e. NltltT
  • This condition is often violated in practice

12
How do we know the returns and the covariances?
  • In principle, from observations on the market
  • If the portfolio contains N assets, we need O(N²)
    data
  • The input data come from T observations for N
    assets
  • The estimation error is negligible as long as
    NTgtgtN², i.e. NltltT
  • This condition is often violated in practice

13
How do we know the returns and the covariances?
  • In principle, from observations on the market
  • If the portfolio contains N assets, we need O(N²)
    data
  • The input data come from T observations for N
    assets
  • The estimation error is negligible as long as
    NTgtgtN², i.e. NltltT
  • This condition is often violated in practice

14
How do we know the returns and the covariances?
  • In principle, from observations on the market
  • If the portfolio contains N assets, we need O(N²)
    data
  • The input data come from T observations for N
    assets
  • The estimation error is negligible as long as
    NTgtgtN², i.e. NltltT
  • This condition is often violated in practice

15
How do we know the returns and the covariances?
  • In principle, from observations on the market
  • If the portfolio contains N assets, we need O(N²)
    data
  • The input data come from T observations for N
    assets
  • The estimation error is negligible as long as
    NTgtgtN², i.e. NltltT
  • This condition is often violated in practice

16
Information deficit
  • Thus the Markowitz problem suffers from the
    curse of dimensions, or from information
    deficit
  • The estimates will contain error and the
    resulting portfolios will be suboptimal

17
Information deficit
  • Thus the Markowitz problem suffers from the
    curse of dimensions, or from information
    deficit
  • The estimates will contain error and the
    resulting portfolios will be suboptimal

18
Fighting the curse of dimensions
  • Economists have been struggling with this problem
    for ages. Since the root of the problem is lack
    of sufficient information, the remedy is to
    inject external info into the estimate. This
    means imposing some structure on s. This
    introduces bias, but beneficial effect of noise
    reduction may compensate for this.
  • Examples
  • single-factor models (ßs) All these
    help to
  • multi-factor models various degrees.
  • grouping by sectors Most studies are
    based
  • principal component analysis on
    empirical data
  • Bayesian shrinkage estimators, etc.
  • Random matrix theory

19
Our approach
  • Analytical Applying the methods of statistical
    physics (random matrix theory, phase transition
    theory, replicas, etc.)
  • Numerical To test the noise sensitivity of
    various risk measures we use simulated data
  • The rationale is that in order to be able to
    compare the sensitivity of various risk measures
    to noise, we better get rid of other sources of
    uncertainty, like non-stationarity. This can be
    achieved by using artificial data where we have
    total control over the underlying stochastic
    process.
  • For simplicity, we mostly use iid normal
    variables in the following.

20
Our approach
  • Analytical Applying the methods of statistical
    physics (random matrix theory, phase transition
    theory, replicas, etc.)
  • Numerical To test the noise sensitivity of
    various risk measures we use simulated data
  • The rationale is that in order to be able to
    compare the sensitivity of various risk measures
    to noise, we better get rid of other sources of
    uncertainty, like non-stationarity. This can be
    achieved by using artificial data where we have
    total control over the underlying stochastic
    process.
  • For simplicity, we mostly use iid normal
    variables in the following.

21
Our approach
  • Analytical Applying the methods of statistical
    physics (random matrix theory, phase transition
    theory, replicas, etc.)
  • Numerical To test the noise sensitivity of
    various risk measures we use simulated data
  • The rationale is that in order to be able to
    compare the sensitivity of various risk measures
    to noise, we better get rid of other sources of
    uncertainty, like non-stationarity. This can be
    achieved by using artificial data where we have
    total control over the underlying stochastic
    process.
  • For simplicity, we mostly use iid normal
    variables in the following.

22
Our approach
  • Analytical Applying the methods of statistical
    physics (random matrix theory, phase transition
    theory, replicas, etc.)
  • Numerical To test the noise sensitivity of
    various risk measures we use simulated data
  • The rationale is that in order to be able to
    compare the sensitivity of various risk measures
    to noise, we better get rid of other sources of
    uncertainty, like non-stationarity. This can be
    achieved by using artificial data where we have
    total control over the underlying stochastic
    process.
  • For simplicity, we mostly use iid normal
    variables in the following.

23
  • For such simple underlying processes the exact
    risk measure can be calculated.
  • To construct the empirical risk measure
  • we generate long time series, and cut out
    segments of length T from them, as if making
    observations on the market.
  • From these observations we construct the
    empirical risk measure and optimize our portfolio
    under it.

24
  • For such simple underlying processes the exact
    risk measure can be calculated.
  • To construct the empirical risk measure
  • we generate long time series, and cut out
    segments of length T from them, as if making
    observations on the market.
  • From these observations we construct the
    empirical risk measure and optimize our portfolio
    under it.

25
  • For such simple underlying processes the exact
    risk measure can be calculated.
  • To construct the empirical risk measure
  • we generate long time series, and cut out
    segments of length T from them, as if making
    observations on the market.
  • From these observations we construct the
    empirical risk measure and optimize our portfolio
    under it.

26
The ratio qo of the empirical and the exact risk
measure is a measure of the estimation error due
to noise
27
  • The relative error of the optimal portfolio
    is a random variable, fluctuating from sample to
    sample.
  • The weights of the optimal portfolio also
    fluctuate.

28
The distribution of qo over the samples
29
Critical behaviour for N,T large, with N/Tfixed
  • The average of qo as a function of N/T can be
    calculated from random matrix theory it diverges
    at the critical point N/T1

30
The standard deviation of the estimation error
diverges even more strongly than the average
  • ,
    where r N/T

31
Instability of the weigthsThe weights of a
portfolio of N100 iid normal variables for a
given sample, T500
32
The distribution of weights in a given sample
  • The optimization hardly determines the weights
    even far from the critical point!
  • The standard deviation of the weights relative to
    their exact average value also diverges at the
    critical point

33
If short selling is banned
  • If the weights are constrained to be positive,
    the instability will manifest itself by more and
    more weights becoming zero the portfolio
    spontaneously reduces its size!
  • Explanation the solution would like to run away,
    the constraints prevent it from doing so,
    therefore it will stick to the walls.
  • Similar effects are observed if we impose any
    other linear constraints, like limits on sectors,
    etc.
  • It is clear, that in these cases the solution is
    determined more by the constraints (and the
    experts who impose them) than the objective
    function.

34
If short selling is banned
  • If the weights are constrained to be positive,
    the instability will manifest itself by more and
    more weights becoming zero the portfolio
    spontaneously reduces its size!
  • Explanation the solution would like to run away,
    the constraints prevent it from doing so,
    therefore it will stick to the walls.
  • Similar effects are observed if we impose any
    other linear constraints, like limits on sectors,
    etc.
  • It is clear, that in these cases the solution is
    determined more by the constraints (and the
    experts who impose them) than the objective
    function.

35
If short selling is banned
  • If the weights are constrained to be positive,
    the instability will manifest itself by more and
    more weights becoming zero the portfolio
    spontaneously reduces its size!
  • Explanation the solution would like to run away,
    the constraints prevent it from doing so,
    therefore it will stick to the walls.
  • Similar effects are observed if we impose any
    other linear constraints, like limits on sectors,
    etc.
  • It is clear, that in these cases the solution is
    determined more by the constraints (and the
    experts who impose them) than the objective
    function.

36
If short selling is banned
  • If the weights are constrained to be positive,
    the instability will manifest itself by more and
    more weights becoming zero the portfolio
    spontaneously reduces its size!
  • Explanation the solution would like to run away,
    the constraints prevent it from doing so,
    therefore it will stick to the walls.
  • Similar effects are observed if we impose any
    other linear constraints, like limits on sectors,
    etc.
  • It is clear, that in these cases the solution is
    determined more by the constraints (and the
    experts who impose them) than the objective
    function.

37
If the variables are not iid
  • Experimenting with various market models
    (one-factor, market plus sectors, positive and
    negative covariances, etc.) shows that the main
    conclusion does not change a manifestation of
    universality
  • Overwhelmingly positive correlations tend to
    enhance the instability, negative ones decrease
    it, but they do not change the power of the
    divergence, only its prefactor

38
If the variables are not iid
  • Experimenting with various market models
    (one-factor, market plus sectors, positive and
    negative covariances, etc.) shows that the main
    conclusion does not change a manifestation of
    universality.
  • Overwhelmingly positive correlations tend to
    enhance the instability, negative ones decrease
    it, but they do not change the power of the
    divergence, only its prefactor

39
After filtering the noise is much reduced, and we
can even penetrate into the region below the
critical point TltN . BUT the weights remain
extremely unstable even after filtering
ButButBUT
40
Similar studies under alternative risk measures
mean absolute deviation, expected shortfall and
maximal loss
  • Lead to similar conclusions, except that the
    effect of estimation error is even more serious
  • In addition, no convincing filtering methods
    exist for these measures
  • In the case of coherent measures the existence of
    a solution becomes a probabilistic issue,
    depending on the sample
  • Calculation of this probability leads to some
    intriguing problems in random geometry that can
    be solved by the replica method.

41
A wider context
  • The critical phenomena we observe in portfolio
    selection are analogous to the phase transitions
    discovered recently in some hard computational
    problems, they represent a new random Gaussian
    universality class within this family, where a
    number of modes go soft in rapid succession, as
    one approaches the critical point.
  • Filtering corresponds to discarding these soft
    modes.

42
A wider context
  • The critical phenomena we observe in portfolio
    selection are analogous to the phase transitions
    discovered recently in some hard computational
    problems, they represent a new random Gaussian
    universality class within this family, where a
    number of modes go soft in rapid succession, as
    one approaches the critical point.
  • Filtering corresponds to discarding these soft
    modes.

43
  • The appearence of powerful tools borrowed from
    statistical physics (random matrices, phase
    transition concepts, scaling, universality,
    replicas) is an important development that
    enriches finance theory

44
More generally
  • The sampling error catastrophe, due to lack of
    sufficient information, appears in a much wider
    set of problems than just the problem of
    investment decisions (multivariate regression,
    stochastic linear progamming and all their
    applications.)
  • Whenever a phenomenon is influenced by a large
    number of factors, but we have a limited amount
    of information about this dependence, we have to
    expect that the estimation error will diverge and
    fluctuations over the samples will be huge.

45
Optimization and statistical mechanics
  • Any convex optimization problem can be
    transformed into a problem in statistical
    mechanics, by promoting the cost (objective,
    target) function into a Hamiltonian, and
    introducing a fictitious temperature. At the end
    we can recover the original problem in the limit
    of zero temperature.
  • Averaging over the time series segments (samples)
    is similar to what is called quenched averaging
    in the statistical physics of random systems one
    has to average the logarithm of the partition
    function (i.e. the cumulant generating function).
  • Averaging can then be performed by the replica
    trick

46
Portfolio optimization and linear regression
  • Portfolios

47
Linear regression
  • .

48
Equivalence of the two
49
Translation
50
Minimizing the residual error for an infinitely
large sample
51
Minimizing the residual error for a sample of
length T
52
The relative error
53
Summary
  • If we do not have sufficient information we
    cannot make an intelligent decision, nor can we
    build a good model so far this is a triviality
  • The important message here is that there is a
    critical point in both the optimization problem
    and in the regression problem where the error
    diverges, and its behaviour is subject to
    universal scaling laws

54
A few remarks on modeling complex systems
55
  • Normally, one is supposed to work in the NltltT
    limit, i.e. with low dimensional problems and
    plenty of data.
  • Modern portfolio management (e.g. in hedge funds)
    forces us to consider very large portfolios, but
    the amount of input information is always
    limited. So we have N T, or even NgtT.
  • Complex systems are very high dimensional and
    irreducible (incompressible), they require a
    large number of explicatory variables for their
    faithful representation.
  • The dimensionality of the minimal model providing
    an acceptable representation of a system can be
    regarded as a measure of the complexity of the
    system. (Cf. Kolmogorov Chaitin measure of the
    complexity of a string. Also Jorge Luis Borges
    map.)

56
  • Therefore, we have to face the unconventional
    situation also in the regression problem that
    NT, or NgtT, and then the error in the regression
    coefficients will be large.
  • If the number of explicatory variables is very
    large and they are all of the same order of
    magnitude, then there is no structure in the
    system, it is just noise (like a completely
    random string). So we have to assume that some of
    the variables have a larger weight than others,
    but we do not have a natural cutoff beyond which
    it would be safe to forget about the higher order
    variables. This leads us to the assumption that
    the regression coefficients must have a scale
    free, power law like distribution for complex
    systems.

57
  • The regression coefficients are proportional to
    the covariances of the dependent and independent
    variables. A power law like distribution of the
    regression coefficients implies the same for the
    covariances.
  • In a physical system this translates into the
    power law like distribution of the correlations.
  • The usual behaviour of correlations in simple
    systems is not like this correlations fall off
    typically exponentially.

58
  • Exceptions systems at a critical point, or
    systems with a broken continuous symmetry. Both
    these are very special cases, however.
  • Correlations in a spin glass decay like a power,
    without any continuous symmetry!
  • The power law like behaviour of correlations is a
    typical behaviour in the spin glass phase, not
    only on average, but for each sample.
  • A related phenomenon is what is called chaos in
    spin glasses.
  • The long range correlations and the multiplicity
    of ground states explain the extreme sensitivity
    of the ground states the system reacts to any
    slight external disturbance, but the statistical
    properties of the new ground state are the same
    as before this is a kind of adaptation or
    learning process.

59
  • Other complex systems? Adaptation, learning,
    evolution, self-reflexivity cannot be expected to
    appear in systems with a translationally
    invariant and all-ferromagnetic coupling. Some of
    the characteristic features of spin glasses
    (competition and cooperation, the existence of
    many metastable equilibria, sensitivity, long
    range correlations) seem to be necessary minimal
    properties of any complex system.
  • This also means that we will always face the
    information deficite catastrophe when we try to
    build a model for a complex system.

60
  • How can we understand that people (in the social
    sciences, medical sciences, etc.) are getting
    away with lousy statistics, even with NgtT?
  • They are projecting external information into
    their statistical assessments. (I can draw a
    well-determined straight line across even a
    single point, if I know that it must be parallel
    to another line.)
  • Humans do not optimize, but use quick and dirty
    heuristics. This has an evolutionary meaning if
    something looks vaguely like a leopard, one
    jumps, rather than trying to seek the optimal fit
    to the observed fragments of the picture to a
    leopard.

61
  • Prior knowledge, the larger picture, deliberate
    or unconscious bias, etc. are essential features
    of model building.
  • When we have a chance to check this prior
    knowledge millions of times in carefully designed
    laboratory experiments, this is a well-justified
    procedure.
  • In several applications (macroeconomics, medical
    sciences, epidemology, etc.) there is no way to
    perform these laboratory checks, and errors may
    build up as one uncertain piece of knowledge
    serves as a prior for another uncertain
    statistical model. This is how we construct
    myths, ideologies and social theories.

62
  • It is conceivable that theory building, in the
    sense of constructing a low dimensional model,
    for social phenomena will prove to be impossible,
    and the best we will be able to do is to build a
    life-size computer model of the system, a kind of
    gigantic Simcity.
  • It remains to be seen what we will mean by
    understanding under those circumstances.
Write a Comment
User Comments (0)
About PowerShow.com