Title: Robust and Rich Roy Welsch Xinfeng Zhou Massachusetts Institute of Technology
1Robust and Rich Roy WelschXinfeng
ZhouMassachusetts Institute of Technology
- email rwelsch_at_mit.edu
- International Conference on Robust Statistics
- Technical University of Lisbon
- 17 July 2006
2(No Transcript)
3Notation
- N number of risky assets
- Ri return of the ith asset in the portfolio
- wi weight of the ith asset in the portfolio,
- ?i expected return of the ith asset
- ? covariance matrix of the returns of N assets
- T number of time periods for estimating ?.
- Note wi could be negative for short sales.
4Mean-Variance Portfolio Optimization
- Portfolio return
- Rp w1R1 w2R2 . . . wnRn
- Expected portfolio return
- E(Rp) wT?
- Variance of the portfolio return
- Var(Rp) wT?w
- Mean-variance portfolio optimization minimizes
the variance of a portfolio return for a given
level of expected return ?p - subject to wT? ?p, wTe 1
- where e is the n ? 1 column vector with all
elements 1.
5Problems with Mean-Variance
- Static just one period.
- Sensitive to inputs which are, in turn, subject
to random errors in the estimation of expected
return and variance which are usually obtained
from historical return data. - This sensitivity often leads to extreme portfolio
weights and dramatic swings in weights with only
minor changes in expected returns or the
covariance matrix. This can lead to frequent
rebalancing and excessive transaction costs. - For stable covariance estimation, we prefer long
historical time series (the number of assets, N,
far smaller than the number of time periods, T).
However, old historical data may not reflect
current market dynamics. - Underlying multivariate normal assumption may not
be right.
6Some Solutions
- Factor models (CAPM, etc.), Bayesian shrinkage,
GARCH models. - Regularization (penalty) methods.
- Robust estimation of the expected return and the
covariance matrix. We will focus on this. - Combinations of the above methods.
7Fast-MCD
- The minimum covariance determinant (MCD) proposed
by Rousseeuw (1985) looks for the covariance
matrix of h data points (T / 2 ? h lt T) with the
smallest determinant. The breakdown is (T ? h) /
T. The resulting covariance matrix is biased
(and can be adjusted to be unbiased), but this
multiplicative factor has no effect on portfolio
weight allocation. MCD is not feasible for N gt
20 in our situation. Fast-MCD proposed by
Rousseeuw and Van Driessen (1999) makes large N
(51 in our data) feasible. MCD retains affine
equivariance.
8Pairwise Robust Covariance
- If the affine equivariance assumption is
dropped, faster robust pairwise covariance
estimators are available. - Khan et.al. (2005) compared several approaches
to robust pairwise covariance estimation while
investigating ways to make least-angle regression
(LARS) (Efron, et. al., 2003) robust. They found
a two-step, two-dimensional Winsorization method
to be effective and fast. We use a modified form
of their idea with adjustment to insure a
positive definite covariance matrix.
9Huber Winsorization
For each (time) vector of returns xi, i 1, . .
. , N, the transformation is used to
shrink outliers towards the median with the
Huber function Hc min max(c, x), c, c gt 0.
10Bivariate Winsorization
- Huber Winsorization fails to take the orientation
of the bivariate data into consideration.
Bivariate Winsorization (after centering) sets - with xt (xti, xtj)T and D(xt) the Mahalanobis
distance based on some initial ?0, ?0 and
constant c.
11Iterated Bivariate Winsorization
- For each pair of variables xi, xj compute
- Let xt xti, xtjT.
- For each ?k, ?k, calculate the Mahalanobis
distance for each return pair - and weight
12- Update ?k1, ?k1
- until convergence.
- All pairwise covariances are combined to form an
initial covariance matrix. This is converted to
positive semi-definite using a method due to
Maronna and Zamar (2002). - We call this I2D-Winsor.
13Fast 2-D Winsorization
- Iteration is expensive and Khan, et. al. (2005)
proposed taking one bivariate Winsorization step
from an improved starting ?0. They start with
univariate Huber Winsorization but use two tuning
constants, c1 and c2. The constant c1 (chosen to
be 2) is used in the two quadrants with the most
data (n1) and the second constant c2 n2 / n1
with n2 T n1 is used in the remaining two
quadrants. This pulls the Huber Winsorization
boundary in where there is less data and a higher
chance of data not following the ellipsoidal
pattern for the main part of the data.
14(No Transcript)
15Fast 2-D
- The classical correlation is computed on this
Winsorized data and all pairwise correlations
form the full initial correlation matrix which,
if necessary, is made positive definite. Then one
step of bivariate Winsorization is used and this
new matrix is again made positive definite. We
call this F2D-Winsor.
16Historical Data
- Daily returns on 51 MSCI US Industry sector
indexes from 01/03/1995 to 02/07/2005 (2600 days
of data). Broader than the SP 500. We need to
find the weights to use on each of the n 51
indexes in our portfolio. - Rebalance as follows Estimate sector weights
using most recent T 100 daily returns,
rebalance every 5 trading days. With 2600 days
there are 500 rebalances. Trading costs (when
used) are 5 cents for each 100 bought or sold. - We use the following constraints
- wT? ?p, wTe 1, ?1 ? wi ? 1.
- The market portfolio consists of all individual
stocks (about 700) in the 51 indexes weighted by
market capitalization. -
17Financial Performance Measures
- mean the sample mean of weekly ex-post returns.
- STD the sample standard deviation of weekly
ex-post returns. - Information ratio (annualized)
- ?-VaR (? 5, 1) the sample ?-quantile of the
weekly ex-post returns distribution. - ?-CVaR (? 5, 1) the sample conditional mean
of the weekly ex-post returns distribution, given
the returns are below the ?th quantile. - Max DD the maximum drawdown, which is the
maximum loss in a week. - CRet cumulative return.
- Turnover weekly asset turnover, defined as the
mean of the absolute - weight changes for 500 updates.
- CRet_cost cumulative return with transaction
costs. - IRcost information ratio with transaction costs.
18Winsorization Results
19Contamination Models
- MCD
- Each row (time observation) either from F0 or H.
Implies either a bad day on the market (all
stocks) or a high correlation among stocks. In
fact, rarely true. - Pairwise
- Pairwise correlation permits a more flexible
error model. Unusual market returns only explain
a small part of observed outliers. Industrial
factors and idiosyncratic risk specific to
individual stocks or groups of stocks explain a
majority of the outlying data.
20Too Much Turnover?
- The mean-variance portfolio optimization problem
can be re-expressed as - subject to w?? ?p and w?e 1. One way to
possibly reduce turnover would be to penalize
deviations from the market weights, mj and, at
the same time, look for sparse solutions that do
not invest any funds in some securities. The
LASSO (Tibshirani, 1996) does exactly this. More
robust loss functions such as L1 and Huber may
also be used instead of least-squares, but did
not change the results significantly.
21Penalization
- To implement this we solve (Laupréte, 2001)
- and use 5-fold cross-validation to find ? based
on prediction error for the out-of-sample data.
The recently developed LARS (least-angle
regression) algorithm (Efron, et. al. 2004)
greatly speeds up computations for the Lasso
since solutions for all ? can be found in about
the same time as one least-squares regression.
This removes the need for a (non-specific) grid
search on ?.
22Penalty Results
- We end up with slightly better performance and
dramatically lower turnover.
23Run Times
- 500 Rebalancings
- V 40 seconds
- F2D-Winsor 35 minutes
- I2D-Winsor 3 hours
- FAST-MCD 10 hours
- V1 4 hours
24Next Steps
- Combine robust covariance and penalty approaches.
- Use individual stocks (about 700) instead of 51
sector index funds. Then T 100 lt lt N 700. - Fast algorithms.
25References
- Alqallaf, F.A., et al., (2002) Scalable robust
covariance and correlation estimates for data
mining. Proceedings of the eighth ACM SIGKDD
international conference on Knowledge discovery
and data mining, Statistical methods, 1423. - Efron, B., Hastie, T., Johnstone, I., Tibshirani,
R., (2003) Least Angle Regression, Annals of
Statistics, 32, 407499. - Khan, J., Van Aelst, S. and Zamar, R., (2005)
Robust linear model selection based on Least
Angle Regression, Technical Report, Department of
Statistics, University of British Columbia. - Laupréte, G.J., Portfolio risk minimization under
departures from normality. MIT PhD Thesis, 2001. - Maronna, R.A. and R.H. Zamar, (2002) Robust
estimates of location and dispersion for
high-dimensional datasets, Technometrics, 44(4),
307317.
26- Rousseeuw, P.J. and K. Van Driessen, (1999) A
fast algorithm for the minimum covariance
determinant estimator, Technometrics, 41(3),
212223. - Tibshirani, R., (1996) Regression Shrinkage and
Selection via the Lasso, Journal of the Royal
Statistical Society, Series B, 5, 267288. - Zhou, X., (2006) Application of Robust Statistics
to Asset Allocation Models, MIT, M.S. Thesis.