Portfolio Stress Lab | QuantifiedTrader

Abstract

When investors talk about portfolio risk, they often mix together three quite different questions without realising it. The first is historical: what actually happened to this allocation during past crises such as the global financial crisis, the COVID shock, or the 2022 bond selloff? The second is hypothetical: if rates spike, volatility surges, or markets fall sharply, how sensitive is the portfolio before any new data arrives? The third is forward-looking: given where we stand today, what is the range of plausible outcomes over the next several months or years? Each question is legitimate, but none of them alone is sufficient. History does not cover shocks that have not happened yet; hypothetical shocks ignore the specific path markets took; and forward models depend on assumptions that may or may not hold.

This study brings those perspectives together in a single stress-testing laboratory. Rather than building one new portfolio from scratch, it synthesises results from five existing allocation programs that already represent different ways of thinking about risk. One program studies global strategic asset allocation across roughly thirty-seven ETFs and eight portfolio frameworks, from a classic 60/40 to risk parity and max-Sharpe constructions. Another benchmarks twelve quantitative optimisation rules on US equities and global macro indices. A third applies machine-learning return views on top of mean-variance optimisation for US stocks. A fourth simulates forward paths with GARCH volatility dynamics on a US index blend. A fifth selects emerging-market stocks using fundamental scoring. Taken together, they span passive beta, optimisation-driven weights, forecasting overlays, stochastic volatility, and fundamental stock selection.

The analysis is organised in four layers that build on one another. We begin with historical crisis replay, measuring how each sleeve actually behaved in documented stress windows. We then apply parametric and compound stress operators that shock wealth levels and volatility in controlled ways. Next we estimate tail risk using historical, parametric, and Monte Carlo Value-at-Risk and Expected Shortfall, so that model choice itself becomes visible. Finally, we use Fully Flexible Probability reweighting in the spirit of Meucci (2008, 2010), tilting probability mass toward adverse historical scenarios rather than inventing artificial return paths. Forward-looking perspective comes from a large-scale strategic Monte Carlo exercise and from GARCH-based simulation with fat-tailed innovations.

Throughout the page, charts and tables appear immediately after the section that describes them. None of these exhibits should be read as a standalone forecast. Their value is comparative: they show whether risk is hiding in crisis episodes, in volatility regimes, in the way tail risk is measured, or in the portfolio construction rule itself.

Loading empirical exhibit…

Data provenance and empirical inputs

Every result shown in this stress lab comes from allocation studies that were already completed elsewhere on the site. In other words, we are not pulling a fresh price history for this page alone. Instead, we take the monthly and daily return series, optimised weights, and risk statistics that those studies produced, and we re-express them in a common framework so they can be compared side by side. That design keeps the focus on interpretation rather than data engineering.

The largest input is the Global Strategic Asset Allocation study. It uses monthly total returns on a broad panel of liquid ETFs covering US, developed, emerging, and frontier equity, government and corporate bonds, inflation-linked bonds, commodities, and listed alternatives. The sample begins in January 2005. Crisis windows are anchored to well-known episodes: the global financial crisis from October 2007 through March 2009, the COVID crash in early 2020, the bond-duration shock of 2022, and more recent equity correction periods. A short-maturity US Treasury bill yield serves as the risk-free reference. This dataset drives most of what you will see on the page — historical crisis tables, parametric shocks, the VaR comparison, FFP reweighting, correlation analytics, and the long-horizon Monte Carlo fan chart.

The Quantitative Portfolio Optimisation Benchmarks study contributes daily returns on a US large-cap subset and twenty global macro indices. Twelve portfolio rules are evaluated, including equal weight, inverse volatility, classical mean-variance, risk parity, hierarchical risk parity, and distributionally robust CVaR minimisation. These results feed the optimisation tail-ranking table and add rows to the cross-study comparison matrix.

The US Machine-Learning Enhanced Allocation study follows thirty US equities with quarterly walk-forward rebalancing. Classical mean-variance weights are blended with machine-learning return forecasts, using SPY as the benchmark. It is included because many practitioners now layer forecasting models on top of traditional optimisation, and it is important to ask whether that improves tail risk or merely lifts average returns.

The US Index Volatility Simulation study models a simple 50/50 blend of the S&P 500 and the Dow Jones Industrial Average. Conditional variance follows a GARCH(1,1) process with Student-t innovations, which allows volatility to cluster and produce fatter tails than a Gaussian model would. This supplies the forward path simulation on the right-hand side of the forward Monte Carlo section.

Finally, the Emerging Markets Fundamental Allocation study covers twenty liquid EM American depositary receipts selected through a fundamental composite score and constrained mean-variance optimisation, with EEM as the regional benchmark. It rounds out the comparison by showing how a stock-selection sleeve behaves on tail metrics relative to broad passive regional exposure.

Historical crisis replay

The most natural place to begin any stress analysis is with what actually happened. When a crisis unfolds in real time, investors want to know not only whether they lost money, but how their chosen allocation compared with simpler alternatives. Did risk parity really provide balance? Did a global balanced sleeve fare better than a domestic 60/40? Did a return-seeking max-Sharpe portfolio recover quickly or linger near its lows? Historical crisis replay answers those questions using realised monthly returns rather than modelled scenarios.

For each portfolio sleeve we build a wealth index from monthly simple returns. Let $W_{t} = \prod_{s \leq t} (1 + r_{p, s})$ denote cumulative wealth for portfolio $p$ . Within a crisis window $T = [t_{0}, t_{1}]$ , we report the cumulative return over that window and the deepest peak-to-trough decline experienced inside it:

$R_{p, T} = t \in T \prod (1 + r_{p, t}) - 1, M D D_{p, T} = t \in T min \frac{W _{t} - max _{s \leq t} W _{s}}{max _{s \leq t} W _{s}}$

We also compute annualised volatility inside the window as $σ_{p, T} = Std (r_{p, t})_{t \in T} 12$ . This is deliberately simple: it keeps the focus on outcomes investors can relate to — how much was lost, how deep the drawdown was, and how turbulent the ride felt.

The table in the exhibit lines up several sleeves for each crisis episode. A portfolio can look perfectly respectable over the full sample yet show a punishing drawdown in one specific window, which is often where tail risk hides. The bar chart below the table measures how far apart the worst and best sleeves were in terms of maximum drawdown during each crisis. When that spread is wide, the choice of allocation framework made a large difference. When it is narrow, the macro environment overwhelmed portfolio design and almost everything moved together.

It is worth reading each crisis on its own terms. The global financial crisis was dominated by credit stress, deleveraging, and liquidity withdrawal. COVID was a violent but relatively short risk-off shock followed by a rapid policy-driven rebound. The 2022 episode punished duration and challenged the assumption that bonds would diversify equity losses. Comparing a 60/40 sleeve with global balanced, risk parity, and max Sharpe makes it easier to see whether diversification genuinely protected capital or simply changed the shape of the loss.

Loading empirical exhibit…

Univariate parametric stress operators

History is informative, but it is also incomplete. Some macro shocks are gradual rather than sudden, and some combinations of stress have no close historical parallel at the scale investors now fear. Parametric stress testing addresses that gap by asking a straightforward counterfactual: if a particular channel deteriorates, how much does portfolio wealth suffer even before new market data arrives?

We consider two families of shocks. The first is a level shock, or wealth haircut. Starting from the baseline wealth path $W_{t}$ , we apply an instantaneous reduction of size $δ$ so that $\tilde{W}_{t} = (1 - δ) W_{t}$ , then reconstruct the implied return series from the shocked wealth path. The calibrated haircuts are 10%, 20%, and 30%, which can be read as moderate rate pressure, meaningful systemic stress, and a severe impairment event respectively.

The second family is a volatility shock. Here the mean return is left unchanged, but each month's deviation from that mean is scaled up: $\tilde{r}_{t} = μ + κ (r_{t} - μ)$ with $κ$ set to 1.5 or 2.0. This is the portfolio analogue of a world in which uncertainty rises sharply even though the average expected return has not moved. It is especially relevant when implied volatility spikes but the market's central forecast remains intact.

For every scenario we compare stressed and baseline cumulative return, maximum drawdown, and annualised volatility across the main sleeves: 60/40, global balanced, risk parity, and max Sharpe.

In the exhibit, the dropdown lets you move between shock types. The bars show how far each sleeve's cumulative return falls relative to its unshocked history, and the table reports the change in return together with the stressed drawdown. If a sleeve reacts strongly to volatility shocks but barely moves under level shocks, its vulnerability is mostly in the second moment — the portfolio can tolerate a one-off hit but struggles when dispersion widens. The opposite pattern usually points to concentrated exposure to level risk, such as equity beta or bond duration.

Loading empirical exhibit…

Compound stress combinations

Markets rarely present stress as a single clean shock. During the global financial crisis, losses came together with soaring volatility and a withdrawal of liquidity. In 2022, rising rates coincided with a breakdown in the familiar bond-equity diversification relationship. Testing only one channel at a time can therefore leave a false sense of comfort. A portfolio that survives a 10% wealth shock in isolation may look very different once that shock is followed by a volatility spike or a second leg down.

To capture that reality, we apply stress operators sequentially to the monthly return series. Formally, a compound scenario is written as $S = S_{k} \circ \dots \circ S_{1}$ , where each step is either a level shock $δ_{j}$ or a volatility multiplier $κ_{j}$ . The order of application matters because these operations do not commute: crisis first and volatility second is not the same path as the reverse.

Five compound calibrations are examined. The first combines a severe crisis haircut of 30% with a volatility multiplier of 1.5. The second pairs a 10% rate-style shock with the same volatility increase. The third stacks a 30% crisis shock followed by a further 10% level shock. The fourth applies all three channels — 30%, 10%, and $κ = 1.5$ — in sequence. The fifth represents a deep drawdown plus a liquidity-style stress, with a 20% haircut and a stronger volatility multiplier of 2.0.

The purpose of this section is not to predict which combination will happen next, but to reveal interaction effects. A sleeve that appears robust in the univariate section may deteriorate sharply once shocks are layered. That is often where risk-budgeting assumptions break down in practice.

When you read the exhibit, compare each compound result with the single-shock scenarios above. If the compound loss is worse than the sum of the individual pieces, the portfolio is displaying genuine non-linearity. Risk parity and minimum-variance sleeves are particularly worth watching here, because a volatility shock can hit many positions at once and temporarily erase the benefit of diversification.

Loading empirical exhibit…

Value-at-Risk and Expected Shortfall estimators

Tail risk is often reduced to a single number inside risk committees, but the number you get depends heavily on how you estimate it. Value-at-Risk asks how bad the loss could be at a chosen confidence level, while Expected Shortfall — also called CVaR — asks how bad the average loss is in the worst cases beyond that threshold. Both are useful, yet they can disagree materially when returns are skewed, fat-tailed, or clustered in time.

In this section we estimate monthly tail risk at the 95% confidence level using three standard approaches. Historical simulation takes the empirical return distribution as given:

$VaR_{α} = - Quantile_{α} (r_{p})$ and $ES_{α} = - E [r_{p} ∣ r_{p} \leq - VaR_{α}]$ .

This method makes no normality assumption and therefore respects the actual shape of past returns, though it still assumes the historical sample is relevant going forward.

The parametric Gaussian approach instead fits a mean and standard deviation and uses normal quantiles:

$VaR_{α} = - (μ + z_{α} σ)$ with $z_{0.05} \approx - 1.645$ , and

$ES_{α} = - μ + σ ϕ (z_{α}) / (1 - α)$ , where $ϕ$ is the standard normal density.

It is elegant and fast, but it tends to understate tail risk when markets jump or when negative months cluster together.

The third method is Monte Carlo simulation, drawing 10,000 returns from a normal distribution with the same mean and variance and computing empirical tail statistics from those simulated samples. It behaves similarly to the parametric route, but makes sampling uncertainty visible.

Expected Shortfall is generally the more informative of the two tail summaries because it cares about the severity of losses beyond the VaR cutoff and satisfies coherence properties that VaR does not. Even so, the more important lesson here is methodological: if historical and parametric estimates diverge sharply, the portfolio's risk is not well described by a bell curve.

Use the dropdown in the exhibit to move between portfolio sleeves. The chart places VaR and ES side by side for each estimation method. When the historical bars sit far below the parametric ones, the empirical distribution has heavier left-tail mass than a Gaussian model admits. When Monte Carlo and parametric results agree with each other but not with history, the model may be internally consistent yet miss the real-world tail behaviour.

Loading empirical exhibit…

Fully Flexible Probability scenario reweighting

Many stress tests change the returns themselves — they shock covariances, apply haircuts, or invent synthetic scenarios. Meucci's Fully Flexible Probabilities take a different philosophical route. The historical return scenarios stay exactly as they were observed, but the probability attached to each scenario is allowed to change. In that sense, the exercise is closer to asking, "What if the past had unfolded with a different emphasis on bad months?" than to asking, "What if prices moved differently?"

We begin with a baseline probability scheme that down-weights distant history using exponential decay with a half-life of thirty-six months: $p_{t} \propto e^{- λ (T - t)}$ where $λ = ln 2/ h$ . Recent observations therefore count more than older ones, which is a reasonable starting point when market structure evolves over time.

On top of that baseline we impose a stress view that shifts probability toward the left tail. Months whose returns fall in the worst decile receive additional weight until the stressed tail accounts for 20% of total probability mass. Once those reweighted probabilities are in place, tail risk is recomputed using the weighted distribution:

$VaR_{α}^{(w)} = - Quantile_{α}^{(p)} (r), N_{eff} = 1/ t \sum p_{t}^{2}$

The effective number of scenarios, $N_{eff}$ , is a useful diagnostic. When it falls sharply, the stress view is concentrating belief in only a handful of historical outcomes. A portfolio that looked acceptable under gently decayed probabilities may look much less comfortable once crisis months are treated as more likely than the baseline suggests.

The exhibit compares baseline weighted VaR with tail-tilted VaR for each sleeve. The table underneath shows how much effective sample size is lost in the process, how far VaR moves, and what happens to Expected Shortfall under the stressed probabilities. A large move in VaR combined with a steep drop in $N_{eff}$ usually means the portfolio's risk is concentrated in a small set of adverse historical episodes rather than spread evenly across time.

Loading empirical exhibit…

Forward stochastic simulation

So far, the analysis has looked backward or applied fixed shocks to history. Investors also need a sense of what may lie ahead. Forward simulation does not provide a forecast in the narrow sense of predicting one outcome, but it does map out a range of plausible futures given today's estimates of drift, covariance, and volatility dynamics.

The left-hand exhibit uses a strategic Monte Carlo engine for the global balanced sleeve. One hundred thousand ten-year wealth paths are simulated from the estimated mean and covariance structure. The chart shows the 5th, 50th, and 95th percentile bands through time, which gives an intuitive picture of how wide or narrow the outcome set is. We also report the probability that terminal wealth ends below its starting value, which is a simple summary of downside frequency over the horizon. This approach is well suited to long-term asset allocation, though it may understate the severity of short, sharp crashes because it does not explicitly model volatility clustering.

The right-hand exhibit takes a different route for a 50/50 blend of the S&P 500 and Dow Jones Industrial Average. Here conditional variance follows a GARCH(1,1) process,

$σ_{t}^{2} = ω + α ε_{t - 1}^{2} + β σ_{t - 1}^{2},$

with Student-t innovations so that large moves beget more large moves and the tails are fatter than normal. The realised path is plotted against the simulated mean path so you can judge visually how closely the model tracks recent history. Terminal VaR and the probability of finishing below starting wealth summarise short-horizon tail exposure in a world where volatility itself moves over time.

Read the two panels together rather than in isolation. The strategic Monte Carlo answers a slow-moving allocation question: how dispersed are long-run outcomes? The GARCH simulation answers a market-timing and risk-management question: how unstable could the near term become if volatility persists? Neither panel replaces the other; they complement the historical and parametric sections by shifting the time direction forward.

Loading empirical exhibit…

Cross-study tail-risk attribution

Each section up to this point has asked how a particular portfolio behaves under a particular lens. The cross-study matrix steps back and asks a broader question: when you compare fundamentally different ways of building risk, which approaches actually look safer once you focus on the tail rather than the average?

To make that comparison possible, we line up Sharpe ratio, maximum drawdown, 95% VaR, and 95% Expected Shortfall on a common scale even though the underlying universes and rebalance frequencies differ. The rows cover global ETF sleeves, US and global optimisation benchmarks, machine-learning-enhanced US equity allocation, GARCH-based US index simulation, and emerging-market fundamental strategies.

This is not a simple return ranking. Two portfolios can post similar Sharpe ratios yet diverge sharply on drawdown or expected shortfall, which is often what matters most to allocators in stressful periods. The matrix is therefore best read as a map of tail efficiency: which construction rules cluster toward the safer end of the distribution, and which ones buy returns by accepting deeper tail losses.

The filter at the top of the exhibit lets you narrow the view to one empirical program or keep all rows visible for cross-program comparison. Within the optimisation benchmarks, it is especially informative to compare risk parity and hierarchical risk parity against max Sharpe and equal weight. In the emerging-market block, compare the fundamental sleeves with the EEM benchmark to see whether stock selection improved drawdown characteristics or mainly shifted beta exposure.

Several patterns tend to recur. Minimum-variance and risk-parity constructions often look better on maximum drawdown. Return-maximising rules frequently look worse on expected shortfall. Machine-learning overlays sometimes lift Sharpe without delivering a matching improvement in the tail, which suggests alpha in the mean but not necessarily in downside protection.

Loading empirical exhibit…

Correlation structure and macro regimes

Diversification is only as good as the correlations that underpin it, and those correlations are not stable. In calm markets, government bonds, equities, commodities, and regional ETFs often appear to move independently enough to justify multi-asset construction. In risk-off episodes, many of those relationships tighten at once, and portfolios that looked balanced on paper suddenly behave like concentrated bets. This section measures that instability directly.

We begin with the average pairwise correlation across the global ETF panel,

$\overset{ρ}{ˉ} = \frac{2}{N ( N - 1 )} i < j \sum ρ_{ij} .$

When this quantity is low, there is genuine room for diversification to work. When it rises, risk-budgeting assumptions that treated positions as partly independent become harder to defend. The table of high-correlation pairs highlights specific clusters — pairs whose correlation exceeds 0.75 in absolute value are effectively moving as one position even if they are held under different labels.

The rolling correlation chart shows how $\overset{ρ}{ˉ}_{t}$ evolved through time. Spikes line up with the major stress episodes in the sample, which is exactly what practitioners mean when they say correlations go to one in a crisis. Reading that chart alongside the historical crisis section makes it easier to understand why some sleeves suffered more than their static risk models implied.

The regime analysis goes one step further by asking where each portfolio earned its Sharpe ratio. History is partitioned into macro states such as growth, recession, inflation, and periods of monetary tightening or easing, and we compute regime-conditional Sharpe ratios $S R_{p, r}$ for each sleeve. A portfolio can look attractive over the full sample yet be heavily dependent on one favourable regime. The bar chart lets you select a sleeve and see which environments supported it and which did not. A strong overall Sharpe paired with a deeply negative recession Sharpe is a warning sign that the strategy may be carrying a hidden macro bet.

Loading empirical exhibit…

Optimization benchmark tail ranking

Portfolio optimisers are often introduced as if there were one universally best way to build an efficient portfolio. In practice, the answer depends on what you mean by efficient. A maximum-Sharpe portfolio is efficient with respect to mean and variance. A minimum-variance portfolio is efficient with respect to volatility. A CVaR-minimising portfolio is efficient with respect to the tail. Those are not the same objective, and they do not always produce similar portfolios.

This section ranks twelve quantitative constructions on the global index panel using 95% Conditional Value-at-Risk as the primary sorting criterion. The list includes equal weight, inverse volatility, global minimum variance, maximum Sharpe, risk parity, hierarchical risk parity, nested clustered optimisation, maximum diversification, and distributionally robust CVaR minimisation. Each rule encodes a different belief about what risk actually is: variance, tail loss, cluster structure, or uncertainty about the data itself.

The table should be read as a tail-efficiency scorecard rather than a return leaderboard. VaR, maximum drawdown, and Sharpe are shown alongside CVaR so you can see the trade-off clearly. A model with modest Sharpe but comparatively low CVaR is protecting the tail even if it does not win on average returns. A model with high Sharpe and high CVaR is doing the opposite — accepting deeper tail losses in exchange for stronger central performance.

Hierarchical risk parity and risk parity often rank well on CVaR because they avoid the extreme corner solutions that classical mean-variance optimisation can produce when estimates are noisy. That does not automatically make them superior in every setting, but it does explain why they appear so often in defensive portfolio discussions.

The most useful reading is joint. Compare this table with the parametric and compound stress sections above. If an optimiser ranks well on CVaR here but still deteriorates sharply under stacked shocks, the in-sample tail advantage may be fragile. If it ranks well on both, there is a stronger case that the construction rule is structurally defensive rather than lucky in the estimation window.

Loading empirical exhibit…

Conclusion

Taken together, the exhibits on this page make a simple point that is easy to forget in day-to-day portfolio work: resilience is not a single number. It depends on the path markets actually took, on the method used to measure tail risk, and on the construction rule used to assemble the portfolio in the first place.

The historical crisis section shows that sleeves with similar long-run Sharpe ratios can diverge sharply once you condition on real stress episodes. The parametric and compound sections show that vulnerability can grow non-linearly when shocks arrive together rather than one at a time. The VaR and Expected Shortfall comparison shows that model choice alone can change the reported severity of tail risk. The FFP reweighting section shows that a portfolio can look acceptable under gently weighted history yet look much less comfortable once probability is shifted toward bad months.

Correlation and regime analytics add an important structural lesson. Diversification is not a static property; it weakens when average correlations rise and when a strategy only works in a narrow set of macro environments. The cross-study tail matrix reinforces that no construction dominates every risk measure. Risk parity and minimum-variance approaches often protect drawdowns; return-seeking optimisers frequently accept higher expected shortfall in exchange for stronger average performance.

The practical implication is not to search for one stress number that settles the question. It is to treat each section as a different lens on the same underlying problem. Historical replay anchors the discussion in what markets have already done. Parametric and compound shocks test sensitivity to events that may happen again in a different form. Forward simulation and GARCH paths ask what the near and long horizon could look like from today's starting point. FFP reweighting makes explicit the subjective belief that adverse scenarios deserve more weight than a simple historical average would assign.

Used that way, the stress lab is less a forecasting engine than a disciplined conversation about where risk really lives — in crisis paths, in volatility regimes, in tail measurement, or in the portfolio rule itself.