Nifty 50 ML Portfolio Optimization | Black-Litterman

Summary

This report studies how classical portfolio theory and machine-learning forecasts can be combined to allocate capital across Nifty 50 large-cap equities on the National Stock Exchange of India.

The investable universe comprises up to thirty liquid index constituents with a full price history from 2016 onward. The Nifty 50 index serves as the market benchmark. A risk-free rate of 6.5% (annualized) proxies the return on long-dated Indian government bonds.

Four portfolio constructions are tracked out-of-sample from late 2021: mean-variance (MV) optimized, ML-enhanced MV (forecast views blended via Black-Litterman, then re-optimized), a cap-weight reference portfolio, and the index itself. Interactive exhibits in the interpretation section are research illustrations of method behaviour—not product recommendations.

Research workflow and data processing

The study follows a repeatable institutional pipeline from raw market inputs to published performance tables. Each stage is designed to avoid look-ahead: only information available at the rebalance date enters the optimizer.

Stage 1 — Universe and prices. Current Nifty 50 membership is screened for listing history. Adjusted daily closing prices are aligned on a common calendar; names with fewer than roughly two years of observations are excluded so covariance and momentum features are stable.

Stage 2 — Feature engineering. For every surviving stock, price series are transformed into technical indicators: short and medium moving averages, relative strength (RSI), MACD, multi-horizon momentum, and rolling volatility. These become inputs to the forecasting model.

Stage 3 — Training window. At each quarterly rebalance, the prior ~30 months of data form the estimation sample. Expected returns and the covariance matrix are computed from daily log returns in this window only.

Stage 4 — Optimization. Two weight vectors are produced: (a) pure MV using historical mean returns, and (b) ML + MV using Black-Litterman posterior returns. Both respect long-only constraints, minimum and maximum position sizes, and a portfolio volatility ceiling.

Stage 5 — Walk-forward simulation. Optimized weights are held for the next quarter. Daily portfolio returns compound; 0.15% slippage is applied proportional to weight turnover at each rebalance. The process advances three months and repeats until the end of the sample.

Stage 6 — Risk analytics and reporting. Cumulative wealth paths, drawdowns, Sharpe and Sortino ratios, beta, alpha, value-at-risk, and sector concentration are summarized in the interpretation section.

Quantitative framework

Daily returns. For stock $i$ on day $t$ : $r_{i, t} = P_{i, t} / P_{i, t - 1} - 1$ . The sample covariance matrix $Σ$ uses these returns over the training window; annualized figures scale by 252 trading days.

Mean-variance optimization. Portfolio weights $w$ maximize the Sharpe ratio subject to $\sum_{i} w_{i} = 1$ , box constraints $w_{m i n} \leq w_{i} \leq w_{m a x}$ , and $252 w^{⊤} Σ w \leq σ_{m a x}$ . When no external view is supplied, expected returns $μ$ are the sample mean daily returns times 252.

Machine-learning views. A gradient-boosted model predicts the forward five-day return from technical features. Training uses the first 80% of the estimation window chronologically; the last 20% estimates out-of-sample skill ( $R^{2}$ ), which maps to view confidence $c \in [0.05, 1]$ . The raw forecast is annualized and shrunk toward each stock’s historical mean return: $\tilde{μ}_{i} = c \cdot μ_{i}^{M L} + (1 - c) \cdot μ_{i}^{hi s t}$ .

Black-Litterman blending. Market-cap weights imply equilibrium returns $π$ proportional to index performance. Investor views $Q$ carry diagonal uncertainty $Ω$ inversely related to confidence. With prior scaling $τ$ , posterior expected returns are:

$E [R] = [(τ Σ)^{- 1} + Ω^{- 1}]^{- 1} [(τ Σ)^{- 1} π + Ω^{- 1} Q]$

These posteriors replace $μ$ in a second MV solve to obtain ML + MV weights.

Performance metrics. Cumulative return compounds daily portfolio returns. CAGR annualizes terminal wealth. Sharpe is excess return over the risk-free rate divided by annualized volatility. Sortino uses downside deviation only. Max drawdown is the worst peak-to-trough decline on the cumulative curve. Beta and alpha come from regressing strategy daily returns on Nifty returns. VaR and CVaR at 95% are historical quantiles of the daily return distribution.

How to interpret the results

Strategy comparison table. Compare CAGR (wealth growth), Sharpe (return per unit of total risk), and max drawdown (worst loss episode). A higher Sharpe with moderate drawdown suggests efficient risk-taking; a high CAGR with deep drawdown may reflect concentrated bets.

If ML + MV outperforms MV only, the machine-learning views are adding information beyond historical means—typically by tilting toward names with favourable short-term technical patterns while the optimizer enforces diversification. If MV only lags cap-weight, pure historical covariance may be a weak signal in fast-moving Indian large caps over this window.

Cumulative performance chart. Parallel wealth indices (rebased to zero excess return at the backtest start) show regime behaviour. Divergence between ML + MV and Nifty indicates periods of active risk; convergence suggests the strategy matched the index.

Risk profile (ML + MV). Beta near one implies market-like sensitivity; below one suggests defensive positioning. Positive alpha is average return unexplained by index exposure. Information ratio scales active return by tracking error versus Nifty. VaR/CVaR describe typical and tail daily losses under the historical distribution.

Sector allocation. Aggregated weights reveal industry concentration—e.g. overweight Financials or IT if the model and optimizer favour those names. Large sector tilts increase idiosyncratic risk relative to the index.

Trading signals. Each row combines a 50-day trend (price versus moving average) with an annualized return forecast. Strong Buy appears when trend and forecast align bullishly; Hold when they conflict. Signals are illustrative rankings at the last training date, not live orders.

Extended analytics. Calendar-year return tables decompose performance by regime. Drawdown and monthly-return charts show *when* risk materialized, not only headline CAGR. Up/down capture and tracking error quantify how closely ML + MV follows or diverges from Nifty in rallies and corrections.

Limitations

Published closing prices may differ from exchange official figures; corporate actions are handled via standard adjustment conventions.

The study uses a subset of thirty Nifty names for computational stability; conclusions may not transfer identically to the full fifty-stock index.

Transaction costs are stylized (slippage on turnover only). Securities transaction tax, stamp duty, brokerage, and market impact are not fully modeled.

Machine-learning forecasts are noisy; out-of-sample $R^{2}$ is often low for individual stocks, so views are deliberately shrunk toward historical means.

Past backtest performance does not guarantee future results. This document is research output, not investment advice.

Conclusion

ML-enhanced mean-variance optimization on Nifty large caps illustrates how forecast views can shift weights relative to a pure MVO baseline, but transaction costs and model noise limit how aggressively one should interpret headline Sharpe ratios.

Use the scorecard for headline performance, cumulative paths for timing of out- and under-performance, and sector/signal tables for understanding why weights shifted — not for trading without independent validation.