Nifty 50 Alpha101 Selection & Composite Factor Research

Indian large caps Price-volume alphas Cross-sectional cleaning Information coefficient screening Linear factor combination Machine-learning composite Quintile portfolio backtests Nifty 50 constituents

Summary

This study asks whether classic formulaic alphas — compact rules built from open, high, low, close, and volume — still separate winners from losers among Nifty 50 constituents over a long daily history of NSE-listed prices.

The workflow has three layers: (1) signal engineering on a daily panel of roughly fifty NSE stocks; (2) statistical evaluation via cross-sectional information coefficients and quintile spreads; (3) factor combination with both a transparent rolling linear model and an optional gradient-boosted machine-learning overlay.

Results are descriptive research outputs. They illustrate how multi-factor selection behaves on Indian large caps over the sample window; they are not live trading recommendations.

Research objectives

Measure predictive content of each alpha using daily Spearman IC and ICIR (mean IC divided by IC volatility).

Identify alphas stable enough to feed a composite score that ranks stocks each day.

Simulate long–short and quintile portfolios to see whether ranked signals translate into monotonic return patterns.

Compare a linear composite (rolling OLS with orthogonalized inputs) against a walk-forward LightGBM model trained on the same cleaned features.

Universe and market data

The investable set is the Nifty 50 index: liquid names traded on the National Stock Exchange of India. The sample focuses on current index members with sufficient listing history.

Daily adjusted open, high, low, close, and volume span from 2015 onward. Names with fewer than about five hundred trading days are dropped so rolling windows (20-day volatility, 60-day training) are well defined.

Market capitalisation at each rebalance supports size neutralisation. VWAP in volume-sensitive alphas is approximated as the typical price (average of high, low, and close) when exchange turnover detail is unavailable.

How the study is conducted

Step 1 — Panel construction. Prices are organised by trade date and stock. Forward one-day returns are the percentage change from today’s close to the next session’s close, aligned to today for prediction (no look-ahead in labels).

Step 2 — Alpha computation. Fifteen formulaic signals inspired by the WorldQuant Alpha101 library are evaluated in wide matrix form (dates × stocks), then stacked. Examples include short-horizon reversal, rank correlations of open and volume, VWAP distance, and close–open range ratios.

Step 3 — Cross-sectional cleaning. For each date independently: replace infinities; MAD winsorise (median ± 3× scaled MAD); z-score; regress each factor on log market cap and take residuals; z-score again; fill remaining gaps with zero.

Step 4 — Single-factor diagnostics. Every cleaned alpha is correlated with next-day returns across stocks (Spearman IC). Quintile portfolios sort stocks into five equal buckets; cumulative NAV paths and G5−G1 spreads are recorded.

Step 5 — Linear composite. Alphas passing IC screens (or, if none pass, the top three by |ICIR|) enter a 60-day rolling OLS. Inputs are Löwdin-orthogonalised each day; the regression target is the cross-sectional percentile rank of forward return so the model learns relative ordering, not market beta.

Step 6 — Machine-learning overlay (optional). The same cleaned features feed a walk-forward LightGBM regressor with embargo between validation and test windows. Out-of-sample scores become an ML alpha; feature importance and mean |SHAP| values summarise drivers.

How signals are evaluated

Information Coefficient (IC). Each trading day, factor values and realised next-day returns are ranked across stocks; IC is the Spearman correlation of those ranks. A positive IC means higher factor scores tended to precede higher returns.

ICIR. Mean daily IC divided by its standard deviation. Higher |ICIR| suggests a more stable signal; very low ICIR often means noise dominates.

Selection rule. Strict inclusion requires |mean IC| > 2% and |ICIR| > 0.30. On Nifty 50 these thresholds are demanding; when no factor qualifies, the composite uses the three alphas with the largest |ICIR| and |mean IC| ≥ 1%.

Layered backtest. Stocks are sorted into quintiles G1 (lowest factor) through G5 (highest). Each bucket is equal-weighted; the long–short leg is G5 minus G1. Reported metrics: cumulative return, annualised return and volatility, Sharpe ratio, and maximum drawdown — before brokerage, STT, or borrow costs.

Machine-learning extension

The gradient-boosted model uses L1 regression (robust to return outliers), shallow trees, column subsampling, and early stopping on a validation slice inside each walk-forward fold.

Training windows span roughly twelve months; validation three months; test three months; one-day embargo between validation end and test start to limit label leakage from overlapping forward returns.

The ML alpha is evaluated with the same IC and quintile machinery as single-factor alphas. Feature-importance bars show average gain across folds; SHAP summaries compress the beeswarm view into mean absolute contribution per feature on the last test sample.

Limitations and interpretation

Fifty stocks is a narrow cross-section compared with CSI 300 or full NSE universes; IC estimates are noisier and quintile buckets hold fewer names.

Published closing prices may differ slightly from exchange official figures; corporate actions are handled via standard price adjustment.

Industry neutralisation is not applied — only log market cap — because sector classifications are not part of this study.

Backtests are paper portfolios without Indian transaction taxes, stamp duty, or short borrow; live implementation would reduce net spreads.

In-sample factor mining across fifteen correlated alphas increases multiple-testing risk; treat significance as exploratory.

Interactive results (below)

The screening table lists IC mean, IC standard deviation, ICIR, and composite membership for every alpha.

Use the factor tabs to inspect daily IC (bars) and cumulative IC (line), quintile NAV curves, and the full performance table for any signal.

The synthetic factor section shows the rolling OLS composite; the ML alpha section appears when the boost model ran successfully in the latest build.

Empirical results

Interactive diagnostics from the latest study. Time series are downsampled for display; performance tables use the full sample.

Study snapshot

UniverseNifty 5050 stocks with sufficient history
Sample period2015-01-01 to 2026-06-08
Alphas computed15
OLS composite inputsalpha038, alpha101, alpha_5_day_reversal (fallback top icir)
MarketIndia — NSE-listed equities
ML walk-forward39 out-of-sample folds

Factor screening

Cross-sectional Spearman IC diagnostics for each cleaned alpha. Composite membership follows strict IC/ICIR thresholds or a ranked fallback when no factor clears the bar.

FactorIC meanIC stdICIRIn composite
alpha0380.03300.18800.177yes
alpha101-0.02800.1790-0.155yes
alpha_5_day_reversal-0.02600.2000-0.131yes
alpha0410.02200.16900.129
alpha001-0.02000.1670-0.117
alpha0060.01600.15800.100
alpha0420.01800.19400.092
alpha0400.01300.16600.081
alpha0940.01300.17900.076
alpha0020.01000.15300.067
alpha0720.00900.15100.060
alpha0120.00900.15400.057
alpha0880.00600.14800.043
alpha0030.00200.14800.014
alpha098-0.00100.1470-0.006

Single-factor diagnostics

Select an alpha to view its information coefficient path and quintile backtest. G1 is the lowest factor bucket; G5 the highest; L-S is the top-minus-bottom spread.

Information coefficient —

Quintile cumulative NAV —

Rolling linear composite

Multi-alpha score from 60-day rolling OLS on orthogonalised inputs. Training labels are cross-sectional ranks of next-day return.

Information coefficient — composite

Quintile cumulative NAV — composite

BucketCum ReturnAnn ReturnAnn VolSharpeMax DD
G1135.40%8.10%19.50%0.083-55.10%
G2204.80%10.70%18.10%0.232-41.60%
G3732.00%21.30%18.10%0.822-42.90%
G41253.50%26.80%18.30%1.113-36.70%
G5962.70%24.10%18.60%0.945-36.80%
L-S291.80%13.30%14.90%0.456-39.40%

Machine-learning alpha

LightGBM walk-forward model trained on the same cleaned alphas. Out-of-sample predictions are ranked and backtested like single-factor signals.

IC mean 0.0220 · ICIR 0.133 · 39 folds

Information coefficient — ML alpha

Quintile cumulative NAV — ML alpha

BucketCum ReturnAnn ReturnAnn VolSharpeMax DD
G1155.60%10.10%18.30%0.197-46.10%
G2568.10%21.50%17.60%0.854-35.60%
G3610.40%22.30%17.90%0.879-41.10%
G4643.00%22.80%18.40%0.887-41.40%
G5690.00%23.60%19.60%0.875-44.60%
L-S187.80%11.50%14.10%0.350-36.60%

Feature importance (mean gain across folds)

SHAP contribution (mean absolute value)

QuantifiedTrader logoQuantifiedTrader

Independent quantitative research on trading methods, backtesting, and market analytics.

Research disclaimer

QuantifiedTrader is operated by an independent quantitative research group. We study, document, and compare different methods of trading, portfolio construction, risk management, and investment analysis. Our work is exploratory and academic in nature—we build tools, run backtests, and publish findings to advance understanding, not to promote any particular strategy or product.

Not investment advice. Nothing on this website constitutes investment, trading, financial, tax, legal, or other professional advice. We do not recommend, endorse, or solicit the purchase or sale of any security, derivative, or financial instrument, nor do we suggest that any strategy, model, or result presented here is suitable for any individual or institution. Any examples, simulations, or performance figures are illustrative research outputs only.

No client or advisory relationship. We do not provide investment advisory, brokerage, portfolio-management, custody, or asset-management services to any person or entity. Browsing this site, using our tools, or contacting us does not create a client, fiduciary, or advisory relationship. We do not manage money on behalf of third parties and do not act as agents for any financial institution.

Research & education only. Content, datasets, backtests, charts, code, and software made available here are for informational and educational research. Materials may be incomplete, simulated, hypothetical, or derived from third-party sources that we do not control. Past performance, backtested results, and historical analyses are not indicative of future results. Market conditions change; models may fail; assumptions may be wrong. You are solely responsible for evaluating any information and for all decisions you make.

No responsibility or liability. To the fullest extent permitted by applicable law, QuantifiedTrader and its contributors disclaim all responsibility and liability for any loss, damage, cost, or expense—direct or indirect—arising from access to, use of, or reliance on this website, its content, or its tools. All materials are provided “as is” and “as available,” without warranties of any kind, whether express or implied, including but not limited to accuracy, completeness, fitness for a particular purpose, or non-infringement.

Non-commercial research sharing. This site does not aim to profit from the knowledge, tools, or datasets published here. Materials are shared for non-commercial research and learning, subject to applicable open-source or site terms where noted. We are a research collective, not a commercial product or service provider.

Contact. For questions about this notice, the site, or published research materials, contact support@quantedx.com. Correspondence is for administrative and research purposes only and does not constitute advice or create any professional obligation on our part.

© 2026 QuantifiedTrader. All rights reserved.