Cross-section shrinkage lab (ETF panel)

Overview

This project studies how much dimensionality is needed to explain cross-sectional ETF variation without overfitting. It combines principal-component structure, ridge shrinkage, and out-of-sample style checks to evaluate signal stability.

Methodology

Build an ETF cross-section panel and standardize features to a consistent scale.

Estimate principal components and inspect explained-variance concentration to identify effective rank.

Measure cross-sectional fit quality as factor count K increases (R² vs K curves).

Run pseudo out-of-sample folds and ridge regularization sweeps to test robustness under shrinkage.

Interpretation Guide

If R² saturates quickly as K rises, the signal is low-dimensional and easier to regularize.

If ridge penalties stabilize fold-level performance, estimates are likely less sensitive to sampling noise.

Use the interactive charts on this page to compare in-sample fit against pseudo-OOS behavior before selecting production settings.

Interactive concept charts

Charts below show PCA spectrum, cross-sectional R² vs factor count (K), ridge shrinkage profiles, pseudo-OOS fold diagnostics, and per-ETF summary statistics from the Yahoo Finance panel.

Refresh the underlying JSON with npm run data:shrinking-cross-section (requires Python and yfinance).

Live Yahoo Finance panel. Daily adjusted closes from Yahoo Finance; excess returns use ^IRX as a simple short-rate proxy. Metrics are computed in this repository for a liquid ETF sleeve and are intended as a transparent lab snapshot.Updated: 2026-06-29T19:39:47ZWindow: 2013-07-19 → 2026-06-29Days × assets: 3254 × 20CV fold chart uses K = 10 PCsPanel: SPY, QQQ, IWM, VTV, VUG, MTUM, QUAL, USMV, EFA, EEM, VEA, TLT, LQD, HYG, VNQ, GLD, XLE, XLF, XLK, XLV

What this shows: Model-complexity trade-off on the ETF panel: in-sample cross-sectional R² vs K principal directions, mean pseudo-OOS R² across one-year-ahead folds, and a finite-sample-shrunk OOS curve.

How to read it: K is the number of PC columns from the full-sample covariance used for the in-sample curve; OOS curves refit eigenvectors each fold on training data only, then map train gammas to the next calendar year’s realized mean excess returns.

Data and Reproducibility

The page renders from generated JSON concept charts. Re-run the data pipeline to refresh diagnostics and keep conclusions synchronized with current market history.