Diversified Stock Portfolio Using Clustering Analysis

correlation beta returns volatility Sharpe ratio backtesting

Overview

This project constructs diversified stock portfolios from the S&P 500 using unsupervised learning. Historical price data is used to compute risk and return features for each stock; K-means clustering groups stocks with similar behavior. Portfolio construction then selects top stocks by Sharpe ratio from each cluster to achieve diversification across clusters and validate performance via backtesting against the S&P 500 index.

Data & Features

Data sources: S&P 500 constituent list and historical stock prices (US equity data fetched via the pipeline). The first 70% of the history is used for model building; the remaining 30% is reserved for validation.

Features used for clustering (all derived from historical data):

Clustering Features

Correlation with S&P 500 index (price correlation).

Beta: sensitivity of stock returns to index returns.

Annualized return (from daily returns, 252-day year).

Annualized volatility (standard deviation of daily returns, annualized).

Sharpe ratio (annualized return / annualized volatility).

Daily change in price (open-to-close) and daily variation (high-to-low), annualized.

Features are scaled (z-score) before clustering. The optimal number of clusters is chosen using the within-cluster sum of squares (elbow method); we use K = 4.

Clustering

K-means is run on the normalized feature matrix with multiple random starts. Cluster membership ensures the portfolio spans different behavior groups rather than concentrating in one segment of the risk/return space.

Portfolio Construction

Two portfolio variants are built:

(1) Diversified by cluster: Within each cluster, stocks are ranked by Sharpe ratio; the top 5 from each of the 4 clusters form a 20-stock portfolio (equal weight per stock).

(2) Top-20 by Sharpe: The top 20 stocks by Sharpe ratio across the full universe, without cluster constraints.

Both are equal-weighted. The diversified-by-cluster portfolio reduces concentration in a single risk/return profile.

Validation & Backtesting

Validation uses the holdout period (last 30% of the data). Daily returns are computed for the portfolio (equal-weighted average of constituent returns) and for the S&P 500 index.

Cumulative returns are plotted for the cluster-based portfolio, the top-20 Sharpe portfolio, and the S&P 500 to compare risk-adjusted performance.

Results

Summary

Stocks used: 196
Clusters: 4
Train period: 2021-06-072024-12-02
Validation period: 2024-12-032026-06-05
Computed: 6/8/2026, 12:58:20 PM

Cluster statistics

ClusterCountMean returnMean volMean SharpeMean beta
1968.30%25.70%0.340.79
244-5.80%30.60%-0.180.79
33732.30%31.70%1.031.03
419-2.00%53.80%-0.031.72

Elbow plot: within-cluster sum of squares

What this shows: Within-cluster sum of squares (WSS) by candidate cluster count K.

How to read it: Look for the elbow where WSS improvement starts flattening; that point suggests a practical K before diminishing returns.

Return vs volatility by cluster

Training-sample K-means assignment (K = 4). Color is cluster; hover for ticker.

Cluster-wise metrics

Mean annualized return, volatility, and Sharpe ratio by cluster.

Mean return

Mean volatility

Mean Sharpe ratio

Feature correlation matrix

Correlation between clustering features (used for K-means). Helps check redundancy.

ann_returnann_volann_sharpe…ann_daily_…ann_daily_…betacor
ann_return1.00-0.060.94-0.69-0.040.010.67
ann_vol-0.061.00-0.200.200.980.82-0.17
ann_sharpe…0.94-0.201.00-0.68-0.18-0.090.70
ann_daily_…-0.690.20-0.681.000.230.04-0.62
ann_daily_…-0.040.98-0.180.231.000.79-0.18
beta0.010.82-0.090.040.791.000.13
cor0.67-0.170.70-0.62-0.180.131.00

Portfolios

By cluster: top 5 by Sharpe in each of 4 clusters (equal weight).
Top 20: top 20 stocks by Sharpe ratio overall (equal weight).

Portfolio by cluster (symbols)

CBOE, BRK-B, CB, ATO, ADP, CTRA, CF, DVN, CHRW, SCHW, CEG, FICO, CTAS, COST, COR, EQT, AMD, CRWD, DDOG, AMAT

Top 20 by Sharpe (symbols)

CEG, FICO, CTAS, COST, COR, ANET, AVGO, ACGL, AJG, ETN, BLDR, AXON, DECK, APO, AZO, AFL, CAH, BSX, BRO, APH

Validation: cumulative returns

Out-of-sample wealth paths (indexed to 1 at validation start) for the cluster portfolio, top-20 Sharpe basket, and S&P 500. Period: 2024-12-03 → 2026-06-05.

QuantifiedTrader logoQuantifiedTrader

Independent quantitative research on trading methods, backtesting, and market analytics.

Research disclaimer

QuantifiedTrader is operated by an independent quantitative research group. We study, document, and compare different methods of trading, portfolio construction, risk management, and investment analysis. Our work is exploratory and academic in nature—we build tools, run backtests, and publish findings to advance understanding, not to promote any particular strategy or product.

Not investment advice. Nothing on this website constitutes investment, trading, financial, tax, legal, or other professional advice. We do not recommend, endorse, or solicit the purchase or sale of any security, derivative, or financial instrument, nor do we suggest that any strategy, model, or result presented here is suitable for any individual or institution. Any examples, simulations, or performance figures are illustrative research outputs only.

No client or advisory relationship. We do not provide investment advisory, brokerage, portfolio-management, custody, or asset-management services to any person or entity. Browsing this site, using our tools, or contacting us does not create a client, fiduciary, or advisory relationship. We do not manage money on behalf of third parties and do not act as agents for any financial institution.

Research & education only. Content, datasets, backtests, charts, code, and software made available here are for informational and educational research. Materials may be incomplete, simulated, hypothetical, or derived from third-party sources that we do not control. Past performance, backtested results, and historical analyses are not indicative of future results. Market conditions change; models may fail; assumptions may be wrong. You are solely responsible for evaluating any information and for all decisions you make.

No responsibility or liability. To the fullest extent permitted by applicable law, QuantifiedTrader and its contributors disclaim all responsibility and liability for any loss, damage, cost, or expense—direct or indirect—arising from access to, use of, or reliance on this website, its content, or its tools. All materials are provided “as is” and “as available,” without warranties of any kind, whether express or implied, including but not limited to accuracy, completeness, fitness for a particular purpose, or non-infringement.

Non-commercial research sharing. This site does not aim to profit from the knowledge, tools, or datasets published here. Materials are shared for non-commercial research and learning, subject to applicable open-source or site terms where noted. We are a research collective, not a commercial product or service provider.

Contact. For questions about this notice, the site, or published research materials, contact support@quantedx.com. Correspondence is for administrative and research purposes only and does not constitute advice or create any professional obligation on our part.

© 2026 QuantifiedTrader. All rights reserved.