US equity sentiment tracker using VADER lexical analysis on financial news headlines. Tracks prices, sentiment scores, and correlations for S&P 500 stocks.
The Stock Sentiment Tracker collects financial news headlines and summaries for major US equities daily from multiple sources, scores each article using VADER (Valence Aware Dictionary and sEntiment Reasoner), and aggregates scores per ticker. The result is a daily sentiment signal for each stock — ranging from -1 (most negative) to +1 (most positive) — alongside end-of-day price data. The tracker covers S&P 500 stocks and is updated automatically via the pipeline.
News sources (all optional, configured via API keys): MarketAux (global financial news with entity sentiment), Finnhub (market and company news), Stock News API (ticker-specific news), NewsAPI (top business headlines), Yahoo Finance (US market news, no API key required).
Price data: End-of-day closing prices, 1-day, 1-month, 3-month, and 6-month price changes for each tracked ticker.
All news is collected at the headline and summary level only — no full article text is stored. Only the aggregated sentiment report is persisted.
VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool specifically designed for social media and short financial text. It assigns a compound score from -1 (most negative) to +1 (most positive) based on a curated dictionary of words with sentiment valence.
Why VADER for finance: It handles financial jargon, negations, intensifiers, and punctuation patterns common in news headlines. It requires no training data and runs in real time, making it suitable for daily pipeline execution.
Aggregation: For each ticker, the compound VADER score is averaged across all articles collected in the current window (1-month, 3-month, 6-month). Only articles mentioning the ticker's company name or symbol are included.
The sentiment pipeline runs daily via GitHub Actions. Steps: (1) Fetch news from all configured sources, (2) Filter articles by ticker relevance, (3) Score each article with VADER, (4) Aggregate scores per ticker per time window, (5) Fetch end-of-day prices, (6) Write report to data/sentiment-tracker/.
Output files: sentiment.json (per-ticker sentiment scores), prices.json (end-of-day prices and momentum), news.json (recent headlines), sentiment-history.json (day-by-day sentiment per ticker), report-meta.json (pipeline metadata).
No raw article text is stored — only the aggregated scores and metadata. This keeps the data footprint small and avoids licensing issues.
Per-ticker metrics: Current sentiment score (lexical VADER compound), 1M/3M/6M sentiment averages, current price, 1-day/1M/3M/6M price change.
Cross-sectional analysis: Sentiment vs return scatter (6M sentiment vs 6M price change), cross-sectional correlation between sentiment and returns across 1M/3M/6M windows, return distribution histograms by period.
Time-series analysis: Sentiment history chart (day-by-day lexical sentiment per ticker), autocorrelation of sentiment series (lag-1 Pearson correlation), price return autocorrelation by period.
Active stocks table: Tickers with non-zero sentiment, sorted by score, with price and momentum columns. Paginated for readability.
VADER is a lexical model — it does not understand context, sarcasm, or domain-specific financial language beyond its dictionary. Scores should be treated as a noisy signal, not a precise measure.
News coverage is uneven: large-cap stocks have more articles and more stable sentiment estimates; small-caps may have sparse coverage.
Sentiment is a lagging indicator in many cases — news often reflects events that have already moved prices. The correlation between sentiment and future returns is weak and varies by time window.
API keys are required for most news sources. Without keys, only Yahoo Finance (no key required) is used, which limits coverage.