Skip to content

BK5102/FinBrief

Repository files navigation

FinBrief

A one-glance morning brief for your stock portfolio.

FinBrief is a personal-use dashboard that ingests a user-defined stock portfolio, pulls daily financial news per ticker, scores each headline with FinBERT, and surfaces a single urgency signal that tells you where to look first. It replaces 30+ minutes of manual scanning across financial news sites with a one-glance morning brief.

Example signal: "3 of your 8 holdings have negative sentiment spikes today — here's why."


Target User

A single retail investor (the builder) holding 5 to 15 equities, checking the dashboard once per day.


Goals

Product Goals

  • Portfolio input — add, edit, and remove tickers via a simple UI; holdings persist locally across sessions.
  • Daily news collection — for each held ticker, pull the day's relevant English-language financial headlines from at least two free news sources for redundancy.
  • Per-ticker sentiment scoring — every collected headline scored by FinBERT into {positive, neutral, negative} with a confidence weight; ticker-level daily score aggregates these.
  • Urgency signal — front-page banner naming how many holdings show a negative sentiment spike today and linking to the headlines responsible.
  • Drill-down view — click any ticker to see today's headlines, their FinBERT labels, source, and timestamp.

Success Criteria

  • Dashboard correctly displays sentiment for at least 10 user-supplied tickers across 7 consecutive days without manual intervention.
  • Urgency banner has actionable precision: a manual spot-check of 20 flagged "negative spike" days shows that at least 70% correspond to genuinely negative news (not false positives from neutral-but-volatile language).

Non-Goals

  • Financial advice — no buy/sell/hold recommendations. Sentiment is presented as raw signal, not guidance.
  • Price prediction — no forecasting models. Sentiment is not used to predict future returns.
  • Real-time streaming — headlines update daily, not by minute or second. Day traders are not the target.

Build Plan

Phase Theme Output Exit Criteria
1 Data & Model Spike CLI script that, given a ticker list, returns a JSON blob of today's headlines and FinBERT scores. Pipeline runs end-to-end for 5 tickers in under 2 minutes; sentiment labels manually sanity-checked on 30 sample headlines.
2 Persistence & Aggregation SQLite database storing tickers, headlines, scores, and daily aggregates. Scheduled job runs the pipeline once daily. 7 consecutive days of historical data successfully captured and queryable; urgency-spike logic defined and validated.
3 Dashboard UI Web dashboard (FastAPI + lightweight frontend) with portfolio input, urgency banner, ticker grid, and headline drill-down. All four views render correctly with live data; portfolio edits persist; page loads in under 1 second from cached data.
4 Hardening & Polish Dockerized deployment, error handling for API/network failures, basic observability (logs + run history), README. App runs unattended for 7 days with no manual recovery; documented setup gets a fresh machine to first run in under 15 minutes.

Phase 1 — Data & Model Spike (Week 1)

Goal: prove the riskiest pieces work before building anything else.

Scope

  • Select news source(s): evaluate Yahoo Finance RSS, Finnhub free tier, and NewsAPI free tier. Choose 2 for redundancy.
  • Build a fetcher that takes a ticker and returns the day's headlines (title, summary, url, source, published_at).
  • Load FinBERT (ProsusAI/finbert) via Hugging Face transformers; build a scorer that takes a list of headlines and returns labels + confidence scores.
  • Compose into a CLI: python pipeline.py --tickers AAPL,MSFT,NVDA → prints JSON.
  • Sanity-check: hand-label 30 random headlines and compare to FinBERT output; document failure modes.

Risks Addressed

  • News API rate limits or coverage gaps for less-popular tickers.
  • FinBERT inference latency on CPU; decide here whether GPU/quantization is needed.
  • FinBERT label drift on non-headline text (e.g., press release boilerplate).

Phase 2 — Persistence & Aggregation (Week 2)

Goal: turn the one-shot script into a system that accumulates history.

Scope

  • Define schema: tickers, headlines, scores, daily_aggregates (one row per ticker per day).
  • Migrate the Phase 1 pipeline to write into SQLite instead of stdout.
  • Aggregation logic: ticker-day score = weighted average of headline scores (weight = FinBERT confidence).
  • Urgency-spike definition: a ticker has a "negative spike" today if its daily score drops more than 1.5 standard deviations below its 14-day rolling mean AND at least 2 headlines are labeled negative with confidence ≥ 0.7. Tune thresholds against captured history.
  • Scheduler: cron (or APScheduler) running the pipeline at 07:00 local.
  • Backfill: ingest 7 days of history to seed the rolling baseline.

Phase 3 — Dashboard UI (Week 3)

Goal: make the data usable in 10 seconds per morning.

Scope

  • Backend: FastAPI service exposing /portfolio, /summary, /ticker/{symbol}.
  • Frontend: Jinja templates + HTMX (or a minimal React app — decide by Tuesday). Tailwind for styling.
  • Views:
    1. Portfolio editor
    2. Home with urgency banner + ticker grid
    3. Ticker detail with today's headlines and 14-day score chart
  • Color/label coding: green / neutral / red badges per ticker; banner styling escalates with spike count.
  • "Why?" link on each negative-spike ticker jumps to the headlines responsible for the drop.

Phase 4 — Hardening & Polish (Week 4)

Goal: take it from "works on my machine in dev mode" to "runs unattended for a week."

Scope

  • Dockerize: one Dockerfile, one docker-compose for app + scheduler + volume-mounted SQLite.
  • Error handling: retries with backoff on news API failures; graceful degradation if one source is down; alerting (log file flag) on full pipeline failure.
  • Observability: structured logs per pipeline run (tickers attempted, articles fetched, articles scored, duration).
  • Configuration: .env file for API keys, refresh time, urgency thresholds.
  • README: setup, running, troubleshooting, and a short "how the urgency signal works" explainer.
  • Manual 7-day run: deploy, walk away, return after a week, confirm nothing crashed.

Tech Stack

Layer Choice
Language Python 3.12+
ML Model FinBERT (ProsusAI/finbert) via Hugging Face transformers
News Sources yfinance primary, Yahoo Finance RSS backup, Finnhub optional when FINNHUB_API_KEY is set
Storage SQLite
Scheduling cron / APScheduler
Backend FastAPI
Frontend Jinja + HTMX + Tailwind (or minimal React)
Deployment Docker + docker-compose

Current Implementation

Phase 1 has a working CLI pipeline:

  • Fetches today's headlines per ticker from yfinance, Yahoo Finance RSS, and Finnhub when configured.
  • Deduplicates articles by URL.
  • Scores headline text with ProsusAI/finbert.
  • Emits JSON grouped by ticker with source, timestamp, sentiment label, confidence, and full class probabilities.

Local Quickstart

python -m venv .venv
.venv\Scripts\python.exe -m pip install --upgrade pip
.venv\Scripts\python.exe -m pip install -r requirements.txt
Copy-Item .env.example .env

Edit .env and set FINNHUB_API_KEY if you want Finnhub and historical backfill.

Seed a local portfolio:

$env:PYTHONPATH = "src"
.venv\Scripts\python.exe scripts\portfolio.py --db data\finbrief.db set AAPL,MSFT,NVDA,JPM,TSLA

Backfill enough history for spike detection:

$env:PYTHONPATH = "src"
.venv\Scripts\python.exe scripts\backfill_finnhub.py --db data\finbrief.db --days 7

Run the dashboard:

.venv\Scripts\python.exe -m uvicorn finbrief.app:app --app-dir src --host 127.0.0.1 --port 8780

Open http://127.0.0.1:8780/.

Main entrypoint:

$env:PYTHONPATH = "src"
.venv\Scripts\python.exe -m finbrief.pipeline --tickers AAPL,MSFT,NVDA --pretty --out today.json

Persist a run to SQLite:

$env:PYTHONPATH = "src"
.venv\Scripts\python.exe -m finbrief.pipeline --tickers AAPL,MSFT,NVDA --db data\finbrief.db --pretty --out today.json

Manage the local portfolio:

$env:PYTHONPATH = "src"
.venv\Scripts\python.exe scripts\portfolio.py --db data\finbrief.db set AAPL,MSFT,NVDA,JPM,TSLA
.venv\Scripts\python.exe scripts\portfolio.py --db data\finbrief.db list

After a portfolio is stored, the pipeline can read active tickers from SQLite:

$env:PYTHONPATH = "src"
.venv\Scripts\python.exe -m finbrief.pipeline --db data\finbrief.db --pretty --out data\latest_run.json

Inspect persisted data:

$env:PYTHONPATH = "src"
.venv\Scripts\python.exe scripts\inspect_db.py --db data\finbrief.db

Clean duplicate or weakly related existing headlines:

.venv\Scripts\python.exe scripts\clean_duplicate_headlines.py --db data\finbrief.db --drop-irrelevant

Run lightweight smoke tests:

.venv\Scripts\python.exe scripts\smoke_test.py --db data\finbrief.db --base-url http://127.0.0.1:8783

Run the FastAPI dashboard/API:

.venv\Scripts\python.exe -m uvicorn finbrief.app:app --app-dir src --host 127.0.0.1 --port 8780

Local endpoints:

  • http://127.0.0.1:8780/ — dashboard page with urgency banner, portfolio editor, and ticker grid
  • http://127.0.0.1:8780/ticker/NVDA/view — ticker drill-down page with 14-day chart and headlines
  • POST http://127.0.0.1:8780/refresh — start a manual background refresh

The dashboard Run Refresh button starts the background refresh without leaving the page, polls status, and reloads the dashboard when the run completes.

Developer JSON endpoints (/summary, /ticker/{symbol}, and /refresh/status) remain available for smoke tests and future integrations, but they are intentionally not exposed as dashboard links.

Run the daily pipeline once using the active SQLite portfolio:

.venv\Scripts\python.exe scripts\daily_run.py --db data\finbrief.db

Keep a local scheduler process alive for a 07:00 daily run:

.venv\Scripts\python.exe scripts\schedule_daily.py --db data\finbrief.db --time 07:00

Backfill 7 days from Finnhub:

$env:PYTHONPATH = "src"
.venv\Scripts\python.exe scripts\backfill_finnhub.py --db data\finbrief.db --days 7

Optional Finnhub setup:

Copy-Item .env.example .env
# Then edit .env and set FINNHUB_API_KEY=...

Docker Quickstart

Requires Docker Desktop (or Docker Engine + Compose plugin).

Copy-Item .env.example .env
# Edit .env and set FINNHUB_API_KEY if you have one
docker compose build         # first build takes ~10 min (torch download)
docker compose up -d

Open http://localhost:8780/.

On first startup the scheduler container will download the FinBERT model (~440 MB) into the shared hf_cache volume. Subsequent restarts use the cache and start in seconds.

Seed a portfolio from the host once the app is healthy:

docker compose exec app python scripts/portfolio.py --db /app/data/finbrief.db set AAPL,MSFT,NVDA,JPM,TSLA

Backfill 7 days of history from Finnhub (requires FINNHUB_API_KEY in .env):

docker compose exec app python scripts/backfill_finnhub.py --db /app/data/finbrief.db --days 7

The scheduler container runs the pipeline automatically every day at REFRESH_TIME (default 07:00 local). Both containers share the same finbrief_data volume so all writes are durable.

To stop and preserve data:

docker compose down   # containers removed; named volumes kept

How the Urgency Signal Works

Each morning the pipeline fetches today's headlines for every ticker in your portfolio and scores each one with FinBERT, which outputs a positive / neutral / negative label plus a confidence score.

Ticker-day score — a single number between −1.0 and +1.0:

score = sum(sentiment_value × confidence) / sum(confidence)

where positive = +1.0, neutral = 0.0, negative = −1.0. High-confidence headlines pull the score further from zero.

Negative spike detection — a ticker is flagged when both conditions hold today:

  1. Today's score falls more than SPIKE_SIGMA (default 1.5) standard deviations below its 14-day rolling mean.
  2. At least MIN_NEG_HEADLINES (default 2) headlines are labeled negative with confidence ≥ 0.7.

The first condition requires an unusual drop relative to that ticker's own recent history; the second guards against a single noisy headline tripping the signal. Both thresholds are tunable in .env.

The dashboard banner shows how many holdings are spiking and links directly to the high-confidence negative headlines responsible.


Troubleshooting

"No data" or empty ticker cards after a refresh The pipeline only stores aggregates for dates where at least one headline is fetched and scored. If all sources return zero headlines for a ticker on a given day (common on weekends or market holidays), that day simply has no aggregate row. The 14-day chart on the drill-down page will have gaps.

Urgency banner never fires / always fires Tune SPIKE_SIGMA and MIN_NEG_HEADLINES in .env. Raising SPIKE_SIGMA makes the signal harder to trigger; lowering MIN_NEG_HEADLINES to 1 makes it easier. You need at least 2 prior aggregate days before the rolling-mean logic can produce any spike at all — run a 7-day Finnhub backfill first.

FinBERT model download is very slow or fails The model (~440 MB) is fetched from Hugging Face on first run. Set HF_HUB_OFFLINE=1 after the first successful download to prevent any further network calls to HF. If the download fails mid-way, delete ~/.cache/huggingface/hub/models--ProsusAI--finbert and re-run.

Yahoo RSS returns 0 headlines This is expected intermittently — per-ticker Yahoo RSS feeds are unreliable (see Phase 1 notes). yfinance is the primary source. Add a Finnhub key for best coverage.

Port 8780 already in use Change the host port mapping in docker-compose.yml: "8781:8780" maps container port 8780 to host port 8781.

Container exits immediately with a torch / CUDA error FinBERT runs on CPU by default. CUDA is not required. If you see a CUDA-related import error, ensure torch>=2.2.0 was installed without the CUDA extras (the requirements.txt installs the CPU-only wheel).


Status

  • Phase 1 complete — CLI pipeline, multi-source fetchers (yfinance primary, Yahoo RSS, Finnhub optional), FinBERT scoring.
  • Phase 2 complete — SQLite persistence, daily aggregates, negative-spike logic, Finnhub backfill, portfolio and inspection scripts, daily-run and scheduler scripts.
  • Phase 3 complete — FastAPI dashboard with urgency banner, portfolio add/remove controls, ticker drill-down with 14-day chart, background refresh with browser-side polling.
  • Phase 4 in progress — retry/backoff on fetchers, env-driven configuration (SPIKE_SIGMA, MIN_NEG_HEADLINES, REFRESH_TIME), Docker deployment.

About

A personal-use dashboard that ingests a user-defined stock portfolio.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors