Skip to content

fcistud/Northstar

Repository files navigation

Northstar

✦ Northstar

Autonomous Data Investigation Platform

Ask a business question. Get an executive memo, grounded claims, charts, reproducible notebook — and know exactly what you can trust.

Quick StartWhy NorthstarHow It WorksArchitectureLLM IntegrationDemo


The Problem

Modern data tools force a painful tradeoff:

Approach Autonomy Trust Artifacts
Chat-with-data (ChatGPT, etc.) ✓ Writes SQL/code ✗ No verification ✗ Ephemeral
Notebook copilots (Copilot, etc.) ✗ You do the work ✗ No claims audit ✓ Reproducible
AutoML tools (H2O, DataRobot) ✓ Trains models ✗ Black-box ✗ Tables only
BI dashboards (Tableau, Looker) ✗ Manual setup ✗ No investigation ✓ Charts

None of them investigate. They answer the question you knew to ask — they don't explore, verify, caveat, and deliver like a senior data analyst would.

Why Northstar

Northstar is an autonomous investigation agent that behaves like a senior analyst:

  1. Profiles your data — schemas, quality signals, null rates, date columns
  2. Infers context — entities, measures, dimensions, candidate joins across sources
  3. Builds a plan — bounded analysis steps with risk flags (optionally LLM-reasoned)
  4. Executes autonomously — trend analysis, segmentation, chart generation, optional ML modeling
  5. Verifies its own work — checks artifact coverage, evidence links, claim confidence, surfaces caveats
  6. Delivers polished outputs — executive memo, technical appendix, reproducible notebook, charts, grounded Q&A

What makes it different

Capability Others Northstar
Multi-step investigation ✗ Single-turn ✓ 7-stage pipeline
Evidence-grounded claims ✗ Hallucination risk ✓ Every claim linked to artifacts
Confidence levels ✗ No uncertainty ✓ Supported / Directional / Observational
Built-in verification ✗ Trust the output ✓ Caveats, risk flags, null warnings
Mixed-source reasoning ✗ One table at a time ✓ CSV + text memos + joins
Executive-ready reports ✗ Raw tables or chat ✓ Polished memo + appendix
Reproducible artifacts ✗ Ephemeral ✓ Charts, CSVs, Jupyter notebook
Follow-up Q&A ✗ Loses context ✓ Grounded in investigation artifacts

Quick Start

# Clone
git clone https://github.com/YOUR_USERNAME/Northstar.git
cd Northstar

# Install
pip install -r requirements.txt

# Run
MPLBACKEND=Agg python -m uvicorn app.main:app --port 8000

# Open http://localhost:8000

With AI-powered reasoning (optional)

Set any one of these API keys for LLM-enhanced analysis:

# NVIDIA Nemotron (recommended — free tier available)
export NEMOTRON_API_KEY=nvapi-...

# Or OpenAI
export OPENAI_API_KEY=sk-...

# Or Anthropic
export ANTHROPIC_API_KEY=sk-ant-...

Without an API key, Northstar runs its full pipeline using deterministic analysis — still produces all artifacts, claims, and reports.


How It Works

┌─────────────────────────────────────────────────────────────┐
│                     User Question                           │
│  "Why did revenue per customer dip in February?"            │
└─────────────┬───────────────────────────────────────────────┘
              ▼
┌─────────────────────────┐
│   1. INTAKE             │  Register data sources (CSV, Excel, text)
│   2. PROFILE            │  Schema, quality, null rates, date detection
│   3. SEMANTIC CONTEXT   │  Entities, measures, dimensions, joins
│   4. PLAN               │  Bounded steps + risk flags (+ LLM reasoning)
│   5. EXECUTE            │  Trend analysis, segmentation, charts, ML
│   6. VERIFY             │  Evidence audit, claim quality, caveats
│   7. DELIVER            │  Memo, appendix, notebook, summary
└─────────────┬───────────┘
              ▼
┌─────────────────────────────────────────────────────────────┐
│  Outputs:                                                    │
│  📄 Executive Memo       📊 Charts (PNG)                     │
│  📑 Technical Appendix   📋 Data Tables (CSV)                │
│  📓 Jupyter Notebook     💬 Grounded Follow-up Q&A           │
│  🔍 Evidence-linked claims with confidence levels            │
│  ⚠️  Caveats & limitations                                   │
└─────────────────────────────────────────────────────────────┘

Architecture

Northstar follows a modular monolith pattern — easy to deploy, easy to understand:

app/
├── main.py                    # FastAPI application + SPA serving
├── api/routes.py              # REST API endpoints
├── agents/orchestrator.py     # 7-stage pipeline orchestrator
├── services/
│   ├── profiler_service.py    # Data profiling & quality checks
│   ├── semantic_context_service.py  # Entity/measure/join inference
│   ├── planner_service.py     # Analysis plan generation
│   ├── execution_service.py   # Autonomous analysis execution
│   ├── verification_service.py # Claim & evidence verification
│   ├── narrative_service.py   # Report generation (memo + appendix)
│   ├── qa_service.py          # Grounded follow-up Q&A
│   ├── llm_service.py         # LLM abstraction (Nemotron/OpenAI/Anthropic)
│   └── model_service.py       # Optional ML modeling
├── schemas.py                 # Pydantic data models
└── config.py                  # Configuration
web/
├── index.html                 # SPA shell
├── app.js                     # Full SPA (landing + dashboard + results)
└── styles.css                 # Premium dark-mode design system

Design Principles

  • Read-only — Northstar never modifies your data
  • Bounded execution — Analysis plan has fixed steps, no unbounded loops
  • Evidence-grounded claims — Every finding links to specific artifacts
  • Conservative language — Findings rated Supported/Directional/Observational
  • Caveat-first — Limitations surfaced, not hidden
  • Reproducible — Every artifact saved, notebook re-runnable

LLM Integration

Northstar has a pluggable LLM layer that enhances (but is never required for) its pipeline:

Feature Without LLM With LLM
Plan reasoning Deterministic steps AI-reasoned strategy based on data + question
Executive memo Structured template Polished, context-aware narrative
Follow-up Q&A Category-based responses Natural language, evidence-grounded answers
Insight synthesis Quantitative claims Connected, interpreted findings

The LLM is called through a unified interface (llm_service.py) that:

  • Auto-detects available provider (Nemotron → OpenAI → Anthropic)
  • Implements timeout and error handling
  • Falls back gracefully to deterministic analysis on failure

Demo

Sample Data Included

Northstar ships with sample data for an e-commerce revenue investigation:

  • orders.csv — Transaction data with dates, amounts, channels
  • customers.csv — Customer segments and join dates
  • support_tickets.csv — Support volume and resolution times
  • pricing_memo.txt — Text memo about a February pricing change

Running the Demo

  1. Start the server: MPLBACKEND=Agg python -m uvicorn app.main:app --port 8000
  2. Open http://localhost:8000
  3. Click Launch AppRun Sample Demo
  4. Watch the 7-stage pipeline execute
  5. Explore findings, artifacts, and ask follow-up questions

Project Structure

Northstar/
├── app/                  # Backend (Python/FastAPI)
├── web/                  # Frontend (SPA)
├── docs/                 # Documentation (GitHub Pages)
├── sample_data/          # Demo datasets
├── tests/                # Test suite
├── scripts/              # Utility scripts
├── requirements.txt      # Python dependencies
└── Dockerfile            # Container support

Testing

pip install -r requirements-dev.txt
python -m pytest tests/ -v

License

MIT


✦ Northstar — Because your data deserves an autonomous analyst.

About

Ask a business question. Get an executive memo, grounded claims, charts, reproducible notebook

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors