✦ Northstar

Autonomous Data Investigation Platform

Ask a business question. Get an executive memo, grounded claims, charts, reproducible notebook — and know exactly what you can trust.

Quick Start • Why Northstar • How It Works • Architecture • LLM Integration • Demo

The Problem

Modern data tools force a painful tradeoff:

Approach	Autonomy	Trust	Artifacts
Chat-with-data (ChatGPT, etc.)	✓ Writes SQL/code	✗ No verification	✗ Ephemeral
Notebook copilots (Copilot, etc.)	✗ You do the work	✗ No claims audit	✓ Reproducible
AutoML tools (H2O, DataRobot)	✓ Trains models	✗ Black-box	✗ Tables only
BI dashboards (Tableau, Looker)	✗ Manual setup	✗ No investigation	✓ Charts

None of them investigate. They answer the question you knew to ask — they don't explore, verify, caveat, and deliver like a senior data analyst would.

Why Northstar

Northstar is an autonomous investigation agent that behaves like a senior analyst:

Profiles your data — schemas, quality signals, null rates, date columns
Infers context — entities, measures, dimensions, candidate joins across sources
Builds a plan — bounded analysis steps with risk flags (optionally LLM-reasoned)
Executes autonomously — trend analysis, segmentation, chart generation, optional ML modeling
Verifies its own work — checks artifact coverage, evidence links, claim confidence, surfaces caveats
Delivers polished outputs — executive memo, technical appendix, reproducible notebook, charts, grounded Q&A

What makes it different

Capability	Others	Northstar
Multi-step investigation	✗ Single-turn	✓ 7-stage pipeline
Evidence-grounded claims	✗ Hallucination risk	✓ Every claim linked to artifacts
Confidence levels	✗ No uncertainty	✓ Supported / Directional / Observational
Built-in verification	✗ Trust the output	✓ Caveats, risk flags, null warnings
Mixed-source reasoning	✗ One table at a time	✓ CSV + text memos + joins
Executive-ready reports	✗ Raw tables or chat	✓ Polished memo + appendix
Reproducible artifacts	✗ Ephemeral	✓ Charts, CSVs, Jupyter notebook
Follow-up Q&A	✗ Loses context	✓ Grounded in investigation artifacts

Quick Start

# Clone
git clone https://github.com/YOUR_USERNAME/Northstar.git
cd Northstar

# Install
pip install -r requirements.txt

# Run
MPLBACKEND=Agg python -m uvicorn app.main:app --port 8000

# Open http://localhost:8000

With AI-powered reasoning (optional)

Set any one of these API keys for LLM-enhanced analysis:

# NVIDIA Nemotron (recommended — free tier available)
export NEMOTRON_API_KEY=nvapi-...

# Or OpenAI
export OPENAI_API_KEY=sk-...

# Or Anthropic
export ANTHROPIC_API_KEY=sk-ant-...

Without an API key, Northstar runs its full pipeline using deterministic analysis — still produces all artifacts, claims, and reports.

How It Works

┌─────────────────────────────────────────────────────────────┐
│                     User Question                           │
│  "Why did revenue per customer dip in February?"            │
└─────────────┬───────────────────────────────────────────────┘
              ▼
┌─────────────────────────┐
│   1. INTAKE             │  Register data sources (CSV, Excel, text)
│   2. PROFILE            │  Schema, quality, null rates, date detection
│   3. SEMANTIC CONTEXT   │  Entities, measures, dimensions, joins
│   4. PLAN               │  Bounded steps + risk flags (+ LLM reasoning)
│   5. EXECUTE            │  Trend analysis, segmentation, charts, ML
│   6. VERIFY             │  Evidence audit, claim quality, caveats
│   7. DELIVER            │  Memo, appendix, notebook, summary
└─────────────┬───────────┘
              ▼
┌─────────────────────────────────────────────────────────────┐
│  Outputs:                                                    │
│  📄 Executive Memo       📊 Charts (PNG)                     │
│  📑 Technical Appendix   📋 Data Tables (CSV)                │
│  📓 Jupyter Notebook     💬 Grounded Follow-up Q&A           │
│  🔍 Evidence-linked claims with confidence levels            │
│  ⚠️  Caveats & limitations                                   │
└─────────────────────────────────────────────────────────────┘

Architecture

Northstar follows a modular monolith pattern — easy to deploy, easy to understand:

app/
├── main.py                    # FastAPI application + SPA serving
├── api/routes.py              # REST API endpoints
├── agents/orchestrator.py     # 7-stage pipeline orchestrator
├── services/
│   ├── profiler_service.py    # Data profiling & quality checks
│   ├── semantic_context_service.py  # Entity/measure/join inference
│   ├── planner_service.py     # Analysis plan generation
│   ├── execution_service.py   # Autonomous analysis execution
│   ├── verification_service.py # Claim & evidence verification
│   ├── narrative_service.py   # Report generation (memo + appendix)
│   ├── qa_service.py          # Grounded follow-up Q&A
│   ├── llm_service.py         # LLM abstraction (Nemotron/OpenAI/Anthropic)
│   └── model_service.py       # Optional ML modeling
├── schemas.py                 # Pydantic data models
└── config.py                  # Configuration
web/
├── index.html                 # SPA shell
├── app.js                     # Full SPA (landing + dashboard + results)
└── styles.css                 # Premium dark-mode design system

Design Principles

Read-only — Northstar never modifies your data
Bounded execution — Analysis plan has fixed steps, no unbounded loops
Evidence-grounded claims — Every finding links to specific artifacts
Conservative language — Findings rated Supported/Directional/Observational
Caveat-first — Limitations surfaced, not hidden
Reproducible — Every artifact saved, notebook re-runnable

LLM Integration

Northstar has a pluggable LLM layer that enhances (but is never required for) its pipeline:

Feature	Without LLM	With LLM
Plan reasoning	Deterministic steps	AI-reasoned strategy based on data + question
Executive memo	Structured template	Polished, context-aware narrative
Follow-up Q&A	Category-based responses	Natural language, evidence-grounded answers
Insight synthesis	Quantitative claims	Connected, interpreted findings

The LLM is called through a unified interface (llm_service.py) that:

Auto-detects available provider (Nemotron → OpenAI → Anthropic)
Implements timeout and error handling
Falls back gracefully to deterministic analysis on failure

Demo

Sample Data Included

Northstar ships with sample data for an e-commerce revenue investigation:

orders.csv — Transaction data with dates, amounts, channels
customers.csv — Customer segments and join dates
support_tickets.csv — Support volume and resolution times
pricing_memo.txt — Text memo about a February pricing change

Running the Demo

Start the server: MPLBACKEND=Agg python -m uvicorn app.main:app --port 8000
Open http://localhost:8000
Click Launch App → Run Sample Demo
Watch the 7-stage pipeline execute
Explore findings, artifacts, and ask follow-up questions

Project Structure

Northstar/
├── app/                  # Backend (Python/FastAPI)
├── web/                  # Frontend (SPA)
├── docs/                 # Documentation (GitHub Pages)
├── sample_data/          # Demo datasets
├── tests/                # Test suite
├── scripts/              # Utility scripts
├── requirements.txt      # Python dependencies
└── Dockerfile            # Container support

Testing

pip install -r requirements-dev.txt
python -m pytest tests/ -v

License

MIT

✦ Northstar — Because your data deserves an autonomous analyst.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✦ Northstar

Autonomous Data Investigation Platform

The Problem

Why Northstar

What makes it different

Quick Start

With AI-powered reasoning (optional)

How It Works

Architecture

Design Principles

LLM Integration

Demo

Sample Data Included

Running the Demo

Project Structure

Testing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
docs		docs
examples		examples
sample_data		sample_data
scripts		scripts
tests		tests
web		web
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
server.log		server.log

Folders and files

Latest commit

History

Repository files navigation

✦ Northstar

Autonomous Data Investigation Platform

The Problem

Why Northstar

What makes it different

Quick Start

With AI-powered reasoning (optional)

How It Works

Architecture

Design Principles

LLM Integration

Demo

Sample Data Included

Running the Demo

Project Structure

Testing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages