BlueprintParser

Live demo: https://app.blueprintparser.com/demo
BP Docs: https://app.blueprintparser.com/docs

BlueprintParser is a Next.js web app and a blueprint parsing engine.

Features

Manual and AI-assisted QTO
Extract schedules/tables from PDF to Excel/CSV
LLM chat with entire projects without blowing up context
- BP includes a context engine that uses multiple methods for creating dense, token-efficient blueprint embeddings
Agent tool use (work in progress)
- BP tools can be operated by agents
- Agents can query blueprints and provide more accurate answers with citations
Blueprint parsing engine
- Backend tooling that makes PDF CDs machine readable
- YOLO object detection models detect shapes and spatial zones.
- Rasterize PDF and OCR all text
- CSI tagging: tag all CSI codes in the drawings
- OpenCV tools for extracting shapes and visual information
- Auto name Sheets / Drawing numbers
- Network graph engine: build nodes and edges between sheets in a project and between projects in a portfolio.

See all features and more in the BlueprintParser docs, link up top.

Current YOLO Classes

There are two general categories: "discrete and spatial" classes.

Discrete Classes

Examples:

door_single
square
schedule_table

Spatial Classes

Examples:

title_block
drawings
text_block

Spatial classes define regions of the drawings, and help with localization and corroboration. Since we use OCR, we also have the pixel coordinates of all text, which we can map against the YOLO outputs. For instance, when extracting the drawing number, we can check if the drawing number is located inside a title_block region.
Or, when looking for the door schedule, BP knows to look for the sheet where the words "door schedule" occur inside a "schedule_table" region, and can double check the title_block for coraberating text like "door and window schedule".
BP uses spatial data to build CSI-heatmaps, which show the distribution of CSI codes on a sheet. YOLO spatial data enables us to differentiate between a cluster of CSI codes that are occuring in a text_block vs the drawings vs inside a schedule vs the title_block. YOLO spatial classes are baby steps towards building a visual-reasoning engine for blueprints.

YOLO acts as the "eyes" for LLMs and agents, enabling them to "see" into the drawings.

NOTE on YOLO Models

The YOLO models used in the demo were hand-labeled and trained by myself.

500+ hours of manual labeling
More than $3,000 in AWS compute

The trained weights and training datasets are not included in this repository. BlueprintParser is designed so users can upload and use their own object-detection models.

Project Goal

It started with trying to build a free quantity takeoff tool, expanded with extracting schedules/tables to excel, and eventually evolved into a LLM harness for blueprints.

Its not enough to rely on LLMs, it's necesary to give them tools that empower them to do more with blueprints.

Future Goals as of May, 2026

Rebuild a headless version of the BP parsing engine
- Frontend agnostic
- Usable by tools like Claude Cowork/Code, Codex and agent harnesses.
Build a lightweight frontend optimized for LLM/"chat with the drawings".
Expand agent tool use
- Enable an agent to do a full QTO using BP's tools. With better tool use, agents can more effectively navigate the plans, answer more complex questions with citations derived from the drawings, and actually interact with blueprints.
Eventually rebuild the frontend entirely, it's full of bugs and the UI is janky.

Training Data requirements for YOLO

I believe there are at least ~200 blueprint object classes that, if trained properly, could enable an AI-QTO engine to handle 80–90% of "each" takeoffs across trades. A truly comprehensive blueprint object-detection system would likely exceed 400 classes. To label this many classes, it would take 50-100+ annotators working full time for 6-12 months, require hundreds of thousands of bid grade CDs, a robust annotator workforce management system, and lots of data labeling and ML ops. Factor in hosting cost for that many annotators and the compute for training the models, the capital to build this dataset could be a 6 or 7 figure investment.

Then, we have to do this again for image segmentation (surface area), classification, for fine tuning LLMs, and much more.

There is a tremendous data labeling challenge ahead of the AEC industry.

Why This Matters

We have to solve the "drawing problem" if the AEC industry wants to one day, build its own VLM or foundation models. Spatial reasoning is a must. To get there will require a massive and sustained data labeling campaign, that likely will require coordination between unlikely partners to succeed.

BlueprintParser is a small step in that direction, and hopefully contributes toward the development of an open source ai community in AEC and specifically for Estimators.

Quick Start (Local Development)

git clone https://github.com/goodmorningcoffee/BlueprintParser.git
cd BlueprintParser/blueprintparser_2
cp .env.example .env.local       # Edit DATABASE_URL, NEXTAUTH_SECRET at minimum
docker compose up -d              # PostgreSQL on port 5433
npm install
npx drizzle-kit migrate           # Create database tables
bash scripts/setup.sh             # Create root admin account (interactive)
npm run dev                       # http://localhost:3000

Works without AWS credentials — PDF viewing, annotations, table parsing, QTO, and search are all functional locally. For the full pipeline, add: GROQ_API_KEY (free-tier LLM chat), AWS credentials (Textract OCR, S3 storage, SageMaker YOLO inference).

Setup Scripts

Script	Purpose	When to Use
`scripts/setup.sh`	First-time setup — runs migrations and creates root admin account	After cloning, once PostgreSQL is running
`install_setup.sh`	Full interactive wizard — configures `.env.local`, AWS credentials, LLM providers, S3, SageMaker, Label Studio	First deploy to AWS, or when adding new services
`deploy.sh`	Build Docker image, push to ECR, update ECS service	AWS production deployments
`deploy-yolo.sh`	Build and push YOLO inference container to ECR	When updating YOLO models or inference code
`hardening.sh`	AWS WAF rules, security headers, CloudWatch alarms	After initial AWS infrastructure setup
`root_admin.sh`	Create/reset root admin account on a running instance	Emergency admin access recovery
`scripts/cost-control.sh`	Check and stop idle SageMaker endpoints	Cost management for GPU resources
`scripts/sagemaker-killswitch.sh`	Emergency stop all SageMaker resources	If costs spike unexpectedly

Full Setup (with AWS)

For AWS deployment with Textract, S3, SageMaker, and ECS:

bash install_setup.sh             # Interactive wizard — walks through all config
bash deploy.sh                    # Build + push to ECR + update ECS

The setup wizard prompts for: database URL, auth secrets, LLM API keys (Groq/Anthropic/OpenAI), AWS credentials, S3 bucket, CloudFront, Step Functions, SageMaker, and Label Studio. Re-run anytime to update configuration.

Configuration

File	Purpose
`.env.example`	All environment variables with descriptions. Copy to `.env.local` and fill in your values.
`.deploy.env.example`	AWS deployment variables (ECS, ECR, S3 bucket names). Copy to `.deploy.env` for production deploy.
`infrastructure/terraform/`	Full AWS infrastructure as code. Update `main.tf` backend config and `terraform.tfvars` before running.

Deployment Tiers

Tier	What You Get	Estimated Cost
Local only (Docker Compose)	PDF viewer, annotations, table parsing, QTO, search, LLM chat (Groq free tier)	$0
Minimal AWS (S3 only)	+ Cloud storage for PDFs and thumbnails	~$5/month
Full AWS (ECS + RDS + SageMaker)	+ Textract OCR, YOLO inference, Step Functions orchestration, multi-tenant	~$150-300/month
SageMaker GPU (on-demand)	YOLO model inference jobs (ml.g4dn.xlarge)	~$0.75/hour per run

System Architecture

                                 +------------------+
                                 |   CloudFront     |
                                 |  (assets CDN)    |
                                 +--------+---------+
                                          |
+-------------+    HTTPS    +-------------+-----------+    S3     +------------------+
|   Browser   +------------>+   ALB (TLS termination) +---------->+  beaver-data-*   |
|  (pdf.js +  |             +-------------+-----------+           |  (PDFs, thumbs,  |
|   Zustand)  |                           |                       |   YOLO results,  |
+-------------+                           v                       |   model weights) |
                            +-------------+-----------+           +------------------+
                            |   ECS Fargate (2vCPU)   |
                            |   Next.js 16 + API      |
                            |   52+ endpoints          |
                            +--+--------+----------+--+
                               |        |          |
                  +------------+   +----+----+   +-+-----------+
                  |                |         |   |             |
                  v                v         v   v             v
           +------+------+  +-----+---+ +---+---+--+  +-------+--------+
           |  RDS PG 16  |  | Textract| | Secrets  |  | Step Functions  |
           |  (14 tables)|  | (OCR)   | | Manager  |  | (orchestrator)  |
           +-------------+  +---------+ +----------+  +-------+--------+
                                                               |
                                                               v
                                                    +----------+---------+
                                                    | ECS Task (8vCPU)   |
                                                    | CPU Processing     |
                                                    | (GS + Textract +   |
                                                    |  OpenCV + Python)  |
                                                    +--------------------+

                                                    +--------------------+
                                                    | SageMaker GPU      |
                                                    | ml.g4dn.xlarge     |
                                                    | (YOLOv8 inference) |
                                                    +--------------------+

Processing Pipeline (6-System Cascade)

The processing pipeline transforms PDFs into 'structured data' through six interconnected systems. Upstream system's outputs feed as inputs into downstream systems. The final processed outputs are machine readable, and context optimised for LLMs.

System Overview

Raw pixels → OCR words → spatial clusters → semantic regions →
classified tables → YOLO-text bindings → tag patterns →
page intelligence → project graph → LLM context

Upload Processing (per page, concurrency configurable via admin)

Upload PDF → S3
  │
  v
Step Functions → ECS Task
  │
  ├── 1. Ghostscript: PDF → PNG per page (300 DPI, -dMaxBitmap=500M)
  ├── 2. Textract OCR with fallback chain:
  │     ├── Textract full-res (with retry+backoff for throttling)
  │     ├── Textract half-res (50% downscale if UnsupportedDocument)
  │     └── Tesseract local (if Textract unavailable)
  │     Returns: words[], lines[], tables[] (never throws — empty on total failure)
  │
  ├── 3. Drawing number extraction (title block regex + position scoring)
  ├── 4. CSI code detection (3-tier matching against 8,951-row MasterFormat DB)
  ├── 5. Text annotation detection (30+ regex detectors: phone, email, equipment, dims...)
  ├── 6. Page intelligence:
  │     ├── System 1: Text region classification (OCR word clustering → table-like, notes, key-value)
  │     ├── System 2: Heuristic engine (rule-based: text keywords + YOLO spatial + CSI affinity)
  │     └── System 3: Table meta-classifier (combines Systems 1+2 → door-schedule, finish-schedule, etc.)
  ├── 7. CSI spatial heatmap (NxN configurable grid + title-block + right-margin zones)
  ├── 8. tsvector generation (full-text search index)
  └── 9. Project-level analysis (discipline breakdown, reference graph, CSI topology, summaries)

Page concurrency: 8 (configurable via admin dashboard, Textract limit: 10 TPS)
Each step wrapped in independent try-catch — one step failing doesn't block the pipeline.

Post-YOLO Processing (user-triggered)

POST /api/yolo/run → SageMaker Processing Job (ml.g4dn.xlarge GPU)
  │
  v
POST /api/yolo/load → Load results from S3
  ├── Create annotations (source='yolo') with class-level CSI codes from model config
  ├── Re-run heuristic engine with YOLO spatial signals (System 2, phase 2)
  ├── Reclassify tables with YOLO-enriched heuristic data
  └── Merge YOLO CSI codes into page-level csiCodes

CSI Code Detection (3-Tier Matching)

CSI MasterFormat codes are the universal embedding layer that connects every system — OCR text, YOLO detections, parsed tables, spatial zones, and LLM context all tagged with CSI codes.

Detection Algorithm (`csi-detect.ts`)

Matches raw OCR text against the 8,951-row MasterFormat 2004 database (csi-masterformat.tsv). Three independent tiers, each with configurable thresholds:

Tier	Method	Confidence	Description
1	Exact subphrase	0.95	Consecutive words from CSI description appear together in text
2	Bag-of-words	up to 0.75	Score = `(matched/total)²` — squared penalty rewards near-complete overlap
3	Keyword anchors	up to 0.50	Rare words weighted by inverse document frequency (IDF-like)

Multi-tier boost: +0.05 when both Tier 2 and Tier 3 agree (independent signals confirming each other)
Stop-word filtering, minimum word count thresholds per tier
All thresholds admin-configurable via pipeline control panel

CSI Embedding Strategy

Every element in the system gets CSI-tagged:

Textract words → CSI codes (via csi-detect)
YOLO detections → CSI codes (via model config classCsiCodes)
Parsed table rows → CSI codes (via content matching)
Text annotations → CSI tags (via csi-detect on annotation text)
Spatial zones → CSI divisions (via csi-spatial heatmap)
User markups → CSI codes (via manual tagging in annotation editor)

All sources merge into pages.csiCodes (page-level). The CSI co-occurrence graph (csi-graph.ts) tracks which divisions appear together across pages, building a project-wide network with clusters (MEP, Architectural, Structural, Site) and cross-reference edges.

CSI Spatial Heatmap (`csi-spatial.ts`)

Each page is divided into a configurable NxN grid (3x3 default, up to 12x12). Special zones: title-block (y > 0.85), right-margin (x > 0.75). All CSI-tagged elements (text annotations, YOLO detections, classified tables, parsed regions, YOLO tags) are binned into zones by bbox center.

Output: per-zone division breakdown + natural language summary (e.g., "Door-related content (Div 08) clusters in center drawing area; Plumbing (Div 22) clusters in bottom-right").

Fed to LLM at priority 7.0 in the context builder. Recomputed client-side after user parsing or YOLO load via refreshPageCsiSpatialMap().

Table & Schedule Parsing

Three parsing methods run in parallel on user-drawn bounding boxes, results merged by a grid merger that selects the highest-confidence alignment:

Method 1: OCR Word Positions (`ocr-grid-detect.ts`)

Cluster OCR words by Y-center (row tolerance: configurable, default 0.006)
Detect columns by X left-edge gap analysis (min gap: configurable, default 0.015)
Column stability filter: column must appear in ≥30% of rows (configurable)
Layout hint support: force N columns (e.g., 2 for keynotes)
Output: row/column boundaries + confidence score

Method 2: Textract TABLES Feature

AWS Textract returns structured TABLE blocks with CELL children
Extract row/column indices, cell text, bounding boxes
Higher confidence on structured tables with clear borders

Method 3: OpenCV Line Detection (`detect_table_lines.py`)

Rasterize page region → Hough transform → detect horizontal/vertical lines
Build grid from line intersections
Best for tables with visible rule lines

Guided Parse (Tunable)

New "Guided" tab in the Schedules/Tables panel. User draws BB, system auto-proposes grid via /api/table-parse/propose, then user can tune with live sliders:

Slider	Controls	Default	Effect
Row Sensitivity	`rowTolerance` in `clusterRows()`	0.006	Lower = tighter row clustering
Column Sensitivity	`minColGap` in `detectColumns()`	0.015	Lower = more columns detected
Column Confidence	`minHitsRatio` in `detectColumns()`	0.3	Lower = keeps weaker columns
Expected Columns	`layoutHint.columns`	Auto	Force exact column count

Slider changes trigger debounced re-proposal (300ms). Grid lines rendered as draggable overlays on canvas (GuidedParseOverlay). Repeat Down / Repeat Right buttons tile uniform rows/columns. Available in both table and keynote parsing flows.

Post-Parse Pipeline

Cell extraction via extractCellsFromGrid() — reads OCR words within grid boundaries
Generic column headers ("Column 1", "Column 2") — user renames via edit UI
CSI auto-detection on parsed content
Persistence to pageIntelligence.parsedRegions + DB
CSI spatial map recomputation
Optional tag mapping (see YOLO-Tag Engine below)

YOLO-Tag Engine (`yolo-tag-engine.ts`)

Maps YOLO shape detections to nearby OCR text, creating tag instances that link visual symbols to their alphanumeric identifiers.

Core Algorithm: `mapYoloToOcrText()`

For each YOLO annotation matching the target class:
  1. Find all OCR words whose center falls inside the YOLO bbox
  2. Sort left-to-right, concatenate → candidateText
  3. Match against target tag:
     - Exact match → confidence 1.0
     - Edit distance ≤ 1 (for text ≥ 3 chars) → confidence 0.9
     - Short tags (1-2 chars) require exact match (prevents "3" matching "8")

Free-Floating Mode

For tags without YOLO shapes (OCR-only detection):

Single-word tags: scan all words on page
Multi-word tags: sliding window over adjacent words with bbox merging

Tag Pattern Detection (`tag-patterns.ts`)

Discovers repeating YOLO+text patterns:

Input: 5 circles containing "T-01", "T-02", "T-03", "T-04", "T-05"
  1. For each YOLO detection, extract overlapping OCR text
  2. Extract prefix pattern: "T-01" → prefix "T-"
  3. Group by (yoloClass, prefix): "circle__T-" → 5 instances
  4. Build TagGroup: { pattern: "T-\\d+", instances: 5, confidence: "confirmed" }

Used by: keynote parsing (auto-create tags after parse), schedule parsing (Map Tags section), Auto-QTO workflow.

Auto-QTO Workflow

Guided, material-specific Quantity Takeoff workflow. Walks estimators through: find schedule → parse it → define tag column → scan all pages for tags → review counts → CSV export.

Workflow State Machine

Step 1: Pick Material → Step 2: Find Schedule Page → Step 3: Parse Schedule
                                                           ↓
Step 6: CSV Editor ← Step 5: Review Counts ← Step 4: Map Tags to Drawings

Key Concepts

Tag Column: User marks which parsed column contains the tag identifiers (e.g., "MARK", "NO."). Auto-suggested via regex + header keyword matching.
Multi-Signal Tag Engine: OCR text matching (always) + YOLO shape proximity (optional boost). YOLO not required — works on day one without models.
Confidence Scoring: Each tag instance includes signal breakdown (OCR match, OCR confidence, YOLO proximity, YOLO class).
Flags: Auto-detected discrepancies (tag in schedule but not found on drawings, tag on drawings but not in schedule, low confidence, qty mismatch).

Data Model

qto_workflows table: persists workflow state across sessions
Phases A-B implemented: foundation + schedule finding/parsing
Phases C-E planned: enhanced tag engine (findTagInstances() with multi-signal return), review table, CSV editor

LLM Context System

The core compression engine. Transforms 250K tokens of raw OCR into ~6K tokens of structured, priority-ordered context — a 40:1 compression ratio that makes small models (Haiku, Llama) viable for blueprint Q&A.

Context Assembly (`context-builder.ts`)

12+ sections, each with stable ID, priority, and configurable budget allocation:

Priority	Section	Source
0.5	Project Intelligence Report	projectIntelligence (auto-generated)
1.0	YOLO Detection Counts	annotations (source='yolo')
1.0	CSI Network Graph	projectIntelligence.csiGraph
1.5	Page Classification	pageIntelligence.classification
2.0	User Annotations	annotations (source='user')
3.0	Takeoff Notes	takeoffItems
3.5	Cross-References	pageIntelligence.crossRefs
4.0	CSI Codes	pages.csiCodes
5.0	Text Annotations	pages.textAnnotations (37 types)
5.5	Note Blocks	pageIntelligence.noteBlocks
5.8	Parsed Tables/Keynotes	pageIntelligence.parsedRegions
6.0	Detected Regions	pageIntelligence.classifiedTables
7.0	CSI Spatial Distribution	pageIntelligence.csiSpatialMap
8.0	Spatial OCR→YOLO Context	OCR words mapped to YOLO regions
10.0	Raw OCR Text	pages.rawText (fallback, often truncated)

Percentage-Based Budgeting with Overflow

Each section gets a % of the model-aware budget (Opus: 200K chars, Sonnet: 80K, GPT-4o: 60K, Groq: 24K). Unused allocation flows to a shared overflow pool that redistributes to sections needing more space. Three presets:

Balanced: equal distribution
Structured: parsed tables 25%, CSI spatial 8%, spatial context 12%, raw OCR 5%
Verbose: raw OCR 40%, spatial 15%, parsed tables 10%

All configurable via admin LLM/Context tab (4 panels: section control, system prompt, budget, context preview tool).

Multi-Provider Support

Resolution chain: User API key → Company config → Environment variable. Streaming SSE for all providers.

Provider	Models	Tool-Use
Anthropic	Claude Opus/Sonnet/Haiku	Planned
OpenAI	GPT-4o, o1/o3	Planned
Groq	Llama 3.3 70B	Partial
Custom/Ollama	Any	Context-only

Frontend Architecture

Viewer Component Hierarchy

PDFViewer (root — scroll, zoom, keyboard shortcuts)
├── ViewerToolbar (mode, zoom, search, panel toggles)
├── PageSidebar (thumbnails, page filtering, lazy-loaded)
├── SymbolSearchPanel (floating popup, 4 states)
├── PDFPage (canvas + overlays)
│   ├── SearchHighlightOverlay (memo) — search + CSI word highlights
│   ├── TextAnnotationOverlay — auto-detected text patterns
│   ├── KeynoteOverlay — keynote markers
│   ├── GuidedParseOverlay — draggable grid lines for guided parse
│   ├── ParseRegionLayer (memo) — table/keynote parse BBs
│   └── AnnotationOverlay (orchestrator — ALL mouse events)
│       └── DrawingPreviewLayer (memo) — in-progress BB/polygon/calibration
├── Right panels (toggled):
│   ├── TextPanel, ChatPanel, TakeoffPanel, DetectionPanel
│   ├── CsiPanel, PageIntelligencePanel
│   ├── TableParsePanel (5 tabs: All Tables, Auto Parse, Guided, Manual, Compare/Edit)
│   ├── KeynotePanel (3 tabs: All Keynotes, Auto Parse, Guided)
│   └── TableCompareModal (fullscreen side-by-side + overlay mode)
└── MarkupDialog (modal)

Performance Architecture

Drawing state decoupling: _drawing, _drawStart, _drawEnd, _mousePos live in Zustand store. AnnotationOverlay writes but does NOT subscribe (uses getState() for reads in event handlers). Only DrawingPreviewLayer subscribes → only lightweight canvas redraws during BB drawing. The main 1500-line overlay stays frozen during mouse movement.

Event handler precedence (AnnotationOverlay): Strict sequential dispatch with early returns. Order matters:

Calibration mode → place points
Takeoff placement → polygon/marker
YOLO/tag picking → select shape
Symbol search / parse drawing → draw BB
Pointer mode → select/click/double-click OCR
Markup mode → draw annotation

Zoom: CSS transform for instant feedback, debounced pdf.js re-render (300ms). Cursor-centric (keeps point under cursor stable during zoom).

Chunking: 15-page sliding window with catalog/detail split. Catalog data (~100KB — trade lists, schedule names, CSI directory) loads once. Detail data (~80KB/page — textract words, page intelligence) loads on demand with eviction. State snapshot before async fetch prevents race conditions during rapid navigation.

Zustand Store

Single store (~250 fields, 1,100 lines) with 15 slice selectors (useNavigation(), useProject(), useTableParse(), useYoloTags(), etc.) using useShallow() for memoized subscriptions. Components should use slices rather than direct field access.

Admin Dashboard (9 Tabs)

Tab	Purpose
Overview	Project list, reprocess controls, system status
Pipeline	Processing step toggles, page concurrency, CSI spatial grid resolution, table proposals config
AI Models	YOLO model registry, S3 upload, run/load/status per project
Heuristics	Rule editor with YOLO class picker (dropdown from registered models), text keywords, spatial conditions
LLM / Context	4 panels: section control (toggle/priority/%), system prompt editor, budget config, context preview tool
CSI Codes	CSI database browser, class-level CSI tagging for YOLO models
Text Annotations	Detector toggle (30+ types), preview per page
Page Intelligence	Classification results, heuristic inferences, reprocess by scope
Users	User management, role assignment, API key management

All customization lives in admin — code provides defaults only. Different companies have different YOLO models with different class names; the admin dashboard is the authority.

Database Schema (14 tables)

Table	Purpose	Key Fields
companies	Multi-tenant orgs	pipelineConfig (JSONB), features (JSONB), dataKey (S3 prefix)
users	Auth + permissions	role (admin/member), canRunModels, companyId
userApiKeys	BYOK LLM keys	provider, encryptedKey (AES-256-GCM)
projects	PDF documents	dataUrl (S3), status, projectIntelligence (JSONB), isDemo
pages	Per-page OCR data	textractData (JSONB), csiCodes, pageIntelligence, search_vector (GIN)
annotations	YOLO + user markups	bbox [minX,minY,maxX,maxY], source (yolo/user/takeoff), data (JSONB)
takeoffItems	QTO categories	name, shape, color, size, notes
qtoWorkflows	Auto-QTO state	step, parsedSchedule, lineItems, userEdits (JSONB)
chatMessages	Chat history	role, content, model, pageNumber (nullable = project-wide)
processingJobs	SageMaker tracking	status, stepFunctionArn
models	YOLO model registry	s3Path, config (JSONB: classCsiCodes, classKeywords)
llmConfigs	LLM provider config	provider, model, encryptedApiKey, baseUrl
auditLog	Security audit	action, userId, details (JSONB), ip
inviteRequests	Signup queue	email, name, company

Security Model

Layer	Implementation
Auth	NextAuth 5 credentials, bcrypt (cost 12), JWT (24hr)
Brute force	5 failures = 15min lockout, 10 = 1hr, per email
Rate limiting	In-memory middleware, per-endpoint (auth: 3-5/15min, chat: 30/hr, YOLO: 5/hr, general: 120/min)
Multi-tenancy	All queries scoped by `companyId`. Registration requires company access key
API key encryption	AES-256-GCM with random IV + auth tag
Quotas	Per-company daily: uploads (3 member/10 admin), YOLO (5), chat (200)
Secrets	AWS Secrets Manager via ECS task definition
Webhooks	HMAC-SHA256 + timestamp validation (reject > 5min)
Audit	Login, registration, create/delete, YOLO runs, password changes
Headers	nosniff, DENY frame, XSS protection, strict referrer
WAF	AWS WAF on ALB: rate limit (1000/IP), SQLi rules, known bad inputs, IP reputation (via `hardening.sh`)
Monitoring	CloudWatch alarms (5xx, unhealthy hosts, CPU), ALB access logs, GuardDuty, CloudTrail

Infrastructure (Terraform)

Resource	Config
VPC	10.0.0.0/16, 2 public + 2 private subnets, NAT gateway
ECS Fargate	2 vCPU / 4 GB app, 8 vCPU / 16 GB processing tasks, auto-scaling 1-4
ALB	HTTPS with ACM cert, TLS 1.3, 300s idle, circuit breaker rollback
RDS	PostgreSQL 16, db.t4g.medium, 50GB gp3, encrypted, 7-day backups, Multi-AZ
S3 + CloudFront	Versioned bucket, OAC, CORS policy, CachingOptimized, TLS 1.2
ECR	3 repos (app, cpu-pipeline, yolo-pipeline), scan-on-push
Step Functions	Blueprint processing orchestrator with retry/catch
SageMaker	Processing jobs for YOLO (ml.g4dn.xlarge GPU, on-demand)
Secrets Manager	8 secrets with 7-day recovery window

Estimated: ~$155/month baseline. SageMaker GPU ~$0.75/hr on-demand.

Python Scripts

Script	Purpose	Called By
`yolo_inference.py`	YOLOv8 inference on pages	SageMaker job
`template_match.py`	Two-tier template matching (cv2.matchTemplate + SIFT/FLANN)	/api/symbol-search
`extract_keynotes.py`	Keynote shape extraction + OCR	/api/keynotes/extract
`detect_table_lines.py`	Table grid line detection (Hough transform)	/api/table-parse

All communicate via JSON stdin/stdout. Called with execFile (not exec) — no shell injection surface.

Local Development

Prerequisites: Node.js 20+, Docker

docker compose up -d          # PostgreSQL on port 5433
npm install
cp .env.example .env.local    # Edit: DATABASE_URL, NEXTAUTH_SECRET
npx drizzle-kit migrate
npm run dev                   # http://localhost:3000

Works without AWS credentials — PDF viewing, annotations, QTO, search all functional. For full pipeline: add GROQ_API_KEY (chat), AWS credentials (Textract, S3, SageMaker).

Tech Stack


Runtime	Next.js 16 (App Router), React 19, TypeScript 5
State	Zustand 5 (single store, 15 slice selectors, `useShallow()` memoization)
Rendering	pdfjs-dist 4 (PDF), HTML5 Canvas (7 overlay layers), CSS transform zoom
Styling	Tailwind 4, 3 dark themes (Midnight/Slate/Graphite)
Database	PostgreSQL 16, Drizzle ORM, 15 migrations
Auth	NextAuth 5 (credentials), bcrypt, JWT
AI/LLM	Multi-provider (Groq/Anthropic/OpenAI/Custom), streaming SSE, priority-ordered context
CV	OpenCV (table lines, keynotes), Tesseract (fallback OCR), YOLOv8/ultralytics
Search	PostgreSQL tsvector + GIN index, ts_rank + ts_headline, global cross-project
Infra	Terraform, ECS Fargate, S3/CloudFront, RDS, SageMaker, Step Functions
Monitoring	CloudWatch alarms, ALB access logs, GuardDuty, WAF, CloudTrail

Key Design Decisions

CSI codes as universal embedding layer. Every pixel, annotation, tag, table cell, keynote, and spatial zone gets CSI-tagged. The CSI graph is a condensed project representation giving the LLM efficient navigation without reading raw OCR. This is the core architectural insight — structured tags compress information 40:1.

Confidence scores, never binary decisions. Every system outputs 0.0–1.0 confidence. A text region at 0.6 flows downstream; the table classifier might boost it to 0.85 if keywords match, or decay it to 0.3. Multiple independent signals that agree compound confidence.

Normalized coordinates everywhere. All annotations stored as 0-1 ratios. Rendering multiplies by current canvas dimensions. Zoom-independent, DPI-independent, resolution-independent.

No migrations for new annotation types. The data jsonb column on annotations absorbs new types without schema changes. Count markers, area polygons, scale calibrations, and YOLO detections all share one table.

Admin dashboard is the authority. All customization (heuristic rules, YOLO class names, CSI mappings, LLM config, pipeline toggles) lives in admin UI. Code provides defaults only. Different companies have different models with different class names — the admin adapts without code changes.

Two-phase heuristic execution. Rules work in text-only mode during OCR processing, then get re-scored with YOLO spatial signals after model load. Same rules, richer evidence.

Graceful degradation everywhere. Each processing step has independent try-catch. Textract failure falls through to Tesseract then to empty. Missing YOLO data → text-only heuristics. Missing CSI database → empty codes. The pipeline always completes.

License

BlueprintParser is source-available software.

Free for:

Personal use
Research
Education
Internal evaluation

Commercial use requires a separate written license.

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
drizzle		drizzle
featureRoadMap		featureRoadMap
infrastructure/terraform		infrastructure/terraform
public		public
scripts		scripts
src		src
.deploy.env.example		.deploy.env.example
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.lambda		Dockerfile.lambda
Dockerfile.yolo		Dockerfile.yolo
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
deploy-label-studio.sh		deploy-label-studio.sh
deploy-lambda.sh		deploy-lambda.sh
deploy-yolo.sh		deploy-yolo.sh
deploy.sh		deploy.sh
docker-compose.yml		docker-compose.yml
drizzle.config.ts		drizzle.config.ts
ecs-health.sh		ecs-health.sh
ecs-tune.sh		ecs-tune.sh
entrypoint.sh		entrypoint.sh
eslint.config.mjs		eslint.config.mjs
hardening.sh		hardening.sh
install_setup.sh		install_setup.sh
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
pre-reddit-check.sh		pre-reddit-check.sh
root_admin.sh		root_admin.sh
setup-lambda-iam.sh		setup-lambda-iam.sh
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

BlueprintParser

Features

Current YOLO Classes

Discrete Classes

Spatial Classes

NOTE on YOLO Models

Project Goal

Future Goals as of May, 2026

Training Data requirements for YOLO

Why This Matters

Quick Start (Local Development)

Setup Scripts

Full Setup (with AWS)

Configuration

Deployment Tiers

System Architecture

Processing Pipeline (6-System Cascade)

System Overview

Upload Processing (per page, concurrency configurable via admin)

Post-YOLO Processing (user-triggered)

CSI Code Detection (3-Tier Matching)

Detection Algorithm (csi-detect.ts)

CSI Embedding Strategy

CSI Spatial Heatmap (csi-spatial.ts)

Table & Schedule Parsing

Method 1: OCR Word Positions (ocr-grid-detect.ts)

Method 2: Textract TABLES Feature

Method 3: OpenCV Line Detection (detect_table_lines.py)

Guided Parse (Tunable)

Post-Parse Pipeline

YOLO-Tag Engine (yolo-tag-engine.ts)

Core Algorithm: mapYoloToOcrText()

Free-Floating Mode

Tag Pattern Detection (tag-patterns.ts)

Auto-QTO Workflow

Workflow State Machine

Key Concepts

Data Model

LLM Context System

Context Assembly (context-builder.ts)

Percentage-Based Budgeting with Overflow

Multi-Provider Support

Frontend Architecture

Viewer Component Hierarchy

Performance Architecture

Zustand Store

Admin Dashboard (9 Tabs)

Database Schema (14 tables)

Security Model

Infrastructure (Terraform)

Python Scripts

Local Development

Tech Stack

Key Design Decisions

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

Detection Algorithm (`csi-detect.ts`)

CSI Spatial Heatmap (`csi-spatial.ts`)

Method 1: OCR Word Positions (`ocr-grid-detect.ts`)

Method 3: OpenCV Line Detection (`detect_table_lines.py`)

YOLO-Tag Engine (`yolo-tag-engine.ts`)

Core Algorithm: `mapYoloToOcrText()`

Tag Pattern Detection (`tag-patterns.ts`)

Context Assembly (`context-builder.ts`)