Live demo: https://app.blueprintparser.com/demo
BP Docs: https://app.blueprintparser.com/docs
BlueprintParser is a Next.js web app and a blueprint parsing engine.
- Manual and AI-assisted QTO
- Extract schedules/tables from PDF to Excel/CSV
- LLM chat with entire projects without blowing up context
- BP includes a context engine that uses multiple methods for creating dense, token-efficient blueprint embeddings
- Agent tool use (work in progress)
- BP tools can be operated by agents
- Agents can query blueprints and provide more accurate answers with citations
- Blueprint parsing engine
- Backend tooling that makes PDF CDs machine readable
- YOLO object detection models detect shapes and spatial zones.
- Rasterize PDF and OCR all text
- CSI tagging: tag all CSI codes in the drawings
- OpenCV tools for extracting shapes and visual information
- Auto name Sheets / Drawing numbers
- Network graph engine: build nodes and edges between sheets in a project and between projects in a portfolio.
See all features and more in the BlueprintParser docs, link up top.
There are two general categories: "discrete and spatial" classes.
Examples:
door_singlesquareschedule_table
Examples:
title_blockdrawingstext_block
Spatial classes define regions of the drawings, and help with localization and corroboration. Since we use OCR, we also have the pixel coordinates of all text, which we can map against the YOLO outputs.
For instance, when extracting the drawing number, we can check if the drawing number is located inside a title_block region.
Or, when looking for the door schedule, BP knows to look for the sheet where the words "door schedule" occur inside a "schedule_table" region, and can double check the title_block for coraberating text like "door and window schedule".
BP uses spatial data to build CSI-heatmaps, which show the distribution of CSI codes on a sheet. YOLO spatial data enables us to differentiate between a cluster of CSI codes that are occuring in a text_block vs the drawings vs inside a schedule vs the title_block. YOLO spatial classes are baby steps towards building a visual-reasoning engine for blueprints.
YOLO acts as the "eyes" for LLMs and agents, enabling them to "see" into the drawings.
The YOLO models used in the demo were hand-labeled and trained by myself.
- 500+ hours of manual labeling
- More than $3,000 in AWS compute
The trained weights and training datasets are not included in this repository. BlueprintParser is designed so users can upload and use their own object-detection models.
It started with trying to build a free quantity takeoff tool, expanded with extracting schedules/tables to excel, and eventually evolved into a LLM harness for blueprints.
Its not enough to rely on LLMs, it's necesary to give them tools that empower them to do more with blueprints.
-
Rebuild a headless version of the BP parsing engine
- Frontend agnostic
- Usable by tools like Claude Cowork/Code, Codex and agent harnesses.
-
Build a lightweight frontend optimized for LLM/"chat with the drawings".
-
Expand agent tool use
- Enable an agent to do a full QTO using BP's tools. With better tool use, agents can more effectively navigate the plans, answer more complex questions with citations derived from the drawings, and actually interact with blueprints.
-
Eventually rebuild the frontend entirely, it's full of bugs and the UI is janky.
I believe there are at least ~200 blueprint object classes that, if trained properly, could enable an AI-QTO engine to handle 80–90% of "each" takeoffs across trades. A truly comprehensive blueprint object-detection system would likely exceed 400 classes. To label this many classes, it would take 50-100+ annotators working full time for 6-12 months, require hundreds of thousands of bid grade CDs, a robust annotator workforce management system, and lots of data labeling and ML ops. Factor in hosting cost for that many annotators and the compute for training the models, the capital to build this dataset could be a 6 or 7 figure investment.
Then, we have to do this again for image segmentation (surface area), classification, for fine tuning LLMs, and much more.
There is a tremendous data labeling challenge ahead of the AEC industry.
We have to solve the "drawing problem" if the AEC industry wants to one day, build its own VLM or foundation models. Spatial reasoning is a must. To get there will require a massive and sustained data labeling campaign, that likely will require coordination between unlikely partners to succeed.
BlueprintParser is a small step in that direction, and hopefully contributes toward the development of an open source ai community in AEC and specifically for Estimators.
git clone https://github.com/goodmorningcoffee/BlueprintParser.git
cd BlueprintParser/blueprintparser_2
cp .env.example .env.local # Edit DATABASE_URL, NEXTAUTH_SECRET at minimum
docker compose up -d # PostgreSQL on port 5433
npm install
npx drizzle-kit migrate # Create database tables
bash scripts/setup.sh # Create root admin account (interactive)
npm run dev # http://localhost:3000Works without AWS credentials — PDF viewing, annotations, table parsing, QTO, and search are all functional locally. For the full pipeline, add: GROQ_API_KEY (free-tier LLM chat), AWS credentials (Textract OCR, S3 storage, SageMaker YOLO inference).
| Script | Purpose | When to Use |
|---|---|---|
scripts/setup.sh |
First-time setup — runs migrations and creates root admin account | After cloning, once PostgreSQL is running |
install_setup.sh |
Full interactive wizard — configures .env.local, AWS credentials, LLM providers, S3, SageMaker, Label Studio |
First deploy to AWS, or when adding new services |
deploy.sh |
Build Docker image, push to ECR, update ECS service | AWS production deployments |
deploy-yolo.sh |
Build and push YOLO inference container to ECR | When updating YOLO models or inference code |
hardening.sh |
AWS WAF rules, security headers, CloudWatch alarms | After initial AWS infrastructure setup |
root_admin.sh |
Create/reset root admin account on a running instance | Emergency admin access recovery |
scripts/cost-control.sh |
Check and stop idle SageMaker endpoints | Cost management for GPU resources |
scripts/sagemaker-killswitch.sh |
Emergency stop all SageMaker resources | If costs spike unexpectedly |
For AWS deployment with Textract, S3, SageMaker, and ECS:
bash install_setup.sh # Interactive wizard — walks through all config
bash deploy.sh # Build + push to ECR + update ECSThe setup wizard prompts for: database URL, auth secrets, LLM API keys (Groq/Anthropic/OpenAI), AWS credentials, S3 bucket, CloudFront, Step Functions, SageMaker, and Label Studio. Re-run anytime to update configuration.
| File | Purpose |
|---|---|
.env.example |
All environment variables with descriptions. Copy to .env.local and fill in your values. |
.deploy.env.example |
AWS deployment variables (ECS, ECR, S3 bucket names). Copy to .deploy.env for production deploy. |
infrastructure/terraform/ |
Full AWS infrastructure as code. Update main.tf backend config and terraform.tfvars before running. |
| Tier | What You Get | Estimated Cost |
|---|---|---|
| Local only (Docker Compose) | PDF viewer, annotations, table parsing, QTO, search, LLM chat (Groq free tier) | $0 |
| Minimal AWS (S3 only) | + Cloud storage for PDFs and thumbnails | ~$5/month |
| Full AWS (ECS + RDS + SageMaker) | + Textract OCR, YOLO inference, Step Functions orchestration, multi-tenant | ~$150-300/month |
| SageMaker GPU (on-demand) | YOLO model inference jobs (ml.g4dn.xlarge) | ~$0.75/hour per run |
+------------------+
| CloudFront |
| (assets CDN) |
+--------+---------+
|
+-------------+ HTTPS +-------------+-----------+ S3 +------------------+
| Browser +------------>+ ALB (TLS termination) +---------->+ beaver-data-* |
| (pdf.js + | +-------------+-----------+ | (PDFs, thumbs, |
| Zustand) | | | YOLO results, |
+-------------+ v | model weights) |
+-------------+-----------+ +------------------+
| ECS Fargate (2vCPU) |
| Next.js 16 + API |
| 52+ endpoints |
+--+--------+----------+--+
| | |
+------------+ +----+----+ +-+-----------+
| | | | |
v v v v v
+------+------+ +-----+---+ +---+---+--+ +-------+--------+
| RDS PG 16 | | Textract| | Secrets | | Step Functions |
| (14 tables)| | (OCR) | | Manager | | (orchestrator) |
+-------------+ +---------+ +----------+ +-------+--------+
|
v
+----------+---------+
| ECS Task (8vCPU) |
| CPU Processing |
| (GS + Textract + |
| OpenCV + Python) |
+--------------------+
+--------------------+
| SageMaker GPU |
| ml.g4dn.xlarge |
| (YOLOv8 inference) |
+--------------------+
The processing pipeline transforms PDFs into 'structured data' through six interconnected systems. Upstream system's outputs feed as inputs into downstream systems. The final processed outputs are machine readable, and context optimised for LLMs.
Raw pixels → OCR words → spatial clusters → semantic regions →
classified tables → YOLO-text bindings → tag patterns →
page intelligence → project graph → LLM context
Upload PDF → S3
│
v
Step Functions → ECS Task
│
├── 1. Ghostscript: PDF → PNG per page (300 DPI, -dMaxBitmap=500M)
├── 2. Textract OCR with fallback chain:
│ ├── Textract full-res (with retry+backoff for throttling)
│ ├── Textract half-res (50% downscale if UnsupportedDocument)
│ └── Tesseract local (if Textract unavailable)
│ Returns: words[], lines[], tables[] (never throws — empty on total failure)
│
├── 3. Drawing number extraction (title block regex + position scoring)
├── 4. CSI code detection (3-tier matching against 8,951-row MasterFormat DB)
├── 5. Text annotation detection (30+ regex detectors: phone, email, equipment, dims...)
├── 6. Page intelligence:
│ ├── System 1: Text region classification (OCR word clustering → table-like, notes, key-value)
│ ├── System 2: Heuristic engine (rule-based: text keywords + YOLO spatial + CSI affinity)
│ └── System 3: Table meta-classifier (combines Systems 1+2 → door-schedule, finish-schedule, etc.)
├── 7. CSI spatial heatmap (NxN configurable grid + title-block + right-margin zones)
├── 8. tsvector generation (full-text search index)
└── 9. Project-level analysis (discipline breakdown, reference graph, CSI topology, summaries)
Page concurrency: 8 (configurable via admin dashboard, Textract limit: 10 TPS)
Each step wrapped in independent try-catch — one step failing doesn't block the pipeline.
POST /api/yolo/run → SageMaker Processing Job (ml.g4dn.xlarge GPU)
│
v
POST /api/yolo/load → Load results from S3
├── Create annotations (source='yolo') with class-level CSI codes from model config
├── Re-run heuristic engine with YOLO spatial signals (System 2, phase 2)
├── Reclassify tables with YOLO-enriched heuristic data
└── Merge YOLO CSI codes into page-level csiCodes
CSI MasterFormat codes are the universal embedding layer that connects every system — OCR text, YOLO detections, parsed tables, spatial zones, and LLM context all tagged with CSI codes.
Matches raw OCR text against the 8,951-row MasterFormat 2004 database (csi-masterformat.tsv). Three independent tiers, each with configurable thresholds:
| Tier | Method | Confidence | Description |
|---|---|---|---|
| 1 | Exact subphrase | 0.95 | Consecutive words from CSI description appear together in text |
| 2 | Bag-of-words | up to 0.75 | Score = (matched/total)² — squared penalty rewards near-complete overlap |
| 3 | Keyword anchors | up to 0.50 | Rare words weighted by inverse document frequency (IDF-like) |
- Multi-tier boost: +0.05 when both Tier 2 and Tier 3 agree (independent signals confirming each other)
- Stop-word filtering, minimum word count thresholds per tier
- All thresholds admin-configurable via pipeline control panel
Every element in the system gets CSI-tagged:
Textract words → CSI codes (via csi-detect)
YOLO detections → CSI codes (via model config classCsiCodes)
Parsed table rows → CSI codes (via content matching)
Text annotations → CSI tags (via csi-detect on annotation text)
Spatial zones → CSI divisions (via csi-spatial heatmap)
User markups → CSI codes (via manual tagging in annotation editor)
All sources merge into pages.csiCodes (page-level). The CSI co-occurrence graph (csi-graph.ts) tracks which divisions appear together across pages, building a project-wide network with clusters (MEP, Architectural, Structural, Site) and cross-reference edges.
Each page is divided into a configurable NxN grid (3x3 default, up to 12x12). Special zones: title-block (y > 0.85), right-margin (x > 0.75). All CSI-tagged elements (text annotations, YOLO detections, classified tables, parsed regions, YOLO tags) are binned into zones by bbox center.
Output: per-zone division breakdown + natural language summary (e.g., "Door-related content (Div 08) clusters in center drawing area; Plumbing (Div 22) clusters in bottom-right").
Fed to LLM at priority 7.0 in the context builder. Recomputed client-side after user parsing or YOLO load via refreshPageCsiSpatialMap().
Three parsing methods run in parallel on user-drawn bounding boxes, results merged by a grid merger that selects the highest-confidence alignment:
- Cluster OCR words by Y-center (row tolerance: configurable, default 0.006)
- Detect columns by X left-edge gap analysis (min gap: configurable, default 0.015)
- Column stability filter: column must appear in ≥30% of rows (configurable)
- Layout hint support: force N columns (e.g., 2 for keynotes)
- Output: row/column boundaries + confidence score
- AWS Textract returns structured TABLE blocks with CELL children
- Extract row/column indices, cell text, bounding boxes
- Higher confidence on structured tables with clear borders
- Rasterize page region → Hough transform → detect horizontal/vertical lines
- Build grid from line intersections
- Best for tables with visible rule lines
New "Guided" tab in the Schedules/Tables panel. User draws BB, system auto-proposes grid via /api/table-parse/propose, then user can tune with live sliders:
| Slider | Controls | Default | Effect |
|---|---|---|---|
| Row Sensitivity | rowTolerance in clusterRows() |
0.006 | Lower = tighter row clustering |
| Column Sensitivity | minColGap in detectColumns() |
0.015 | Lower = more columns detected |
| Column Confidence | minHitsRatio in detectColumns() |
0.3 | Lower = keeps weaker columns |
| Expected Columns | layoutHint.columns |
Auto | Force exact column count |
Slider changes trigger debounced re-proposal (300ms). Grid lines rendered as draggable overlays on canvas (GuidedParseOverlay). Repeat Down / Repeat Right buttons tile uniform rows/columns. Available in both table and keynote parsing flows.
- Cell extraction via
extractCellsFromGrid()— reads OCR words within grid boundaries - Generic column headers ("Column 1", "Column 2") — user renames via edit UI
- CSI auto-detection on parsed content
- Persistence to
pageIntelligence.parsedRegions+ DB - CSI spatial map recomputation
- Optional tag mapping (see YOLO-Tag Engine below)
Maps YOLO shape detections to nearby OCR text, creating tag instances that link visual symbols to their alphanumeric identifiers.
For each YOLO annotation matching the target class:
1. Find all OCR words whose center falls inside the YOLO bbox
2. Sort left-to-right, concatenate → candidateText
3. Match against target tag:
- Exact match → confidence 1.0
- Edit distance ≤ 1 (for text ≥ 3 chars) → confidence 0.9
- Short tags (1-2 chars) require exact match (prevents "3" matching "8")
For tags without YOLO shapes (OCR-only detection):
- Single-word tags: scan all words on page
- Multi-word tags: sliding window over adjacent words with bbox merging
Discovers repeating YOLO+text patterns:
Input: 5 circles containing "T-01", "T-02", "T-03", "T-04", "T-05"
1. For each YOLO detection, extract overlapping OCR text
2. Extract prefix pattern: "T-01" → prefix "T-"
3. Group by (yoloClass, prefix): "circle__T-" → 5 instances
4. Build TagGroup: { pattern: "T-\\d+", instances: 5, confidence: "confirmed" }
Used by: keynote parsing (auto-create tags after parse), schedule parsing (Map Tags section), Auto-QTO workflow.
Guided, material-specific Quantity Takeoff workflow. Walks estimators through: find schedule → parse it → define tag column → scan all pages for tags → review counts → CSV export.
Step 1: Pick Material → Step 2: Find Schedule Page → Step 3: Parse Schedule
↓
Step 6: CSV Editor ← Step 5: Review Counts ← Step 4: Map Tags to Drawings
- Tag Column: User marks which parsed column contains the tag identifiers (e.g., "MARK", "NO."). Auto-suggested via regex + header keyword matching.
- Multi-Signal Tag Engine: OCR text matching (always) + YOLO shape proximity (optional boost). YOLO not required — works on day one without models.
- Confidence Scoring: Each tag instance includes signal breakdown (OCR match, OCR confidence, YOLO proximity, YOLO class).
- Flags: Auto-detected discrepancies (tag in schedule but not found on drawings, tag on drawings but not in schedule, low confidence, qty mismatch).
qto_workflowstable: persists workflow state across sessions- Phases A-B implemented: foundation + schedule finding/parsing
- Phases C-E planned: enhanced tag engine (
findTagInstances()with multi-signal return), review table, CSV editor
The core compression engine. Transforms 250K tokens of raw OCR into ~6K tokens of structured, priority-ordered context — a 40:1 compression ratio that makes small models (Haiku, Llama) viable for blueprint Q&A.
12+ sections, each with stable ID, priority, and configurable budget allocation:
| Priority | Section | Source |
|---|---|---|
| 0.5 | Project Intelligence Report | projectIntelligence (auto-generated) |
| 1.0 | YOLO Detection Counts | annotations (source='yolo') |
| 1.0 | CSI Network Graph | projectIntelligence.csiGraph |
| 1.5 | Page Classification | pageIntelligence.classification |
| 2.0 | User Annotations | annotations (source='user') |
| 3.0 | Takeoff Notes | takeoffItems |
| 3.5 | Cross-References | pageIntelligence.crossRefs |
| 4.0 | CSI Codes | pages.csiCodes |
| 5.0 | Text Annotations | pages.textAnnotations (37 types) |
| 5.5 | Note Blocks | pageIntelligence.noteBlocks |
| 5.8 | Parsed Tables/Keynotes | pageIntelligence.parsedRegions |
| 6.0 | Detected Regions | pageIntelligence.classifiedTables |
| 7.0 | CSI Spatial Distribution | pageIntelligence.csiSpatialMap |
| 8.0 | Spatial OCR→YOLO Context | OCR words mapped to YOLO regions |
| 10.0 | Raw OCR Text | pages.rawText (fallback, often truncated) |
Each section gets a % of the model-aware budget (Opus: 200K chars, Sonnet: 80K, GPT-4o: 60K, Groq: 24K). Unused allocation flows to a shared overflow pool that redistributes to sections needing more space. Three presets:
- Balanced: equal distribution
- Structured: parsed tables 25%, CSI spatial 8%, spatial context 12%, raw OCR 5%
- Verbose: raw OCR 40%, spatial 15%, parsed tables 10%
All configurable via admin LLM/Context tab (4 panels: section control, system prompt, budget, context preview tool).
Resolution chain: User API key → Company config → Environment variable. Streaming SSE for all providers.
| Provider | Models | Tool-Use |
|---|---|---|
| Anthropic | Claude Opus/Sonnet/Haiku | Planned |
| OpenAI | GPT-4o, o1/o3 | Planned |
| Groq | Llama 3.3 70B | Partial |
| Custom/Ollama | Any | Context-only |
PDFViewer (root — scroll, zoom, keyboard shortcuts)
├── ViewerToolbar (mode, zoom, search, panel toggles)
├── PageSidebar (thumbnails, page filtering, lazy-loaded)
├── SymbolSearchPanel (floating popup, 4 states)
├── PDFPage (canvas + overlays)
│ ├── SearchHighlightOverlay (memo) — search + CSI word highlights
│ ├── TextAnnotationOverlay — auto-detected text patterns
│ ├── KeynoteOverlay — keynote markers
│ ├── GuidedParseOverlay — draggable grid lines for guided parse
│ ├── ParseRegionLayer (memo) — table/keynote parse BBs
│ └── AnnotationOverlay (orchestrator — ALL mouse events)
│ └── DrawingPreviewLayer (memo) — in-progress BB/polygon/calibration
├── Right panels (toggled):
│ ├── TextPanel, ChatPanel, TakeoffPanel, DetectionPanel
│ ├── CsiPanel, PageIntelligencePanel
│ ├── TableParsePanel (5 tabs: All Tables, Auto Parse, Guided, Manual, Compare/Edit)
│ ├── KeynotePanel (3 tabs: All Keynotes, Auto Parse, Guided)
│ └── TableCompareModal (fullscreen side-by-side + overlay mode)
└── MarkupDialog (modal)
Drawing state decoupling: _drawing, _drawStart, _drawEnd, _mousePos live in Zustand store. AnnotationOverlay writes but does NOT subscribe (uses getState() for reads in event handlers). Only DrawingPreviewLayer subscribes → only lightweight canvas redraws during BB drawing. The main 1500-line overlay stays frozen during mouse movement.
Event handler precedence (AnnotationOverlay): Strict sequential dispatch with early returns. Order matters:
- Calibration mode → place points
- Takeoff placement → polygon/marker
- YOLO/tag picking → select shape
- Symbol search / parse drawing → draw BB
- Pointer mode → select/click/double-click OCR
- Markup mode → draw annotation
Zoom: CSS transform for instant feedback, debounced pdf.js re-render (300ms). Cursor-centric (keeps point under cursor stable during zoom).
Chunking: 15-page sliding window with catalog/detail split. Catalog data (~100KB — trade lists, schedule names, CSI directory) loads once. Detail data (~80KB/page — textract words, page intelligence) loads on demand with eviction. State snapshot before async fetch prevents race conditions during rapid navigation.
Single store (~250 fields, 1,100 lines) with 15 slice selectors (useNavigation(), useProject(), useTableParse(), useYoloTags(), etc.) using useShallow() for memoized subscriptions. Components should use slices rather than direct field access.
| Tab | Purpose |
|---|---|
| Overview | Project list, reprocess controls, system status |
| Pipeline | Processing step toggles, page concurrency, CSI spatial grid resolution, table proposals config |
| AI Models | YOLO model registry, S3 upload, run/load/status per project |
| Heuristics | Rule editor with YOLO class picker (dropdown from registered models), text keywords, spatial conditions |
| LLM / Context | 4 panels: section control (toggle/priority/%), system prompt editor, budget config, context preview tool |
| CSI Codes | CSI database browser, class-level CSI tagging for YOLO models |
| Text Annotations | Detector toggle (30+ types), preview per page |
| Page Intelligence | Classification results, heuristic inferences, reprocess by scope |
| Users | User management, role assignment, API key management |
All customization lives in admin — code provides defaults only. Different companies have different YOLO models with different class names; the admin dashboard is the authority.
| Table | Purpose | Key Fields |
|---|---|---|
| companies | Multi-tenant orgs | pipelineConfig (JSONB), features (JSONB), dataKey (S3 prefix) |
| users | Auth + permissions | role (admin/member), canRunModels, companyId |
| userApiKeys | BYOK LLM keys | provider, encryptedKey (AES-256-GCM) |
| projects | PDF documents | dataUrl (S3), status, projectIntelligence (JSONB), isDemo |
| pages | Per-page OCR data | textractData (JSONB), csiCodes, pageIntelligence, search_vector (GIN) |
| annotations | YOLO + user markups | bbox [minX,minY,maxX,maxY], source (yolo/user/takeoff), data (JSONB) |
| takeoffItems | QTO categories | name, shape, color, size, notes |
| qtoWorkflows | Auto-QTO state | step, parsedSchedule, lineItems, userEdits (JSONB) |
| chatMessages | Chat history | role, content, model, pageNumber (nullable = project-wide) |
| processingJobs | SageMaker tracking | status, stepFunctionArn |
| models | YOLO model registry | s3Path, config (JSONB: classCsiCodes, classKeywords) |
| llmConfigs | LLM provider config | provider, model, encryptedApiKey, baseUrl |
| auditLog | Security audit | action, userId, details (JSONB), ip |
| inviteRequests | Signup queue | email, name, company |
| Layer | Implementation |
|---|---|
| Auth | NextAuth 5 credentials, bcrypt (cost 12), JWT (24hr) |
| Brute force | 5 failures = 15min lockout, 10 = 1hr, per email |
| Rate limiting | In-memory middleware, per-endpoint (auth: 3-5/15min, chat: 30/hr, YOLO: 5/hr, general: 120/min) |
| Multi-tenancy | All queries scoped by companyId. Registration requires company access key |
| API key encryption | AES-256-GCM with random IV + auth tag |
| Quotas | Per-company daily: uploads (3 member/10 admin), YOLO (5), chat (200) |
| Secrets | AWS Secrets Manager via ECS task definition |
| Webhooks | HMAC-SHA256 + timestamp validation (reject > 5min) |
| Audit | Login, registration, create/delete, YOLO runs, password changes |
| Headers | nosniff, DENY frame, XSS protection, strict referrer |
| WAF | AWS WAF on ALB: rate limit (1000/IP), SQLi rules, known bad inputs, IP reputation (via hardening.sh) |
| Monitoring | CloudWatch alarms (5xx, unhealthy hosts, CPU), ALB access logs, GuardDuty, CloudTrail |
| Resource | Config |
|---|---|
| VPC | 10.0.0.0/16, 2 public + 2 private subnets, NAT gateway |
| ECS Fargate | 2 vCPU / 4 GB app, 8 vCPU / 16 GB processing tasks, auto-scaling 1-4 |
| ALB | HTTPS with ACM cert, TLS 1.3, 300s idle, circuit breaker rollback |
| RDS | PostgreSQL 16, db.t4g.medium, 50GB gp3, encrypted, 7-day backups, Multi-AZ |
| S3 + CloudFront | Versioned bucket, OAC, CORS policy, CachingOptimized, TLS 1.2 |
| ECR | 3 repos (app, cpu-pipeline, yolo-pipeline), scan-on-push |
| Step Functions | Blueprint processing orchestrator with retry/catch |
| SageMaker | Processing jobs for YOLO (ml.g4dn.xlarge GPU, on-demand) |
| Secrets Manager | 8 secrets with 7-day recovery window |
Estimated: ~$155/month baseline. SageMaker GPU ~$0.75/hr on-demand.
| Script | Purpose | Called By |
|---|---|---|
yolo_inference.py |
YOLOv8 inference on pages | SageMaker job |
template_match.py |
Two-tier template matching (cv2.matchTemplate + SIFT/FLANN) | /api/symbol-search |
extract_keynotes.py |
Keynote shape extraction + OCR | /api/keynotes/extract |
detect_table_lines.py |
Table grid line detection (Hough transform) | /api/table-parse |
All communicate via JSON stdin/stdout. Called with execFile (not exec) — no shell injection surface.
Prerequisites: Node.js 20+, Docker
docker compose up -d # PostgreSQL on port 5433
npm install
cp .env.example .env.local # Edit: DATABASE_URL, NEXTAUTH_SECRET
npx drizzle-kit migrate
npm run dev # http://localhost:3000Works without AWS credentials — PDF viewing, annotations, QTO, search all functional. For full pipeline: add GROQ_API_KEY (chat), AWS credentials (Textract, S3, SageMaker).
| Runtime | Next.js 16 (App Router), React 19, TypeScript 5 |
| State | Zustand 5 (single store, 15 slice selectors, useShallow() memoization) |
| Rendering | pdfjs-dist 4 (PDF), HTML5 Canvas (7 overlay layers), CSS transform zoom |
| Styling | Tailwind 4, 3 dark themes (Midnight/Slate/Graphite) |
| Database | PostgreSQL 16, Drizzle ORM, 15 migrations |
| Auth | NextAuth 5 (credentials), bcrypt, JWT |
| AI/LLM | Multi-provider (Groq/Anthropic/OpenAI/Custom), streaming SSE, priority-ordered context |
| CV | OpenCV (table lines, keynotes), Tesseract (fallback OCR), YOLOv8/ultralytics |
| Search | PostgreSQL tsvector + GIN index, ts_rank + ts_headline, global cross-project |
| Infra | Terraform, ECS Fargate, S3/CloudFront, RDS, SageMaker, Step Functions |
| Monitoring | CloudWatch alarms, ALB access logs, GuardDuty, WAF, CloudTrail |
CSI codes as universal embedding layer. Every pixel, annotation, tag, table cell, keynote, and spatial zone gets CSI-tagged. The CSI graph is a condensed project representation giving the LLM efficient navigation without reading raw OCR. This is the core architectural insight — structured tags compress information 40:1.
Confidence scores, never binary decisions. Every system outputs 0.0–1.0 confidence. A text region at 0.6 flows downstream; the table classifier might boost it to 0.85 if keywords match, or decay it to 0.3. Multiple independent signals that agree compound confidence.
Normalized coordinates everywhere. All annotations stored as 0-1 ratios. Rendering multiplies by current canvas dimensions. Zoom-independent, DPI-independent, resolution-independent.
No migrations for new annotation types. The data jsonb column on annotations absorbs new types without schema changes. Count markers, area polygons, scale calibrations, and YOLO detections all share one table.
Admin dashboard is the authority. All customization (heuristic rules, YOLO class names, CSI mappings, LLM config, pipeline toggles) lives in admin UI. Code provides defaults only. Different companies have different models with different class names — the admin adapts without code changes.
Two-phase heuristic execution. Rules work in text-only mode during OCR processing, then get re-scored with YOLO spatial signals after model load. Same rules, richer evidence.
Graceful degradation everywhere. Each processing step has independent try-catch. Textract failure falls through to Tesseract then to empty. Missing YOLO data → text-only heuristics. Missing CSI database → empty codes. The pipeline always completes.
BlueprintParser is source-available software.
Free for:
- Personal use
- Research
- Education
- Internal evaluation
Commercial use requires a separate written license.