Skip to content

Feat: Delta Storage System with SQL View Architecture#42

Open
arnavg24-arch wants to merge 21 commits into
mainfrom
feat-delta-storage
Open

Feat: Delta Storage System with SQL View Architecture#42
arnavg24-arch wants to merge 21 commits into
mainfrom
feat-delta-storage

Conversation

@arnavg24-arch
Copy link
Copy Markdown
Collaborator

@arnavg24-arch arnavg24-arch commented May 27, 2026

Overview

This PR implements a comprehensive Delta Storage system for the Buckaroo Visual Wrangler. The goal was to transition from simple data cloning to a functional, space-efficient architecture that supports full wrangling provenance and reproducible Pandas script generation.

Technical Implementation (Architecture : SQL Views)

Instead of physical state cloning, we have implemented a Functional Nesting Architecture using PostgreSQL Views:

Zero-Storage History
Each wrangling step is now a virtual SQL View rather than a hard table. This ensures the database size remains constant regardless of the number of undo/redo steps.

Virtualized Operation Mapping
Operations like delete-column, impute, and row-deletion are mapped to dynamic SQL rules such as CASE WHEN, WHERE NOT IN, and SELECT.

Nesting Logic
Each new node view refers to its parent view, creating a logical chain that the database engine unrolls for near-instant execution.

New Components

Delta Class (delta.py)
A standard data structure for capturing the intent of an operation, its parameters, and its translated Python code.

Pandas Mapper (pandas_mapper.py)
A translation layer that converts SQL-based visual selections into executable, production-ready Pandas code snippets.

Provenance Reassembler
Updated PGraph logic to traverse the history chain and stitch together a full, reproducible Python script for export.

Refactored Service Helpers
Integrated view generation into the preview and finalization pipelines.

Key Benefits

Efficiency
Prevents database bloat on large datasets.

Reproducibility
Added a new endpoint to provide users with the exact code needed to replicate their visual work in a Jupyter Notebook.

Scalability
Architecture matches modern data warehouse paradigms used in professional data tools.

@arnavg24-arch arnavg24-arch added the enhancement New feature or request label May 27, 2026
@arnavg24-arch arnavg24-arch self-assigned this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant