The Unified Lakebase Extension Suite for PostgreSQL.
pg-lakebase makes PostgreSQL a first-class citizen in the modern Lakebase ecosystem. By implementing high-performance Table Access Methods (TAM) and Foreign Data Wrappers (FDW) in Rust — backed by a dedicated local caching storage service — it allows PostgreSQL to query and manage open table formats with native-like performance and semantics.
The current runnable extension is pg-iceberg-am, a PostgreSQL Table Access
Method (TAM) for Apache Iceberg tables. It uses pg-lakebase-core for the TAM
framework, iceberg-lite for Iceberg metadata and file format logic, and pgrx
for PostgreSQL integration.
pg-iceberg-amis the primary SQL-facing extension. Its local Iceberg table storage path is the default and most exercised path, using PostgreSQL's local file APIs and a custom WAL resource manager for crash recovery.- Predicate pushdown is supported through a CustomScan provider: SQL
WHEREpredicates are pushed into the Iceberg scan for file/row-group pruning and row-level filtering, instead of scanning everything and filtering in the executor. - Object storage is available through distributed tablespaces backed by
pg-lakebase-storage, a Unix-socket cache service. The storage layer supports AWS S3, S3-compatible endpoints, Google Cloud Storage, and Azure Blob Storage. pg-lakebase-corecurrently exposes a TAM framework plus a generic CustomScan filter-pushdown framework, andpg-arrow-convprovides the format-neutral Arrow⇆PostgreSQL value conversion both the scan and DML paths rely on. FDW support is still a project direction, not a completed public API.- Iceberg transaction-local visibility is handled by the metadata tracker using
an in-memory
SnapshotDeltaoverlay. Each statement reads the latest committed Iceberg metadata pointer and layers the current transaction's staged file operations on top; top-level commit materializes that delta once and publishes it with catalog CAS. This removes the need for statement-time intermediate metadata files or a PostgreSQL heap-table file catalog.
PostgreSQL backend
|
| pgrx hooks (TAM / FDW)
v
+------------------+ +---------------------+
| pg-iceberg-am | ---> | pg-lakebase-core |
| (Iceberg TAM) | | (framework traits) |
+------------------+ +---------------------+
/ \
local storage object storage
(VFD + WAL) (Unix domain socket)
/ \
v v
local filesystem +-------------------------------+
| pg-lakebase-storage |
| transport | protocol | conn |
| service | backend | cache |
+-------------------------------+
| |
v v
local disk cache S3 / S3-compatible / GCS / Azure
(redb + files) (object_store)
pg-iceberg-am supports two storage paths depending on the tablespace:
- Local storage: reads and writes go directly through PostgreSQL's Virtual File Descriptor (VFD) system with optional WAL logging for crash consistency.
- Object storage: the database process communicates with
pg-lakebase-storageover Unix domain sockets. Reads of cached files use a localpreadfast path that bypasses the socket entirely; control operations (open, head, miss fetch, upload) go over the socket. Cache misses are transparently fetched from AWS S3, S3-compatible endpoints, Google Cloud Storage, or Azure Blob Storage. Writes go through an explicit stage → commit flow tied to database transaction boundaries.
Object-storage tablespaces intentionally use the PostgreSQL tablespace name as
the storage-service store_id, so cache and staging paths remain readable on
disk. Because that name is part of the storage identity, renaming a distributed
tablespace is unsupported.
Tablespace options currently expose protocol=s3, protocol=gcs, and
protocol=azure; use protocol=s3 with a custom endpoint for S3-compatible
services.
Distributed tablespace credentials are currently stored in
pg_tablespace.spcoptions. They are redacted from Rust Debug output, but the
catalog value itself is not encrypted; production deployments should prefer
credential references, IAM-style ambient credentials, or another secret manager
once that integration exists.
| Crate | Purpose |
|---|---|
| pg-iceberg-am | PostgreSQL extension implementing the Iceberg table access method. |
| pg-lakebase-core | Framework crate for PostgreSQL TAM implementations and CustomScan predicate pushdown. |
| pg-arrow-conv | Format-neutral Arrow⇆PostgreSQL value conversion layer shared by Arrow-backed access methods. |
| pg-backend-tests | Single test-only extension hosting the backend (#[pg_test]) tests for the framework library crates. |
| pg-lakebase-macros | Procedural macro support, including #[pg_table_am]. |
| iceberg-lite | Synchronous, PostgreSQL-friendly Iceberg library used by the TAM. |
| pg-lakebase-storage | Local object-storage caching service library. |
| xtask | Workspace maintenance commands: test-all, isolation. |
- Rust 1.96.0 or later
- PostgreSQL 17, including server development files, or a pgrx-managed PostgreSQL 17 downloaded during setup
cargo-pgrx0.18.1
Register PostgreSQL 17 with pgrx. Use either an existing pg_config or let
pgrx download PostgreSQL:
cargo pgrx init --pg17=/path/to/pg_config
# or
cargo pgrx init --pg17=downloadBuild the Iceberg extension crate:
cargo build --package pg-iceberg-amInstall the extension into the PostgreSQL instance you want to use. Pass the
target PostgreSQL 17 pg_config, whether it comes from pgrx-managed PostgreSQL
or an existing PostgreSQL installation:
cargo pgrx install --package pg-iceberg-am --pg-config /path/to/pg_configThen start or restart PostgreSQL with shared_preload_libraries='pg_iceberg_am'.
For a pgrx-managed PostgreSQL 17:
cargo pgrx start pg17 \
--package pg-iceberg-am \
--postgresql-conf "shared_preload_libraries='pg_iceberg_am'"
cargo pgrx connect pg17 --package pg-iceberg-amIf the pgrx-managed PostgreSQL instance is already running, stop it before
starting it again so shared_preload_libraries is applied.
For an existing PostgreSQL 17, update postgresql.conf:
shared_preload_libraries = 'pg_iceberg_am'Then restart PostgreSQL and connect to the target database.
After modifying code, run the standard test suite:
cargo xtask test-all pg17This runs unit tests, pgrx tests, SQL regression, and isolation tests.
Regression SQL lives in pg-iceberg-am/tests/pg_regress/sql,
isolation specs in pg-iceberg-am/tests/isolation/specs,
and isolation results are written to target/isolation/pg17/output_iso/.
Build a distributable directory of extension artifacts:
cargo pgrx package --package pg-iceberg-am --pg-config "$(cargo pgrx info pg-config pg17)"Use package when you want to copy the extension artifacts into an image, VM,
or distro package instead of installing directly into a local PostgreSQL
installation.
Create the extension once in each database that uses Iceberg tables:
CREATE EXTENSION IF NOT EXISTS pg_iceberg_am;Create a local Iceberg table in PostgreSQL's default tablespace:
CREATE TABLE events (
id int,
payload text,
created_at timestamp
) USING iceberg;
INSERT INTO events VALUES
(1, 'hello', now()),
(2, 'lakebase', now());
SELECT * FROM events ORDER BY id;To use a regular PostgreSQL local tablespace, create the tablespace first and then place the Iceberg table in it:
CREATE TABLESPACE lake_local LOCATION '/path/to/local/tablespace';
CREATE TABLE local_events (
id int,
payload text
) USING iceberg TABLESPACE lake_local;To use object storage, create a distributed tablespace and then place the
Iceberg table in it. PostgreSQL still requires a local LOCATION directory for
the tablespace metadata.
CREATE TABLESPACE lake_s3 LOCATION '/path/to/local/tablespace' WITH (
protocol = 's3',
bucket = 'my-lake-bucket',
region = 'us-east-1'
);
CREATE TABLE object_events (
id int,
payload text
) USING iceberg TABLESPACE lake_s3;
INSERT INTO object_events VALUES
(1, 'hello'),
(2, 'lakebase');
SELECT * FROM object_events ORDER BY id;For S3-compatible services, keep protocol = 's3' and set endpoint.
The items below are project directions, not committed releases. Each links to the design notes that describe the problem and the intended approach in more detail.
| Area | Goal | Status | Design notes |
|---|---|---|---|
| DataFusion query offload | Push lake-table query fragments such as joins, aggregates, sort, and limit into embedded DataFusion execution over Arrow batches, with PostgreSQL receiving only the final result rows. Layers on top of the existing single-relation filter pushdown and falls back when a fragment cannot be offloaded. | Design roadmap, no implementation | datafusion-offload-roadmap.md |
| Iceberg metadata tracker | Maintain PostgreSQL-style Read Committed visibility by overlaying each transaction's staged Iceberg file delta on the latest committed metadata, then materializing that delta once at top-level commit with catalog CAS. A PostgreSQL heap-table file catalog is not part of the current roadmap. | Core overlay implemented; hardening and broader DML coverage remain | catalog/README.md |
| Expanded DML | Support UPDATE and DELETE in addition to INSERT, using Iceberg position/equality delete files. |
Planned | — |
| Expanded DDL | Support schema evolution through ALTER TABLE (add / drop / rename column, type changes where Iceberg allows). |
Planned | — |
| Partitioned tables | Support creating and querying Iceberg partitioned tables, including partition-aware pruning on the scan path. | Planned | — |
- Core framework
- Arrow⇆PostgreSQL conversion
- Backend integration tests
- Iceberg access method
- Storage service
- Storage design
This project is licensed under the Apache License 2.0. See LICENSE for details.