feat: Implement icebug-disk format by aheev · Pull Request #429 · LadybugDB/ladybug

aheev · 2026-04-28T08:52:06Z

Added IceDisk storage tables: Added IceDiskNodeTable and IceDiskRelTable
IceDisk test suite: Added demo_db_ice_disk.test covering node/rel table scans against IceDisk storage.
IceDisk utilities: Added ice_disk_utils.h with shared helpers for path resolution and Parquet-backed row group access.
added icebug-disk spec implementation

Closes #469

adsharma · 2026-05-08T14:52:49Z

How's this different from:

src/include/storage/table/parquet_rel_table.h
src/include/storage/table/parquet_node_table.h

aheev · 2026-05-08T14:55:53Z

How's this different from:

src/include/storage/table/parquet_rel_table.h
src/include/storage/table/parquet_node_table.h

updating description. Give me a minute

aheev · 2026-05-08T15:15:48Z

@adsharma

Improvements:

Batch scanning in node table
revamped rel_table scanning. Previously, we used to scan every row group for each boundNodeOffset. Changed it to boundNodeOffset driven scan. Based on fwd or bwd scan, we'll fetch the rowGroups
Other improvements like removing unnecessary data from scanStates, removing repeated state data population, caching
blocking user from creating mixed tables/rels

However, the main motivation behind adding new classes is having separate entities for icebug-format because it may diverge from existing columnar implementations. As discussed earlier, we are planning to use/replace exisiting parquet or arrow classes for normal tables or delegate to duckDB anyway

aheev · 2026-05-08T15:23:06Z

@adsharma

With this implementation, we can have a working db as an output from non-icedisk-to-icedisk-converter or user can pass the schema.cypher or run the cmds themselves to create icebug-disk tables. If you want to attach from existing db, we already support attaching external ladybug dbs

The main issue is path though. If the user moves the db or path, he/she might need to trigger create table queries again. DuckDB provides override option. We may have to provide similar option in the future

Couple of clarifications:

Where to add version? I am thinking metadata in each file
Should we keep the current column(0) for nbr_offset in in indptr? or allow users to specify

adsharma · 2026-05-08T15:29:53Z

The improvements are nice. But we can't have two implementations with 2x the bugs. Need to remove one and justify the other with benchmarks/data comparing the improvement.

DuckDB and Lance could be additional ColumnarBaseTable implementations. They don't remove the need for parquet.

adsharma · 2026-05-08T15:37:59Z

uvx icebug-format --help

It already creates a metadata table thusly:

        # Create global metadata
        con.execute(f"""
        CREATE TABLE {csr_table_name}_metadata AS
        SELECT {total_nodes} AS n_nodes, {total_edges} AS n_edges, {directed} AS directed
        """)

Yes, this is a good place to add version.

Should we keep the current column(0) for nbr_offset in in indptr? or allow users to specify

nbr_offset is kuzu/ladybug specific terminology. I'd avoid it in the code/docs. CSR format is broadly defined. Include a pointer to existing definition. My recollection is that indptr always has only one column specifying the offset into indices. However, indices can have many columns if there are properties on the edge.

We can specify that target is always col0 and additional columns appear in an order as specified in the schema.

The distinction between schema.cypher and catalog entry are mechanical. I don't think we should spend a lot of time specifying that or we'll be duplicating a significant chunk of docs.ladybugdb.com.

However, getting other Graph databases and analytical packages to adopt icebug is an explicit goal. So I'd try to strike a balance between over-specifying (as opposed to just refer to reference implementation) and adding ladybug specific assumptions that would rub the non-ladybug people the wrong way.

aheev · 2026-05-08T16:28:09Z

dataset addition PR: LadybugDB/dataset#1

aheev · 2026-05-08T16:33:14Z

The improvements are nice. But we can't have two implementations with 2x the bugs. Need to remove one and justify the other with benchmarks/data comparing the improvement.

DuckDB and Lance could be additional ColumnarBaseTable implementations. They don't remove the need for parquet.

I will run a benchmark tmrw and attach the results

But, what about support for normal parquet tables? We can just remove them from docs until we change the existing classes to normal tables

aheev · 2026-05-08T16:36:15Z

uvx icebug-format --help

It already creates a metadata table thusly:

        # Create global metadata
        con.execute(f"""
        CREATE TABLE {csr_table_name}_metadata AS
        SELECT {total_nodes} AS n_nodes, {total_edges} AS n_edges, {directed} AS directed
        """)

I will update the tool and add it in this repo tmrw

adsharma · 2026-05-08T17:22:42Z

I will update the tool and add it in this repo tmrw

Ladybug-Memory/icebug-format#2 (comment)
Ladybug-Memory/icebug-format#2 (comment)

If the issue is icebug-format is in a for profit company namespace, we can move it some place else. But ladybugdb core repo is the wrong place for the spec and the script.

aheev force-pushed the icedisk-impl branch 3 times, most recently from 7f26fa3 to 2c9f811 Compare May 7, 2026 05:22

aheev added 13 commits May 8, 2026 14:32

feat: add icebug-disk tables

ac3d00b

fix tests in demo_db ice_disk test

72bf56f

fix path validations

022f363

fix IceDiskNodeTable::getNumTotalRows

b107b98

fix IceDiskNodeTable scan init

806438c

fix ice disk node table scan

1e8fb4f

move rowGroupStartOffsets into ice disk node table shared state

5662afb

remove this in ice_disk_node_table.cpp

ec109fb

fix ice-disk rel scan

67188d7

move const data out of ice-disk shared states

ee16b9c

fix demo_db ice_disk storage paths

1a682d7

fix ice-disk node table scan initScanState

0e5175d

fix reset indicesRowGroupStartOffsets

4ea758c

aheev force-pushed the icedisk-impl branch from 1f9bae9 to 69f32c5 Compare May 8, 2026 09:07

fix/optimize ice-disk rel table scan

4c5f9bf

aheev force-pushed the icedisk-impl branch from 69f32c5 to 4c5f9bf Compare May 8, 2026 09:41

aheev changed the title ~~[WIP] feat: add icebug-disk tables~~ feat: add icebug-disk tables May 8, 2026

revert table_path, indptr, indices options

09ef657

aheev force-pushed the icedisk-impl branch from 8cdeae7 to 09ef657 Compare May 8, 2026 14:47

aheev changed the title ~~feat: add icebug-disk tables~~ feat: Implement icebug-disk format May 8, 2026

aheev mentioned this pull request May 8, 2026

add icebug-disk dataset LadybugDB/dataset#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement icebug-disk format#429

feat: Implement icebug-disk format#429
aheev wants to merge 15 commits intoLadybugDB:mainfrom
aheev:icedisk-impl

aheev commented Apr 28, 2026 •

edited

Loading

Uh oh!

adsharma commented May 8, 2026

Uh oh!

aheev commented May 8, 2026 •

edited

Loading

Uh oh!

aheev commented May 8, 2026

Uh oh!

aheev commented May 8, 2026 •

edited

Loading

Uh oh!

adsharma commented May 8, 2026

Uh oh!

adsharma commented May 8, 2026

Uh oh!

aheev commented May 8, 2026

Uh oh!

aheev commented May 8, 2026

Uh oh!

aheev commented May 8, 2026

Uh oh!

adsharma commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aheev commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adsharma commented May 8, 2026

Uh oh!

aheev commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aheev commented May 8, 2026

Uh oh!

aheev commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adsharma commented May 8, 2026

Uh oh!

adsharma commented May 8, 2026

Uh oh!

aheev commented May 8, 2026

Uh oh!

aheev commented May 8, 2026

Uh oh!

aheev commented May 8, 2026

Uh oh!

adsharma commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aheev commented Apr 28, 2026 •

edited

Loading

aheev commented May 8, 2026 •

edited

Loading

aheev commented May 8, 2026 •

edited

Loading