Skip to content

add caching in hashTreeRoot#62

Open
anshalshukla wants to merge 4 commits into
masterfrom
anshalshukla/merkle-cache
Open

add caching in hashTreeRoot#62
anshalshukla wants to merge 4 commits into
masterfrom
anshalshukla/merkle-cache

Conversation

@anshalshukla
Copy link
Copy Markdown
Contributor

Adds caching in hashing path for List and BitList

  • Supports caching only when hashing with Sha256
  • init API doesn't take in Hasher as an input so CacheType cannot be Hasher agnostic without API changes
  • Can support multiple Hasher in CacheType either by updating init API or by storing a list of CacheType which can be appended based on the Hasher call in hashTreeRoot

@anshalshukla anshalshukla requested a review from gballet as a code owner April 28, 2026 06:14
Copy link
Copy Markdown
Collaborator

@gballet gballet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing, I think that there is a fundamental problem with how the cached hashing has been designed:

  • There is no reason to make it Sha256-specific, it should be easy to make it agnostic
  • It should be possible not to use caching, which isn't the case here
  • imo, the cached hashing should be a wrapper around the regular root-hashing function, e.g. by introducing a TreeHasher interface, with a hashTreeRoot function - which is Hasher-agnostic (and passed as a parameter). If caching is active, then the wrapper is used, otherwise the regular object is used.

Comment thread src/merkle_cache.zig
// Tests
const Sha256 = std.crypto.hash.sha2.Sha256;
const lib = @import("./lib.zig");

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this at the start of the file, or just import it inside the tests that need it

Comment thread src/utils.zig Outdated
Comment on lines +36 to +38
/// Hasher-agnostic cache type. We use SHA256 as the default since that's the
/// standard SSZ hasher, but the cache is only used when hashTreeRoot is called
/// with a matching hasher.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't look correct. Either it's Sha256-specific, or it's hasher-agnostic. Looks like slop, I suggest removing it.

Comment thread src/merkle_cache.zig Outdated
const std = @import("std");
const zeros = @import("./zeros.zig");

const BYTES_PER_CHUNK = 32;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already defined in src/utils.zig

Comment thread src/merkle_cache.zig Outdated

const BYTES_PER_CHUNK = 32;
const chunk = [BYTES_PER_CHUNK]u8;
const zero_chunk: chunk = [_]u8{0} ** BYTES_PER_CHUNK;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment thread src/merkle_cache.zig Outdated
Comment on lines +9 to +10
/// Node 1 is the root. Node i has children 2i and 2i+1.
/// Leaves occupy indices [capacity .. 2*capacity).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop the slop, it is obvious from reading the code.

Suggested change
/// Node 1 is the root. Node i has children 2i and 2i+1.
/// Leaves occupy indices [capacity .. 2*capacity).

Comment thread src/utils.zig Outdated
.int => blk: {
const bytes_per_item = @sizeOf(Item);
const items_per_chunk = BYTES_PER_CHUNK / bytes_per_item;
break :blk element_index / items_per_chunk;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what of u384 ?

Comment thread src/utils.zig Outdated
}

/// Free the Merkle cache without freeing the list itself.
/// Called by lib.hashTreeRoot to clean up caches on value copies.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Called by lib.hashTreeRoot to clean up caches on value copies.

Comment thread src/utils.zig Outdated
}
cache.nodes[cache.capacity + chunk_idx] = leaf;
}
// Zero out chunks beyond data
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Zero out chunks beyond data

Comment thread src/utils.zig Outdated
const end_item = @min(start_item + items_per_chunk, items.len);
for (start_item..end_item) |item_i| {
const pos = (item_i % items_per_chunk) * bytes_per_item;
// SSZ requires little-endian encoding
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// SSZ requires little-endian encoding

Comment thread src/utils.zig Outdated
Comment on lines +289 to +293
// Composite types: each item is its own chunk (hash tree root of item)
// Composite items may contain pointers to mutable data that
// can change without going through set(), so we must always
// recompute all item hashes. We compare against cached leaves
// to only mark actually-changed nodes dirty for tree rehashing.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the cache bringing any performance, then? Most of the types we want are composite types, so if they are recomputed no matter what, that's a lot of performance left on the table.

Comment thread src/lib.zig
Comment on lines +423 to +426
const min_len: ?usize = comptime if (enforce_min) (minInLength(T) catch null) else null;
if (min_len) |m| if (serialized.len < m) return error.PayloadTooSmall;

const max_len: ?usize = if (enforce_max) blk: {
const m = maxInLength(T) catch break :blk null;
break :blk m;
} else null;
const max_len: ?usize = comptime if (enforce_max) (maxInLength(T) catch null) else null;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to cache min and max length because they have been made comptime and deserialize functions called for types whose size can be calculated will be predetermined and added to binary itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants