add caching in hashTreeRoot#62
Conversation
gballet
left a comment
There was a problem hiding this comment.
After reviewing, I think that there is a fundamental problem with how the cached hashing has been designed:
- There is no reason to make it
Sha256-specific, it should be easy to make it agnostic - It should be possible not to use caching, which isn't the case here
- imo, the cached hashing should be a wrapper around the regular root-hashing function, e.g. by introducing a
TreeHasherinterface, with ahashTreeRootfunction - which isHasher-agnostic (and passed as a parameter). If caching is active, then the wrapper is used, otherwise the regular object is used.
| // Tests | ||
| const Sha256 = std.crypto.hash.sha2.Sha256; | ||
| const lib = @import("./lib.zig"); | ||
|
|
There was a problem hiding this comment.
put this at the start of the file, or just import it inside the tests that need it
| /// Hasher-agnostic cache type. We use SHA256 as the default since that's the | ||
| /// standard SSZ hasher, but the cache is only used when hashTreeRoot is called | ||
| /// with a matching hasher. |
There was a problem hiding this comment.
This comment doesn't look correct. Either it's Sha256-specific, or it's hasher-agnostic. Looks like slop, I suggest removing it.
| const std = @import("std"); | ||
| const zeros = @import("./zeros.zig"); | ||
|
|
||
| const BYTES_PER_CHUNK = 32; |
There was a problem hiding this comment.
this is already defined in src/utils.zig
|
|
||
| const BYTES_PER_CHUNK = 32; | ||
| const chunk = [BYTES_PER_CHUNK]u8; | ||
| const zero_chunk: chunk = [_]u8{0} ** BYTES_PER_CHUNK; |
| /// Node 1 is the root. Node i has children 2i and 2i+1. | ||
| /// Leaves occupy indices [capacity .. 2*capacity). |
There was a problem hiding this comment.
stop the slop, it is obvious from reading the code.
| /// Node 1 is the root. Node i has children 2i and 2i+1. | |
| /// Leaves occupy indices [capacity .. 2*capacity). |
| .int => blk: { | ||
| const bytes_per_item = @sizeOf(Item); | ||
| const items_per_chunk = BYTES_PER_CHUNK / bytes_per_item; | ||
| break :blk element_index / items_per_chunk; |
| } | ||
|
|
||
| /// Free the Merkle cache without freeing the list itself. | ||
| /// Called by lib.hashTreeRoot to clean up caches on value copies. |
There was a problem hiding this comment.
| /// Called by lib.hashTreeRoot to clean up caches on value copies. |
| } | ||
| cache.nodes[cache.capacity + chunk_idx] = leaf; | ||
| } | ||
| // Zero out chunks beyond data |
There was a problem hiding this comment.
| // Zero out chunks beyond data |
| const end_item = @min(start_item + items_per_chunk, items.len); | ||
| for (start_item..end_item) |item_i| { | ||
| const pos = (item_i % items_per_chunk) * bytes_per_item; | ||
| // SSZ requires little-endian encoding |
There was a problem hiding this comment.
| // SSZ requires little-endian encoding |
| // Composite types: each item is its own chunk (hash tree root of item) | ||
| // Composite items may contain pointers to mutable data that | ||
| // can change without going through set(), so we must always | ||
| // recompute all item hashes. We compare against cached leaves | ||
| // to only mark actually-changed nodes dirty for tree rehashing. |
There was a problem hiding this comment.
Is the cache bringing any performance, then? Most of the types we want are composite types, so if they are recomputed no matter what, that's a lot of performance left on the table.
| const min_len: ?usize = comptime if (enforce_min) (minInLength(T) catch null) else null; | ||
| if (min_len) |m| if (serialized.len < m) return error.PayloadTooSmall; | ||
|
|
||
| const max_len: ?usize = if (enforce_max) blk: { | ||
| const m = maxInLength(T) catch break :blk null; | ||
| break :blk m; | ||
| } else null; | ||
| const max_len: ?usize = comptime if (enforce_max) (maxInLength(T) catch null) else null; |
There was a problem hiding this comment.
We don't need to cache min and max length because they have been made comptime and deserialize functions called for types whose size can be calculated will be predetermined and added to binary itself
Adds caching in hashing path for List and BitList