Skip to content

fix: Option.hash distinguishes None from Some x when (f x = 0)#14543

Merged
rgrinberg merged 1 commit into
ocaml:mainfrom
robinbb:robinbb-option-hash-collision
May 15, 2026
Merged

fix: Option.hash distinguishes None from Some x when (f x = 0)#14543
rgrinberg merged 1 commit into
ocaml:mainfrom
robinbb:robinbb-option-hash-collision

Conversation

@robinbb
Copy link
Copy Markdown
Collaborator

@robinbb robinbb commented May 15, 2026

Summary

Option.hash f None and Option.hash f (Some x) previously both went through Stdlib.Hashtbl.hash, which gave equal results whenever f x = 0 because None shares OCaml's tag-0 runtime representation with the integer 0. So under stdune's existing hashers:

  • Option.hash Bool.hash (Some false) collided with Option.hash Bool.hash None (Bool.hash false = 0).
  • Option.hash Unit.hash (Some ()) collided with Option.hash Unit.hash None (Unit.hash () = 0).

This PR replaces the structural hash with a tagged recurrence — None -> 0, Some s -> (f s * 31) + 1 — so the two cases land in disjoint buckets. Added an expect test in otherlibs/stdune/test/option_tests.ml covering the Bool and Unit cases that previously collided.

Discovered while surveying allocation-heavy hash sites for a follow-up to #14542.

Previously `Option.hash f None` and `Option.hash f (Some x)` both
went through `Stdlib.Hashtbl.hash`, which gives equal results
whenever `f x = 0` because `None` shares OCaml's tag-0 runtime
representation with the integer 0. This collapses, for example,
`Some false` and `None` under `Bool.hash`, and `Some ()` and `None`
under `Unit.hash`.

Switch to a non-structural recurrence that tags the `Some` branch
with `+ 1`, so the two cases occupy disjoint buckets.

Signed-off-by: Robin Bate Boerop <me@robinbb.com>
@robinbb robinbb requested a review from Copilot May 15, 2026 03:11
@robinbb robinbb self-assigned this May 15, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Stdune.Option.hash so None no longer collides with Some x when the supplied element hasher returns 0, and adds regression coverage for the previously affected Bool and Unit cases.

Changes:

  • Replaces structural hashing for options with a tagged integer recurrence.
  • Adds an expect test covering Option.hash Bool.hash (Some false) and Option.hash Unit.hash (Some ()).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
otherlibs/stdune/src/option.ml Updates Option.hash implementation to distinguish None from Some values whose inner hash is 0.
otherlibs/stdune/test/option_tests.ml Adds expect-test coverage for the fixed Bool and Unit hash collision cases.

@robinbb robinbb requested a review from rgrinberg May 15, 2026 03:14
@robinbb robinbb marked this pull request as ready for review May 15, 2026 16:15
@rgrinberg rgrinberg merged commit 3aa5afc into ocaml:main May 15, 2026
35 checks passed
@robinbb robinbb deleted the robinbb-option-hash-collision branch May 15, 2026 17:37
robinbb added a commit to robinbb/dune that referenced this pull request May 27, 2026
Extract the multiplicative `(acc * 31) + x` accumulator into a new
`Stdune.Hash` module with the API requested in review:

    type t
    val create : unit -> t
    val feed   : t -> int -> t
    val hash   : t -> int

Route the structural hash combinators through it:

- `Stdune.Tuple.T2.hash`, `Stdune.Tuple.T3.hash`, `Stdune.List.hash`
  drop the allocations (a fresh inner tuple / a fresh `int list`)
  that they previously paid before handing off to `Hashtbl.hash`.

- `Stdune.Option.hash` stops allocating via `Hashtbl.hash` and gains
  an explicit discriminator byte. The `None` vs `Some 0` collision
  fixed in ocaml#14543 is preserved by construction.

- `Stdune.Result.hash` likewise stops allocating and gains explicit
  discriminator bytes. This is a drive-by bugfix: the previous
  implementation collided `Ok x` with `Error y` whenever the inner
  hashes were equal (e.g. `Ok 0` and `Error 0`), the same shape as
  the bug `Stdune.Option.hash` had pre-ocaml#14543.

- `Resolve.error_hash` (`src/dune_rules/resolve.ml`) drops its 2-tuple
  allocation and its per-call `List.map ~f:Lazy.force` (which builds
  a fresh `int list` of length |stack_frames|), folding the forced
  frames into the accumulator instead.

Same `'a -> 'a -> int` callsite API; callers don't change. Hash
values shift, but these functions feed `Hashtbl` / `Memo` bucket
selectors only — no persistent serialization, no cross-process
invariants, so the shift is invisible to consumers.

Signed-off-by: Robin Bate Boerop <me@robinbb.com>
robinbb added a commit to robinbb/dune that referenced this pull request May 28, 2026
Extract the multiplicative `(acc * 31) + x` accumulator into a new
`Stdune.Hash` module with the API requested in review:

    type t
    val create : unit -> t
    val feed   : t -> int -> t
    val hash   : t -> int

Route the structural hash combinators through it:

- `Stdune.Tuple.T2.hash`, `Stdune.Tuple.T3.hash`, `Stdune.List.hash`
  drop the allocations (a fresh inner tuple / a fresh `int list`)
  that they previously paid before handing off to `Hashtbl.hash`.

- `Stdune.Result.hash` likewise stops allocating and gains explicit
  discriminator bytes. This is a drive-by bugfix: the previous
  implementation collided `Ok x` with `Error y` whenever the inner
  hashes were equal (e.g. `Ok 0` and `Error 0`), the same shape as
  the bug `Stdune.Option.hash` had pre-ocaml#14543.

- `Resolve.error_hash` (`src/dune_rules/resolve.ml`) drops its 2-tuple
  allocation and its per-call `List.map ~f:Lazy.force` (which builds
  a fresh `int list` of length |stack_frames|), folding the forced
  frames into the accumulator instead.

`Stdune.Option.hash` is already optimal on main as of ocaml#14543 and is
left unchanged.

Same `'a -> 'a -> int` callsite API; callers don't change. Hash
values shift, but these functions feed `Hashtbl` / `Memo` bucket
selectors only — no persistent serialization, no cross-process
invariants, so the shift is invisible to consumers.

Signed-off-by: Robin Bate Boerop <me@robinbb.com>
robinbb added a commit to robinbb/dune that referenced this pull request May 29, 2026
Extract the multiplicative `(acc * 31) + x` accumulator into a new
`Stdune.Hash` module with the API requested in review:

    type t
    val create : unit -> t
    val feed   : t -> int -> t
    val hash   : t -> int

Route the structural hash combinators through it:

- `Stdune.Tuple.T2.hash`, `Stdune.Tuple.T3.hash`, `Stdune.List.hash`
  drop the allocations (a fresh inner tuple / a fresh `int list`)
  that they previously paid before handing off to `Hashtbl.hash`.

- `Stdune.Result.hash` likewise stops allocating and gains explicit
  discriminator bytes. This is a drive-by bugfix: the previous
  implementation collided `Ok x` with `Error y` whenever the inner
  hashes were equal (e.g. `Ok 0` and `Error 0`), the same shape as
  the bug `Stdune.Option.hash` had pre-ocaml#14543.

- `Resolve.error_hash` (`src/dune_rules/resolve.ml`) drops its 2-tuple
  allocation and its per-call `List.map ~f:Lazy.force` (which builds
  a fresh `int list` of length |stack_frames|), folding the forced
  frames into the accumulator instead.

`Stdune.Option.hash` is already optimal on main as of ocaml#14543 and is
left unchanged.

Direct consumption of ocaml#14605: `src/source/opam_switch.ml` already
included `Repr.Poly`, but a redundant manual `let hash` shadowed the
inherited one. That definition is now dropped. This is a single
demonstration site for the elimination-by-Repr.Poly direction
referenced in the review; a broader sweep is out of scope here
because `Repr.Poly`'s `validate_compare` rejects `Repr.view`, which
pervades dune's wrapped-string newtypes (paths, lib-names, opam
URLs, checksums, package names, ...).

Same `'a -> 'a -> int` callsite API; callers don't change. Hash
values shift, but these functions feed `Hashtbl` / `Memo` bucket
selectors only — no persistent serialization, no cross-process
invariants, so the shift is invisible to consumers.

Signed-off-by: Robin Bate Boerop <me@robinbb.com>
robinbb added a commit to robinbb/dune that referenced this pull request May 29, 2026
Extract the multiplicative `(acc * 31) + x` accumulator into a new
`Stdune.Hash` module with the API requested in review:

    type t
    val create : unit -> t
    val feed   : t -> int -> t
    val hash   : t -> int

Route the structural hash combinators through it:

- `Stdune.Tuple.T2.hash`, `Stdune.Tuple.T3.hash`, `Stdune.List.hash`
  drop the allocations (a fresh inner tuple / a fresh `int list`)
  that they previously paid before handing off to `Hashtbl.hash`.

- `Stdune.Result.hash` likewise stops allocating and gains explicit
  discriminator bytes. This is a drive-by bugfix: the previous
  implementation collided `Ok x` with `Error y` whenever the inner
  hashes were equal (e.g. `Ok 0` and `Error 0`), the same shape as
  the bug `Stdune.Option.hash` had pre-ocaml#14543.

- `Resolve.error_hash` (`src/dune_rules/resolve.ml`) drops its 2-tuple
  allocation and its per-call `List.map ~f:Lazy.force` (which builds
  a fresh `int list` of length |stack_frames|), folding the forced
  frames into the accumulator instead.

`Stdune.Option.hash` is already optimal on main as of ocaml#14543 and is
left unchanged.

Direct consumption of ocaml#14605: `src/source/opam_switch.ml` already
included `Repr.Poly`, but a redundant manual `let hash` shadowed the
inherited one. That definition is now dropped. This is a single
demonstration site for the elimination-by-Repr.Poly direction
referenced in the review; a broader sweep is out of scope here
because `Repr.Poly`'s `validate_compare` rejects `Repr.view`, which
pervades dune's wrapped-string newtypes (paths, lib-names, opam
URLs, checksums, package names, ...).

Same `'a -> 'a -> int` callsite API; callers don't change. Hash
values shift, but these functions feed `Hashtbl` / `Memo` bucket
selectors only — no persistent serialization, no cross-process
invariants, so the shift is invisible to consumers.

Signed-off-by: Robin Bate Boerop <me@robinbb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants