🐞 Bug Report
Description
When charon deposit fetch is run after only a threshold-sized subset of operators (but not all operators) have run charon deposit sign, the aggregated signature returned by MarshalDepositData fails BLS verification with invalid deposit data signature: signature not verified, even though every partial signature is individually valid and signed over the same message.
Root cause: the client-side aggregation in app/obolapi/deposit.go derives each partial's share index from its position in the API response slice (rawSignatures[sigIdx+1] = sig), rather than looking it up from the cluster lock. The Obol API response does not pad missing operators with empty entries — it returns only the partials that were actually submitted, packed contiguously. So if e.g. operators with share indices {2, 3, 4} signed (operator 1 has not), the client treats them as {1, 2, 3} and tbls.ThresholdAggregate runs with the wrong Lagrange coefficients. The output is a 96-byte blob that decodes successfully but fails BLS verification against the deposit message.
The continue for empty signatures at app/obolapi/deposit.go:131-134 suggests the original intent was for the API to return fixed positional slots, but in practice it does not.
Has this worked before in a previous version?
Unknown — likely present since cmd: add API partial deposit flow (#4032). Symptom only manifests with a strict-threshold submission (fewer than total operators sign).
🔬 Minimal Reproduction
In a cluster with 4 operators / threshold 3:
- Operators 2, 3, and 4 each run:
charon deposit sign \
--validator-public-keys=0x<pubkey> \
--withdrawal-addresses=0x010000000000000000000000<addr>
- Operator 1 does not submit.
- Any operator runs:
charon deposit fetch --validator-public-keys=0x<pubkey>
Result: Application failed to start: invalid deposit data signature: signature not verified.
Confirmed against the live Obol API by inspecting GET /v1/deposit_data/<lockHash>/<valPubkey> — the response contains exactly 3 partials entries (no empty positional slot for the missing operator), and matching each partial_public_key against distributed_validators[].public_shares in the cluster lock shows the actual share indices are non-contiguous (e.g. {2, 3, 4}), not {1, 2, 3} as the aggregation code assumes.
Workaround: make sure all operators (not just a threshold) submit a partial deposit. Then slice positions and share indices coincide and aggregation works.
🔥 Error
INFO cmd Fetching full deposit message
INFO cmd Fetched full deposit message
ERRO cmd Application failed to start: invalid deposit data signature: signature not verified
tbls/herumi.go:20 .init
Suggested fix
Either:
- Client-side (preferred): In
app/obolapi/deposit.go, derive each partial's share index by looking up its partial_public_key against the cluster lock's distributed_validators[].public_shares (the caller already has *cluster.Lock). This is robust regardless of API response shape and removes the implicit ordering contract.
- Server-side: Have the API return a fixed
total-length slice with empty Partial{} entries for operators that have not submitted.
(1) is safer because it makes the client self-sufficient and would catch any future API ordering changes.
🌍 Your Environment
Operating System: Linux
What version of Charon are you running? `obolnetwork/charon:v1.10.0` (docker image)
Anything else relevant? Cluster: 4 operators, threshold 3, mainnet, non-compounding. Triggered while running charon deposit sign / charon deposit fetch to update withdrawal credentials before activation.
🐞 Bug Report
Description
When
charon deposit fetchis run after only a threshold-sized subset of operators (but not all operators) have runcharon deposit sign, the aggregated signature returned byMarshalDepositDatafails BLS verification withinvalid deposit data signature: signature not verified, even though every partial signature is individually valid and signed over the same message.Root cause: the client-side aggregation in
app/obolapi/deposit.goderives each partial's share index from its position in the API response slice (rawSignatures[sigIdx+1] = sig), rather than looking it up from the cluster lock. The Obol API response does not pad missing operators with empty entries — it returns only the partials that were actually submitted, packed contiguously. So if e.g. operators with share indices{2, 3, 4}signed (operator 1 has not), the client treats them as{1, 2, 3}andtbls.ThresholdAggregateruns with the wrong Lagrange coefficients. The output is a 96-byte blob that decodes successfully but fails BLS verification against the deposit message.The
continuefor empty signatures atapp/obolapi/deposit.go:131-134suggests the original intent was for the API to return fixed positional slots, but in practice it does not.Has this worked before in a previous version?
Unknown — likely present since
cmd: add API partial deposit flow(#4032). Symptom only manifests with a strict-threshold submission (fewer thantotaloperators sign).🔬 Minimal Reproduction
In a cluster with 4 operators / threshold 3:
Result:
Application failed to start: invalid deposit data signature: signature not verified.Confirmed against the live Obol API by inspecting
GET /v1/deposit_data/<lockHash>/<valPubkey>— the response contains exactly 3partialsentries (no empty positional slot for the missing operator), and matching eachpartial_public_keyagainstdistributed_validators[].public_sharesin the cluster lock shows the actual share indices are non-contiguous (e.g.{2, 3, 4}), not{1, 2, 3}as the aggregation code assumes.Workaround: make sure all operators (not just a threshold) submit a partial deposit. Then slice positions and share indices coincide and aggregation works.
🔥 Error
Suggested fix
Either:
app/obolapi/deposit.go, derive each partial's share index by looking up itspartial_public_keyagainst the cluster lock'sdistributed_validators[].public_shares(the caller already has*cluster.Lock). This is robust regardless of API response shape and removes the implicit ordering contract.total-length slice with emptyPartial{}entries for operators that have not submitted.(1) is safer because it makes the client self-sufficient and would catch any future API ordering changes.
🌍 Your Environment
Operating System: Linux
What version of Charon are you running? `obolnetwork/charon:v1.10.0` (docker image)
Anything else relevant? Cluster: 4 operators, threshold 3, mainnet, non-compounding. Triggered while running
charon deposit sign/charon deposit fetchto update withdrawal credentials before activation.