Skip to content

Introduce --step-timeout option for kprove to ensure timely termination#4930

Open
Stevengre wants to merge 5 commits into
developfrom
feat/per-depth-timeout-advance-proof
Open

Introduce --step-timeout option for kprove to ensure timely termination#4930
Stevengre wants to merge 5 commits into
developfrom
feat/per-depth-timeout-advance-proof

Conversation

@Stevengre

@Stevengre Stevengre commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

advance_proof now runs each step under the prover's step_timeout (a per-step wall-clock budget). A step that exceeds its budget is interrupted, the prover is asked to shrink_step (do less work per step), and the step is retried (the timed-out node was never committed, so it reappears from get_steps). If the prover cannot shrink further, advancement stops. Provers without a step_timeout run synchronously as before, so advance_proof stays generic and gains no new parameter.

Close #4924.
PoC PR: runtimeverification/kontrol#1141

@Stevengre Stevengre requested review from ehildenb and tothtamas28 June 8, 2026 11:44
@Stevengre Stevengre self-assigned this Jun 8, 2026
@deosa-arch deosa-arch changed the base branch from master to develop June 8, 2026 11:45
Stevengre added 2 commits June 8, 2026 19:48
Implements #4924: factor kontrol's
`--per-depth-timeout` mechanism out of tool-specific code and into a
generic policy in `Prover.advance_proof`.

`advance_proof` now runs each step under the prover's `step_timeout`
(a per-step wall-clock budget). A step that exceeds its budget is
interrupted, the prover is asked to `shrink_step` (do less work per
step), and the step is retried (the timed-out node was never committed,
so it reappears from `get_steps`). If the prover cannot shrink further,
advancement stops. Provers without a `step_timeout` run synchronously as
before, so `advance_proof` stays generic and gains no new parameter.

- Prover: `step_timeout` attribute (whole seconds, None disables) plus
  no-op `shrink_step()` / `interrupt()` hooks
- APRProver: `step_timeout` ctor arg (floored at 1s); `shrink_step`
  halves `execute_depth`; `interrupt` aborts the in-flight Kore request
- KCFGExplore/CTermSymbolic/KoreClient/JsonRpc*/Transport: `interrupt()`
  that force-unblocks an in-flight single-socket request and reconnects
- Unit tests: shrink-until-progress, stop-at-floor, fast-path, disabled
The step-timeout/shrink policy added to Prover.advance_proof was reachable
only programmatically. Wire it through the pyk prove command: add a
step_timeout field to ProveOptions, a --step-timeout argument (whole seconds,
floored at 1 by APRProver), and thread the value from ProveRpc into APRProver.
Omitting the flag leaves step_timeout=None, preserving the prior synchronous
behavior.
@Stevengre Stevengre force-pushed the feat/per-depth-timeout-advance-proof branch from fa4c4fa to bf77114 Compare June 8, 2026 11:50
@ehildenb

ehildenb commented Jun 8, 2026

Copy link
Copy Markdown
Member

This looks pretty good to me. I guess I'm not an expert on whether this is the best way to handle this issue, but one question I have: when we interrupt a step, it will disconnect from that instance of the haskell backend, but does that also stop the haskell backend from working on that step at the same time? Or does the haskell backend keep working on it, we are just now trying some smaller step size ourselves?

Stevengre added 3 commits June 9, 2026 15:41
The prior interrupt() shut down the client socket and reconnected. Empirically
(legacy/haskell kore-rpc 0.1.145), that does NOT stop the backend: the request
runs in a detached worker thread that the server only aborts on an explicit
`cancel` JSON-RPC method, never on a dropped connection. So an interrupted step
kept a core pinned at ~100% until it finished on its own, while we reconnected
and retried a smaller step.

Instead, inject a `cancel` request on the live connection. The server aborts the
in-flight request, the awaiting thread receives a "Request cancelled" error, and
the connection stays open and reusable (no reconnect). Measured: server CPU drops
from ~100% to ~2% immediately after interrupt().

- Transport: replace no-arg interrupt() with send_interrupt(data) (raw out-of-band
  send on the live connection; default no-op; HTTP stays no-op, one conn per request)
- SingleSocketTransport: sendall the cancel payload, no shutdown/reconnect
- JsonRpcClient: build the `cancel` request and hand it to the transport
Asserts that KoreClient.interrupt() aborts an in-flight (non-terminating) execute
promptly with a "Request cancelled" error rather than letting it run to completion,
and that the connection survives the cancel and serves a subsequent request. Covers
the interrupt mechanism advance_proof's step-timeout policy depends on.
Comment on lines +1 to +22
from __future__ import annotations

from threading import Event
from typing import TYPE_CHECKING

from pyk.proof.proof import Proof, ProofStatus, Prover

if TYPE_CHECKING:
from collections.abc import Mapping
from pathlib import Path
from typing import Any


class _StepInterrupted(Exception):
"""Raised inside `step_proof` when the prover is interrupted, mimicking a backend abort."""


class CountingProof(Proof[int, int]):
"""Minimal proof that needs `target` committed steps to pass."""

target: int
committed: int

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file is likely not needed, it's not really testing anything useful, and may make testing too brittle for this feature in the future. I'd recommend removing it. The integration test added should be enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support progressive depth halving as a generic policy in Prover.advance_proof

2 participants