Skip to content

Limit per-node auto-debug retries to reduce runaway cost and runtime#52

Open
BraelNamekong wants to merge 2 commits into
ISG-Siegen:developfrom
BraelNamekong:develop_meilenstein_4
Open

Limit per-node auto-debug retries to reduce runaway cost and runtime#52
BraelNamekong wants to merge 2 commits into
ISG-Siegen:developfrom
BraelNamekong:develop_meilenstein_4

Conversation

@BraelNamekong
Copy link
Copy Markdown

@BraelNamekong BraelNamekong commented May 28, 2026

Summary

This PR adds a safeguard that limits how many times a single tree node can be automatically debugged and retried.

Motivation

During long searches, the system could get stuck repeatedly trying to fix the same failing node. This leads to unnecessary API calls, longer runtimes, and occasionally endless retry cycles without meaningful progress.

What changed

  • Added [max_debug_retries_per_node] to config.py
  • Added [debug_attempts tracking] to node.py
  • Updated search.py so:
    buggy nodes are only selected for debugging if they have remaining retries

[debug_attempts] is incremented before each [debug()] call

Result
Nodes that repeatedly fail debug now stop generating more retries once their per-node budget is exhausted, while the search can continue exploring other candidates.

Notes

  • The default retry limit is set to 2
  • This is intended as a lightweight safety mechanism, not a replacement for the global iteration limit
  • The change helps make runtime and API usage more predictable

Copy link
Copy Markdown
Collaborator

@eisenbahnhero eisenbahnhero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea. Some time ago, I already implemented logic (a910390) such that the more child nodes a node has, the lower its probability of being selected becomes. So, in theory, it shouldn’t be possible for the same node to be debugged an infinite number of times.

However, I think it’s a very good idea to implement a hard limit here to ensure this is definitively prevented in larger-scale experiments. Nice!

Copy link
Copy Markdown
Member

@moritz-baumgart moritz-baumgart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the test_Lopez.txt file / the commit that introduced it from this PR

Comment thread treesearch/search.py
Comment on lines +118 to 131
if can_debug:
logger.info(
f"Debugging node {parent_node.id[:8]}... "
f"(attempt {parent_node.debug_attempts + 1}/{max_debug_retries})"
)
child_node = await self._minimal_agent._debug(parent_node)
parent_node.debug_attempts += 1
else:
if parent_node.is_buggy and parent_node.debug_attempts >= max_debug_retries:
logger.info(
f"Node {parent_node.id[:8]} has reached max debug retries "
f"({max_debug_retries}). Attempting improvement instead."
)
child_node = await self._minimal_agent._improve(parent_node)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @eisenbahnhero that enforcing a hard limit in addition to the current probability-based solution makes sense; however, in the currently provided implementation, it will go on to improve the node, which is not sensible IMO.
A buggy node that we already tried or even failed to debug max_debug_retries times should not be tried to improve.
Instead, I think we should select a completely different node.

To do that, this logic probably needs to be integrated into the select_next_node() method.
Nodes that have reached the max_debug_retries could e.g. be excluded before the random selection occurs.

We also have to think about the exploitation case, where a buggy node with the highest score is selected, and whether or not to include buggy nodes that have already exhausted the debug budget there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants