Limit per-node auto-debug retries to reduce runaway cost and runtime#52
Limit per-node auto-debug retries to reduce runaway cost and runtime#52BraelNamekong wants to merge 2 commits into
Conversation
eisenbahnhero
left a comment
There was a problem hiding this comment.
I think that's a good idea. Some time ago, I already implemented logic (a910390) such that the more child nodes a node has, the lower its probability of being selected becomes. So, in theory, it shouldn’t be possible for the same node to be debugged an infinite number of times.
However, I think it’s a very good idea to implement a hard limit here to ensure this is definitively prevented in larger-scale experiments. Nice!
moritz-baumgart
left a comment
There was a problem hiding this comment.
Please remove the test_Lopez.txt file / the commit that introduced it from this PR
| if can_debug: | ||
| logger.info( | ||
| f"Debugging node {parent_node.id[:8]}... " | ||
| f"(attempt {parent_node.debug_attempts + 1}/{max_debug_retries})" | ||
| ) | ||
| child_node = await self._minimal_agent._debug(parent_node) | ||
| parent_node.debug_attempts += 1 | ||
| else: | ||
| if parent_node.is_buggy and parent_node.debug_attempts >= max_debug_retries: | ||
| logger.info( | ||
| f"Node {parent_node.id[:8]} has reached max debug retries " | ||
| f"({max_debug_retries}). Attempting improvement instead." | ||
| ) | ||
| child_node = await self._minimal_agent._improve(parent_node) |
There was a problem hiding this comment.
I agree with @eisenbahnhero that enforcing a hard limit in addition to the current probability-based solution makes sense; however, in the currently provided implementation, it will go on to improve the node, which is not sensible IMO.
A buggy node that we already tried or even failed to debug max_debug_retries times should not be tried to improve.
Instead, I think we should select a completely different node.
To do that, this logic probably needs to be integrated into the select_next_node() method.
Nodes that have reached the max_debug_retries could e.g. be excluded before the random selection occurs.
We also have to think about the exploitation case, where a buggy node with the highest score is selected, and whether or not to include buggy nodes that have already exhausted the debug budget there.
Summary
This PR adds a safeguard that limits how many times a single tree node can be automatically debugged and retried.
Motivation
During long searches, the system could get stuck repeatedly trying to fix the same failing node. This leads to unnecessary API calls, longer runtimes, and occasionally endless retry cycles without meaningful progress.
What changed
buggy nodes are only selected for debugging if they have remaining retries
[debug_attempts] is incremented before each [debug()] call
Result
Nodes that repeatedly fail debug now stop generating more retries once their per-node budget is exhausted, while the search can continue exploring other candidates.
Notes