We need a new follow-up issue for the still-live loop path that remains after #227 was merged and loaded into the running devclaw-local-dev environment.
Why this is a separate issue
Issue #227 identified and fixed one remaining stale-snapshot / heartbeat-health race. That fix is now live, but PhysioLink issue #74 is still looping.
The latest evidence shows a different or additional path is still active.
Current live evidence
With the #227 fix live, PhysioLink #74 still shows this pattern:
- worker blocks correctly:
Doing -> Refining
- then the system emits an explicit
workflow.requeue event: Refining -> To Do
- then heartbeat dispatches it again:
To Do -> Doing
This is important because the loop is no longer explained purely by a stale active-state dashboard or stale-snapshot overwrite. The event timeline shows a real normalized workflow.requeue transition being recorded after the hold state.
What this means
The currently live fix from #227 did not stop the full loop family.
There is still at least one active requeue path in the system that is deliberately moving held issues from Refining back into runnable queue state.
Scope to investigate
Focus specifically on code paths that can emit or cause a workflow.requeue after a blocked/hold transition, including but not limited to:
Reproduction anchor
Primary reproduction case:
Acceptance criteria
Relationship to existing issues
We need a new follow-up issue for the still-live loop path that remains after #227 was merged and loaded into the running
devclaw-local-devenvironment.Why this is a separate issue
Issue #227 identified and fixed one remaining stale-snapshot / heartbeat-health race. That fix is now live, but PhysioLink issue #74 is still looping.
The latest evidence shows a different or additional path is still active.
Current live evidence
With the #227 fix live, PhysioLink #74 still shows this pattern:
Doing -> Refiningworkflow.requeueevent:Refining -> To DoTo Do -> DoingThis is important because the loop is no longer explained purely by a stale active-state dashboard or stale-snapshot overwrite. The event timeline shows a real normalized
workflow.requeuetransition being recorded after the hold state.What this means
The currently live fix from #227 did not stop the full loop family.
There is still at least one active requeue path in the system that is deliberately moving held issues from
Refiningback into runnable queue state.Scope to investigate
Focus specifically on code paths that can emit or cause a
workflow.requeueafter a blocked/hold transition, including but not limited to:Reproduction anchor
Primary reproduction case:
work_finish(done)mis-resolves the PR branchDoing -> Refiningworkflow.requeue: Refining -> To DoAcceptance criteria
workflow.requeuefromRefiningtoTo Doafter a blocked worker result.workflow.requeueafter hold case, not just stale snapshot behavior.Relationship to existing issues