Skip to content

TICKET-597: Fix worker orphan processes and add max test timeout cap#710

Merged
david-yz-liu merged 6 commits intomasterfrom
TICKET-597_fix_worker_orphan_and_timeout
Apr 25, 2026
Merged

TICKET-597: Fix worker orphan processes and add max test timeout cap#710
david-yz-liu merged 6 commits intomasterfrom
TICKET-597_fix_worker_orphan_and_timeout

Conversation

@Naragod
Copy link
Copy Markdown
Contributor

@Naragod Naragod commented Mar 31, 2026

Description

Worker processes running student tests could become orphaned when the RQ worker crashed or restarted before cleanup ran, because start_new_session=True fully detached test processes into their own session. This made them invisible to process supervision and allowed them to run indefinitely, blocking the dedicated test user slot.

Implementation

  • Remove start_new_session=True from the subprocess call in order to reap test processes when the worker dies.
  • Introduces max_test_timeout server-side configuration (default 600s) that caps per-test timeouts, preventing instructor-configured or missing timeouts from letting tests hang forever during normal operation.
  • Removing start_new_session means os.killpg can no longer be used in the same-user (dev) path without killing the worker itself, so it is replaced with proc.kill() + proc.wait() to target the child process directly.

@Naragod Naragod marked this pull request as draft March 31, 2026 01:07
@Naragod Naragod force-pushed the TICKET-597_fix_worker_orphan_and_timeout branch from 98b4235 to e17503a Compare March 31, 2026 01:09
@Naragod Naragod force-pushed the TICKET-597_fix_worker_orphan_and_timeout branch from e2fbad0 to e17503a Compare April 10, 2026 15:24
@Naragod Naragod marked this pull request as ready for review April 10, 2026 15:25
@Naragod Naragod requested a review from donny-wong April 10, 2026 15:26
@Naragod Naragod force-pushed the TICKET-597_fix_worker_orphan_and_timeout branch from e17503a to bd5f44c Compare April 10, 2026 15:28
Copy link
Copy Markdown
Contributor

@donny-wong donny-wong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Naragod. There are occasions where err could be empty because the OS kills the process immediately (like SIGKILL or OOM kills), and the script won't have time to print the traceback to stderr, and so the err will be an empty string. Please take into account of proc.returncode which ensures we detect these silent crashes reliably. Thank you.

@Naragod Naragod requested a review from donny-wong April 17, 2026 17:27
@Naragod
Copy link
Copy Markdown
Contributor Author

Naragod commented Apr 17, 2026

Thanks @Naragod. There are occasions where err could be empty because the OS kills the process immediately (like SIGKILL or OOM kills), and the script won't have time to print the traceback to stderr, and so the err will be an empty string. Please take into account of proc.returncode which ensures we detect these silent crashes reliably. Thank you.

Good catch. I have added an extra check. Please take another look.

Comment thread server/autotest_server/settings.yml Outdated
@Naragod Naragod requested a review from donny-wong April 23, 2026 19:43
@Naragod Naragod requested a review from david-yz-liu April 23, 2026 19:44
Copy link
Copy Markdown
Contributor

@david-yz-liu david-yz-liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naragod nice work. In addition to the inline comments I left, please also make sure to update the Changelog.

Comment thread server/autotest_server/settings.yml
Comment thread server/autotest_server/__init__.py Outdated
@Naragod Naragod force-pushed the TICKET-597_fix_worker_orphan_and_timeout branch from 85f20f3 to 3a8adff Compare April 24, 2026 20:09
@Naragod Naragod force-pushed the TICKET-597_fix_worker_orphan_and_timeout branch from 15513f1 to 1596f78 Compare April 24, 2026 20:10
@Naragod Naragod requested a review from david-yz-liu April 24, 2026 20:11
Copy link
Copy Markdown
Contributor

@david-yz-liu david-yz-liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @Naragod!

@david-yz-liu david-yz-liu merged commit 4b176ef into master Apr 25, 2026
7 checks passed
@david-yz-liu david-yz-liu deleted the TICKET-597_fix_worker_orphan_and_timeout branch April 25, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants