🐛 fix(venv): insidious creation/cleanup race condition#940
🐛 fix(venv): insidious creation/cleanup race condition#94013steinj wants to merge 1 commit intoaspect-build:mainfrom
Conversation
|
Suffixes the venv created with the running shell's PID, ensuring that each call generates a venv at a unique location. Also, explicitly cleans the venv up at the end to avoid too many piling up. Fixes aspect-build#858, aspect-build#899.
cca2ab3 to
b86722f
Compare
|
Isn't this moot with static venv? Is the plan to deprecate/delete all this code? :) |
|
Yes I think this will all go away, but this might still be worth merging in 1.x for now? |
I don't have context on this at all, but would generally appreciate a merge to avoid local patches getting stale :) |
|
Assuming you're referring to #967 now that I am seeing this at a reasonable hour. |
|
@13steinj I am quite hesitant to be creating random directories like this and then depending on |
|
@jbedard As for accumulation: these directories (normally) live inside the output base, so bazel clean sweeps them regardless. They won't grow unboundedly like a Happy to add some additional belt-and-suspenders of your choosing, but I think this is sufficient as-is. We've been running with this patch, with remote execution and not, for the past 3-4 weeks on a fairly large monorepo with a lot of Python and nobody has reported a bug, and none of the internal monitoring has sounded alarms on disk space related to python venvs. |
|
But now that there's a conflict, I assume you'd like me to verify that the latest release (or latest main?) still has the issue? |
|
main is now moving toward an alpha release of v2 which will not have this bug anymore. If you'd still like to debate adding this to a v1 patch I can create a v2 branch that this PR will target? |
|
I am fine using v2 assuming it "Just Works" ™️, once out I can test and report back. When do you expect v2 to be released? |


Suffixes the venv created with the running shell's PID, ensuring that each call generates a venv at a unique location.
Also, explicitly cleans the venv up at the end to avoid too many piling up.
Fixes #858, #899.
I am near certain the issue I've been wrestling with is the same as #858 and that the root cause that this PR fixes is the same as the root cause for #899. I don't know if there is any code path in which the venv can be created outside the bazel output base, which is why I added the
trapcleanup. If there isn't I am happy to remove that line.Test plan
py_binaryand specificvenvs (say, via the unstableuvbased rules anddependency_groupsand auv.locklockfile, and more specifically, one that has build actions that use apy_binaryas a code generator), attempt to build on a beefy 96 core machine with just the right actions cached such that the venvs collide.Sadly as this is a race condition your machine needs to be just beefy enough that the actions run in the right order to create the perfect storm of pain.
Then, do it again with this branch, and see that the venvs no longer collide.
If one is using remote execution, you will not (or at least it will be very difficult) to be able to reproduce the original bug, because it appears that under remote execution the venvs are (a) potentially sharded across different nodes, and (b) I imagine the remote execution implementation defines its own prefix path per action, such that I am sure that there exist implementations that have a more unique path based on the action information.