scst: park async LUN-replace cleanup until async_lun_replace clears#365
Merged
Merged
Conversation
530bf97 to
ab95509
Compare
Follow-up to commit a4a55aa ("scst: add async_lun_replace to defer tgt_dev cleanup after LUN replace"), which moved the slow drain of old tgt_devs off the LUN-replace management write path. That defers the drain. It does not defer the free - the asynchronous worker still acquires scst_mutex to call scst_free_tgt_dev, and that function's first action, scst_clear_reservation -> scst_dlm_res_lock, does a DLM round-trip. When the peer node has just died and has not yet been evicted from the lockspace, that round-trip stalls in scst_dlm_lock_wait. With scst_mutex held by the stalled worker, every subsequent LUN-replace management write queues behind it. When async_lun_replace=1, scst_acg_repl_lun() now parks the deferred cleanup of old tgt_devs on a list instead of scheduling it on the workqueue immediately. Writing 0 to the async_lun_replace sysfs knob releases the parked work in a batch. This lets the orchestrating layer hold cleanup until any cluster coordination it depends on (e.g. DLM peer eviction during HA failover) has completed. Module unload calls scst_async_lun_replace_set(false) as a safety net.
ab95509 to
0ad984b
Compare
Contributor
|
Hi Brian, Thank you for the patch! Gleb |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to commit a4a55aa, which added the async_lun_replace knob to defer the slow drain of old tgt_devs off the LUN-replace management write path.
That defers the drain. It does not defer the free - the asynchronous worker still acquires scst_mutex to call scst_free_tgt_dev, and that function's first action, scst_clear_reservation -> scst_dlm_res_lock, does a DLM round-trip. When the peer node has just died and has not yet been evicted from the lockspace, that round-trip stalls in scst_dlm_lock_wait. With scst_mutex held by the stalled worker, every subsequent LUN-replace management write queues behind it.
When async_lun_replace=1, scst_acg_repl_lun() now parks the deferred cleanup of old tgt_devs on a list instead of scheduling it on the workqueue immediately. Writing 0 to the async_lun_replace sysfs knob releases the parked work in a batch.
This lets the orchestrating layer hold cleanup until any cluster coordination it depends on (e.g. DLM peer eviction during HA failover) has completed.
Module unload calls scst_async_lun_replace_set(false) as a safety net.