Skip to content

fix: disable use_cache during unsloth training to recover v0.8.x VRAM#65

Open
Manuscrit wants to merge 1 commit intomainfrom
fix/unsloth-train-use-cache-vram-regression
Open

fix: disable use_cache during unsloth training to recover v0.8.x VRAM#65
Manuscrit wants to merge 1 commit intomainfrom
fix/unsloth-train-use-cache-vram-regression

Conversation

@Manuscrit
Copy link
Copy Markdown
Collaborator

The v0.9 rewrite of the response-only SFT path swapped trl.SFTTrainer for plain transformers.Trainer. SFTTrainer silently sets model.config.use_cache = False in its init; plain Trainer does not. Left enabled, the KV cache is materialised through every training forward, inflating VRAM significantly on large-vocab / long-context models (Qwen 3.x, etc.) and breaking jobs that fit comfortably on v0.8.2.

This adds apply_training_runtime_fixes(model) right after get_peft_model in the unsloth training entrypoint. It logs use_cache, _attn_implementation, and is_gradient_checkpointing so future runtime regressions are visible in worker logs, and flips use_cache to False when needed.

The weighted_sft job already disables use_cache explicitly, so no change is required there.

The v0.9 rewrite of the response-only SFT path swapped trl.SFTTrainer for
plain transformers.Trainer. SFTTrainer silently sets
model.config.use_cache = False in its __init__; plain Trainer does not.
Left enabled, the KV cache is materialised through every training forward,
inflating VRAM significantly on large-vocab / long-context models (Qwen 3.x,
etc.) and breaking jobs that fit comfortably on v0.8.2.

This adds apply_training_runtime_fixes(model) right after get_peft_model in
the unsloth training entrypoint. It logs use_cache, _attn_implementation,
and is_gradient_checkpointing so future runtime regressions are visible in
worker logs, and flips use_cache to False when needed.

The weighted_sft job already disables use_cache explicitly, so no change is
required there.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant