Add Gemma 4 31B to supervised_finetuning recipes#326
Open
dgallitelli wants to merge 1 commit into
Open
Conversation
Adds finetune--google--gemma-4-31B-it.ipynb under
0_model_customization_recipes/supervised_finetuning/ for QLoRA fine-tuning
of Google Gemma 4 31B on a single L40S GPU (ml.g6e.2xlarge).
The notebook applies a small, scoped, idempotent set of patches to
sagemaker_code/{sft.py, utils/merge_adapter_weights.py, requirements.txt}
at job-submit time so the shared training scaffolding handles Gemma 4 31B.
Every patch is gated on "gemma-4" in model_id (sibling recipes unaffected),
saves a .bak of the original, and is reverted by the final cell.
Validated end-to-end with two SageMaker training jobs in us-east-1:
- Status: Completed
- Loss: 12.32 -> 2.74 over 50 steps (78% drop)
- Wall: ~13-17 min on ml.g6e.2xlarge
- Adapter: standard PEFT 0.19 LoRA, 49 MB safetensors, vLLM/LMI-ready
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Gemma 4 31B to supervised_finetuning recipes
Summary
Adds
finetune--google--gemma-4-31B-it.ipynbto0_model_customization_recipes/supervised_finetuning/. The notebook trains Google's Gemma 4 31B with QLoRA on a single L40S GPU (ml.g6e.2xlarge, 48 GB) using the existing sharedsagemaker_code/sft.pydriver — no shared file is modified in this PR.Gemma 4 introduces architectural details that the shared scaffolding doesn't yet handle out of the box, so the notebook applies a small, scoped, idempotent set of patches at job-submit time. Every patch is gated on
"gemma-4" in model_idso sibling recipes are unaffected, and the final cell of the notebook restores the originals from.bak.Validation
End-to-end SageMaker training job (
google--gemma-4-31B-it-smoke-20260522211145, us-east-1):Completedml.g6e.2xlarge(1× L40S 48 GB)pytorch-training:2.7.1-gpu-py312(DLC, unchanged)s3://...output/model.tar.gz(40.6 MB) →peft_adapter/containingadapter_config.json,adapter_model.safetensors(49 MB), tokenizer, chat template13 prior iterations (~$3.50 total) surfaced and resolved 7 distinct failure modes, all documented inline in the notebook's troubleshooting cell.
What the notebook patches
All patches are applied via
%%writefile/ Python edit cells with.bakbackup, gated on Gemma 4, and reverted in the final cell:sagemaker_code/requirements.txttransformers >= 5.5.0(shared file pins 4.57.0)transformers / peft / accelerate / trl / datasets / liger-kernelin lockstepsagemaker_code/sft.py(1)ModelConfig.torch_dtype → dtypegetattr(.., "dtype") or getattr(.., "torch_dtype")sagemaker_code/sft.py(2)Gemma4ClippableLinearwraps every linear in the model; PEFT 0.19's LoRA dispatcher rejects all targetspeft_config.exclude_modules = ["vision_tower", "audio_tower"]keeps PEFT off the multimodal towers; (b) on the language stack (use_clipped_linears=False) the wrapper is a no-op — replace each with its innernn.Linearso the existing bnb-4bit dispatcher matchessagemaker_code/sft.py(3)mm_token_type_idseven for text-only training; SFTTrainer's padding-free path rejects customdata_collatorSFTTrainerand override_prepare_inputsto injectmm_token_type_ids = zeros_like(input_ids)per batch —Trainer._prepare_inputsis the canonical extension point and runs after the default collatorsagemaker_code/utils/merge_adapter_weights.py--enable-lora --lora-modules), DJL Serving / LMI (option.enable_lora=true), and Bedrock all consume it directlyEach patch in
sft.pyis < 25 lines, anchored on exact pre-existing code blocks withassert old in srcso a future drift insft.pywill surface immediately rather than silently corrupting the file.Should these patches be upstreamed?
Yes, eventually — no for this PR. Reasoning:
transformers >= 5.5.0bump cascades acrosspeft,trl,accelerate,datasets, andliger-kernel, and TRL 1.4 renames aModelConfigfield that other recipes pass through their YAML. Bumpingrequirements.txtfor everyone is a maintainer call and would re-validate every sibling notebook.mm_token_type_idsinjection andGemma4ClippableLinearunwrap are Gemma-4-specific. They could live in aEXCEPTION_MODEL_LIST-style branch insft.pysimilar to the existingQwen2-Audio/Qwen3-VLhandling, but that's a structural change worth a separate PR.This PR keeps the notebook self-contained so reviewers can merge it without touching shared code or other recipes. The patches are also packaged as standalone unified diffs (
sft.py.patch,merge_adapter_weights.py.patch) for reference if a future PR wants to upstream them.Files changed
0_model_customization_recipes/supervised_finetuning/finetune--google--gemma-4-31B-it.ipynb(new)Files NOT changed in this PR (but patched at job-submit time by the notebook)
sagemaker_code/sft.pysagemaker_code/utils/merge_adapter_weights.pysagemaker_code/requirements.txtsagemaker_code/hf_recipes/google/gemma-4-31B-it--vanilla-peft-qlora.yaml(created at runtime)