From d48c8cb414f58fd0ff6314a065efcbc451e96719 Mon Sep 17 00:00:00 2001
From: "mintlify[bot]" <109931778+mintlify[bot]@users.noreply.github.com>
Date: Fri, 6 Mar 2026 03:39:09 +0000
Subject: [PATCH 1/4] Add native LoRA training documentation

Generated-By: mintlify-agent
---
 docs.json                                  |  12 ++
 tutorials/training/lora-training.mdx       | 192 +++++++++++++++++++++
 zh-CN/tutorials/training/lora-training.mdx | 192 +++++++++++++++++++++
 3 files changed, 396 insertions(+)
 create mode 100644 tutorials/training/lora-training.mdx
 create mode 100644 zh-CN/tutorials/training/lora-training.mdx
diff --git a/docs.json b/docs.json
index 68746c58a..73c0a1b48 100644
--- a/docs.json
+++ b/docs.json
@@ -126,6 +126,12 @@
                       "tutorials/basic/multiple-loras"
                     ]
                   },
+                  {
+                    "group": "Training",
+                    "pages": [
+                      "tutorials/training/lora-training"
+                    ]
+                  },
                   {
                     "group": "ControlNet",
                     "pages": [
@@ -868,6 +874,12 @@
                       "zh-CN/tutorials/basic/multiple-loras"
                     ]
                   },
+                  {
+                    "group": "训练",
+                    "pages": [
+                      "zh-CN/tutorials/training/lora-training"
+                    ]
+                  },
                   {
                     "group": "ControlNet",
                     "pages": [
diff --git a/tutorials/training/lora-training.mdx b/tutorials/training/lora-training.mdx
new file mode 100644
index 000000000..c3aab33bf
--- /dev/null
+++ b/tutorials/training/lora-training.mdx
@@ -0,0 +1,192 @@
+---
+title: "Native LoRA training"
+sidebarTitle: "LoRA Training"
+description: "Train LoRA models directly in ComfyUI using built-in training nodes"
+---
+
+ComfyUI includes native support for training LoRA (Low-Rank Adaptation) models without requiring external tools or custom nodes. This guide covers how to use the built-in training nodes to create your own LoRAs.
+
+<Warning>
+The training nodes are marked as **experimental**. Features and behavior may change in future releases.
+</Warning>
+
+## Overview
+
+The native LoRA training system consists of four nodes:
+
+| Node | Category | Purpose |
+|------|----------|---------|
+| **Train LoRA** | training | Trains a LoRA model from latents and conditioning |
+| **Load LoRA Model** | loaders | Applies trained LoRA weights to a model |
+| **Save LoRA Weights** | loaders | Exports LoRA weights to a safetensors file |
+| **Plot Loss Graph** | training | Visualizes training loss over time |
+
+## Requirements
+
+- A GPU with sufficient VRAM (training typically requires more memory than inference)
+- Latent images (encoded from your training dataset)
+- Text conditioning (captions for your training images)
+
+## Basic training workflow
+
+<Steps>
+<Step title="Prepare your dataset">
+Encode your training images to latents using a VAE Encode node. Create text conditioning for each image using CLIP Text Encode.
+
+<Tip>
+For best results, use high-quality images that represent the style or subject you want to train.
+</Tip>
+</Step>
+
+<Step title="Configure the Train LoRA node">
+Connect your model, latents, and conditioning to the Train LoRA node. Set the training parameters:
+
+- **batch_size**: Number of samples per training step (default: 1)
+- **steps**: Total training iterations (default: 16)
+- **learning_rate**: How quickly the model adapts (default: 0.0005)
+- **rank**: LoRA rank - higher values capture more detail but use more memory (default: 8)
+</Step>
+
+<Step title="Run training">
+Execute the workflow. The node will output:
+- **lora**: The trained LoRA weights
+- **loss_map**: Training loss history
+- **steps**: Total steps completed
+</Step>
+
+<Step title="Save and use your LoRA">
+Connect the output to **Save LoRA Weights** to export your trained LoRA. Use **Load LoRA Model** to apply it during inference.
+</Step>
+</Steps>
+
+## Train LoRA node
+
+The main training node that creates LoRA weights from your dataset.
+
+### Inputs
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `model` | MODEL | - | Base model to train the LoRA on |
+| `latents` | LATENT | - | Encoded training images |
+| `positive` | CONDITIONING | - | Text conditioning for training |
+| `batch_size` | INT | 1 | Samples per step (1-10000) |
+| `grad_accumulation_steps` | INT | 1 | Gradient accumulation steps (1-1024) |
+| `steps` | INT | 16 | Training iterations (1-100000) |
+| `learning_rate` | FLOAT | 0.0005 | Learning rate (0.0000001-1.0) |
+| `rank` | INT | 8 | LoRA rank (1-128) |
+| `optimizer` | COMBO | AdamW | Optimizer: AdamW, Adam, SGD, RMSprop |
+| `loss_function` | COMBO | MSE | Loss function: MSE, L1, Huber, SmoothL1 |
+| `seed` | INT | 0 | Random seed for reproducibility |
+| `training_dtype` | COMBO | bf16 | Training precision: bf16, fp32 |
+| `lora_dtype` | COMBO | bf16 | LoRA weight precision: bf16, fp32 |
+| `algorithm` | COMBO | lora | Training algorithm (lora, lokr, oft, etc.) |
+| `gradient_checkpointing` | BOOLEAN | true | Reduces VRAM usage during training |
+| `checkpoint_depth` | INT | 1 | Depth level for gradient checkpointing (1-5) |
+| `offloading` | BOOLEAN | false | Offload model to RAM (requires bypass mode) |
+| `existing_lora` | COMBO | [None] | Continue training from existing LoRA |
+| `bucket_mode` | BOOLEAN | false | Enable resolution bucketing for multi-resolution datasets |
+| `bypass_mode` | BOOLEAN | false | Apply adapters via hooks instead of weight modification |
+
+### Outputs
+
+| Output | Type | Description |
+|--------|------|-------------|
+| `lora` | LORA_MODEL | Trained LoRA weights |
+| `loss_map` | LOSS_MAP | Training loss history |
+| `steps` | INT | Total training steps completed |
+
+## Load LoRA Model node
+
+Applies trained LoRA weights to a diffusion model. Use this instead of the standard Load LoRA node when working with LoRA weights directly from the Train LoRA node.
+
+### Inputs
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `model` | MODEL | - | Base diffusion model |
+| `lora` | LORA_MODEL | - | Trained LoRA weights |
+| `strength_model` | FLOAT | 1.0 | LoRA strength (-100 to 100) |
+| `bypass` | BOOLEAN | false | Apply LoRA without modifying base weights |
+
+### Output
+
+| Output | Type | Description |
+|--------|------|-------------|
+| `model` | MODEL | Model with LoRA applied |
+
+## Save LoRA Weights node
+
+Exports trained LoRA weights to a safetensors file in your output folder.
+
+### Inputs
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `lora` | LORA_MODEL | - | Trained LoRA weights to save |
+| `prefix` | STRING | loras/ComfyUI_trained_lora | Output filename prefix |
+| `steps` | INT | (optional) | Training steps for filename |
+
+The saved file will be named `{prefix}_{steps}_steps_{counter}.safetensors` and placed in your `ComfyUI/output/loras/` folder.
+
+## Plot Loss Graph node
+
+Visualizes training progress by plotting loss values over training steps.
+
+### Inputs
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `loss` | LOSS_MAP | - | Loss history from Train LoRA |
+| `filename_prefix` | STRING | loss_graph | Output filename prefix |
+
+## Training tips
+
+### VRAM optimization
+
+- Enable **gradient_checkpointing** to significantly reduce VRAM usage (enabled by default)
+- Use **bypass_mode** when working with quantized models (FP8)
+- Enable **offloading** to move the model to RAM during training (requires bypass_mode)
+- Lower the **batch_size** if you encounter out-of-memory errors
+
+### Dataset preparation
+
+- Use consistent image dimensions when possible, or enable **bucket_mode** for multi-resolution training
+- Match the number of conditioning inputs to the number of latent images
+- Quality matters more than quantity—start with 10-20 high-quality images
+
+### Training parameters
+
+- **rank**: Start with 8-16 for most use cases. Higher ranks (32-64) capture more detail but may overfit
+- **steps**: Start with 100-500 steps and monitor the loss graph
+- **learning_rate**: The default 0.0005 works well for most cases. Lower values (0.0001) for more stable training
+
+### Continuing training
+
+Select an existing LoRA from the **existing_lora** dropdown to continue training from a previously saved checkpoint. The total step count will accumulate.
+
+## Supported algorithms
+
+The **algorithm** parameter supports multiple weight adapter types:
+
+- **lora**: Standard Low-Rank Adaptation (recommended)
+- **lokr**: LoCon with Kronecker product decomposition
+- **oft**: Orthogonal Fine-Tuning
+
+## Example: Single-subject LoRA
+
+A minimal workflow for training a LoRA on a specific subject:
+
+1. Load your training images with **Load Image**
+2. Encode images to latents with **VAE Encode**
+3. Create captions with **CLIP Text Encode** (e.g., "a photo of [subject]")
+4. Connect to **Train LoRA** with:
+   - steps: 200
+   - rank: 16
+   - learning_rate: 0.0001
+5. Save with **Save LoRA Weights**
+6. Test with **Load LoRA Model** connected to your inference workflow
+
+<Note>
+For training on multiple images with different captions, connect multiple conditioning inputs to match your latent batch size.
+</Note>
diff --git a/zh-CN/tutorials/training/lora-training.mdx b/zh-CN/tutorials/training/lora-training.mdx
new file mode 100644
index 000000000..66c04fd76
--- /dev/null
+++ b/zh-CN/tutorials/training/lora-training.mdx
@@ -0,0 +1,192 @@
+---
+title: "原生 LoRA 训练"
+sidebarTitle: "LoRA 训练"
+description: "使用内置训练节点直接在 ComfyUI 中训练 LoRA 模型"
+---
+
+ComfyUI 原生支持训练 LoRA（Low-Rank Adaptation）模型，无需外部工具或自定义节点。本指南介绍如何使用内置训练节点创建自己的 LoRA。
+
+<Warning>
+训练节点目前标记为**实验性功能**。功能和行为可能会在未来版本中发生变化。
+</Warning>
+
+## 概述
+
+原生 LoRA 训练系统包含四个节点：
+
+| 节点 | 类别 | 用途 |
+|------|------|------|
+| **Train LoRA** | training | 从潜空间图像和条件训练 LoRA 模型 |
+| **Load LoRA Model** | loaders | 将训练好的 LoRA 权重应用到模型 |
+| **Save LoRA Weights** | loaders | 将 LoRA 权重导出为 safetensors 文件 |
+| **Plot Loss Graph** | training | 可视化训练过程中的损失变化 |
+
+## 系统要求
+
+- 具有足够显存的 GPU（训练通常比推理需要更多内存）
+- 潜空间图像（从训练数据集编码而来）
+- 文本条件（训练图像的描述文字）
+
+## 基础训练流程
+
+<Steps>
+<Step title="准备数据集">
+使用 VAE Encode 节点将训练图像编码为潜空间表示。使用 CLIP Text Encode 为每张图像创建文本条件。
+
+<Tip>
+为获得最佳效果，请使用能代表您想要训练的风格或主题的高质量图像。
+</Tip>
+</Step>
+
+<Step title="配置 Train LoRA 节点">
+将模型、潜空间图像和条件连接到 Train LoRA 节点。设置训练参数：
+
+- **batch_size**：每个训练步骤的样本数（默认：1）
+- **steps**：总训练迭代次数（默认：16）
+- **learning_rate**：模型适应速度（默认：0.0005）
+- **rank**：LoRA 秩 - 更高的值可以捕获更多细节但使用更多内存（默认：8）
+</Step>
+
+<Step title="运行训练">
+执行工作流。节点将输出：
+- **lora**：训练好的 LoRA 权重
+- **loss_map**：训练损失历史
+- **steps**：完成的总步数
+</Step>
+
+<Step title="保存和使用 LoRA">
+将输出连接到 **Save LoRA Weights** 以导出训练好的 LoRA。使用 **Load LoRA Model** 在推理时应用它。
+</Step>
+</Steps>
+
+## Train LoRA 节点
+
+从数据集创建 LoRA 权重的主要训练节点。
+
+### 输入参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|------|------|--------|------|
+| `model` | MODEL | - | 用于训练 LoRA 的基础模型 |
+| `latents` | LATENT | - | 编码后的训练图像 |
+| `positive` | CONDITIONING | - | 训练用的文本条件 |
+| `batch_size` | INT | 1 | 每步样本数（1-10000） |
+| `grad_accumulation_steps` | INT | 1 | 梯度累积步数（1-1024） |
+| `steps` | INT | 16 | 训练迭代次数（1-100000） |
+| `learning_rate` | FLOAT | 0.0005 | 学习率（0.0000001-1.0） |
+| `rank` | INT | 8 | LoRA 秩（1-128） |
+| `optimizer` | COMBO | AdamW | 优化器：AdamW、Adam、SGD、RMSprop |
+| `loss_function` | COMBO | MSE | 损失函数：MSE、L1、Huber、SmoothL1 |
+| `seed` | INT | 0 | 随机种子，用于可复现性 |
+| `training_dtype` | COMBO | bf16 | 训练精度：bf16、fp32 |
+| `lora_dtype` | COMBO | bf16 | LoRA 权重精度：bf16、fp32 |
+| `algorithm` | COMBO | lora | 训练算法（lora、lokr、oft 等） |
+| `gradient_checkpointing` | BOOLEAN | true | 训练时减少显存使用 |
+| `checkpoint_depth` | INT | 1 | 梯度检查点深度级别（1-5） |
+| `offloading` | BOOLEAN | false | 将模型卸载到内存（需要 bypass 模式） |
+| `existing_lora` | COMBO | [None] | 从现有 LoRA 继续训练 |
+| `bucket_mode` | BOOLEAN | false | 启用分辨率分桶以支持多分辨率数据集 |
+| `bypass_mode` | BOOLEAN | false | 通过钩子应用适配器而非修改权重 |
+
+### 输出
+
+| 输出 | 类型 | 描述 |
+|------|------|------|
+| `lora` | LORA_MODEL | 训练好的 LoRA 权重 |
+| `loss_map` | LOSS_MAP | 训练损失历史 |
+| `steps` | INT | 完成的总训练步数 |
+
+## Load LoRA Model 节点
+
+将训练好的 LoRA 权重应用到扩散模型。当使用来自 Train LoRA 节点的 LoRA 权重时，请使用此节点而非标准的 Load LoRA 节点。
+
+### 输入参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|------|------|--------|------|
+| `model` | MODEL | - | 基础扩散模型 |
+| `lora` | LORA_MODEL | - | 训练好的 LoRA 权重 |
+| `strength_model` | FLOAT | 1.0 | LoRA 强度（-100 到 100） |
+| `bypass` | BOOLEAN | false | 不修改基础权重直接应用 LoRA |
+
+### 输出
+
+| 输出 | 类型 | 描述 |
+|------|------|------|
+| `model` | MODEL | 应用了 LoRA 的模型 |
+
+## Save LoRA Weights 节点
+
+将训练好的 LoRA 权重导出为 safetensors 文件到输出文件夹。
+
+### 输入参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|------|------|--------|------|
+| `lora` | LORA_MODEL | - | 要保存的训练好的 LoRA 权重 |
+| `prefix` | STRING | loras/ComfyUI_trained_lora | 输出文件名前缀 |
+| `steps` | INT | （可选） | 用于文件名的训练步数 |
+
+保存的文件将命名为 `{prefix}_{steps}_steps_{counter}.safetensors` 并放置在 `ComfyUI/output/loras/` 文件夹中。
+
+## Plot Loss Graph 节点
+
+通过绘制训练步骤中的损失值来可视化训练进度。
+
+### 输入参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|------|------|--------|------|
+| `loss` | LOSS_MAP | - | 来自 Train LoRA 的损失历史 |
+| `filename_prefix` | STRING | loss_graph | 输出文件名前缀 |
+
+## 训练技巧
+
+### 显存优化
+
+- 启用 **gradient_checkpointing** 可显著减少显存使用（默认已启用）
+- 使用量化模型（FP8）时使用 **bypass_mode**
+- 启用 **offloading** 在训练期间将模型移至内存（需要 bypass_mode）
+- 如果遇到内存不足错误，请降低 **batch_size**
+
+### 数据集准备
+
+- 尽可能使用一致的图像尺寸，或启用 **bucket_mode** 进行多分辨率训练
+- 确保条件输入数量与潜空间图像数量匹配
+- 质量比数量更重要——从 10-20 张高质量图像开始
+
+### 训练参数
+
+- **rank**：大多数情况从 8-16 开始。更高的秩（32-64）可捕获更多细节但可能过拟合
+- **steps**：从 100-500 步开始，监控损失图
+- **learning_rate**：默认值 0.0005 适用于大多数情况。更低的值（0.0001）可获得更稳定的训练
+
+### 继续训练
+
+从 **existing_lora** 下拉菜单中选择现有 LoRA 以从之前保存的检查点继续训练。总步数将累积。
+
+## 支持的算法
+
+**algorithm** 参数支持多种权重适配器类型：
+
+- **lora**：标准低秩适应（推荐）
+- **lokr**：带 Kronecker 积分解的 LoCon
+- **oft**：正交微调
+
+## 示例：单主题 LoRA
+
+训练特定主题 LoRA 的最小工作流：
+
+1. 使用 **Load Image** 加载训练图像
+2. 使用 **VAE Encode** 将图像编码为潜空间表示
+3. 使用 **CLIP Text Encode** 创建描述文字（例如 "a photo of [subject]"）
+4. 连接到 **Train LoRA** 并设置：
+   - steps: 200
+   - rank: 16
+   - learning_rate: 0.0001
+5. 使用 **Save LoRA Weights** 保存
+6. 使用 **Load LoRA Model** 连接到推理工作流进行测试
+
+<Note>
+当使用不同描述训练多张图像时，请连接多个条件输入以匹配潜空间批次大小。
+</Note>

From e02d5e029c412e981b22a16f00eef7192c381898 Mon Sep 17 00:00:00 2001
From: ComfyUI Wiki <contact@comfyui-wiki.com>
Date: Thu, 9 Apr 2026 23:26:28 +0800
Subject: [PATCH 2/4] Remove the LoRA training documentation in Chinese
 (lora-training.mdx) as part of content restructuring.

---
 {zh-CN => zh}/tutorials/training/lora-training.mdx | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename {zh-CN => zh}/tutorials/training/lora-training.mdx (100%)

diff --git a/zh-CN/tutorials/training/lora-training.mdx b/zh/tutorials/training/lora-training.mdx
similarity index 100%
rename from zh-CN/tutorials/training/lora-training.mdx
rename to zh/tutorials/training/lora-training.mdx

From 19da300df4f6b2dfb22141b09bfcc8fc8580385a Mon Sep 17 00:00:00 2001
From: ComfyUI Wiki <contact@comfyui-wiki.com>
Date: Thu, 9 Apr 2026 23:52:01 +0800
Subject: [PATCH 3/4] Update docs.json

---
 docs.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs.json b/docs.json
index d4f1c90d2..74b1189ac 100644
--- a/docs.json
+++ b/docs.json
@@ -2227,7 +2227,7 @@
                   {
                     "group": "训练",
                     "pages": [
-                      "zh-CN/tutorials/training/lora-training"
+                      "zh/tutorials/training/lora-training"
                     ]
                   },
                   {

From 71fe661feb6d346039edccc6f712499c87f2b745 Mon Sep 17 00:00:00 2001
From: ComfyUI Wiki <contact@comfyui-wiki.com>
Date: Fri, 10 Apr 2026 00:21:22 +0800
Subject: [PATCH 4/4] Update documentation for LoRA training nodes and
 restructure training resources

- Changed references from "lora-training" to "overview" in the documentation structure.
- Added new Japanese training overview section for LoRA.
- Updated input and output descriptions for LoraModelLoader and TrainLoraNode to clarify usage and parameters.
- Removed outdated training documentation in both English and Chinese to streamline content.
---
 docs.json                               |  10 +-
 ja/built-in-nodes/LoraModelLoader.mdx   |  15 +-
 ja/built-in-nodes/TrainLoraNode.mdx     |  44 +++---
 ja/tutorials/training/overview.mdx      | 137 +++++++++++++++++
 tutorials/training/lora-training.mdx    | 192 ------------------------
 tutorials/training/overview.mdx         | 137 +++++++++++++++++
 zh/built-in-nodes/LoraModelLoader.mdx   |  13 +-
 zh/built-in-nodes/TrainLoraNode.mdx     |  42 +++---
 zh/tutorials/training/lora-training.mdx | 192 ------------------------
 zh/tutorials/training/overview.mdx      | 137 +++++++++++++++++
 10 files changed, 480 insertions(+), 439 deletions(-)
 create mode 100644 ja/tutorials/training/overview.mdx
 delete mode 100644 tutorials/training/lora-training.mdx
 create mode 100644 tutorials/training/overview.mdx
 delete mode 100644 zh/tutorials/training/lora-training.mdx
 create mode 100644 zh/tutorials/training/overview.mdx

diff --git a/docs.json b/docs.json
index 74b1189ac..1135874ca 100644
--- a/docs.json
+++ b/docs.json
@@ -139,7 +139,7 @@
                   {
                     "group": "Training",
                     "pages": [
-                      "tutorials/training/lora-training"
+                      "tutorials/training/overview"
                     ]
                   },
                   {
@@ -2227,7 +2227,7 @@
                   {
                     "group": "训练",
                     "pages": [
-                      "zh/tutorials/training/lora-training"
+                      "zh/tutorials/training/overview"
                     ]
                   },
                   {
@@ -4317,6 +4317,12 @@
                       "ja/tutorials/basic/multiple-loras"
                     ]
                   },
+                  {
+                    "group": "トレーニング",
+                    "pages": [
+                      "ja/tutorials/training/overview"
+                    ]
+                  },
                   {
                     "group": "ControlNet",
                     "pages": [
diff --git a/ja/built-in-nodes/LoraModelLoader.mdx b/ja/built-in-nodes/LoraModelLoader.mdx
index 3146892c9..384c1081e 100644
--- a/ja/built-in-nodes/LoraModelLoader.mdx
+++ b/ja/built-in-nodes/LoraModelLoader.mdx
@@ -4,25 +4,26 @@ description: "ComfyUI における LoraModelLoader ノードの完全なドキ
 sidebarTitle: "LoraModelLoader"
 icon: "circle"
 mode: wide
-translationSourceHash: 2d17ee26
-translationFrom: built-in-nodes/LoraModelLoader.mdx, zh/built-in-nodes/LoraModelLoader.mdx
 ---
 > このドキュメントは AI によって生成されました。誤りを発見された場合、または改善に関するご提案がありましたら、ぜひご貢献ください！ [GitHub で編集する](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/LoraModelLoader/en.md)
 
-LoraModelLoader ノードは、学習済みの LoRA（Low-Rank Adaptation：低ランク適応）重みを拡散モデルに適用します。このノードは、学習済み LoRA モデルから重みを読み込み、その影響強度を調整することでベースモデルを変更します。これにより、拡散モデルをゼロから再学習することなく、その動作をカスタマイズできます。
+LoraModelLoader ノードは、学習済みの LoRA（Low-Rank Adaptation：低ランク適応）重みを拡散モデルに適用します。このノードは、学習済み LoRA モデルから重みを読み込み、その影響強度を調整することでベースモデルを変更します。**TrainLoraNode** から直接 LoRA 重みを使用する場合は、標準の LoRA ローダーではなく、このノードを使用してください。
 
 ## 入力
 
 | パラメーター | データ型 | 必須 | 範囲 | 説明 |
 |-----------|-----------|----------|-------|-------------|
 | `model` | MODEL | はい | - | LoRA を適用する対象の拡散モデルです。 |
-| `lora` | LORA_MODEL | はい | - | 拡散モデルに適用する LoRA モデルです。 |
-| `strength_model` | FLOAT | はい | -100.0 ～ 100.0 | 拡散モデルを変更する強さを指定します。この値は負の数も可（デフォルト値：1.0）。 |
+| `lora` | LORA_MODEL | はい | - | 適用する LoRA モデル（TrainLoraNode の出力または読み込み済み LoRA ファイル）です。 |
+| `strength_model` | FLOAT | はい | -100.0 ～ 100.0 | 拡散モデルへの LoRA の影響強度を指定します。負の値も使用可能（デフォルト値：1.0）。 |
+| `bypass` | BOOLEAN | はい | - | 有効にすると、ベースの重みを変更せずフォワードフックで LoRA を適用します。量化モデルに適しています（デフォルト：False）。 |
 
-**注意：** `strength_model` を 0 に設定した場合、ノードは LoRA の適用を行わず、元のモデルをそのまま返します。
+**注意：**
+- `strength_model` を 0 に設定した場合、ノードは LoRA の適用を行わず、元のモデルをそのまま返します。
+- トレーニング時に `bypass_mode` を有効にした場合は、推論時もここで `bypass` を有効にしてください。
 
 ## 出力
 
 | 出力名 | データ型 | 説明 |
 |-------------|-----------|-------------|
-| `model` | MODEL | LoRA 重みが適用された修正済み拡散モデルです。 |
\ No newline at end of file
+| `model` | MODEL | LoRA 重みが適用された拡散モデルです。 |
diff --git a/ja/built-in-nodes/TrainLoraNode.mdx b/ja/built-in-nodes/TrainLoraNode.mdx
index b8ff89530..2a17af67e 100644
--- a/ja/built-in-nodes/TrainLoraNode.mdx
+++ b/ja/built-in-nodes/TrainLoraNode.mdx
@@ -4,35 +4,36 @@ description: "ComfyUI の TrainLoraNode ノードに関する完全なドキュ
 sidebarTitle: "TrainLoraNode"
 icon: "circle"
 mode: wide
-translationSourceHash: 125473cf
-translationFrom: built-in-nodes/TrainLoraNode.mdx, zh/built-in-nodes/TrainLoraNode.mdx
-translationMismatches:
-  - "description"
 ---
 > このドキュメントは AI によって生成されました。誤りを発見された場合、または改善のご提案がある場合は、ぜひご貢献ください！ [GitHub で編集](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/TrainLoraNode/en.md)
 
-TrainLoraNode は、提供された潜在表現（latents）および条件付けデータを用いて、拡散モデル上に LoRA（低ランク適応：Low-Rank Adaptation）モデルを作成・学習します。このノードにより、カスタムの学習パラメーター、オプティマイザー、損失関数を用いたモデルのファインチューニングが可能です。ノードの出力には、LoRA を適用済みの学習済みモデル、LoRA の重み、学習時の損失指標、および完了した総学習ステップ数が含まれます。
+TrainLoraNode は、提供された潜在表現（latents）および条件付けデータを用いて、拡散モデル上に LoRA（低ランク適応：Low-Rank Adaptation）モデルを作成・学習します。このノードにより、カスタムの学習パラメーター、オプティマイザー、損失関数を用いたモデルのファインチューニングが可能です。ノードの出力には、学習済み LoRA の重み、学習時の損失履歴、および完了した総学習ステップ数が含まれます。
 
 ## 入力
 
 | パラメーター | データ型 | 必須 | 範囲 | 説明 |
 |-----------|-----------|----------|-------|-------------|
-| `model` | MODEL | はい | - | LoRA を学習させる対象となるモデルです。 |
-| `latents` | LATENT | はい | - | 学習に使用する潜在表現（latents）。モデルのデータセット／入力として機能します。 |
-| `positive` | CONDITIONING | はい | - | 学習に使用する正の条件付けデータです。 |
+| `model` | MODEL | はい | - | LoRA を学習させる対象となるベースモデルです。 |
+| `latents` | LATENT | はい | - | 学習に使用する潜在表現。モデルのデータセット／入力として機能します。リスト入力に対応。 |
+| `positive` | CONDITIONING | はい | - | 学習に使用する正の条件付けデータです。リスト入力に対応。 |
 | `batch_size` | INT | はい | 1–10000 | 学習時に使用するバッチサイズ（デフォルト：1）。 |
-| `grad_accumulation_steps` | INT | はい | 1–1024 | 学習時に使用する勾配蓄積ステップ数（デフォルト：1）。 |
-| `steps` | INT | はい | 1–100000 | LoRA の学習を行うステップ数（デフォルト：16）。 |
-| `learning_rate` | FLOAT | はい | 0.0000001–1.0 | 学習時に使用する学習率（デフォルト：0.0005）。 |
-| `rank` | INT | はい | 1–128 | LoRA 層のランク（デフォルト：8）。 |
+| `grad_accumulation_steps` | INT | はい | 1–1024 | 勾配蓄積ステップ数。複数ステップの勾配を蓄積してから重みを更新することで、VRAM を増やさずに大きなバッチサイズと同等の効果が得られます（デフォルト：1）。 |
+| `steps` | INT | はい | 1–100000 | LoRA の学習を行う総ステップ数（デフォルト：16）。 |
+| `learning_rate` | FLOAT | はい | 0.0000001–1.0 | 学習率（デフォルト：0.0005）。 |
+| `rank` | INT | はい | 1–128 | LoRA 層のランク。値が高いほど多くの詳細をとらえますが、VRAM 使用量が増加します（デフォルト：8）。 |
 | `optimizer` | COMBO | はい | "AdamW"<br />"Adam"<br />"SGD"<br />"RMSprop" | 学習時に使用するオプティマイザー（デフォルト："AdamW"）。 |
 | `loss_function` | COMBO | はい | "MSE"<br />"L1"<br />"Huber"<br />"SmoothL1" | 学習時に使用する損失関数（デフォルト："MSE"）。 |
-| `seed` | INT | はい | 0–18446744073709551615 | 学習時に使用するシード値（LoRA 重みの初期化およびノイズサンプリングにおけるジェネレーターで使用）（デフォルト：0）。 |
-| `training_dtype` | COMBO | はい | "bf16"<br />"fp32" | 学習時に使用するデータ型（デフォルト："bf16"）。 |
-| `lora_dtype` | COMBO | はい | "bf16"<br />"fp32" | LoRA に使用するデータ型（デフォルト："bf16"）。 |
-| `algorithm` | COMBO | はい | 複数の選択肢あり | 学習時に使用するアルゴリズムです。 |
-| `gradient_checkpointing` | BOOLEAN | はい | - | 学習時に勾配チェックポイントを使用するかどうか（デフォルト：True）。 |
-| `existing_lora` | COMBO | はい | 複数の選択肢あり | 追加対象となる既存の LoRA です。「None」を指定すると新規 LoRA が作成されます（デフォルト："[None]"）。 |
+| `seed` | INT | はい | 0–18446744073709551615 | LoRA 重みの初期化およびノイズサンプリングに使用するランダムシード（デフォルト：0）。 |
+| `training_dtype` | COMBO | はい | "bf16"<br />"fp32"<br />"none" | 学習時に使用するデータ型。`none` はモデルのネイティブ精度を保持します。fp16 モデルの場合は GradScaler が自動的に有効になります（デフォルト："bf16"）。 |
+| `lora_dtype` | COMBO | はい | "bf16"<br />"fp32" | LoRA 重みの保存に使用するデータ型（デフォルト："bf16"）。 |
+| `quantized_backward` | BOOLEAN | はい | - | `training_dtype` が `none` で量化モデルを使用する場合、逆伝播で量化行列積を使用します（デフォルト：False）。 |
+| `algorithm` | COMBO | はい | "LoRA"<br />"LoHa"<br />"LoKr"<br />"OFT" | 学習時に使用する重みアダプターアルゴリズム（デフォルト："LoRA"）。 |
+| `gradient_checkpointing` | BOOLEAN | はい | - | 逆伝播時にアクティベーションを再計算して VRAM を削減するグラジエントチェックポイントを有効化します（デフォルト：True）。 |
+| `checkpoint_depth` | INT | はい | 1–5 | グラジエントチェックポイントのモジュールネスト深度。深いほど VRAM 削減量が増加します（デフォルト：1）。 |
+| `offloading` | BOOLEAN | はい | - | 学習中にモデルの重みを CPU にオフロードして VRAM を節約します。`gradient_checkpointing` の有効化が必要です（デフォルト：False）。 |
+| `existing_lora` | COMBO | はい | 複数の選択肢あり | 既存の LoRA ファイルを選択してトレーニングを継続します。総ステップ数は自動的に累積されます。`[None]` は新規 LoRA の作成を意味します（デフォルト："[None]"）。 |
+| `bucket_mode` | BOOLEAN | はい | - | 解像度バケットモードを有効化します。ResolutionBucket ノードからの入力が必要です（デフォルト：False）。 |
+| `bypass_mode` | BOOLEAN | はい | - | 重みを直接変更せずフォワードフックでアダプターを適用します。量化モデルに対応（デフォルト：False）。 |
 
 **注意：** 正の条件付けデータの数は、潜在表現画像の数と一致していなければなりません。複数の画像に対して正の条件付けデータが 1 つだけ与えられた場合、その条件付けデータは自動的にすべての画像に対して繰り返し使用されます。
 
@@ -40,7 +41,6 @@ TrainLoraNode は、提供された潜在表現（latents）および条件付
 
 | 出力名 | データ型 | 説明 |
 |-------------|-----------|-------------|
-| `model_with_lora` | MODEL | 学習済み LoRA を適用済みの元のモデルです。 |
-| `lora` | LORA_MODEL | 保存可能、あるいは他のモデルへ適用可能な学習済み LoRA の重みです。 |
-| `loss` | LOSS_MAP | 時間経過に伴う学習損失値を格納した辞書です。 |
-| `steps` | INT | 完了した総学習ステップ数（既存 LoRA からの先行ステップを含む）です。 |
\ No newline at end of file
+| `lora` | LORA_MODEL | 学習済み LoRA の重み。LoraModelLoader ノードで保存または他のモデルに適用できます。 |
+| `loss_map` | LOSS_MAP | 学習過程の損失履歴。LossGraphNode に接続して可視化できます。 |
+| `steps` | INT | 完了した総学習ステップ数（既存 LoRA からの先行ステップを含む）。 |
diff --git a/ja/tutorials/training/overview.mdx b/ja/tutorials/training/overview.mdx
new file mode 100644
index 000000000..997be42a2
--- /dev/null
+++ b/ja/tutorials/training/overview.mdx
@@ -0,0 +1,137 @@
+---
+title: "ネイティブ LoRA トレーニング"
+sidebarTitle: "LoRA トレーニング"
+description: "ComfyUI の組み込みトレーニングノードを使って LoRA モデルを直接トレーニングする"
+---
+
+ComfyUI は、外部ツールやカスタムノードを必要とせずに LoRA（Low-Rank Adaptation）モデルのトレーニングをネイティブにサポートしています。このページではトレーニングワークフローの全体像を説明します。各ノードの詳細なパラメーター説明については、対応するノードドキュメントをご参照ください。
+
+<Warning>
+トレーニングノードは現在**実験的機能**としてマークされています。機能と動作は将来のリリースで変更される可能性があります。
+</Warning>
+
+## ノード概要
+
+ネイティブ LoRA トレーニングシステムは、**データセットノード**と**トレーニングノード**の2つに分かれています。
+
+### データセットノード
+
+トレーニングデータの準備と管理に使用します：
+
+| ノード | 用途 |
+|------|------|
+| [Load Image Dataset from Folder](/ja/built-in-nodes/LoadImageDataSetFromFolder) | 入力サブフォルダーから画像をバッチ読み込み |
+| [Load Image and Text Dataset from Folder](/ja/built-in-nodes/LoadImageTextDataSetFromFolder) | 画像とキャプションのペアを読み込み（kohya-ss フォルダー構造に対応） |
+| [Make Training Dataset](/ja/built-in-nodes/MakeTrainingDataset) | VAE で画像を、CLIP でテキストをエンコードしてトレーニングデータを生成 |
+| [Resolution Bucket](/ja/built-in-nodes/ResolutionBucket) | 効率的なバッチ学習のために解像度ごとにグループ化 |
+| [Save Training Dataset](/ja/built-in-nodes/SaveTrainingDataset) | エンコード済みデータセットをディスクに保存して再エンコードを省略 |
+| [Load Training Dataset](/ja/built-in-nodes/LoadTrainingDataset) | 保存済みのエンコード済みデータセットをディスクから読み込み |
+
+### トレーニングノード
+
+トレーニングの実行、結果の保存、LoRA の適用に使用します：
+
+| ノード | 用途 |
+|------|------|
+| [Train LoRA](/ja/built-in-nodes/TrainLoraNode) | 潜在表現と条件付けデータから LoRA をトレーニング |
+| [Save LoRA Weights](/ja/built-in-nodes/SaveLoRA) | トレーニング済み LoRA の重みを safetensors ファイルとして出力 |
+| [Load LoRA Model](/ja/built-in-nodes/LoraModelLoader) | トレーニング済み LoRA の重みをモデルに適用（推論用） |
+| [Plot Loss Graph](/ja/built-in-nodes/LossGraphNode) | トレーニング損失の推移を可視化 |
+
+## 必要条件
+
+- 十分な VRAM を持つ GPU（トレーニングは推論よりも多くのメモリを必要とします）
+- `ComfyUI/input/` のサブフォルダーに配置したトレーニング画像
+- ベースモデル（チェックポイント）
+
+## 典型的なトレーニングワークフロー
+
+<Steps>
+<Step title="トレーニング画像の読み込み">
+
+`ComfyUI/input/` 以下のサブフォルダーにトレーニング画像を配置します。
+
+- 画像のみの場合は **Load Image Dataset from Folder** を使用
+- 画像とキャプションのペアには **Load Image and Text Dataset from Folder** を使用（各画像に同名の `.txt` ファイルが必要）
+
+<Tip>
+10〜20 枚の高品質な画像から始めましょう。量より質が重要です。
+</Tip>
+</Step>
+
+<Step title="データセットのエンコード">
+
+画像とテキストを VAE・CLIP モデルとともに **Make Training Dataset** ノードに接続します。`latents` と `conditioning` が出力されます。
+
+同じデータセットを複数回のトレーニングに使う場合は、**Save Training Dataset** で保存しておき、次回以降は **Load Training Dataset** から読み込むことで再エンコードを省略できます。
+</Step>
+
+<Step title="（任意）解像度バケット">
+
+画像のサイズが異なる場合は、エンコード済みデータを **Resolution Bucket** ノードで解像度ごとにグループ化し、Train LoRA ノードで **bucket_mode** を有効にすることで効率的なバッチトレーニングが可能になります。
+</Step>
+
+<Step title="Train LoRA の設定と実行">
+
+モデル、潜在表現、条件付けデータを **Train LoRA** ノードに接続し、必要に応じてパラメーターを調整します。
+
+推奨の初期設定：
+
+| パラメーター | 推奨初期値 |
+|-----------|-----------|
+| `steps` | 100〜500 |
+| `rank` | 8〜16 |
+| `learning_rate` | 0.0001〜0.0005 |
+| `optimizer` | AdamW |
+| `loss_function` | MSE |
+
+ノードは学習済みの `lora` 重み、`loss_map`、完了した `steps` 数を出力します。
+</Step>
+
+<Step title="トレーニング進捗の監視">
+
+`loss_map` を **Plot Loss Graph** ノードに接続して損失曲線を確認します。損失が収束したらトレーニングを停止できます。
+</Step>
+
+<Step title="LoRA の保存とテスト">
+
+`lora` を **Save LoRA Weights** に接続すると `ComfyUI/output/loras/` に `.safetensors` ファイルが保存されます。
+
+推論ワークフローでは **Load LoRA Model** ノードを使ってトレーニング済み LoRA をベースモデルに適用し、結果を確認します。
+</Step>
+</Steps>
+
+## VRAM の最適化
+
+| 手法 | 説明 |
+|------|------|
+| **gradient_checkpointing**（デフォルトで有効） | 逆伝播時にアクティベーションを再計算して VRAM を削減 |
+| **batch_size** を下げる | 最も直接的な VRAM 削減方法 |
+| **grad_accumulation_steps** を上げる | 追加 VRAM なしで大きなバッチサイズと同等の効果 |
+| **offloading** | モデルの重みを CPU に移動。gradient_checkpointing の有効化が必要 |
+| **bypass_mode** | 重みを直接変更せずフォワードフックでアダプターを適用。量化モデル（FP8/FP4）に必要 |
+
+## 量化モデルのトレーニング
+
+FP8/FP4 などの量化モデルで LoRA をトレーニングする場合、**Train LoRA** に以下の設定を使用します：
+
+- `training_dtype`: `none`
+- `quantized_backward`: 有効
+- `bypass_mode`: 有効
+
+推論時は **Load LoRA Model** でも `bypass` を有効にしてください。
+
+## トレーニングの続行
+
+**Train LoRA** の `existing_lora` に保存済みの LoRA ファイルを指定すると、チェックポイントから学習を再開できます。総ステップ数は自動的に累積されます。
+
+## サポートされているアルゴリズム
+
+**Train LoRA** の `algorithm` パラメーターで重みアダプターの種類を選択します：
+
+| アルゴリズム | 説明 |
+|-----------|------|
+| **LoRA** | 標準の低ランク適応。ほとんどのケースで推奨 |
+| **LoHa** | アダマール積を使った低ランク適応 |
+| **LoKr** | クロネッカー積を使った低ランク適応。よりパラメーター効率が高い |
+| **OFT** | 直交ファインチューニング（実験的） |
diff --git a/tutorials/training/lora-training.mdx b/tutorials/training/lora-training.mdx
deleted file mode 100644
index c3aab33bf..000000000
--- a/tutorials/training/lora-training.mdx
+++ /dev/null
@@ -1,192 +0,0 @@
----
-title: "Native LoRA training"
-sidebarTitle: "LoRA Training"
-description: "Train LoRA models directly in ComfyUI using built-in training nodes"
----
-
-ComfyUI includes native support for training LoRA (Low-Rank Adaptation) models without requiring external tools or custom nodes. This guide covers how to use the built-in training nodes to create your own LoRAs.
-
-<Warning>
-The training nodes are marked as **experimental**. Features and behavior may change in future releases.
-</Warning>
-
-## Overview
-
-The native LoRA training system consists of four nodes:
-
-| Node | Category | Purpose |
-|------|----------|---------|
-| **Train LoRA** | training | Trains a LoRA model from latents and conditioning |
-| **Load LoRA Model** | loaders | Applies trained LoRA weights to a model |
-| **Save LoRA Weights** | loaders | Exports LoRA weights to a safetensors file |
-| **Plot Loss Graph** | training | Visualizes training loss over time |
-
-## Requirements
-
-- A GPU with sufficient VRAM (training typically requires more memory than inference)
-- Latent images (encoded from your training dataset)
-- Text conditioning (captions for your training images)
-
-## Basic training workflow
-
-<Steps>
-<Step title="Prepare your dataset">
-Encode your training images to latents using a VAE Encode node. Create text conditioning for each image using CLIP Text Encode.
-
-<Tip>
-For best results, use high-quality images that represent the style or subject you want to train.
-</Tip>
-</Step>
-
-<Step title="Configure the Train LoRA node">
-Connect your model, latents, and conditioning to the Train LoRA node. Set the training parameters:
-
-- **batch_size**: Number of samples per training step (default: 1)
-- **steps**: Total training iterations (default: 16)
-- **learning_rate**: How quickly the model adapts (default: 0.0005)
-- **rank**: LoRA rank - higher values capture more detail but use more memory (default: 8)
-</Step>
-
-<Step title="Run training">
-Execute the workflow. The node will output:
-- **lora**: The trained LoRA weights
-- **loss_map**: Training loss history
-- **steps**: Total steps completed
-</Step>
-
-<Step title="Save and use your LoRA">
-Connect the output to **Save LoRA Weights** to export your trained LoRA. Use **Load LoRA Model** to apply it during inference.
-</Step>
-</Steps>
-
-## Train LoRA node
-
-The main training node that creates LoRA weights from your dataset.
-
-### Inputs
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `model` | MODEL | - | Base model to train the LoRA on |
-| `latents` | LATENT | - | Encoded training images |
-| `positive` | CONDITIONING | - | Text conditioning for training |
-| `batch_size` | INT | 1 | Samples per step (1-10000) |
-| `grad_accumulation_steps` | INT | 1 | Gradient accumulation steps (1-1024) |
-| `steps` | INT | 16 | Training iterations (1-100000) |
-| `learning_rate` | FLOAT | 0.0005 | Learning rate (0.0000001-1.0) |
-| `rank` | INT | 8 | LoRA rank (1-128) |
-| `optimizer` | COMBO | AdamW | Optimizer: AdamW, Adam, SGD, RMSprop |
-| `loss_function` | COMBO | MSE | Loss function: MSE, L1, Huber, SmoothL1 |
-| `seed` | INT | 0 | Random seed for reproducibility |
-| `training_dtype` | COMBO | bf16 | Training precision: bf16, fp32 |
-| `lora_dtype` | COMBO | bf16 | LoRA weight precision: bf16, fp32 |
-| `algorithm` | COMBO | lora | Training algorithm (lora, lokr, oft, etc.) |
-| `gradient_checkpointing` | BOOLEAN | true | Reduces VRAM usage during training |
-| `checkpoint_depth` | INT | 1 | Depth level for gradient checkpointing (1-5) |
-| `offloading` | BOOLEAN | false | Offload model to RAM (requires bypass mode) |
-| `existing_lora` | COMBO | [None] | Continue training from existing LoRA |
-| `bucket_mode` | BOOLEAN | false | Enable resolution bucketing for multi-resolution datasets |
-| `bypass_mode` | BOOLEAN | false | Apply adapters via hooks instead of weight modification |
-
-### Outputs
-
-| Output | Type | Description |
-|--------|------|-------------|
-| `lora` | LORA_MODEL | Trained LoRA weights |
-| `loss_map` | LOSS_MAP | Training loss history |
-| `steps` | INT | Total training steps completed |
-
-## Load LoRA Model node
-
-Applies trained LoRA weights to a diffusion model. Use this instead of the standard Load LoRA node when working with LoRA weights directly from the Train LoRA node.
-
-### Inputs
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `model` | MODEL | - | Base diffusion model |
-| `lora` | LORA_MODEL | - | Trained LoRA weights |
-| `strength_model` | FLOAT | 1.0 | LoRA strength (-100 to 100) |
-| `bypass` | BOOLEAN | false | Apply LoRA without modifying base weights |
-
-### Output
-
-| Output | Type | Description |
-|--------|------|-------------|
-| `model` | MODEL | Model with LoRA applied |
-
-## Save LoRA Weights node
-
-Exports trained LoRA weights to a safetensors file in your output folder.
-
-### Inputs
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `lora` | LORA_MODEL | - | Trained LoRA weights to save |
-| `prefix` | STRING | loras/ComfyUI_trained_lora | Output filename prefix |
-| `steps` | INT | (optional) | Training steps for filename |
-
-The saved file will be named `{prefix}_{steps}_steps_{counter}.safetensors` and placed in your `ComfyUI/output/loras/` folder.
-
-## Plot Loss Graph node
-
-Visualizes training progress by plotting loss values over training steps.
-
-### Inputs
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `loss` | LOSS_MAP | - | Loss history from Train LoRA |
-| `filename_prefix` | STRING | loss_graph | Output filename prefix |
-
-## Training tips
-
-### VRAM optimization
-
-- Enable **gradient_checkpointing** to significantly reduce VRAM usage (enabled by default)
-- Use **bypass_mode** when working with quantized models (FP8)
-- Enable **offloading** to move the model to RAM during training (requires bypass_mode)
-- Lower the **batch_size** if you encounter out-of-memory errors
-
-### Dataset preparation
-
-- Use consistent image dimensions when possible, or enable **bucket_mode** for multi-resolution training
-- Match the number of conditioning inputs to the number of latent images
-- Quality matters more than quantity—start with 10-20 high-quality images
-
-### Training parameters
-
-- **rank**: Start with 8-16 for most use cases. Higher ranks (32-64) capture more detail but may overfit
-- **steps**: Start with 100-500 steps and monitor the loss graph
-- **learning_rate**: The default 0.0005 works well for most cases. Lower values (0.0001) for more stable training
-
-### Continuing training
-
-Select an existing LoRA from the **existing_lora** dropdown to continue training from a previously saved checkpoint. The total step count will accumulate.
-
-## Supported algorithms
-
-The **algorithm** parameter supports multiple weight adapter types:
-
-- **lora**: Standard Low-Rank Adaptation (recommended)
-- **lokr**: LoCon with Kronecker product decomposition
-- **oft**: Orthogonal Fine-Tuning
-
-## Example: Single-subject LoRA
-
-A minimal workflow for training a LoRA on a specific subject:
-
-1. Load your training images with **Load Image**
-2. Encode images to latents with **VAE Encode**
-3. Create captions with **CLIP Text Encode** (e.g., "a photo of [subject]")
-4. Connect to **Train LoRA** with:
-   - steps: 200
-   - rank: 16
-   - learning_rate: 0.0001
-5. Save with **Save LoRA Weights**
-6. Test with **Load LoRA Model** connected to your inference workflow
-
-<Note>
-For training on multiple images with different captions, connect multiple conditioning inputs to match your latent batch size.
-</Note>
diff --git a/tutorials/training/overview.mdx b/tutorials/training/overview.mdx
new file mode 100644
index 000000000..0aa0c01e7
--- /dev/null
+++ b/tutorials/training/overview.mdx
@@ -0,0 +1,137 @@
+---
+title: "Native LoRA Training"
+sidebarTitle: "LoRA Training"
+description: "Train LoRA models directly in ComfyUI using built-in training nodes"
+---
+
+ComfyUI includes native support for training LoRA (Low-Rank Adaptation) models without requiring external tools or custom nodes. This page provides an overview of the training workflow. For detailed parameter documentation, refer to the individual node pages linked below.
+
+<Warning>
+The training nodes are marked as **experimental**. Features and behavior may change in future releases.
+</Warning>
+
+## Node Overview
+
+The native LoRA training system is organized into **dataset nodes** and **training nodes**.
+
+### Dataset Nodes
+
+Used to prepare and manage training data:
+
+| Node | Purpose |
+|------|---------|
+| [Load Image Dataset from Folder](/built-in-nodes/LoadImageDataSetFromFolder) | Batch-load images from an input subfolder |
+| [Load Image and Text Dataset from Folder](/built-in-nodes/LoadImageTextDataSetFromFolder) | Load images with paired captions (supports kohya-ss folder structure) |
+| [Make Training Dataset](/built-in-nodes/MakeTrainingDataset) | Encode images with VAE and text with CLIP to produce training data |
+| [Resolution Bucket](/built-in-nodes/ResolutionBucket) | Group latents by resolution for efficient batched training |
+| [Save Training Dataset](/built-in-nodes/SaveTrainingDataset) | Save encoded dataset to disk to avoid re-encoding on future runs |
+| [Load Training Dataset](/built-in-nodes/LoadTrainingDataset) | Load a previously saved encoded dataset from disk |
+
+### Training Nodes
+
+Used to run training, save results, and apply the LoRA:
+
+| Node | Purpose |
+|------|---------|
+| [Train LoRA](/built-in-nodes/TrainLoraNode) | Train a LoRA from latents and conditioning data |
+| [Save LoRA Weights](/built-in-nodes/SaveLoRA) | Export trained LoRA weights to a safetensors file |
+| [Load LoRA Model](/built-in-nodes/LoraModelLoader) | Apply trained LoRA weights to a model for inference |
+| [Plot Loss Graph](/built-in-nodes/LossGraphNode) | Visualize training loss over time |
+
+## Requirements
+
+- A GPU with sufficient VRAM (training typically requires more memory than inference)
+- Training images placed in a subfolder under `ComfyUI/input/`
+- A base model (checkpoint)
+
+## Typical Training Workflow
+
+<Steps>
+<Step title="Load training images">
+
+Place your training images in a subfolder under `ComfyUI/input/`.
+
+- Use **Load Image Dataset from Folder** for images only
+- Use **Load Image and Text Dataset from Folder** for image–caption pairs (each image needs a matching `.txt` file with the same base name)
+
+<Tip>
+Start with 10–20 high-quality images. Quality matters more than quantity.
+</Tip>
+</Step>
+
+<Step title="Encode the dataset">
+
+Connect images and text to **Make Training Dataset** along with a VAE and CLIP model. This produces `latents` and `conditioning` outputs.
+
+To reuse the same dataset across multiple training runs, save it with **Save Training Dataset** and load it later with **Load Training Dataset** — no re-encoding needed.
+</Step>
+
+<Step title="(Optional) Resolution bucketing">
+
+If your images have varying dimensions, pass the encoded data through **Resolution Bucket** to group them by resolution, then enable **bucket_mode** in the Train LoRA node for efficient batched training.
+</Step>
+
+<Step title="Configure and run Train LoRA">
+
+Connect the model, latents, and conditioning to **Train LoRA** and adjust parameters as needed.
+
+Recommended starting values:
+
+| Parameter | Starting value |
+|-----------|---------------|
+| `steps` | 100–500 |
+| `rank` | 8–16 |
+| `learning_rate` | 0.0001–0.0005 |
+| `optimizer` | AdamW |
+| `loss_function` | MSE |
+
+The node outputs trained `lora` weights, a `loss_map`, and the completed `steps` count.
+</Step>
+
+<Step title="Monitor training progress">
+
+Connect `loss_map` to **Plot Loss Graph** to visualize the loss curve. Training can be stopped once the loss plateaus.
+</Step>
+
+<Step title="Save and test your LoRA">
+
+Connect `lora` to **Save LoRA Weights** to export a `.safetensors` file to `ComfyUI/output/loras/`.
+
+In your inference workflow, use **Load LoRA Model** to apply the trained LoRA to the base model and test the results.
+</Step>
+</Steps>
+
+## VRAM Optimization
+
+| Technique | Notes |
+|-----------|-------|
+| **gradient_checkpointing** (on by default) | Reduces VRAM by recomputing activations during backward pass |
+| Lower **batch_size** | Most direct way to reduce VRAM |
+| Higher **grad_accumulation_steps** | Equivalent to a larger batch size with no extra VRAM cost |
+| **offloading** | Moves model weights to CPU; requires gradient_checkpointing to be enabled |
+| **bypass_mode** | Applies adapters via forward hooks instead of weight modification; required for quantized models (FP8/FP4) |
+
+## Quantized Model Training
+
+To train a LoRA on a quantized model (FP8/FP4), use these settings in **Train LoRA**:
+
+- `training_dtype`: `none`
+- `quantized_backward`: enabled
+- `bypass_mode`: enabled
+
+Also enable `bypass` in **Load LoRA Model** when using the resulting LoRA for inference.
+
+## Continuing Training
+
+Set `existing_lora` in **Train LoRA** to an existing saved LoRA file to resume from a checkpoint. The total step count accumulates automatically.
+
+## Supported Algorithms
+
+The `algorithm` parameter in **Train LoRA** selects the weight adapter type:
+
+| Algorithm | Notes |
+|-----------|-------|
+| **LoRA** | Standard low-rank adaptation — recommended for most use cases |
+| **LoHa** | Hadamard-product low-rank adaptation |
+| **LoKr** | Kronecker-product low-rank adaptation — more parameter-efficient |
+| **OFT** | Orthogonal Fine-Tuning (experimental) |
diff --git a/zh/built-in-nodes/LoraModelLoader.mdx b/zh/built-in-nodes/LoraModelLoader.mdx
index 18999d455..b76092ef5 100644
--- a/zh/built-in-nodes/LoraModelLoader.mdx
+++ b/zh/built-in-nodes/LoraModelLoader.mdx
@@ -7,20 +7,23 @@ mode: wide
 ---
 > 本文档由 AI 生成。如果您发现任何错误或有改进建议，欢迎贡献！ [在 GitHub 上编辑](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/LoraModelLoader/zh.md)
 
-LoraModelLoader 节点将训练好的 LoRA（低秩自适应）权重应用于扩散模型。它通过从训练好的 LoRA 模型加载权重并调整其影响强度来修改基础模型。这使您能够定制扩散模型的行为，而无需从头开始重新训练。
+LoraModelLoader 节点将训练好的 LoRA（低秩适应）权重应用于扩散模型。它通过加载 LoRA 权重并调整其影响强度来修改基础模型。当使用来自 **TrainLoraNode** 的训练结果时，请使用此节点而非标准的 LoRA 加载器。
 
 ## 输入参数
 
 | 参数 | 数据类型 | 必需 | 取值范围 | 描述 |
 |-----------|-----------|----------|-------|-------------|
 | `model` | MODEL | 是 | - | 要应用 LoRA 的扩散模型。 |
-| `lora` | LORA_MODEL | 是 | - | 要应用于扩散模型的 LoRA 模型。 |
-| `strength_model` | FLOAT | 是 | -100.0 到 100.0 | 修改扩散模型的强度。该值可以为负数（默认值：1.0）。 |
+| `lora` | LORA_MODEL | 是 | - | 要应用的 LoRA 模型（来自 TrainLoraNode 输出或已加载的 LoRA 文件）。 |
+| `strength_model` | FLOAT | 是 | -100.0 到 100.0 | LoRA 对扩散模型的影响强度，可以为负数（默认值：1.0）。 |
+| `bypass` | BOOLEAN | 是 | - | 启用后，通过前向钩子应用 LoRA 而不修改基础权重，适用于量化模型（默认值：False）。 |
 
-**注意：** 当 `strength_model` 设置为 0 时，节点将返回原始模型，不应用任何 LoRA 修改。
+**注意：**
+- 当 `strength_model` 设置为 0 时，节点返回原始模型，不应用任何 LoRA 修改。
+- 若在训练时启用了 `bypass_mode`，推理时也应将此处的 `bypass` 设置为 `True`。
 
 ## 输出结果
 
 | 输出名称 | 数据类型 | 描述 |
 |-------------|-----------|-------------|
-| `model` | MODEL | 应用了 LoRA 权重后的修改版扩散模型。 |
+| `model` | MODEL | 应用了 LoRA 权重后的扩散模型。 |
diff --git a/zh/built-in-nodes/TrainLoraNode.mdx b/zh/built-in-nodes/TrainLoraNode.mdx
index b7054e424..bfc798fd5 100644
--- a/zh/built-in-nodes/TrainLoraNode.mdx
+++ b/zh/built-in-nodes/TrainLoraNode.mdx
@@ -7,36 +7,40 @@ mode: wide
 ---
 > 本文档由 AI 生成。如果您发现任何错误或有改进建议，欢迎贡献！ [在 GitHub 上编辑](https://github.com/Comfy-Org/embedded-docs/blob/main/comfyui_embedded_docs/docs/TrainLoraNode/zh.md)
 
-TrainLoraNode 使用提供的潜空间数据和条件数据，在扩散模型上创建并训练 LoRA（低秩适应）模型。该节点允许您使用自定义训练参数、优化器和损失函数来微调模型。节点输出应用了 LoRA 的训练后模型、LoRA 权重、训练损失指标以及完成的总训练步数。
+TrainLoraNode 使用提供的潜空间数据和条件数据，在扩散模型上创建并训练 LoRA（低秩适应）模型。该节点允许您使用自定义训练参数、优化器和损失函数来微调模型。节点输出训练好的 LoRA 权重、训练损失指标以及完成的总训练步数。
 
 ## 输入参数
 
 | 参数名 | 数据类型 | 必填 | 取值范围 | 描述 |
 |-----------|-----------|----------|-------|-------------|
 | `model` | MODEL | 是 | - | 要训练 LoRA 的基础模型。 |
-| `latents` | LATENT | 是 | - | 用于训练的潜空间数据，作为模型的数据集/输入。 |
-| `positive` | CONDITIONING | 是 | - | 用于训练的正向条件数据。 |
-| `batch_size` | INT | 是 | 1-10000 | 训练时使用的批大小（默认值：1）。 |
-| `grad_accumulation_steps` | INT | 是 | 1-1024 | 训练时使用的梯度累积步数（默认值：1）。 |
-| `steps` | INT | 是 | 1-100000 | 训练 LoRA 的步数（默认值：16）。 |
-| `learning_rate` | FLOAT | 是 | 0.0000001-1.0 | 训练时使用的学习率（默认值：0.0005）。 |
-| `rank` | INT | 是 | 1-128 | LoRA 层的秩（默认值：8）。 |
+| `latents` | LATENT | 是 | - | 用于训练的潜空间数据，作为模型的数据集/输入。支持列表输入。 |
+| `positive` | CONDITIONING | 是 | - | 用于训练的正向条件数据。支持列表输入。 |
+| `batch_size` | INT | 是 | 1–10000 | 每步训练使用的样本数（默认值：1）。 |
+| `grad_accumulation_steps` | INT | 是 | 1–1024 | 梯度累积步数。累积多步梯度后再更新权重，效果等同于增大 batch size，但不增加显存占用（默认值：1）。 |
+| `steps` | INT | 是 | 1–100000 | 训练 LoRA 的总步数（默认值：16）。 |
+| `learning_rate` | FLOAT | 是 | 0.0000001–1.0 | 训练时使用的学习率（默认值：0.0005）。 |
+| `rank` | INT | 是 | 1–128 | LoRA 层的秩。值越高可捕获更多细节，但显存占用越大（默认值：8）。 |
 | `optimizer` | COMBO | 是 | "AdamW"<br />"Adam"<br />"SGD"<br />"RMSprop" | 训练时使用的优化器（默认值："AdamW"）。 |
 | `loss_function` | COMBO | 是 | "MSE"<br />"L1"<br />"Huber"<br />"SmoothL1" | 训练时使用的损失函数（默认值："MSE"）。 |
-| `seed` | INT | 是 | 0-18446744073709551615 | 训练时使用的随机种子（用于 LoRA 权重初始化和噪声采样的生成器）（默认值：0）。 |
-| `training_dtype` | COMBO | 是 | "bf16"<br />"fp32" | 训练时使用的数据类型（默认值："bf16"）。 |
-| `lora_dtype` | COMBO | 是 | "bf16"<br />"fp32" | LoRA 使用的数据类型（默认值："bf16"）。 |
-| `algorithm` | COMBO | 是 | 多种可选算法 | 训练时使用的算法。 |
-| `gradient_checkpointing` | BOOLEAN | 是 | - | 训练时是否使用梯度检查点（默认值：True）。 |
-| `existing_lora` | COMBO | 是 | 多种可选选项 | 要附加到的现有 LoRA。设置为 None 表示创建新的 LoRA（默认值："[None]"）。 |
+| `seed` | INT | 是 | 0–18446744073709551615 | 随机种子，用于 LoRA 权重初始化和噪声采样（默认值：0）。 |
+| `training_dtype` | COMBO | 是 | "bf16"<br />"fp32"<br />"none" | 训练使用的数据类型。`none` 表示保留模型原生精度；对 fp16 模型会自动启用 GradScaler（默认值："bf16"）。 |
+| `lora_dtype` | COMBO | 是 | "bf16"<br />"fp32" | LoRA 权重存储使用的数据类型（默认值："bf16"）。 |
+| `quantized_backward` | BOOLEAN | 是 | - | 当 `training_dtype` 为 `none` 且使用量化模型时，启用量化矩阵乘法进行反向传播（默认值：False）。 |
+| `algorithm` | COMBO | 是 | "LoRA"<br />"LoHa"<br />"LoKr"<br />"OFT" | 训练时使用的权重适配器算法（默认值："LoRA"）。 |
+| `gradient_checkpointing` | BOOLEAN | 是 | - | 启用梯度检查点，通过重计算激活值来减少显存使用（默认值：True）。 |
+| `checkpoint_depth` | INT | 是 | 1–5 | 梯度检查点的模块嵌套深度，深度越大节省的显存越多（默认值：1）。 |
+| `offloading` | BOOLEAN | 是 | - | 训练期间将模型权重卸载到 CPU 以节省显存，需同时启用 `gradient_checkpointing`（默认值：False）。 |
+| `existing_lora` | COMBO | 是 | 多种可选选项 | 选择现有 LoRA 文件以继续训练，总步数会自动累积。设置为 `[None]` 表示创建新的 LoRA（默认值："[None]"）。 |
+| `bucket_mode` | BOOLEAN | 是 | - | 启用分辨率分桶模式，需要来自 ResolutionBucket 节点的输入（默认值：False）。 |
+| `bypass_mode` | BOOLEAN | 是 | - | 通过前向钩子而非直接修改权重来应用适配器，适用于量化模型（默认值：False）。 |
 
-**注意：** 正向条件数据的数量必须与潜空间图像的数量匹配。如果只提供了一个正向条件数据但有多个图像，该条件数据将自动为所有图像重复使用。
+**注意：** 正向条件数据的数量必须与潜空间图像的数量匹配。如果只提供了一个正向条件但有多张图像，该条件将自动为所有图像重复使用。
 
 ## 输出结果
 
 | 输出名称 | 数据类型 | 描述 |
 |-------------|-----------|-------------|
-| `model_with_lora` | MODEL | 应用了训练后 LoRA 的原始模型。 |
-| `lora` | LORA_MODEL | 训练后的 LoRA 权重，可以保存或应用于其他模型。 |
-| `loss` | LOSS_MAP | 包含随时间变化的训练损失值的字典。 |
-| `steps` | INT | 完成的总训练步数（包括现有 LoRA 的任何先前步数）。 |
+| `lora` | LORA_MODEL | 训练好的 LoRA 权重，可保存或通过 LoraModelLoader 节点应用于其他模型。 |
+| `loss_map` | LOSS_MAP | 训练过程中的损失历史，可连接到 LossGraphNode 进行可视化。 |
+| `steps` | INT | 完成的总训练步数（包括从现有 LoRA 继承的先前步数）。 |
diff --git a/zh/tutorials/training/lora-training.mdx b/zh/tutorials/training/lora-training.mdx
deleted file mode 100644
index 66c04fd76..000000000
--- a/zh/tutorials/training/lora-training.mdx
+++ /dev/null
@@ -1,192 +0,0 @@
----
-title: "原生 LoRA 训练"
-sidebarTitle: "LoRA 训练"
-description: "使用内置训练节点直接在 ComfyUI 中训练 LoRA 模型"
----
-
-ComfyUI 原生支持训练 LoRA（Low-Rank Adaptation）模型，无需外部工具或自定义节点。本指南介绍如何使用内置训练节点创建自己的 LoRA。
-
-<Warning>
-训练节点目前标记为**实验性功能**。功能和行为可能会在未来版本中发生变化。
-</Warning>
-
-## 概述
-
-原生 LoRA 训练系统包含四个节点：
-
-| 节点 | 类别 | 用途 |
-|------|------|------|
-| **Train LoRA** | training | 从潜空间图像和条件训练 LoRA 模型 |
-| **Load LoRA Model** | loaders | 将训练好的 LoRA 权重应用到模型 |
-| **Save LoRA Weights** | loaders | 将 LoRA 权重导出为 safetensors 文件 |
-| **Plot Loss Graph** | training | 可视化训练过程中的损失变化 |
-
-## 系统要求
-
-- 具有足够显存的 GPU（训练通常比推理需要更多内存）
-- 潜空间图像（从训练数据集编码而来）
-- 文本条件（训练图像的描述文字）
-
-## 基础训练流程
-
-<Steps>
-<Step title="准备数据集">
-使用 VAE Encode 节点将训练图像编码为潜空间表示。使用 CLIP Text Encode 为每张图像创建文本条件。
-
-<Tip>
-为获得最佳效果，请使用能代表您想要训练的风格或主题的高质量图像。
-</Tip>
-</Step>
-
-<Step title="配置 Train LoRA 节点">
-将模型、潜空间图像和条件连接到 Train LoRA 节点。设置训练参数：
-
-- **batch_size**：每个训练步骤的样本数（默认：1）
-- **steps**：总训练迭代次数（默认：16）
-- **learning_rate**：模型适应速度（默认：0.0005）
-- **rank**：LoRA 秩 - 更高的值可以捕获更多细节但使用更多内存（默认：8）
-</Step>
-
-<Step title="运行训练">
-执行工作流。节点将输出：
-- **lora**：训练好的 LoRA 权重
-- **loss_map**：训练损失历史
-- **steps**：完成的总步数
-</Step>
-
-<Step title="保存和使用 LoRA">
-将输出连接到 **Save LoRA Weights** 以导出训练好的 LoRA。使用 **Load LoRA Model** 在推理时应用它。
-</Step>
-</Steps>
-
-## Train LoRA 节点
-
-从数据集创建 LoRA 权重的主要训练节点。
-
-### 输入参数
-
-| 参数 | 类型 | 默认值 | 描述 |
-|------|------|--------|------|
-| `model` | MODEL | - | 用于训练 LoRA 的基础模型 |
-| `latents` | LATENT | - | 编码后的训练图像 |
-| `positive` | CONDITIONING | - | 训练用的文本条件 |
-| `batch_size` | INT | 1 | 每步样本数（1-10000） |
-| `grad_accumulation_steps` | INT | 1 | 梯度累积步数（1-1024） |
-| `steps` | INT | 16 | 训练迭代次数（1-100000） |
-| `learning_rate` | FLOAT | 0.0005 | 学习率（0.0000001-1.0） |
-| `rank` | INT | 8 | LoRA 秩（1-128） |
-| `optimizer` | COMBO | AdamW | 优化器：AdamW、Adam、SGD、RMSprop |
-| `loss_function` | COMBO | MSE | 损失函数：MSE、L1、Huber、SmoothL1 |
-| `seed` | INT | 0 | 随机种子，用于可复现性 |
-| `training_dtype` | COMBO | bf16 | 训练精度：bf16、fp32 |
-| `lora_dtype` | COMBO | bf16 | LoRA 权重精度：bf16、fp32 |
-| `algorithm` | COMBO | lora | 训练算法（lora、lokr、oft 等） |
-| `gradient_checkpointing` | BOOLEAN | true | 训练时减少显存使用 |
-| `checkpoint_depth` | INT | 1 | 梯度检查点深度级别（1-5） |
-| `offloading` | BOOLEAN | false | 将模型卸载到内存（需要 bypass 模式） |
-| `existing_lora` | COMBO | [None] | 从现有 LoRA 继续训练 |
-| `bucket_mode` | BOOLEAN | false | 启用分辨率分桶以支持多分辨率数据集 |
-| `bypass_mode` | BOOLEAN | false | 通过钩子应用适配器而非修改权重 |
-
-### 输出
-
-| 输出 | 类型 | 描述 |
-|------|------|------|
-| `lora` | LORA_MODEL | 训练好的 LoRA 权重 |
-| `loss_map` | LOSS_MAP | 训练损失历史 |
-| `steps` | INT | 完成的总训练步数 |
-
-## Load LoRA Model 节点
-
-将训练好的 LoRA 权重应用到扩散模型。当使用来自 Train LoRA 节点的 LoRA 权重时，请使用此节点而非标准的 Load LoRA 节点。
-
-### 输入参数
-
-| 参数 | 类型 | 默认值 | 描述 |
-|------|------|--------|------|
-| `model` | MODEL | - | 基础扩散模型 |
-| `lora` | LORA_MODEL | - | 训练好的 LoRA 权重 |
-| `strength_model` | FLOAT | 1.0 | LoRA 强度（-100 到 100） |
-| `bypass` | BOOLEAN | false | 不修改基础权重直接应用 LoRA |
-
-### 输出
-
-| 输出 | 类型 | 描述 |
-|------|------|------|
-| `model` | MODEL | 应用了 LoRA 的模型 |
-
-## Save LoRA Weights 节点
-
-将训练好的 LoRA 权重导出为 safetensors 文件到输出文件夹。
-
-### 输入参数
-
-| 参数 | 类型 | 默认值 | 描述 |
-|------|------|--------|------|
-| `lora` | LORA_MODEL | - | 要保存的训练好的 LoRA 权重 |
-| `prefix` | STRING | loras/ComfyUI_trained_lora | 输出文件名前缀 |
-| `steps` | INT | （可选） | 用于文件名的训练步数 |
-
-保存的文件将命名为 `{prefix}_{steps}_steps_{counter}.safetensors` 并放置在 `ComfyUI/output/loras/` 文件夹中。
-
-## Plot Loss Graph 节点
-
-通过绘制训练步骤中的损失值来可视化训练进度。
-
-### 输入参数
-
-| 参数 | 类型 | 默认值 | 描述 |
-|------|------|--------|------|
-| `loss` | LOSS_MAP | - | 来自 Train LoRA 的损失历史 |
-| `filename_prefix` | STRING | loss_graph | 输出文件名前缀 |
-
-## 训练技巧
-
-### 显存优化
-
-- 启用 **gradient_checkpointing** 可显著减少显存使用（默认已启用）
-- 使用量化模型（FP8）时使用 **bypass_mode**
-- 启用 **offloading** 在训练期间将模型移至内存（需要 bypass_mode）
-- 如果遇到内存不足错误，请降低 **batch_size**
-
-### 数据集准备
-
-- 尽可能使用一致的图像尺寸，或启用 **bucket_mode** 进行多分辨率训练
-- 确保条件输入数量与潜空间图像数量匹配
-- 质量比数量更重要——从 10-20 张高质量图像开始
-
-### 训练参数
-
-- **rank**：大多数情况从 8-16 开始。更高的秩（32-64）可捕获更多细节但可能过拟合
-- **steps**：从 100-500 步开始，监控损失图
-- **learning_rate**：默认值 0.0005 适用于大多数情况。更低的值（0.0001）可获得更稳定的训练
-
-### 继续训练
-
-从 **existing_lora** 下拉菜单中选择现有 LoRA 以从之前保存的检查点继续训练。总步数将累积。
-
-## 支持的算法
-
-**algorithm** 参数支持多种权重适配器类型：
-
-- **lora**：标准低秩适应（推荐）
-- **lokr**：带 Kronecker 积分解的 LoCon
-- **oft**：正交微调
-
-## 示例：单主题 LoRA
-
-训练特定主题 LoRA 的最小工作流：
-
-1. 使用 **Load Image** 加载训练图像
-2. 使用 **VAE Encode** 将图像编码为潜空间表示
-3. 使用 **CLIP Text Encode** 创建描述文字（例如 "a photo of [subject]"）
-4. 连接到 **Train LoRA** 并设置：
-   - steps: 200
-   - rank: 16
-   - learning_rate: 0.0001
-5. 使用 **Save LoRA Weights** 保存
-6. 使用 **Load LoRA Model** 连接到推理工作流进行测试
-
-<Note>
-当使用不同描述训练多张图像时，请连接多个条件输入以匹配潜空间批次大小。
-</Note>
diff --git a/zh/tutorials/training/overview.mdx b/zh/tutorials/training/overview.mdx
new file mode 100644
index 000000000..27a1f2395
--- /dev/null
+++ b/zh/tutorials/training/overview.mdx
@@ -0,0 +1,137 @@
+---
+title: "原生 LoRA 训练"
+sidebarTitle: "LoRA 训练"
+description: "使用内置训练节点直接在 ComfyUI 中训练 LoRA 模型"
+---
+
+ComfyUI 原生支持训练 LoRA（Low-Rank Adaptation）模型，无需外部工具或自定义节点。本文档提供训练流程的整体概览，各节点的详细参数说明请参阅对应的节点文档。
+
+<Warning>
+训练节点目前标记为**实验性功能**。功能和行为可能会在未来版本中发生变化。
+</Warning>
+
+## 节点概览
+
+原生 LoRA 训练系统由**数据集节点**和**训练节点**两个部分组成。
+
+### 数据集节点
+
+用于准备和管理训练数据：
+
+| 节点 | 用途 |
+|------|------|
+| [Load Image Dataset from Folder](/zh/built-in-nodes/LoadImageDataSetFromFolder) | 从输入文件夹批量加载图像 |
+| [Load Image and Text Dataset from Folder](/zh/built-in-nodes/LoadImageTextDataSetFromFolder) | 加载图像及对应的文字说明（支持 kohya-ss 目录结构）|
+| [Make Training Dataset](/zh/built-in-nodes/MakeTrainingDataset) | 用 VAE 和 CLIP 编码图像与文本，生成训练数据 |
+| [Resolution Bucket](/zh/built-in-nodes/ResolutionBucket) | 按分辨率分桶，以便高效批量训练 |
+| [Save Training Dataset](/zh/built-in-nodes/SaveTrainingDataset) | 将编码后的数据集保存到磁盘，避免重复编码 |
+| [Load Training Dataset](/zh/built-in-nodes/LoadTrainingDataset) | 从磁盘加载已保存的编码数据集 |
+
+### 训练节点
+
+用于执行训练、保存和推理：
+
+| 节点 | 用途 |
+|------|------|
+| [Train LoRA](/zh/built-in-nodes/TrainLoraNode) | 从潜空间图像和条件数据训练 LoRA 模型 |
+| [Save LoRA Weights](/zh/built-in-nodes/SaveLoRA) | 将 LoRA 权重导出为 safetensors 文件 |
+| [Load LoRA Model](/zh/built-in-nodes/LoraModelLoader) | 将训练好的 LoRA 权重应用到模型（用于推理）|
+| [Plot Loss Graph](/zh/built-in-nodes/LossGraphNode) | 可视化训练过程中的损失变化 |
+
+## 系统要求
+
+- 具有足够显存的 GPU（训练通常比推理需要更多内存）
+- 训练图像（存放在 `ComfyUI/input/` 的子文件夹中）
+- 基础模型（checkpoint）
+
+## 典型训练流程
+
+<Steps>
+<Step title="加载训练图像">
+
+将训练图像放入 `ComfyUI/input/` 下的子文件夹。
+
+- 若只需图像，使用 **Load Image Dataset from Folder**
+- 若需要图文配对，使用 **Load Image and Text Dataset from Folder**（每张图像需对应同名 `.txt` 说明文件）
+
+<Tip>
+从 10–20 张高质量图像开始，质量比数量更重要。
+</Tip>
+</Step>
+
+<Step title="编码数据集">
+
+将图像和文本连接到 **Make Training Dataset** 节点，提供 VAE 和 CLIP 模型进行编码，得到 `latents` 和 `conditioning` 输出。
+
+若需要多次训练同一数据集，可用 **Save Training Dataset** 将编码结果保存到磁盘，后续直接用 **Load Training Dataset** 加载，避免重复编码。
+</Step>
+
+<Step title="（可选）分辨率分桶">
+
+若训练图像尺寸不一致，将编码后的数据通过 **Resolution Bucket** 节点按分辨率分组，并在 Train LoRA 节点中启用 **bucket_mode**，以实现高效的批量训练。
+</Step>
+
+<Step title="配置并运行 Train LoRA">
+
+将模型、潜空间数据和条件数据连接到 **Train LoRA** 节点，根据需要调整训练参数后执行工作流。
+
+常用起始配置：
+
+| 参数 | 推荐起始值 |
+|------|-----------|
+| `steps` | 100–500 |
+| `rank` | 8–16 |
+| `learning_rate` | 0.0001–0.0005 |
+| `optimizer` | AdamW |
+| `loss_function` | MSE |
+
+节点将输出训练好的 `lora` 权重、`loss_map` 损失历史和完成的 `steps` 数。
+</Step>
+
+<Step title="监控训练进度">
+
+将 `loss_map` 连接到 **Plot Loss Graph** 节点，查看训练损失曲线。损失趋于平稳后即可停止训练。
+</Step>
+
+<Step title="保存并测试 LoRA">
+
+将 `lora` 输出连接到 **Save LoRA Weights** 导出为 `.safetensors` 文件，保存在 `ComfyUI/output/loras/` 目录中。
+
+在推理工作流中，使用 **Load LoRA Model** 节点将训练好的 LoRA 应用到基础模型进行测试。
+</Step>
+</Steps>
+
+## 显存优化
+
+| 方案 | 说明 |
+|------|------|
+| **gradient_checkpointing**（默认启用） | 通过重计算激活值减少显存占用 |
+| 降低 **batch_size** | 最直接的减少显存方式 |
+| 增大 **grad_accumulation_steps** | 在不增加显存的前提下等效增大 batch size |
+| **offloading** | 将模型权重卸载到 CPU，需同时启用 gradient_checkpointing |
+| **bypass_mode** | 通过前向钩子应用适配器，适用于量化模型（FP8/FP4）|
+
+## 量化模型训练
+
+对 FP8/FP4 等量化模型训练 LoRA，推荐在 **Train LoRA** 中使用以下配置：
+
+- `training_dtype`: `none`
+- `quantized_backward`: 启用
+- `bypass_mode`: 启用
+
+推理时在 **Load LoRA Model** 中也需启用 `bypass`。
+
+## 继续训练
+
+在 **Train LoRA** 的 `existing_lora` 参数中选择已保存的 LoRA 文件，即可从之前的检查点继续训练，总步数会自动累积。
+
+## 支持的训练算法
+
+通过 **Train LoRA** 的 `algorithm` 参数可选择不同的权重适配器：
+
+| 算法 | 说明 |
+|------|------|
+| **LoRA** | 标准低秩适应，推荐用于大多数场景 |
+| **LoHa** | 基于 Hadamard 积的低秩适应 |
+| **LoKr** | 基于 Kronecker 积的低秩适应，参数效率更高 |
+| **OFT** | 正交微调（实验性） |