Cap DiffSinger vocoder num_mel_bins at 255 by KakaruHayate · Pull Request #2198 · openutau/OpenUtau

KakaruHayate · 2026-06-15T13:20:22Z

Problem

The DiffSinger vocoder currently loads any onnx model that exposes a mel input plus optional f0, with no bounds checking on its declared configuration. As a result, a malformed voicebank can ship a vocoder slot that points at a non-vocoder onnx model (for example, a DiffSinger acoustic model masquerading as a vocoder, or some other graph that requires an extra-large mel tensor to drive its internal logic).

Real DiffSinger vocoders in the wild use 80 or 128 mel bins.

Fix

Add a hard-coded upper bound DsVocoderConfig.MaxMelBins = 255 and reject both the vocoder and the acoustic-model dsconfig when num_mel_bins falls outside [1, 255].

The check fires in two places:

DsVocoder constructor: rejects the voicebank at load time with the same MessageCustomizableException other vocoder load failures use, so the entry never reaches the onnx session.
DiffSingerRenderer.InvokeDiffsinger: redundant runtime check on both the vocoder and the acoustic model cfg, alongside the existing mel_base / mel_scale validations, as defense in depth.

The threshold is intentionally hard-coded rather than a yaml field: lowering it is a deliberate code change, not something a voicebank author can override.

The DiffSinger vocoder currently loads any onnx model that exposes a mel input plus optional f0, with no bounds checking on its declared configuration. As a result, a malformed voicebank can ship a vocoder slot that points at a non-vocoder onnx model (e.g. a DiffSinger acoustic model masquerading as a vocoder, or some other graph that requires an extra-large mel tensor to drive its internal logic). Real DiffSinger vocoders in the wild use 80 or 128 mel bins. Add a hard-coded upper bound DsVocoderConfig.MaxMelBins = 255 and reject both the vocoder and the acoustic-model dsconfig when num_mel_bins falls outside [1, 255]. The check fires in two places: - DsVocoder constructor: rejects the voicebank at load time with the same MessageCustomizableException other vocoder load failures use, so the entry never reaches the onnx session. - DiffSingerRenderer.InvokeDiffsinger: redundant runtime check on both the vocoder and the acoustic model cfg, alongside the existing mel_base / mel_scale validations, as defense in depth. The threshold is intentionally hard-coded rather than a yaml field: lowering it is a deliberate code change, not something a voicebank author can override.

HuanLinOTO

Review: LGTM with nits

整体评价：功能目标清晰，安全加固方向正确，双层检查设计合理，阈值选择恰当（255）。同意合并，但提几个小建议：

建议修复

1. 错误消息措辞不统一

DsVocoder 构造器：got {config.num_mel_bins}
InvokeDiffsinger：but got {vocoder.num_mel_bins}
建议统一为 but got，方便 grep。

2. 边界条件可改为 <= 0
当前用 < 1，逻辑等价但语义可以更清晰。num_mel_bins 是 int 且默认 128，正常情况下不会为 0，<= 0 更直观：

if (config.num_mel_bins <= 0 || config.num_mel_bins > MaxMelBins)

建议补充

3. 建议补充单元测试
此类安全边界检查非常适合单元测试。建议覆盖：

num_mel_bins = 0 或负数 → 应拒绝
num_mel_bins = 256 → 应拒绝
num_mel_bins = 1 和 255 → 应接受（边界合法值）

Commit: 0ed2251
Files: DiffSingerVocoder.cs / DiffSingerRenderer.cs (+27 lines)

内容由 AI 生成，请仔细甄别

yqzhishen · 2026-06-15T15:20:08Z

This check also prevents vulnerabilities like extremely high mel bins (the only configurable dynamic dimension) that will explode the user's disk.

HuanLinOTO reviewed Jun 15, 2026

View reviewed changes

Cadlaxa added diffsinger feature-improvement labels Jun 15, 2026

yqzhishen approved these changes Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cap DiffSinger vocoder num_mel_bins at 255#2198

Cap DiffSinger vocoder num_mel_bins at 255#2198
KakaruHayate wants to merge 1 commit into
openutau:masterfrom
KakaruHayate:fix/ds-vocoder-block-non-vocoder-models

KakaruHayate commented Jun 15, 2026

Uh oh!

HuanLinOTO left a comment •

edited

Loading

Uh oh!

yqzhishen commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

KakaruHayate commented Jun 15, 2026

Problem

Fix

Uh oh!

HuanLinOTO left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review: LGTM with nits

建议修复

建议补充

Uh oh!

yqzhishen commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HuanLinOTO left a comment •

edited

Loading