Name and Version
llama-server version b9524
Operating systems
Windows
GGML backends
CUDA
Hardware
RTX 2080 ti
Models
gemma-4-12b-it-UD-Q4_K_XL.gguf
Problem description & steps to reproduce
llama-server ignores --chat-template-file when --mmproj is provided.
Expected:
--chat-template-file should be validated and override GGUF metadata template regardless of --mmproj.
Actual:
When --mmproj is present, llama-server uses the GGUF embedded tokenizer.chat_template and silently ignores --chat-template-file. Passing a nonexistent template file path produces no error. Removing --mmproj makes the same bad path error, and a valid custom template is loaded correctly.
Repro:
- Run llama-server with -m model.gguf --jinja --chat-template-file nonexistent.jinja
Result: error, as expected.
- Run llama-server with -m model.gguf --mmproj mmproj.gguf --jinja --chat-template-file nonexistent.jinja
Result: no error, server starts and uses GGUF template.
First Bad Commit
No response
Relevant log output
n/a
Name and Version
llama-server version b9524
Operating systems
Windows
GGML backends
CUDA
Hardware
RTX 2080 ti
Models
gemma-4-12b-it-UD-Q4_K_XL.gguf
Problem description & steps to reproduce
llama-server ignores --chat-template-file when --mmproj is provided.
Expected:
--chat-template-file should be validated and override GGUF metadata template regardless of --mmproj.
Actual:
When --mmproj is present, llama-server uses the GGUF embedded tokenizer.chat_template and silently ignores --chat-template-file. Passing a nonexistent template file path produces no error. Removing --mmproj makes the same bad path error, and a valid custom template is loaded correctly.
Repro:
Result: error, as expected.
Result: no error, server starts and uses GGUF template.
First Bad Commit
No response
Relevant log output
n/a