🚀 Feature Request: Add grammar support (BNF/xgrammar) to vLLM backend
📌 Overview
I would like to request the addition of formal grammar support (via BNF or xgrammar) to the vLLM backend in LocalAI. This feature would allow users to enforce structured outputs from LLMs using context-free grammars, which is particularly useful for generating JSON, code, XML, or other machine-readable formats with strict syntactic rules.
📚 Background
- vLLM Documentation: The official vLLM documentation highlights its support for speculative decoding and PagedAttention, but currently does not support structured output via grammars.
- Current Limitation in LocalAI: While LocalAI already supports constrained grammars through the
llama.cpp backend (via --grammar or grammar parameter in the API), this functionality is not available for the vLLM backend.
- Use Case Example: Users want to generate valid JSON responses for API integrations, or generate Python code that can be directly executed, but are limited by vLLM's lack of grammar enforcement.
✅ Acceptance Criteria
🔗 Relevant Links
🧑💻 Suggested Implementation
- Integrate
xgrammar or grammar parsing logic from llama.cpp into the vLLM backend.
- Use vLLM’s native
speculative decoding and grammar-based sampling via vllm.SamplingParams.
- Ensure the grammar is passed through the HTTP API layer as a JSON string.
🏷️ Labels
enhancement, vLLM, grammar, structured-output
👥 Tagging
@mudler (project maintainer)
@U08FLGN0QJE (core contributor, vLLM-related work)
✅ Note: This request is inspired by similar issues in other frameworks (e.g., HuggingFace Transformers, Llama.cpp), and aligns with the growing need for reliable structured output generation in LLM applications.
🚀 Feature Request: Add grammar support (BNF/xgrammar) to vLLM backend
📌 Overview
I would like to request the addition of formal grammar support (via BNF or xgrammar) to the
vLLMbackend in LocalAI. This feature would allow users to enforce structured outputs from LLMs using context-free grammars, which is particularly useful for generating JSON, code, XML, or other machine-readable formats with strict syntactic rules.📚 Background
llama.cppbackend (via--grammarorgrammarparameter in the API), this functionality is not available for thevLLMbackend.✅ Acceptance Criteria
grammar: { type: "xgrammar", value: "..." }).examples/grammar-vllm.json).🔗 Relevant Links
🧑💻 Suggested Implementation
xgrammarorgrammarparsing logic fromllama.cppinto the vLLM backend.speculative decodingandgrammar-based sampling viavllm.SamplingParams.🏷️ Labels
enhancement,vLLM,grammar,structured-output👥 Tagging
@mudler (project maintainer)
@U08FLGN0QJE (core contributor, vLLM-related work)