Feature Request: Add grammar support (BNF/xgrammar) to vLLM backend

### 🚀 Feature Request: Add grammar support (BNF/xgrammar) to vLLM backend

#### 📌 Overview

I would like to request the addition of formal grammar support (via BNF or xgrammar) to the `vLLM` backend in LocalAI. This feature would allow users to enforce structured outputs from LLMs using context-free grammars, which is particularly useful for generating JSON, code, XML, or other machine-readable formats with strict syntactic rules.

#### 📚 Background

- **vLLM Documentation**: The official [vLLM documentation](https://vllm.ai/) highlights its support for **speculative decoding** and **PagedAttention**, but currently does not support **structured output via grammars**.
- **Current Limitation in LocalAI**: While LocalAI already supports constrained grammars through the `llama.cpp` backend (via `--grammar` or `grammar` parameter in the API), this functionality is **not available for the `vLLM` backend**.
- **Use Case Example**: Users want to generate valid JSON responses for API integrations, or generate Python code that can be directly executed, but are limited by vLLM's lack of grammar enforcement.

#### ✅ Acceptance Criteria

- [ ] Support for BNF (Backus-Naur Form) and/or xgrammar syntax in the vLLM backend.
- [ ] Grammar validation during model generation (via API request parameters).
- [ ] Compatible with the OpenAI-like API spec (e.g., `grammar: { type: "xgrammar", value: "..." }`).
- [ ] Documentation updated in [LocalAI's feature documentation](https://localai.io/features/constrained_grammars/) and [vLLM integration guide](https://localai.io/backends/vllm/).
- [ ] Working example in the [LocalAI-examples](https://github.com/mudler/LocalAI-examples) repo (e.g., `examples/grammar-vllm.json`).

#### 🔗 Relevant Links

- vLLM GitHub: https://github.com/vllm-project/vllm
- vLLM Docs on Structured Output: https://vllm.ai/en/latest/serving/structured_outputs.html
- LocalAI Constrained Grammars Feature: https://localai.io/features/constrained_grammars/
- LocalAI vLLM Backend: https://github.com/mudler/LocalAI/tree/master/backends/vllm
- Example Grammar (xgrammar): https://github.com/vllm-project/vllm/blob/main/examples/xgrammar_example.py

#### 🧑‍💻 Suggested Implementation

- Integrate `xgrammar` or `grammar` parsing logic from `llama.cpp` into the vLLM backend.
- Use vLLM’s native `speculative decoding` and `grammar`-based sampling via `vllm.SamplingParams`.
- Ensure the grammar is passed through the HTTP API layer as a JSON string.

#### 🏷️ Labels

`enhancement`, `vLLM`, `grammar`, `structured-output`

#### 👥 Tagging

@mudler (project maintainer) 
@U08FLGN0QJE (core contributor, vLLM-related work) 

> ✅ **Note**: This request is inspired by similar issues in other frameworks (e.g., HuggingFace Transformers, Llama.cpp), and aligns with the growing need for reliable structured output generation in LLM applications.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Add grammar support (BNF/xgrammar) to vLLM backend #6857

🚀 Feature Request: Add grammar support (BNF/xgrammar) to vLLM backend

📌 Overview

📚 Background

✅ Acceptance Criteria

🔗 Relevant Links

🧑‍💻 Suggested Implementation

🏷️ Labels

👥 Tagging

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Feature Request: Add grammar support (BNF/xgrammar) to vLLM backend #6857

Description

🚀 Feature Request: Add grammar support (BNF/xgrammar) to vLLM backend

📌 Overview

📚 Background

✅ Acceptance Criteria

🔗 Relevant Links

🧑‍💻 Suggested Implementation

🏷️ Labels

👥 Tagging

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions