Production-pattern Red Hat OpenShift AI 3.4.0 platform with bare-metal ESXi, GPU passthrough, KServe RawDeployment, DeepSeek R1 inference at 12–17 tok/s
-
Updated
Jun 26, 2026 - HTML
Production-pattern Red Hat OpenShift AI 3.4.0 platform with bare-metal ESXi, GPU passthrough, KServe RawDeployment, DeepSeek R1 inference at 12–17 tok/s
A privacy-first Slack bot that integrates local LLMs (Ollama/vLLM/LM Studio) with advanced tools like ComfyUI image generation, SearXNG local search, and On-Demand RAG Memory. Analyze files, execute Python code, and generate music - all while keeping your data inside your own network.
Add a description, image, and links to the vllm-inference topic page so that developers can more easily learn about it.
To associate your repository with the vllm-inference topic, visit your repo's landing page and select "manage topics."