Comprehensive AI inference and tooling stack for EU-regulated on-premises and hybrid platform operations, deployed as a single Helm chart.
Includes Open WebUI, Ollama, Qdrant, Apache Tika, SearXNG, Valkey, Open Terminal, MCPO, LangGraph, PostgreSQL (standalone, CloudNativePG, or external), Authelia for OIDC/SSO/MFA, an async ingestion worker, and an OpenTelemetry Collector with PII redaction.
Designed for governance-as-code environments with PSA restricted baseline, NetworkPolicy default-deny, and OpenTelemetry instrumentation hooks.
The chart composes seven logical planes behind a single ingress, modelled as a
best-practice reference for Open WebUI and agentic workloads. For chart-managed
application components, solid borders mark default-enabled components and
dashed borders mark opt-in components; edge concerns such as ingress
exposure vary by deployment and are shown solid for layout reasons. Edges drawn
dashed indicate conditional or optional dependencies (e.g., Authelia uses
Valkey only when enabled and Postgres only when authelia.storage=postgres).
Tiering (T0/T1/T2) follows the Component Tiers table.
For the long-form rationale, integration patterns, and production hardening checklist, see docs/architecture/REFERENCE.md.
Two flows drive the design:
- Conversational + RAG — Open WebUI consumes any OpenAI-compatible model (Ollama for local inference, External APIs for hosted), retrieves with Qdrant (with web fallback via SearXNG), and exposes tools through MCPO so any MCP server is reachable as an OpenAPI tool.
- Agentic — LangGraph runs stateful, long-horizon agents with PostgreSQL as the checkpointer/store, Qdrant for semantic memory, and the same MCPO tool surface so tool definitions are shared across UI and agent runtimes. Async document ingestion is decoupled via the Valkey-backed Ingestion Worker so the request path stays non-blocking.
graph LR
Client([User / Client])
subgraph Edge["Edge & Identity (T0)"]
Ingress[Ingress / Envoy Gateway]
Authelia["Authelia<br/>OIDC · MFA"]
end
subgraph Experience["Experience (T1)"]
OpenWebUI["Open WebUI<br/>chat · pipelines · RAG"]
Workbench["Workbench<br/>GPU notebooks"]
end
subgraph Inference["Inference (T1) — OpenAI-compatible"]
Ollama["Ollama<br/>local LLM + embeddings"]
ExternalAPIs["External APIs<br/>OpenAI · Anthropic · Gemini · …"]
end
subgraph Knowledge["Knowledge & Retrieval"]
Qdrant["Qdrant<br/>vector DB · hybrid search (T1)"]
Tika["Tika<br/>extract · OCR (T2)"]
SearXNG["SearXNG<br/>web search (T2)"]
end
subgraph Agentic["Agentic Runtime & Tools"]
LangGraph["LangGraph<br/>stateful agents · HITL (T1)"]
MCPO["MCPO<br/>MCP → OpenAPI gateway (T2)"]
OpenTerminal["Open Terminal<br/>sandboxed shell (T2)"]
end
subgraph Async["Async, Memory & State"]
IngestionWorker["Ingestion Worker<br/>extract → embed → upsert (T2)"]
Valkey["Valkey<br/>cache · session · streams (T2)"]
Postgres["PostgreSQL<br/>checkpointer · store (T2)"]
end
subgraph Telemetry["Observability (T0)"]
OTel["OTel Collector<br/>GenAI semconv · PII redaction"]
end
Client --> Ingress
Ingress --> Authelia
Ingress --> OpenWebUI
Ingress --> Workbench
Authelia -.->|OIDC| OpenWebUI
Authelia -.->|sessions, if enabled| Valkey
Authelia -.->|storage=postgres| Postgres
%% Open WebUI: conversational + RAG path
OpenWebUI -->|OpenAI API| Ollama
OpenWebUI -->|OpenAI API| ExternalAPIs
OpenWebUI -->|OpenAI API| LangGraph
OpenWebUI -->|retrieve| Qdrant
OpenWebUI -->|extract on upload| Tika
OpenWebUI -->|web fallback| SearXNG
OpenWebUI -->|tool calls| MCPO
OpenWebUI -->|sessions| Valkey
OpenWebUI -.->|sandbox exec| OpenTerminal
Workbench --> Ollama
Workbench --> Qdrant
Workbench --> Tika
Workbench --> SearXNG
%% Agentic path: shared tool surface, persistent memory
LangGraph -->|inference| Ollama
LangGraph -->|inference| ExternalAPIs
LangGraph -->|semantic memory| Qdrant
LangGraph -->|web| SearXNG
LangGraph -->|tools| MCPO
LangGraph -->|checkpointer · store| Postgres
LangGraph -.->|sandbox exec| OpenTerminal
%% Async ingestion (non-blocking)
OpenWebUI -.->|enqueue| Valkey
IngestionWorker -->|XREAD| Valkey
IngestionWorker -->|extract| Tika
IngestionWorker -->|embed| Ollama
IngestionWorker -->|upsert| Qdrant
%% MCPO fans out to MCP servers (any tool)
MCPO -.->|stdio · http| OpenTerminal
%% Telemetry
OpenWebUI -.->|OTLP| OTel
Workbench -.->|OTLP| OTel
LangGraph -.->|OTLP| OTel
MCPO -.->|OTLP| OTel
Ollama -.->|OTLP| OTel
Qdrant -.->|OTLP| OTel
IngestionWorker -.->|OTLP| OTel
Authelia -.->|OTLP| OTel
classDef optIn stroke-dasharray: 5 5
class Authelia,Workbench,LangGraph,MCPO,OpenTerminal,IngestionWorker,Postgres,ExternalAPIs,OTel optIn
- One model abstraction. Local (Ollama), hosted (External APIs), and agentic (LangGraph) all expose an OpenAI-compatible interface so Open WebUI can route to any of them from the model picker without bespoke adapters.
- MCP as the universal tool layer. MCPO is the single tool gateway shared by Open WebUI and LangGraph, so a tool authored once (filesystem, search, sandbox shell, internal APIs) is usable by both chat and agents.
- Persistent agent state. LangGraph uses PostgreSQL as the checkpointer and long-term store; semantic memory lives in Qdrant. CloudNativePG is the recommended production mode for HA and PITR.
- Async ingestion off the request path. Document upload returns immediately; the Ingestion Worker performs extract → embed → upsert through Valkey Streams, with retries and status tracking.
- Defence in depth. Authelia gates the edge with OIDC/MFA, NetworkPolicies default-deny per-component, and the OTel collector applies PII redaction before telemetry leaves the cluster.
Components are classified by operational criticality:
| Tier | Meaning | Components |
|---|---|---|
| T0 | Safety / Integrity — non-negotiable for security and compliance | OTel Collector, Authelia |
| T1 | Operational — core inference and decision-making services | Open WebUI, Ollama, Qdrant, Workbench, LangGraph |
| T2 | Productivity — supporting services and optional tooling | Tika, SearXNG, Valkey, Open Terminal, MCPO, PostgreSQL, Ingestion Worker |
Image versions are defined in values.yaml per component. For a full software bill of materials including licenses and dependency graph, see sbom.cdx.json.
- Kubernetes 1.27+
- Helm 3.12+
- A StorageClass for PersistentVolumeClaims (or use
emptyDirfor lab) - (Optional) NVIDIA GPU Operator for Ollama / Workbench GPU acceleration
- (Optional) Prometheus Operator CRDs for ServiceMonitor resources
- (Optional) cert-manager for automated TLS certificate provisioning
- (Optional) CloudNativePG operator v1.25+ for HA PostgreSQL (
postgres.mode: cnpg)
# Install with lab defaults
helm install ai-stack . -n ai-stack --create-namespace
# Lab with GPU enabled for Ollama
helm install ai-stack . -n ai-stack --create-namespace \
--set ollama.gpu.enabled=true
# Production overlay
helm install ai-stack . -n ai-stack --create-namespace \
-f values.yaml -f values-prod.yamlPull your first models:
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull llama3.2
kubectl exec -n ai-stack deploy/ai-stack-ollama -- ollama pull nomic-embed-textAccess Open WebUI:
kubectl port-forward -n ai-stack svc/ai-stack-openwebui 8080:8080
# Open http://localhost:8080The chart ships two value files:
| File | Purpose |
|---|---|
values.yaml |
Full reference with all defaults (lab profile) |
values-prod.yaml |
Production overlay — HA, TLS ingress, GPU, stricter resources, OTel |
| Parameter | Description | Default |
|---|---|---|
global.profile |
Deployment profile (lab or prod) |
lab |
global.namespace |
Target namespace | ai-stack |
global.imagePullPolicy |
Image pull policy | IfNotPresent |
global.storageClass |
Storage class for all PVCs | "" (cluster default) |
global.podSecurityStandard |
PSA enforcement level | restricted |
global.networkPolicy.enabled |
Deploy default-deny NetworkPolicies | true |
global.otel.enabled |
Deploy OTel Collector and inject env vars | false |
global.otel.endpoint |
OTLP endpoint | http://otel-collector....:4317 |
global.serviceMonitor.enabled |
Create Prometheus ServiceMonitor CRDs | false |
Every component can be individually enabled or disabled:
openwebui:
enabled: true # Primary UI (default: true)
ollama:
enabled: true # LLM inference (default: true)
qdrant:
enabled: true # Vector DB for RAG (default: true)
tika:
enabled: true # Document extraction (default: true)
searxng:
enabled: true # Web search (default: true)
valkey:
enabled: true # Session cache (default: true)
workbench:
enabled: false # GPU ML workbench (opt-in)
openTerminal:
enabled: false # Sandboxed terminal for AI agents (opt-in)
mcpo:
enabled: false # MCP-to-OpenAPI proxy (opt-in)
langgraph:
enabled: false # LangGraph agentic runtime (opt-in)
postgres:
enabled: false # PostgreSQL for LangGraph checkpoints (opt-in)
ingestionWorker:
enabled: false # Async document ingestion worker (opt-in)
authelia:
enabled: false # OIDC identity provider for SSO/MFA (opt-in)The chart auto-generates secrets on first install for:
- Qdrant API key (
qdrant-secret) - SearXNG secret key (
searxng-secret) - Workbench token (
workbench-secret) - Open Terminal API key (
open-terminal-secret) - MCPO API key (
mcpo-secret) - LangGraph API key (
langgraph-secret) - PostgreSQL password (
postgres-secret) - Authelia secrets (
authelia-secret) — JWT secret, session secret, storage encryption key, OIDC client secret
Secrets are annotated with helm.sh/resource-policy: keep so they survive helm upgrade. To use an external secret manager (e.g., ESO or Vault), set the corresponding value:
qdrant:
apiKey: "your-external-key"
searxng:
secretKey: "your-external-key"
openTerminal:
apiKey: "your-external-key"
mcpo:
apiKey: "your-external-key"
langgraph:
apiKey: "your-external-key"
postgres:
password: "your-external-password"ollama:
gpu:
enabled: true
count: 1
resourceName: nvidia.com/gpu
workbench:
enabled: true
gpu:
enabled: true
count: 1
resourceName: nvidia.com/gpuopenwebui:
ingress:
enabled: true
className: "envoy"
hosts:
- host: ai.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: ai-tls
hosts:
- ai.example.comAdd cloud-hosted LLM providers (OpenAI, Azure OpenAI, Anthropic, Gemini, Mistral, etc.) alongside local Ollama inference:
externalAPIs:
enabled: true
providers:
- name: openai
baseUrl: "https://api.openai.com/v1"
apiKey: "sk-..."
- name: gemini
baseUrl: "https://generativelanguage.googleapis.com/v1beta/openai"
apiKey: "AIza..."API keys are stored in Kubernetes Secrets. For production, use an external secret manager:
externalAPIs:
enabled: true
providers:
- name: openai
baseUrl: "https://api.openai.com/v1"
existingSecret:
name: "my-openai-secret"
key: "api-key"When enabled, Open WebUI users can select external models from the model picker alongside locally-served Ollama models. HTTPS egress (port 443) is automatically added to the Open WebUI NetworkPolicy.
Enable stateful agentic workflows with LangGraph Platform. Requires PostgreSQL for checkpoint persistence:
langgraph:
enabled: true
postgres:
enabled: trueLangGraph connects to Ollama for LLM inference, Qdrant for vector retrieval, Tika for document extraction, and SearXNG for web search. Deploy custom graphs by either:
- Custom image (recommended): Build with
langgraph build -t my-graphsand overridelanggraph.image.repository/tag - Volume mount: Place graph code in the
/deps/graphspersistent volume
The chart supports three PostgreSQL provisioning modes:
| Mode | Use case | HA | Managed by |
|---|---|---|---|
standalone |
Lab / dev — single-instance Deployment | No | Helm chart |
cnpg |
Production — CloudNativePG operator cluster | Yes (3 instances, streaming replication, automated failover) | CNPG operator |
external |
Bring-your-own managed PostgreSQL (RDS, Cloud SQL, etc.) | Depends on provider | External |
# Production HA with CloudNativePG
postgres:
enabled: true
mode: cnpg
tls:
mode: require
cnpg:
instances: 3
pooler:
enabled: true # PgBouncer connection pooling
# External managed database
postgres:
enabled: true
mode: external
database: "langgraph"
user: "langgraph"
external:
host: "my-rds-instance.abc123.us-east-1.rds.amazonaws.com"
port: 5432
existingSecret:
name: "rds-password"
key: "password"The ingestion worker consumes tasks from a Valkey Stream and orchestrates: Tika extract, chunk, Ollama embed, Qdrant upsert. Enables non-blocking document uploads with automatic retry and status tracking.
ingestionWorker:
enabled: true
valkey:
persistence:
enabled: true # Recommended: persist Valkey Streams across restartsProducers enqueue tasks via XADD:
XADD ingestion:documents * task_id <id> file_url <url> filename <name>
Track status via HGETALL ingestion:status:<task_id>.
Enable Authelia as an OpenID Connect identity provider for Open WebUI. When enabled, Open WebUI is automatically configured as an OIDC client (OAUTH_* environment variables are injected). Authelia uses Valkey for session storage (when available) and supports SQLite (lab) or PostgreSQL (prod) as its storage backend.
authelia:
enabled: true
domain: "example.local"
defaultPolicy: "one_factor" # or "two_factor" for MFA
oidc:
clientId: "openwebui"
issuerUrl: "https://auth.example.local"
ingress:
enabled: true
className: "envoy"
hosts:
- host: auth.example.local
paths:
- path: /
pathType: Prefix
tls:
- secretName: auth-tls
hosts:
- auth.example.localFor production with PostgreSQL storage:
authelia:
enabled: true
storage: "postgres" # Uses the shared postgres component
postgres:
enabled: trueUsers are managed via a file-based backend (users_database.yml). Override by mounting a custom ConfigMap or configure LDAP. Generate password hashes with authelia crypto hash generate argon2.
When global.otel.enabled=true, the chart:
- Deploys an OTel Collector with OTLP receivers, GenAI semantic conventions, and PII redaction
- Injects
OTEL_*environment variables into all component pods - Optionally creates ServiceMonitor resources for Prometheus scraping
For production DR, use Velero with CSI volume snapshots for PVC-backed data (Qdrant, Ollama models, Open WebUI). PostgreSQL in CNPG mode supports automated backups via Barman to S3-compatible storage — see HOWTO.md §10 PostgreSQL Modes for configuration.
This chart is designed for regulated environments:
- Network isolation: Default-deny ingress and egress with per-component allowlists
- Pod Security: PSA restricted baseline —
runAsNonRoot,seccompProfile: RuntimeDefault,allowPrivilegeEscalation: false, capabilitiesdrop: [ALL] - Read-only root filesystem: Enforced for Qdrant, Valkey, Tika, SearXNG, OTel Collector
- Identity isolation: Per-component ServiceAccounts with
automountServiceAccountToken: false - Secret management: Auto-generated 64-byte credentials with support for external secret stores
- PII redaction: OTel Collector strips email addresses, SSNs, and credit card numbers from telemetry (GDPR Art 5(1)(c))
- Telemetry opt-out:
DO_NOT_TRACK,SCARF_NO_ANALYTICS,ANONYMIZED_TELEMETRY=falseset by default - Rate limiting: Envoy Gateway rate-limit annotations in production profile
- Ollama root exception: Upstream GPU access requirement; documented with
assurance.platform/security-exceptionannotation
Control and policy identifiers used in this chart are defined in docs/governance/CONTROLS.md.
| Control | Description | Implementation |
|---|---|---|
| CTL-001 | Observability | OTel Collector, ServiceMonitors |
| CTL-002 | AI gateway policy | NetworkPolicy, tier labels, boundary annotations |
| POL-001 | Least-privilege | Per-component ServiceAccounts, no automount |
| GDPR Art 5(1)(c) | Data minimisation | PII redaction in OTel pipeline |
| NIS2 | Network security | Default-deny NetworkPolicies |
| AI Act | Risk classification | Tier and boundary labeling |
All pods carry assurance.platform/* annotations for evidence pipeline integration and audit traceability.
The chart includes a machine-readable Software Bill of Materials and license compliance documentation:
| File | Format | Purpose |
|---|---|---|
| sbom.cdx.json | CycloneDX 1.6 JSON | Machine-readable SBOM with all container images, licenses, purls, and dependency graph |
| LICENSE_COMPLIANCE.md | Markdown | Human-readable license matrix, copyleft analysis, and enterprise compliance checklist |
All default-enabled components use permissive licenses (MIT, Apache-2.0, BSD-3-Clause). Notable exceptions:
- SearXNG (AGPL-3.0): Low risk when using the upstream container unmodified. See compliance doc for details.
- LangGraph API (Elastic License 2.0): Opt-in only. Permits self-hosted use but prohibits offering as a managed service.
The SBOM is validated in CI against the CycloneDX 1.6 schema and cross-checked against values.yaml to ensure completeness. Deep per-image SBOMs are generated via Syft and uploaded as CI artifacts.
The GitHub Actions workflow (lint.yaml) runs on every PR and push to main:
| Job | What it does |
|---|---|
| helm-lint | helm lint and helm template for both lab and prod profiles |
| chart-testing | ct lint with chart-testing for standards compliance |
| sbom-validate | Validates sbom.cdx.json against CycloneDX 1.6 schema; cross-checks component count against values.yaml |
| syft-sbom | Generates deep per-image SBOMs via Syft, validates them, and uploads as artifacts |
| cve-scan | Scans all container images for CVEs using Grype; emits warnings on critical vulnerabilities |
| kubeconform | Validates rendered manifests against Kubernetes JSON schemas (lab + prod profiles) |
Pre-built ArgoCD Application manifests are provided in argocd/:
| File | Profile | Notes |
|---|---|---|
argocd/application-lab.yaml |
Lab | Auto-sync disabled — suitable for development |
argocd/application-prod.yaml |
Production | Manual sync — change-control compliance |
GitHub Actions versions are managed by Dependabot. Container image versions in values.yaml are managed manually. Configuration is in .github/dependabot.yml.
After installation, verify the deployment:
# Check all pods are running
kubectl get pods -n ai-stack
# Verify NetworkPolicies are applied
kubectl get networkpolicies -n ai-stack
# Check secrets were generated
kubectl get secrets -n ai-stack
# Verify ServiceAccounts
kubectl get serviceaccounts -n ai-stack
# Check PodDisruptionBudgets
kubectl get pdb -n ai-stack
# Run Helm tests
helm test ai-stack -n ai-stack# Lint the chart
helm lint .
# Lint with production values
helm lint . -f values.yaml -f values-prod.yaml
# Template rendering check
helm template ai-stack . --debug
# Dry-run install
helm install ai-stack . --dry-run --debug -n ai-stack
# Chart-testing
ct lint --config ct.yaml --charts .| Document | Purpose |
|---|---|
| HOWTO.md | Task-oriented guide — installation, day-1 setup, RAG, GPU, scaling, upgrades, troubleshooting |
| docs/architecture/REFERENCE.md | Reference architecture — design principles, conversational + RAG flow, agentic flow, hardening checklist |
| docs/components/ | Per-component reference pages (tier, image, key values, integrations) |
| CHANGELOG.md | Detailed release notes in Keep a Changelog format |
| CONTRIBUTING.md | Pull request process, SemVer rules, security-context and governance-label requirements |
| SECURITY.md | Coordinated vulnerability disclosure policy and supported versions |
| CODE_OF_CONDUCT.md | Contributor Covenant 2.1 |
| docs/enterprise/ENTERPRISE_EVALUATION.md | Enterprise readiness evaluation checklist |
| sbom.cdx.json | CycloneDX 1.6 Software Bill of Materials |
The chart ships with templates and guidance for EU-regulated deployments:
| Document | Purpose |
|---|---|
| docs/governance/CONTROLS.md | Authoritative registry of all CTL and POL identifiers with descriptions and regulatory basis |
| EU_COMPLIANCE_CHECK.md | Gap analysis against GDPR, AI Act, NIS2, CRA, ePrivacy |
| SECURITY.md | Coordinated vulnerability disclosure (CVD) policy |
| docs/compliance/DPIA_TEMPLATE.md | Data Protection Impact Assessment template (GDPR Art. 35 + AI Act Art. 27) |
| docs/compliance/ROPA_TEMPLATE.md | Records of Processing Activities template (GDPR Art. 30) |
| docs/compliance/INCIDENT_RESPONSE.md | Incident response playbook (GDPR Art. 33/34, NIS2 Art. 23, AI Act Art. 73) |
| docs/compliance/DSAR_PROCEDURES.md | Data subject rights procedures (GDPR Art. 15–22) |
| docs/compliance/EU_OPERATIONS_GUIDE.md | Data retention, DPA guidance, encryption, content marking, training |
AI Act Art. 50(1) transparency is implemented via a configurable WEBUI_BANNER_TEXT environment variable that informs users they are interacting with an AI system.
This project is licensed under the Apache License 2.0. See LICENSE for details.
| Name | |
|---|---|
| Roman Mednitzer | r.mednitzer@outlook.com |