Worker: E2 (ensemble: model-frameworks track)
Surface: model frameworks
Filed in: AdaWorldAPI/ndarray because ndarray is the obligatory spine; AdaWorldAPI/candle does not exist yet — repo creation is part of execution under this plan, not a precondition.
Why
After E1 confirms the spine pattern works in burn-fork (GGUF end-to-end parity passes against the upstream reference), the same playbook applies to HuggingFace candle. Today HuggingFace candle links accelerate-src on macOS and cblas on Linux. A candle-fork that swaps both for AdaWorldAPI/ndarray unlocks four production surfaces we currently cannot serve coherently from one BLAS substrate:
- ONNX runtime — clinical NLP is ONNX-first (German clinical BERT variants, med-de-identification, ICD-10 coding suggestion). Without an ONNX path on the spine, every clinical NLP integration is a one-off.
- ViT (Vision Transformer) — medical imaging: dermatology screening, retinal/slit-lamp scans, dental panoramics, derm flagging.
- Whisper — DACH medical dictation. Hausarzt practice is dictation-heavy; this is a real unmet need with no good local-first option today.
- embedanything DTO — already Jina5 / Qwen 3.5 / OpenBERT compatible (transcoded; ~month+ of work already landed). Drives entity-embedding for ontology matching against the TripletGraph spine.
The strategic point: one ndarray, two deployment modes, hardware-accelerated on both. The same model binary runs on Railway SPR-AMX (cohort/batch path) and on a Pi 5 NEON box at the practice (edge inference). For a German Hausarzt that cannot ship patient images to cloud (GDPR + Krankenhaus-IT-Sicherheitsgesetz), local ViT screening on a Pi 5 is the difference between deployable and non-deployable. That asymmetry — same binary, two accelerator backends, both first-class — is the moat.
What
Create AdaWorldAPI/candle as a fork of huggingface/candle, swap the BLAS dependency to AdaWorldAPI/ndarray, and validate four smoke surfaces plus an edge deployment matrix.
Concrete items
- Fork creation —
AdaWorldAPI/candle from huggingface/candle. Tag the upstream commit at fork time; default branch tracks our integration branch, not upstream main.
- Manifest swap — replace
accelerate-src (macOS) and cblas (Linux) with AdaWorldAPI/ndarray as the BLAS substrate across the workspace Cargo.toml and per-crate manifests. Remove the platform cfg gates that pick between accelerate-src and cblas; the ndarray dependency is the single source.
- Manual upstream merge cadence — candle is fast-moving. Document the merge process: weekly
git fetch upstream + integration-branch rebase, conflict resolution log, regression-gate (smoke tests must pass before merging upstream into our default branch). This is tracked here but the recurring work happens in the candle-fork repo, not on this issue.
candle-onnx crate — wire ONNX gemm through ndarray (same gemm path the rest of candle now uses). No bypass back to cblas.
- Smoke tests (4):
- ONNX — load + run a small German clinical encoder (e.g.
medbert-de or equivalent), parity vs onnxruntime on 10 sample sentences. Top-K embeddings agree within the tolerance defined by D1's parity harness.
- ViT — load + run
ViT-Base, parity vs reference on 10 images.
- Whisper — 10-second German clinical clip transcription, BLEU/CER vs
whisper.cpp reference.
- embedanything — Jina5 retrieval test on a small corpus, top-K agreement vs upstream.
- Edge deployment matrix doc — for each target, validate that build + ViT inference works:
- Pi 5 (NEON, 8GB RAM) — primary edge target.
- Pi Zero 2 W (NEON, 512MB) — minimum-viable edge, small models only.
- Orange Pi (NEON, varies) — third-party validation.
- x86_64 SPR (AMX) — Railway production.
- Build flag combinations validated, e.g.
--no-default-features --features simd-neon for Pi targets; AMX flags for SPR.
- License audit gate — clinical deployment requires per-model auditing (some HuggingFace models are research-only / non-commercial). Document a whitelist policy: model weights, license SPDX, commercial-use status, in-scope-for-clinical decision, last-audited date.
Architecture
AdaWorldAPI/ndarray (spine)
|
| one BLAS substrate
|
┌──────────────────┼──────────────────┐
| | |
burn-fork (E1) candle-fork (this) ...future forks
|
┌────────────────────┼────────────────────┐
| | |
ONNX ViT Whisper
(clinical NLP) (medical imaging) (DACH dictation)
| | |
└────────────────────┼────────────────────┘
|
embedanything DTO
(Jina5 / Qwen 3.5 / OpenBERT)
|
── two deployment modes ──
|
┌────────────────────┴────────────────────┐
| |
Railway SPR-AMX Pi 5 NEON edge
(cohort/batch path) (Hausarzt local inference,
GDPR / KHZG compliant)
Spine pattern (repeated from E1). The fork's job is to retarget BLAS at the spine; everything above the BLAS line — model loading, kernels, control flow — is upstream code we keep merging in. The fork is small, structural, and audit-friendly; it does not own the model logic.
Edge moat. The same compiled artefact for ViT (or Whisper, or ONNX) targets --features amx on SPR and --features simd-neon on Pi. ndarray decides which kernel to dispatch at runtime via the same gemm entry point. No model-side #[cfg], no separate Pi build of model code. That property is what makes "ship same binary to practice + cohort engine" feasible.
Acceptance criteria
Out of scope
- Ongoing upstream merge work after the initial fork — tracked here but executed in the candle-fork repo on its own cadence, not on this issue.
MedCare-rs handler wiring (ONNX/ViT/Whisper plumbed into clinical request paths) — a separate item, downstream of this one.
- Replacing
onnxruntime for training; this fork is inference-only.
- Quantisation work (GGUF on candle, INT8 ONNX) — separate items, after the parity surface is green.
- Model fine-tuning / training loops — candle-fork is for inference; training stays on the upstream BLAS path until separately scoped.
Dependencies
- Blocks on D1 — parity harness must define the tolerance under which ONNX top-K, ViT logits, Whisper CER, and Jina5 retrieval agreements are evaluated. Without D1's harness this issue cannot define passing thresholds.
- Blocks on E1 — burn-fork GGUF end-to-end parity must pass against the upstream reference first. E1 is the proof that the spine pattern (ndarray substituted for the cblas/accelerate-src BLAS path) works for a real model framework. Without that proof we should not fork a second framework.
- Neither blocker is in this repo's scope; both are tracked in their own issues.
Worker: E2 (ensemble: model-frameworks track)
Surface: model frameworks
Filed in:
AdaWorldAPI/ndarraybecause ndarray is the obligatory spine;AdaWorldAPI/candledoes not exist yet — repo creation is part of execution under this plan, not a precondition.Why
After E1 confirms the spine pattern works in
burn-fork(GGUF end-to-end parity passes against the upstream reference), the same playbook applies to HuggingFacecandle. Today HuggingFacecandlelinksaccelerate-srcon macOS andcblason Linux. Acandle-forkthat swaps both forAdaWorldAPI/ndarrayunlocks four production surfaces we currently cannot serve coherently from one BLAS substrate:The strategic point: one ndarray, two deployment modes, hardware-accelerated on both. The same model binary runs on Railway SPR-AMX (cohort/batch path) and on a Pi 5 NEON box at the practice (edge inference). For a German Hausarzt that cannot ship patient images to cloud (GDPR + Krankenhaus-IT-Sicherheitsgesetz), local ViT screening on a Pi 5 is the difference between deployable and non-deployable. That asymmetry — same binary, two accelerator backends, both first-class — is the moat.
What
Create
AdaWorldAPI/candleas a fork ofhuggingface/candle, swap the BLAS dependency toAdaWorldAPI/ndarray, and validate four smoke surfaces plus an edge deployment matrix.Concrete items
AdaWorldAPI/candlefromhuggingface/candle. Tag the upstream commit at fork time; default branch tracks our integration branch, not upstreammain.accelerate-src(macOS) andcblas(Linux) withAdaWorldAPI/ndarrayas the BLAS substrate across the workspaceCargo.tomland per-crate manifests. Remove the platformcfggates that pick betweenaccelerate-srcandcblas; the ndarray dependency is the single source.git fetch upstream+ integration-branch rebase, conflict resolution log, regression-gate (smoke tests must pass before merging upstream into our default branch). This is tracked here but the recurring work happens in the candle-fork repo, not on this issue.candle-onnxcrate — wire ONNX gemm through ndarray (same gemm path the rest of candle now uses). No bypass back to cblas.medbert-deor equivalent), parity vsonnxruntimeon 10 sample sentences. Top-K embeddings agree within the tolerance defined by D1's parity harness.ViT-Base, parity vs reference on 10 images.whisper.cppreference.--no-default-features --features simd-neonfor Pi targets; AMX flags for SPR.Architecture
Spine pattern (repeated from E1). The fork's job is to retarget BLAS at the spine; everything above the BLAS line — model loading, kernels, control flow — is upstream code we keep merging in. The fork is small, structural, and audit-friendly; it does not own the model logic.
Edge moat. The same compiled artefact for ViT (or Whisper, or ONNX) targets
--features amxon SPR and--features simd-neonon Pi. ndarray decides which kernel to dispatch at runtime via the same gemm entry point. No model-side#[cfg], no separate Pi build of model code. That property is what makes "ship same binary to practice + cohort engine" feasible.Acceptance criteria
AdaWorldAPI/candlerepo exists, forked fromhuggingface/candle, fork commit tagged.accelerate-srcandcblasremoved across the workspace,AdaWorldAPI/ndarrayis the only BLAS dependency.x86_64-linuxandaarch64-linux.--no-default-features --features simd-neonfor Pi).MERGE_CADENCE.mdor equivalent).Out of scope
MedCare-rshandler wiring (ONNX/ViT/Whisper plumbed into clinical request paths) — a separate item, downstream of this one.onnxruntimefor training; this fork is inference-only.Dependencies