Skip to content

Roadmap: MLX backend strategy + HiDream-O1-Image-Dev #7

@localkin

Description

@localkin

Context

Tracking the next architectural bet for ollamadiffuser. v2.0.14 added two new registry entries (FLUX.1-Kontext-dev, Chroma1-HD) that ride on existing strategies. The next batch needs a new strategy class — specifically for Apple Silicon (MLX) native inference.

Why MLX

ollamadiffuser currently routes everything through PyTorch (diffusers library). On Apple Silicon, PyTorch + MPS is slower than Apple's MLX framework for many ops (often 2-3× on the same hardware).

Concretely, the mlx-community org publishes MLX-quantized ports of major diffusion models, and mflux (0.17.5, April 2026) is the de-facto MLX inference library — supports FLUX.1, FLUX.2 (4B + 9B), Z-Image, FIBO, SeedVR2, Qwen-Image, Depth Pro.

Status of competition: ComfyUI, A1111, InvokeAI, diffusers-studio — none have a native MLX backend. This is a real differentiation opportunity for the "Ollama-for-X" niche.

Concrete plan

Phase 1: MLXStrategy base class

New file: ollamadiffuser/core/inference/strategies/mlx_strategy.py

Mirror the shape of FluxStrategy / GenericPipelineStrategy but route through mlx-community model packages. Registry entries opt in via model_type: "mlx" + parameters.mlx_backend: "mflux" | "mlx-community".

Phase 2: HiDream-O1-Image-Dev as first consumer

HiDream-ai/HiDream-O1-Image-Dev (MIT, May 8 2026):

  • 8B unified transformer on Qwen3-VL backbone
  • No VAE — predicts raw 32×32 RGB patches directly
  • Already has an MLX port: mlx-community/HiDream-O1-Image-Dev-mlx-bf16
  • 16 GB peak at 1024×1024 — fits the M4 16GB
  • ~67s/image at 1024² on Apple Silicon

This is the natural first consumer of MLXStrategy because (a) it has zero diffusers integration, so the alternative is a custom HF transformers wrapper; (b) the MLX path is already published.

Phase 3: mflux integration (FLUX family)

After HiDream-O1 proves the pattern, extend MLXStrategy to support mflux:

  • FLUX.1-schnell / dev (already in registry as PyTorch — would get an mlx variant)
  • FLUX.2-klein-4B / 9B (gives M4 16GB users a usable FLUX.2 path)
  • Z-Image-Turbo (already in registry — would get MLX acceleration)

Hardware constraints

Maintainer can only test on:

  • Mac Pro M1 32GB (~24 GB UMA)
  • Mac Mini M4 16GB (~12 GB UMA)

So this work is doubly good for the project: solves a real differentiation gap AND aligns with what the maintainer can actually develop on.

Effort estimate

  • Phase 1 (base class): ~2-3 days
  • Phase 2 (HiDream-O1): ~1-2 days on top of Phase 1
  • Phase 3 (mflux): ~2-3 days

Tracking

Reply to this issue with implementation progress, or open child issues per phase if discussion warrants.

cc / community: interested in helping? PRs welcome — the test fixture pattern from v2.0.13 (tests/unit/test_*_endpoint.py uses TestClient + monkeypatched loaded_models) is the template for adding a new strategy without needing actual GPU weights.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions