Skip to content

fix: add SSE streaming support to MLX engine#3

Open
Turcy76 wants to merge 1 commit into
walter-grace:mainfrom
Turcy76:fix/mlx-streaming-support
Open

fix: add SSE streaming support to MLX engine#3
Turcy76 wants to merge 1 commit into
walter-grace:mainfrom
Turcy76:fix/mlx-streaming-support

Conversation

@Turcy76
Copy link
Copy Markdown

@Turcy76 Turcy76 commented Mar 27, 2026

Problem

When using mlx_engine.py as the backend (Option B), agent.py shows no response for chat messages.

The root cause is that agent.py's stream_llm() sends "stream": true and expects SSE (Server-Sent Events) format, but mlx_engine.py only returns a single JSON response, ignoring the stream parameter.

Additionally, the model detection shows auto ? because /props returns lowercase "9b" while agent.py checks for uppercase "9B".

Changes

  • generate_stream() — New streaming generator using mlx_lm.stream_generate(), with filtering for <think> tags and special tokens
  • _handle_chat_stream() — SSE response handler that sends data: {...}\n\n chunks followed by data: [DONE]
  • _handle_chat() — Now checks stream parameter and routes to streaming or normal handler
  • clean_response() — Extracted shared cleanup logic (strip special tokens + thinking tags)
  • /props endpoint — Fixed model_alias to use uppercase (9B/35B) to match agent.py's detection

Testing

Tested locally with MLX + Qwen3.5-9B-MLX-4bit on Apple Silicon M4:

  • agent.py now displays streamed responses in both auto and raw modes
  • ✅ Intent classification (non-streaming) still works
  • ✅ Model detection shows auto 9b instead of auto ?
  • /bench command works correctly

- Add generate_stream() using mlx_lm.stream_generate()
- Add _handle_chat_stream() for SSE response format
- Fix model_alias case mismatch (9b -> 9B) for proper detection
- Extract clean_response() as shared utility
Fixes agent.py showing no response when using MLX backend (Option B),
because stream_llm() sends stream=True but MLX engine only returned
a single JSON blob instead of SSE events.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant