Private, on-device AI chat for iPhone β powered by Apple MLX.
Fork it, extend it, and ship your own private AI app.
Important
Requires a physical iPhone 15 or later. The iOS Simulator has no Metal GPU and cannot run inference.
Warning
iOS 27 beta is not yet supported. The app may crash at launch on the iOS 27 beta (under investigation). Supported: iOS 17.0 β iOS 26.x.
| π Fully private | All inference runs on-device. No API keys, no accounts, no server calls after the model download. |
| β‘ Real-time streaming | Tokens appear as fast as the model generates them. |
| π Ships with Llama 3.2 1B Instruct | 4-bit quantised, β 860 MB. Downloads publicly β no HuggingFace token β and activates automatically. |
| π¦ Simple model management | Download, activate, and uninstall models from a clean Models tab. Add more models with one line of code. |
| βοΈ Settings tab | System prompt editor and developer toggles. |
| π§ Memory-safe | RAM gate blocks incompatible loads; the model auto-unloads on background and memory warnings. |
| π¦Ί Swift 6 strict concurrency | Actors throughout; zero data races by construction. |
git clone https://github.com/your-org/Onyx.git
cd Onyx
open Onyx/Onyx.xcodeprojOpen Onyx.xcodeproj β Targets β Onyx β Signing & Capabilities, set your development team, and change the bundle id from kiraa.Onyx to something you own.
Important: On-device inference requires a physical iPhone 15 or later. The iOS Simulator has no Metal GPU.
Select your device from the scheme picker, then Product β Run (βR).
- Tap the Models tab.
- Tap Download next to Llama 3.2 1B Instruct (4-bit) (β 860 MB β use Wi-Fi).
- The model activates automatically when the download completes.
- Switch to the Chat tab and start chatting.
π New to iOS development? The step-by-step QUICKSTART takes you from a fresh Mac to a working app in under 30 minutes.
| Capability | File |
|---|---|
| On-device MLX inference | MLXModelManager.swift |
| Real-time streaming token output | ChatProvider.swift + generateFromModel() |
| Multi-turn conversation history with auto-trimming | MLXConversationHistory.swift |
| Resumable HuggingFace model downloader (5-phase) | ChatModelDownloader.swift |
| Hardware RAM gate + background/low-mem unload | HardwareProfile.swift, ChatMemoryGate.swift, OnyxApp.swift |
| Model catalog (ships with Llama 3.2 1B) | ChatModelCatalog.swift |
User settings (Settings tab, SettingsView) |
OnyxSettings.swift, PreferencesView.swift |
| Installed/active model registry | ChatModelRegistry.swift |
| Sandbox-safe file path helpers | OnyxPaths.swift |
- No persistence β conversations reset on restart. Trivial to add (see below).
- No accounts or API keys β model downloads are public and unauthenticated; nothing leaves the device.
- No theming β plain system colors throughout; swap in your own design tokens.
- No analytics or crash reporting β add the SDK of your choice.
This deliberate minimalism keeps the diff small when you diverge from the skeleton.
Add a model (one line):
Open ChatModelCatalog.swift and append to ChatModelCatalog.all (the catalog ships with a single model β Llama 3.2 1B Instruct):
ChatModelDescriptor(
id: "mlx-community/my-model-4bit",
displayName: "My Model (4-bit)",
family: .other,
approxSizeBytes: Int64(4.0 * 1_073_741_824), // β 4 GB
filePatterns: ChatModelCatalog.defaultFilePatterns,
summary: "One-line description shown in the Models tab."
)The downloader, registry, and Models tab UI pick it up automatically β no other changes needed. Browse available models at huggingface.co/mlx-community.
Add conversation persistence:
// Encode turns and write to the app's data directory:
let turns = await ChatProvider.shared.history.turns
let data = try JSONEncoder().encode(turns)
try data.write(to: OnyxPaths.baseDirectory().appending(path: "history.json"))
// Restore on launch:
let saved = try Data(contentsOf: OnyxPaths.baseDirectory().appending(path: "history.json"))
let turns = try JSONDecoder().decode([MLXConversationHistory.Turn].self, from: saved)Onyx/
βββ Core Runtime
β βββ OnyxPaths.swift β Sandbox-safe paths (AppSupport/Onyx/)
β βββ MLXErrors.swift β Typed errors (metalUnavailable, modelNotInstalled, β¦)
β βββ MLXModelManager.swift β actor: ModelContainer lifecycle + generateFromModel()
β βββ MLXConversationHistory.swift β actor: turn history, 16 K char / 10-pair auto-trim
β
βββ Model Catalog & Registry
β βββ ChatModelCatalog.swift β Curated list of downloadable models
β βββ ChatModelRegistry.swift β actor: installed / active model tracking
β βββ ChatModelDownloader.swift β actor: 5-phase HuggingFace download + retry
β βββ HardwareProfile.swift β sysctl RAM/GPU detection; canLoadModel()
β βββ ChatMemoryGate.swift β Pre-flight RAM check before load
β
βββ Chat Layer
β βββ ChatProvider.swift β @MainActor @Observable: UI β MLX bridge
β
βββ Views
βββ OnyxApp.swift β @main entry point (no SwiftData)
βββ ContentView.swift β TabView: Chat + Models + Settings
βββ ChatView.swift β Scrollable chat UI with input bar
βββ MessageBubble.swift β Markdown-rendering message row
βββ ThinkingDotsView.swift β Animated 3-dot waiting indicator
βββ ModelsView.swift β Download / activate / uninstall list
βββ DownloadRow.swift β Live-progress model card
βββ PreferencesView.swift β Settings tab (SettingsView)
User types β ChatView
β ChatProvider.respond(to:)
β MLXConversationHistory.buildMessages(systemPrompt:)
β MLXModelManager.ensureLoaded(modelId:) β loads model lazily if needed
β generateFromModel(container:messages:) β nonisolated, off main thread
β AsyncStream<String>
β ChatView appends tokens to the streaming bubble in real time
| Component | Isolation | Reason |
|---|---|---|
MLXModelManager |
actor |
Single owner of ModelContainer |
MLXConversationHistory |
actor |
Turn array written from UI and inference tasks |
ChatModelDownloader |
actor |
Background download; pub/sub via AsyncStream |
ChatModelRegistry |
actor |
File I/O to active.txt and model directories |
ChatProvider |
@MainActor |
View-model; drives SwiftUI @Observable state |
generateFromModel() |
nonisolated |
GPU-intensive; must not block the main thread |
| sysctl helpers | nonisolated |
Called at app launch before any actor exists |
The build setting
SWIFT_DEFAULT_ACTOR_ISOLATION = MainActormakes all unannotated functions@MainActor. Functions that must run off-thread are explicitlynonisolated.
# Resolve Swift packages and build for Simulator (UI compiles; no inference)
xcodebuild -project Onyx/Onyx.xcodeproj -scheme Onyx \
-destination 'platform=iOS Simulator,name=iPhone 16' \
-resolvePackageDependencies
xcodebuild build -project Onyx/Onyx.xcodeproj -scheme Onyx \
-destination 'platform=iOS Simulator,name=iPhone 16'
# On-device inference: connect an iPhone 15+ and select it in Xcode's scheme picker, then βR| Device | RAM | Status |
|---|---|---|
| iPhone 15 (base) | 6 GB | β Supported |
| iPhone 15 Pro / Max | 8 GB | β Supported |
| iPhone 16 (all models) | 8 GB | β Supported |
| iPad Pro M2+ | 8β16 GB | β Supported |
| iOS Simulator | β | β No Metal GPU β UI works, inference does not |
OS support: iOS 17.0 β 26.x. iOS 27 beta is not yet supported β see the warning at the top.
The com.apple.developer.kernel.increased-memory-limit entitlement allows the app to keep a 2 GB model resident on 6 GB devices. Call MLXModelManager.shared.unloadModel() when entering the background to free memory:
// In OnyxApp.swift (or a scene delegate):
.onChange(of: scenePhase) { _, phase in
if phase == .background {
Task { await MLXModelManager.shared.unloadModel() }
}
}| Key | Storage | Default | Description |
|---|---|---|---|
onyx.systemPrompt |
UserDefaults | "You are a helpful AI assistantβ¦" |
System prompt injected before every conversation |
onyx.logPrompts |
UserDefaults | true |
Log outgoing prompts to stdout (π¨ [Onyx]) |
Change settings at runtime:
// System prompt
ChatProvider.shared.systemPrompt = "You are a pirate. Respond only in pirate speak."
// Silence debug logging
OnyxSettings.shared.logPrompts = falseDownloads come straight from public mlx-community repos on HuggingFace β no account, token, or API key is needed. Gated or private repos are not supported by this build.
- Fork the repo and create a feature branch:
git checkout -b feature/my-improvement - Make your changes. The easiest first contribution is adding a model β one line in
ChatModelCatalog.swift. - Run the simulator build to confirm it compiles cleanly (see Build commands).
- Open a pull request β describe what it adds and why it belongs in a skeleton.
All contributions are welcome: new model descriptors, UI improvements, documentation, and tests.
Apache 2.0 β see LICENSE.
Built on:
- mlx-swift-lm β Apple's MLX Swift bindings
- swift-transformers β HuggingFace tokenizers
- Models from mlx-community on HuggingFace
Onyx 0.1 beta Β· Made for people who want their AI conversations to stay on their phone. π
