You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import{LLM}from'react-native-nitro-mlx'awaitLLM.load('mlx-community/Qwen3-0.6B-4bit',{onProgress: (progress)=>{console.log(`Loading: ${(progress*100).toFixed(0)}%`)},manageHistory: true,generationConfig: {maxTokens: 1024,temperature: 0.7,topP: 0.9,prefillStepSize: 512,},tokenBatchSize: 8,contextConfig: {maxContextTokens: 4096,keepLastMessages: 6,},})constresponse=awaitLLM.generate('What is the capital of France?')console.log(response)
Load with Additional Context
You can provide conversation history or few-shot examples when loading the model:
awaitLLM.load('mlx-community/Qwen3-0.6B-4bit',{onProgress: (progress)=>{console.log(`Loading: ${(progress*100).toFixed(0)}%`)},additionalContext: [{role: 'user',content: 'What is machine learning?'},{role: 'assistant',content: 'Machine learning is...'},{role: 'user',content: 'Can you explain neural networks?'}]})
Streaming
letresponse=''awaitLLM.stream('Tell me a story',(token)=>{response+=tokenconsole.log(response)})
Stop Generation
LLM.stop()
Chat Session (high-level API)
For a session-oriented experience that manages structured history, streaming
state, and tool-call metadata for you, use createChatSession:
import{createChatSession,MLXModel}from'react-native-nitro-mlx'constchat=createChatSession({modelId: MLXModel.Qwen3_1_7B_4bit,systemPrompt: 'You are a helpful assistant.',tools: [weatherTool],onUpdate: state=>{// state.status, state.partialAssistantContent, state.activeToolCalls, ...},})awaitchat.load({onProgress: p=>console.log(`${(p*100).toFixed(0)}%`)})constassistant=awaitchat.sendMessage('Plan a 3-day trip to Tokyo',{onToken: token=>{// append token to UI},onToolCall: call=>{// render tool-call card with call.status + call.arguments},})console.log(assistant.content)console.log(chat.messages)// full typed historyconsole.log(chat.state.status)// 'done'console.log(chat.state.lastStats)// GenerationStats from the last turnchat.reset()// clear history, keep system promptchat.unload()// release the model
ChatSession delegates to the same low-level LLM module, so the existing
LLM.stream / LLM.streamWithEvents APIs remain available for advanced use
cases.
ChatSessionOptions
Option
Description
modelId
HuggingFace model id to load
systemPrompt
System prompt applied on load()
initialMessages
Seed messages appended to JS history and forwarded as additionalContext (system-role entries stay in JS history only)
tools
Tool definitions available to the model
generationConfig
Default LLMGenerationConfig (temperature, top-p, max tokens, ...)
contextConfig
LLMContextConfig for managed-history trimming
tokenBatchSize
Tokens batched per JS bridge hop
onUpdate
Called on every state transition with the latest snapshot
onMessage
Called when a user/assistant/tool message is appended to history
onToken
Called for each streamed assistant token
onToolCall
Called on every tool-call lifecycle update
onError
Called when load() or sendMessage() fails
ChatSession methods
Method
Description
load(options?): Promise<void>
Load the model, apply system prompt, tools, and initial messages
Accumulated thinking content during the current thinking block
activeToolCalls
Tool calls currently in-flight for the active turn
lastError
Last error thrown by load() or sendMessage()
lastStats
Stats from the last completed generation
Text-to-Speech
import{TTS,MLXModel}from'react-native-nitro-mlx'awaitTTS.load(MLXModel.PocketTTS,{onProgress: (progress)=>{console.log(`Loading: ${(progress*100).toFixed(0)}%`)}})constaudioBuffer=awaitTTS.generate('Hello world!',{voice: 'alba',speed: 1.0})// Or stream audio chunks as they're generatedawaitTTS.stream('Hello world!',(chunk)=>{// Process each audio chunk},{voice: 'alba'})
Available voices: alba, azelma, cosette, eponine, fantine, javert, jean, marius
Speech-to-Text
import{STT,MLXModel}from'react-native-nitro-mlx'awaitSTT.load(MLXModel.GLM_ASR_Nano_4bit,{onProgress: (progress)=>{console.log(`Loading: ${(progress*100).toFixed(0)}%`)}})// Transcribe an audio bufferconsttext=awaitSTT.transcribe(audioBuffer)// Or use live microphone transcriptionawaitSTT.startListening()constpartial=awaitSTT.transcribeBuffer()// Get current transcriptconstfinal=awaitSTT.stopListening()// Stop and get final transcript
Any MLX-compatible model from Hugging Face should work. The package exports an MLXModel enum with pre-defined models for convenience that are more likely to run well on-device: