CarryTalk is a Tauri desktop app for real-time transcription and translation. It combines a Svelte-based desktop UI with a Rust-powered native layer to capture audio, stream speech data, and persist session output locally.
CarryTalk is designed around a live transcription workflow:
- capture audio from available sources
- start, pause, resume, and stop transcription sessions
- display transcript updates in real time
- optionally show translated output alongside the original text
- save session data locally for later access and recovery
The current codebase includes a Soniox-based provider flow, desktop settings management, localized UI resources, and a native session pipeline built with Tauri and Rust.
- Real-time session lifecycle with start, pause, resume, and stop controls
- Live transcript rendering with timestamps and support for original and translated text
- Audio source configuration for microphone, system audio, or mixed capture modes depending on runtime capabilities
- Device selection support for available audio inputs and outputs
- Provider and API key management through the settings flow
- Local session persistence using session folders, manifests, and JSONL transcript parts
- Interrupted session recovery on app startup
- Desktop-friendly UI with theme and language preferences
- Built-in localization resources for English and Vietnamese
- Svelte 5
- TypeScript
- Vite
- Tailwind CSS 4
- Tauri JavaScript APIs
- Tauri 2
- Rust
- Tokio
- WebSocket-based streaming with
tokio-tungstenite - Audio capture with
cpal - Audio resampling with
rubato - Local secret handling with
aes-gcmandargon2
Before running the app, make sure your environment satisfies the system requirements for building Tauri applications on your platform.
git clone https://github.com/tuannt39/carry-talk.git
cd carry-talknpm installnpm run devnpm run tauri -- devnpm run checknpm run buildnpm run tauri -- build- Launch the application in development or from a built desktop bundle.
- Open the settings panel and configure the current provider settings.
- Add or update the required API key.
- Choose the desired audio capture mode and device configuration.
- Start a session to begin receiving live transcript updates.
- View original and translated transcript text in the main transcript area.
- Stop the session when finished. Session data is stored locally for recovery and listing.
carry-talk/
├── src/
│ ├── App.svelte # Application shell and startup flow
│ ├── main.ts # Frontend entrypoint
│ └── lib/
│ ├── components/ # UI components such as controls, settings, transcript view
│ ├── services/ # Tauri command and event wrappers
│ ├── stores/ # Frontend state stores
│ ├── i18n/ # Localization resources
│ └── types/ # Shared frontend types
├── src-tauri/
│ ├── src/
│ │ ├── main.rs # Native entrypoint
│ │ ├── lib.rs # App bootstrap and shared state wiring
│ │ ├── commands.rs # Tauri command surface
│ │ ├── session_manager.rs # Session orchestration and streaming pipeline
│ │ ├── storage.rs # Local session persistence and recovery
│ │ ├── settings.rs # App settings persistence
│ │ └── secrets.rs # Encrypted secret storage
│ └── tauri.conf.json # Tauri app and build configuration
├── package.json # Frontend scripts and dependencies
└── README.md
At a high level, CarryTalk follows this flow:
- The frontend loads app settings, current session state, and audio runtime capabilities.
- When a session starts, the backend manages audio capture, streaming, transcript buffering, and local storage.
- Transcript and session events are emitted back to the frontend through Tauri events.
- The UI renders incoming transcript segments and keeps the visible session state in sync.
This project was adapted and inspired by the following projects:
- My Translator
- Node Trans
- LiveCaptions Translator
- Real-Time Translator
- Realtime Subtitle
- DeLive
- Taurscribe
Contributions are welcome.
If you want to contribute:
- Fork the repository.
- Create a feature branch.
- Make focused, reviewable changes.
- Run the relevant local checks.
- Open a pull request describing the change clearly.
This project is licensed under the MIT License.
For questions, feedback, or support, please open an issue on GitHub:

