A user-friendly desktop tool for live translation with transparent overlay window, optimized for Chinese texts.
Transparent overlay window for capturing text for translation
Result window with translated text output
Translating an electron microscope manual from English to German - capture window (top) and result window (bottom) shown simultaneously
VisoLingua is available in three implementations with different trade-offs:
| Version | Status | Binary Size | Startup | Best For | Link |
|---|---|---|---|---|---|
| π Python | β Working | ~50 MB | ~2-3s | Development, quick iteration | (this directory) |
| π¦ Rust | β Working | ~8 MB | ~0.5s | Production use β | visolingua-rust/ β’ Releases |
| πΉ Go | ~12 MB | ~1s | Not recommended | visolingua-go/ |
β Download Latest Release - Get the Rust version (recommended)
- π― For end users: Use the Rust version - smallest, fastest, most reliable
- π¨βπ» For developers: Use the Python version - easiest to modify and test
β οΈ Avoid: The Go version has critical screen capture bugs and is not functional
Python (Original)
- β Fully working, well-tested
- β Easy to modify and extend
- β All features implemented
- β Larger binary size
- β Requires Python runtime
Rust + Tauri (Recommended)
- β Production-ready
- β Smallest binary (~8 MB)
- β Fastest startup
- β No runtime dependencies
- β No antivirus false positives
- β Longer build times (~5 min)
Go + Wails (Experimental)
β οΈ NOT WORKING - screen capture broken- β Only captures app window, not user's screen
- β Unusable for translation purposes
- π Kept as reference implementation
See each version's README for detailed setup instructions.
- Transparent Capture Window: Movable and resizable over other applications
- LLM Integration: Supports Gemini 2.5 Flash and GPT-4 Mini/Nano
- One-Click Translation: Simply click in the overlay window
- Dual-Mode System: Seamless switching between capture and result mode
- π€ Ask AI: Ask questions about translation results for context, explanations and details
- Chinese Focus: Optimized for simplified and traditional Chinese characters
- Automatic Language Detection: Automatically detects source language
- Multilingual: Supports many languages β German
- Intelligent Caching: Identical screenshots are not translated again
- History: Storage and retrieval of recent translations
- Cross-Platform: Windows, Linux, macOS
- DPI-Aware: Perfect rendering on High-DPI displays
# 1. Clone repository
git clone https://github.com/username/VisoLingua.git
cd VisoLingua
# 2. Install Python 3.8+ (if not already installed)
# 3. Install dependencies
pip install -r requirements.txt
# 4. Start app
python main.py# After installing dependencies:
python build_exe.py
# Creates dist/VisoLingua.exe (may trigger antivirus warnings)# 1. Create configuration file
cp config/config_sample.ini config/config.ini
# 2. Enter API keys (edit config/config.ini)
# - Gemini API Key from https://aistudio.google.com/
# - Or OpenAI API Key from https://platform.openai.com/
# 3. Start app
python main.py
# On first start:
# 1. Check/enter API key in Settings
# 2. Select default LLM (recommended: Gemini 2.5 Flash)
# 3. Done!- Gemini API: Google AI Studio β Create API Key
- OpenAI API: OpenAI Platform β Create Secret Key
- Start app:
python main.py - Position scan window: Drag over the text to be translated
- Take screenshot: Click in the red overlay window
- Get translation: Automatic switch to result window
- π€ Ask AI (optional): Enter question about translation in text field and click "Ask AI"
- Back to scan: "Back to Capture" button or close window
After a translation, you can ask the AI questions about the result:
- Examples: "What does this context mean?", "Are there alternative translations?", "Explain the grammar"
- Response: Displayed directly in the result window below the original translation
- Usage: Same LLM configuration as for translations (Gemini/OpenAI/Ollama)
- Double-click on overlay title bar β To result window
- "Back to Capture" button β Back to scan window
- Close window β Back to scan window
- X-button on overlay β Exit app
Ctrl+Tab: Switch between modesCtrl+C: Copy translation (in result mode)Esc: Close result window (back to capture)- Double-click title bar: Switch mode
VisoLingua/
βββ main.py # Main program
βββ config/
β βββ settings.py # Configuration management
β βββ config.ini # User settings (created locally)
β βββ config_sample.ini # Configuration template
βββ ui/
β βββ overlay.py # Transparent overlay
β βββ result_window.py # Result window with Ask AI function
βββ core/
β βββ screenshot.py # Screenshot capture
β βββ translator.py # LLM integration
βββ utils/
β βββ helpers.py # Helper functions
β βββ constants.py # Constants
βββ requirements.txt # Dependencies
| LLM | Speed | Cost | Quality | Recommendation |
|---|---|---|---|---|
| Gemini 2.5 Flash | β‘β‘β‘ | π° | ββββ | β Recommended |
| GPT-4.1 Mini | β‘β‘ | π°π° | βββββ | For best quality |
| GPT-4.1 Nano | β‘β‘β‘ | π° | βββ | Experimental |
| Model | Parameters | Downloads | Performance | Chinese | Description |
|---|---|---|---|---|---|
| gemma3 | 1b-27b | 9.4M | β‘β‘β‘ | βββ | Latest model for single GPU |
| qwen2.5-vl | 3b-72b | 400K | β‘β‘β‘ | βββββ | Qwen's flagship vision model |
| llava | 7b-34b | 7.9M | β‘β‘ | ββββ | Proven vision-language model |
| minicpm-v | 8b | 2.4M | β‘β‘ | ββββ | Compact multimodal model |
| llama3.2-vision | 11b-90b | 2.2M | β‘β‘ | βββ | Meta's vision model |
| llava-llama3 | 8b | 1.3M | β‘β‘ | ββββ | LLaVA with Llama 3 base |
| llama4 | 16x17b-128x17b | 467K | β‘ | βββ | Meta's latest multimodal model |
| moondream | 1.8b | 223K | β‘β‘β‘ | ββ | Optimized for edge devices |
# 1. Install Ollama (https://ollama.ai)
# 2. Pull model (example):
ollama pull llava:7b
# 3. In VisoLingua: Settings β Local Ollama β Enable- Python: 3.8+
- Operating System: Windows 10+, Linux (GUI), macOS 10.14+
- RAM: 2GB available
- Internet: For LLM API calls
- Python: 3.9+
- RAM: 4GB+
- Display: 1920x1080+ (High-DPI supported)
- Internet: Stable broadband connection
- GPU: NVIDIA with 6GB+ VRAM (recommended) or CPU-only
- RAM: 16GB+ (depends on model, see table above)
- Storage: 5-40GB for models
- Ollama: Installed and running locally
- Frontend: tkinter (Cross-Platform GUI)
- Screenshot: mss + PIL ImageGrab (Fallback)
- LLM APIs: aiohttp (Async requests)
- Threading: Async/await for non-blocking UI
- Thread-safe screenshot capture with MSS fallbacks
- DPI-Awareness for Windows High-DPI displays
- Intelligent caching with MD5 hash comparison
- Robust error handling with multiple fallback methods
- OverText - Transparent overlay functionality
- Developed for defensive security purposes and language learning support
USE AT YOUR OWN RISK!
We cannot guarantee that the application is error-free and always only sends the selected scan area to the LLM. To ensure maximum privacy and security, using a local, self-hosted LLM for translation is recommended when in doubt.
PyInstaller EXE files are often incorrectly detected as viruses. Recommendation: Use Python directly (python main.py) instead of an EXE file.
VisoLingua now supports Ollama for completely private translations without external API calls. Enable local LLMs in the settings.
- Window not visible: Adjust transparency in
config.ini - Screenshot error: Start app with administrator rights
- API error: Check API key and internet connection
- DPI problems: Automatically fixed with DPI-Awareness
For detailed solutions see: SETUP.md
This project is exclusively for defensive security purposes and language learning support.