Skip to content

marcellopps283/flow

Repository files navigation

Logo

Flow

A Premium AI Dictation "Dynamic Island" for Windows

Speak naturally, stutter, pause, or mumble. Flow catches your voice, polishes it into perfect text using state-of-the-art LLMs, and types it seamlessly into any application.


✨ Features

  • 🏝️ True Dynamic Island UI: A beautiful, top-anchored notch built with PySide6 that expands elastically when you speak, mimicking the premium Apple aesthetic on Windows.
  • 🎵 Reactive Neon Waveform: Features a math-driven, glassmorphism audio visualizer. The stems idle like musical notes and pulse with vibrant neon gradients (Cyan, Magenta, Purple) based on your real-time voice volume.
  • ⚡ Blazing Fast Transcription: Uses heavily optimized faster-whisper (CTranslate2) running locally to transcribe audio with near-zero latency.
  • 🧠 AI Text Polishing: Streams the raw transcription through Groq's llama-3.3-70b-versatile engine to fix grammar, remove conversational filler, and structure the text professionally.
  • ⌨️ Universal Auto-Paste: Press the global hotkey (F9) anywhere, dictate your thought, and Flow will automatically type the polished text directly into your active window.

🚀 How It Works

  1. Idle State: A tiny, discreet notch at the top of your screen.
  2. Listening: Press F9. The island expands smoothly. As you speak, the neon stems react to your voice.
  3. Transcribing: You release F9. The audio is instantly transcribed locally.
  4. Polishing: The text is sent to Groq for split-second intelligent cleaning.
  5. Done: The island flashes green, and the polished text is typed wherever your cursor is!

🛠️ Installation

1. Clone the repository

git clone https://github.com/YOUR_USERNAME/flow.git
cd flow

2. Set up the Virtual Environment

Ensure you have Python 3.10+ installed.

python -m venv .venv
.venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Configure API Keys

Create a .env file in the root directory and add your Groq API key:

GROQ_API_KEY=gsk_your_api_key_here

Note: The .env file is safely included in .gitignore to prevent leaking your keys.

5. Run Flow

python main.py

⚙️ Technologies Used

  • UI Architecture: Python, PySide6 (Qt for Python)
  • Audio Processing: PyAudio, NumPy
  • Speech-to-Text: faster-whisper (CTranslate2)
  • LLM Engine: Groq API (llama-3.3-70b-versatile)
  • Automation: PyAutoGUI, Pyperclip, pynput

🧠 Overcoming Engineering Challenges

Building Flow required solving advanced OS-level rendering issues:

  • Preventing GUI Thread Blocking: The entire audio capturing pipeline runs on separate background threads to ensure the UI continues animating at a buttery smooth 33fps without stuttering.
  • Bypassing Qt Compositor Bugs: Animating heavy QGraphicsDropShadowEffect with translucent backgrounds caused severe QPainter crashes on Windows NVIDIA drivers. I engineered a strict Z-order architecture, separating the static drop-shadow into an unmoving background layer (ShadowWidget) and calculating all foreground alpha opacities via manual math, achieving 100% stability.

📄 License

This project is open-source and available under the MIT License.

About

Premium AI Dictation 'Dynamic Island' for Windows. Local speech-to-text with faster-whisper and intelligent text polishing with Groq Llama 3.3.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages