This script records audio, transcribes it using Whisper, and detects hesitation markers from the transcription. It saves the audio and transcription files in an organized output folder.
Before running the script, ensure you have Python 3.7 or later installed on your machine. Additionally, you need to have ffmpeg installed as a prerequisite for audio processing.
You will also need the following dependencies:
- pyaudio - for audio recording
- whisper - for audio transcription using Whisper model
- re - for regular expressions to detect hesitation markers
- os, datetime, wave - for file management and handling
You can install the necessary dependencies using the provided requirements.txt file.
- Install Python 3.7+ if you haven't already. You can download it from here.
- Install ffmpeg on your system:
- For Windows: Download and install ffmpeg from FFmpeg Downloads.
- For macOS: Use Homebrew:
brew install ffmpeg. - For Linux: Use your package manager, for example,
sudo apt install ffmpegon Ubuntu.
Clone the repository or download the project files to your local machine.
- Open a terminal and navigate to the project directory.
- Create a virtual environment:
python -m venv venv
- Activate the virtual environment:
.\venv\Scripts\activate
- Open a terminal and navigate to the project directory.
- Create a virtual environment:
python3 -m venv venv
- Activate the virtual environment:
source venv/bin/activate
With the virtual environment activated, install the required dependencies by running:
pip install -r requirements.txtAfter the setup is complete, you can run the script as follows:
python main.pyThe script will:
- Record audio until you press Ctrl+C to stop.
- Transcribe the audio using Whisper.
- Detect hesitation markers from the transcription.
- Save the transcription and detected markers to a
.txtfile.
You can also customize the recording behavior by providing additional arguments:
--prep-time <seconds>: Specifies the preparation time before the recording starts.--seconds <seconds>: Defines the recording duration in seconds.
To start recording with a 5-second preparation time and a 30-second recording duration, you would run:
python main.py --prep-time 5 --seconds 30The script generates two output files:
- Audio File:
The recorded audio will be saved as a.wavfile in theoutput_transcriptionfolder with a timestamp in its filename. Example:
output_transcription/recorded_audio_20250116_103045.wav- Transcription File:
The transcription and detected hesitation markers will be saved in a.txtfile with the same base name as the audio file. Example:
output_transcription/recorded_audio_20250116_103045_transcript.txtTranscription:
You know what they call a Quarter Pounder with Cheese in Paris?
They don’t, um, call it a Quarter Pounder with Cheese,
they got the, uh, metric system... they call it a Royale with Cheese!Detected Hesitations:
um, uh
-
PyAudio installation on Windows: If you encounter issues installing PyAudio, try installing the precompiled binary using:
pip install pipwin pipwin install pyaudio
-
Whisper installation: Whisper should install automatically via the requirements.txt. Ensure you have the necessary hardware to run the models (for example, a GPU for larger models).
The Whisper model can be set to different variants for better performance:
- "base" (default)
- "small"
- "medium"
- "large"
You can change the model in the script by modifying the MODEL variable.
The script uses a predefined list of hesitation markers, but you can expand or modify this list by editing the HESITATION_MARKERS array in the script.
Here’s the requirements.txt file with necessary dependencies:
pyaudio==0.2.11 whisper==1.0.0
If you're having trouble installing pyaudio, especially on macOS or Linux, you might need to install additional dependencies, such as portaudio or libportaudio-dev.
-
PyAudio installation on Windows: If you encounter issues installing pyaudio, try installing the precompiled binary using:
pip install pipwin pipwin install pyaudio
-
Whisper installation: Whisper should install automatically via the requirements.txt. Ensure you have the necessary hardware to run the models (for example, a GPU for larger models).