Skip to content

Enable Speech to Text Dictation #5

@rajin-khan

Description

@rajin-khan

Use existing groq key to enable speech to text with OpenAI whisper. You just hit record, and hit stop, and it compresses it into .ogg optimized for voice (but this means i'll have to have a server/backend running for puffnotes with the conversion or try a serverless function on vercel, introducing complexities), and then give me back the transcript word for word. It can be limited to 5 minutes per click and dynamically show in the note as "processing audio" in a block, and start a new recording automatically (if needed), if not, we can keep going for a max of 30 minutes, and then compress to .ogg, and if the file is above 25mb (groq's limit for openai whisper per request), then we can split it and then process. i will have to create a separate engine and probably an api to handle this, or look for some easy way to handle simple dictation or try to use any existing libraries.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions