Use existing groq key to enable speech to text with OpenAI whisper. You just hit record, and hit stop, and it compresses it into .ogg optimized for voice (but this means i'll have to have a server/backend running for puffnotes with the conversion or try a serverless function on vercel, introducing complexities), and then give me back the transcript word for word. It can be limited to 5 minutes per click and dynamically show in the note as "processing audio" in a block, and start a new recording automatically (if needed), if not, we can keep going for a max of 30 minutes, and then compress to .ogg, and if the file is above 25mb (groq's limit for openai whisper per request), then we can split it and then process. i will have to create a separate engine and probably an api to handle this, or look for some easy way to handle simple dictation or try to use any existing libraries.
Use existing groq key to enable speech to text with OpenAI whisper. You just hit record, and hit stop, and it compresses it into .ogg optimized for voice (but this means i'll have to have a server/backend running for puffnotes with the conversion or try a serverless function on vercel, introducing complexities), and then give me back the transcript word for word. It can be limited to 5 minutes per click and dynamically show in the note as "processing audio" in a block, and start a new recording automatically (if needed), if not, we can keep going for a max of 30 minutes, and then compress to .ogg, and if the file is above 25mb (groq's limit for openai whisper per request), then we can split it and then process. i will have to create a separate engine and probably an api to handle this, or look for some easy way to handle simple dictation or try to use any existing libraries.