LLM Scribe is your professional toolkit for creating high-quality conversational datasets for Large Language Model fine-tuning. Whether you're a creative writer crafting character personalities or a developer preparing training data, LLM Scribe eliminates the technical barriers and formatting headaches.
No more struggling with JSON syntax or format specifications - LLM Scribe handles all the technical details while you focus on creating valuable content.
- Intuitive Interface - Focus on writing, not formatting
- Auto-save Functionality - Never lose your work with automatic saving on every interaction
- Progress Tracking - Set goals and monitor your dataset completion
- Tab Navigation - Rapidly cycle between fields for efficient data entry
- Light mode and Dark mode themes - Swap in settings
- Multiple Export Formats - Supports all major LLM training formats including ChatML, Alpaca, ShareGPT/Vicuna
- Format-Specific Customizations - Tailor your datasets with format-specific options
- Real-time Token Tracking - Monitor token usage with popular tokenizers (OpenAI, HuggingFace, Mistral)
- Customizable Fields - Enable/disable optional fields based on your specific needs
- System Message Support - Add system prompts for ChatGPT/ChatML formats
- Custom IDs - Assign unique identifiers for ShareGPT/Vicuna formats
- Easy Dataset Reloading - Seamlessly continue work on existing projects
- Multi-turn Conversation Support - Create contextually aware training data
- In-app Guidance - Helpful tooltips and explanations throughout the interface
chatgpt_chatml.jsonlchatml.jsonalpaca.jsonlalpaca.jsonsharegpt_vicuna.jsonlsharegpt_vicuna.jsongeneric.jsonl
chatgpt_chatml.jsonlchatml.jsonsharegpt_vicuna.jsonlsharegpt_vicuna.json- Plus all pair formats (automatically generated)
- Start with default settings to get all formats you need
- Choose between simple pair data or more advanced multi-turn conversations
- No technical knowledge required - just write and export
- Fine-tune your datasets with format-specific customizations
- Track token usage for cost and performance optimization
- Leverage advanced features for professional dataset creation
Please click the open book icon to get started once you open the app! It will give you all the info you need.
- Windows Only Application - Not compatible with macOS or Linux
This software includes a commercial license that grants you full commercial rights to all datasets and outputs you create.
Created with ❤️ by Gabriella Baris - Check out my portfolio for more projects and tools!
Check out my current AI safety project elif else
If you have any issues, find bugs, or need assistance, please open a GitHub issue:
- Technical support
- Bug reports
- General questions
- Additional format requests
- Tokenizer library additions
Version 1.1
Note on Tokenizer Libraries: LLM Scribe utilizes open-source libraries (tiktoken, Hugging Face transformers, Mistral AI Tokenizers) for token counting functionalities, each governed by their respective licenses.