AI-powered image categorization utility that uses the Google Gemini API to sort large volumes of images (e.g., WhatsApp media) into organized folders.
Supported Python Version: 3.12+
- π Dual-mode processing β Standard (instant synchronous processing) and Batch (async queue, 50% offline discount). Designed explicitly for the Google Gemini API.
- π§ AI categorization β Uses Gemini's natively multimodal vision to classify images into configurable categories.
- π Cost tracking β Pre-processing estimates (self-calibrating in SQLite) and post-processing actual costs in local currency, natively accounting for the Batch 50% discount.
- π Progress bars β Clean, live single-line
tqdmprogress tracking with dynamic API ETA spinners. - π Performance Architecture β Blitz through massive backlogs! Native
ThreadPoolExecutorleverages Google's File API to concurrently upload up to 100 threads at once. - π Retry with back-off β Automatic exponential back-off on API rate limits (429) and server errors.
- πΎ Resume-safe β SQLite state database ensures seamless restarts with auto-pruning.
- π Smart date extraction β Regex filename parsing + OS timestamp fallback
- π·οΈ EXIF restoration β Optionally inject dates back into image metadata
- π Extensive logging β Per-run audit logs + error-specific log file
- π§ͺ Test mode β Process a single batch for validation
- π Dry run β Scan and estimate without making any changes
git clone git@github.com:jadia/whatsapp_images_sort.git
cd whatsapp_images_sort
pip install -r requirements.txtcp .env.example .env
# Edit .env and add your Gemini API key:
# GEMINI_API_KEY=your-actual-key-hereGet your API key from Google AI Studio.
cp config.json.example config.jsonEdit config.json:
{
"api_mode": "standard",
"active_model": "gemini-3-flash-lite",
"source_dir": "/path/to/your/WhatsApp/Media/Images",
"output_dir": "/path/to/your/Sorted/Output",
"fallback_category": "Uncategorized_Review",
"whatsapp_categories": [
{
"name": "Documents & IDs",
"description": "Document-like or proof-like images"
},
{
"name": "People & Social",
"description": "Real personal photos of people"
}
]
}# Dry run first (see what would happen)
python main.py --dry-run
# Process one batch to validate
python main.py --test-mode
# Full processing
python main.py| Key | Type | Description |
|---|---|---|
api_mode |
"standard" | "batch" |
Processing mode |
active_model |
string | Gemini model name (must be in pricing) |
batch_chunk_size |
int | Images per batch job (batch mode) |
standard_club_size |
int | Images per API call (standard mode) |
upload_threads |
int | Parallel upload and cleanup threads (1β150, default: 100) |
source_dir |
string | Directory to scan for images |
output_dir |
string | Root directory for sorted output |
features.restore_exif_date |
bool | Inject date into EXIF metadata |
pricing.<model> |
object | input_per_1m and output_per_1m in USD |
currency.symbol |
string | Local currency symbol |
currency.usd_exchange_rate |
float | USD to local currency rate |
fallback_category |
string | Used when the AI is unsure (e.g., "Uncategorized_Review") |
whatsapp_categories |
list | List of config objects with name and description |
| Flag | Description |
|---|---|
--test-mode |
Process exactly one batch and exit |
--dry-run |
Scan images, show stats/cost estimate, exit without changes |
--prune-queue |
Wipe the entire tracking queue inside the SQLite DB |
Sorted/
βββ Documents & IDs/
β βββ 2024/
β β βββ IMG-20240115-WA0001.jpg
β β βββ ...
β βββ Unknown_Date/
βββ Financial & Receipts/
βββ People & Social/
βββ Memes & Junk/
βββ Scenery & Objects/
βββ Uncategorized_Review/
This project was built primarily to sort WhatsApp media. WhatsApp automatically receives immense amounts of "junk" daily: forwarded morning quotes, receipts, ID cards, memes, and random screenshots.
Because WhatsApp aggressively compresses images, the file sizes are exceptionally small and API-friendly to upload. Furthermore, WhatsApp filenames follow a predictable date structure (IMG-20240115-WA0001.jpg), making them perfect for auto-sorting into nested Category/Year folders once AI evaluates them.
(Note: This application only supports Google Gemini. It utilizes the official Google Generative AI SDK, maximizing cost-effectiveness by leveraging Google's extremely generous async Batch API discounts.)
Choosing the right mode in config.json depends entirely on your queue size and patience:
| Feature | Standard Mode | Batch Mode |
|---|---|---|
| Best For | Small runs (< 50 images) | Massive backlogs (1,000+ images) |
| Speed | Instant / Synchronous | Asynchronous (Delay of 15+ mins) |
| Cost | Full API Price | 50% Discount |
| Data Sent | Inline Base64 Data | Concurrent File API Uploads |
- Processes short batches synchronously.
- Instantly returns results.
- You pay the full API token price.
- Cost-efficient (50% cheaper).
- Phase 1 (Submit): Parses thousands of images and leverages up to 100 concurrent threads to upload images straight into the Google Cloud File API at lightning speed.
- Phase 2 (Poll): Instead of keeping a synchronous connection open, the application steps back and automatically polls Google until their servers have processed your entire backlog.
- Automatic cleanup of File API uploads when completed or cancelled.
Using Gemini's Batch API makes classifying thousands of images impressively cheap. However, your choice of model dramatically affects the price.
Below is a direct comparison from a real-world batch run of 4,745 images resized to 384x384, processed with the exact same configuration and prompts:
| Model | Avg Input Tokens | Avg Output Tokens | Total Tokens | Cost (Batch 50% Off) |
|---|---|---|---|---|
| gemini-2.5-flash-lite | ~697 / image | ~22.3 / image | ~3.4 Million | ~$0.19 USD (βΉ15.58 INR) |
| gemini-3.1-flash-lite-preview | ~1,480 / image | ~21.5 / image | ~7.3 Million | ~$0.98 USD (βΉ81.66 INR) |
The cost difference comes down to how the models calculate vision tokens:
- Gemini 2.5 dynamically scales with image size. It charges just 258 tokens for a heavily compressed 384x384 image. Added to a ~439 token text prompt, it totals ~697 tokens per image.
- Gemini 3.1 introduces a new vision architecture that enforces a fixed 1,120 token floor per image (treating all inputs as
media_resolution_high), completely ignoring the fact that the image was downscaled to 384x384. Adding the same text prompt bumps the total to nearly ~1,500 tokens per image. (See the Gemini 3 Media Resolution Docs for more details).
Reference: Gemini 3 Media Resolution Docs
Recommendation: For simple visual classification tasks like WhatsApp sorting, gemini-2.5-flash-lite is significantly more cost-effective while delivering comparable accuracy.
Note: The SQLite database self-calibrates to your personal usage. If you run --dry-run, the cost printed uses your actual historical data.
β οΈ Duplicate Images: Ensure no duplicate images exist in your source directory to avoid unnecessary API costs. Consider usingrmlintfor deduplication before running this tool:rmlint /path/to/WhatsApp/Media/Images
π‘ State Recovery: The SQLite database (
state.db) tracks all progress. If processing is interrupted, simply re-run the script β it will resume from where it left off.
If your batch processing script is killed forcefully or crashes during Phase 1, it may leave behind temporary images in Google's cloud storage. These dangling files silently consume your Gemini File API storage quota (which is typically 20 GB).
To safely inspect and clear out any dangling storage files, run the included manual cleanup utility:
python scripts/cleanup_gemini_storage.pyThis will automatically securely authenticate using your .env API key, list precisely how many orphaned files exist and their total megabyte size, and prompt you before deleting them all to free up your Google Cloud quota.
pip install pytest pytest-cov
pytest tests/ -v --tb=short- Architecture β System design with Mermaid diagrams
- Troubleshooting β Common errors and recovery steps
- Specification β Original project specification
WhatsApp Image Sorter is designed to be bulletproof against interruptions. You can Ctrl+C the script, lose your internet connection, or hit an API rate limit, and the tool will seamlessly resume exactly where it left off.
- The SQLite State Machine: When you start the script, it scans your target directory and logs every single image into a local database (
state.db) asPending. - Strict ID Mapping: Every image is assigned a permanent Database ID. When we ask the AI to categorize an image, we tag the image with this ID (e.g.,
img_14022). When the AI responds, we use that ID to map the answer back to your local file, completely eliminating the risk of files getting sorted into the wrong folders. - Local Pre-Processing: Before anything touches the internet,
Pillowresizes your images to a maximum of 384x384 pixels in memory. This drastically reduces your API token usage and speeds up upload times by 90%. - Asynchronous Batching: In Batch Mode, the app uploads your files to Google's temporary storage using 100 parallel threads. It hands Google a "Job", marks your local files as
Processing, and goes to sleep. When you run the script later, it downloads the results, sorts your files, and cleans up Google's servers to save your quota.
For a deeper dive into the system's execution paths and state management, check out docs/architecture.md and docs/project_flow.md.