Skip to content

jadia/whatsapp_images_sort

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

36 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

WhatsApp Image Sorter

AI-powered image categorization utility that uses the Google Gemini API to sort large volumes of images (e.g., WhatsApp media) into organized folders.

Supported Python Version: 3.12+

asciicast

Features

  • πŸ“ Dual-mode processing β€” Standard (instant synchronous processing) and Batch (async queue, 50% offline discount). Designed explicitly for the Google Gemini API.
  • 🧠 AI categorization β€” Uses Gemini's natively multimodal vision to classify images into configurable categories.
  • πŸ“Š Cost tracking β€” Pre-processing estimates (self-calibrating in SQLite) and post-processing actual costs in local currency, natively accounting for the Batch 50% discount.
  • πŸ“ˆ Progress bars β€” Clean, live single-line tqdm progress tracking with dynamic API ETA spinners.
  • πŸš€ Performance Architecture β€” Blitz through massive backlogs! Native ThreadPoolExecutor leverages Google's File API to concurrently upload up to 100 threads at once.
  • πŸ”„ Retry with back-off β€” Automatic exponential back-off on API rate limits (429) and server errors.
  • πŸ’Ύ Resume-safe β€” SQLite state database ensures seamless restarts with auto-pruning.
  • πŸ“… Smart date extraction β€” Regex filename parsing + OS timestamp fallback
  • 🏷️ EXIF restoration β€” Optionally inject dates back into image metadata
  • πŸ“ Extensive logging β€” Per-run audit logs + error-specific log file
  • πŸ§ͺ Test mode β€” Process a single batch for validation
  • πŸ” Dry run β€” Scan and estimate without making any changes

Quick Start

1. Clone and install

git clone git@github.com:jadia/whatsapp_images_sort.git
cd whatsapp_images_sort
pip install -r requirements.txt

2. Set up your API key

cp .env.example .env
# Edit .env and add your Gemini API key:
# GEMINI_API_KEY=your-actual-key-here

Get your API key from Google AI Studio.

3. Configure

cp config.json.example config.json

Edit config.json:

{
  "api_mode": "standard",
  "active_model": "gemini-3-flash-lite",
  "source_dir": "/path/to/your/WhatsApp/Media/Images",
  "output_dir": "/path/to/your/Sorted/Output",
  "fallback_category": "Uncategorized_Review",
  "whatsapp_categories": [
    {
      "name": "Documents & IDs",
      "description": "Document-like or proof-like images"
    },
    {
      "name": "People & Social",
      "description": "Real personal photos of people"
    }
  ]
}

4. Run

# Dry run first (see what would happen)
python main.py --dry-run

# Process one batch to validate
python main.py --test-mode

# Full processing
python main.py

Configuration Reference

Key Type Description
api_mode "standard" | "batch" Processing mode
active_model string Gemini model name (must be in pricing)
batch_chunk_size int Images per batch job (batch mode)
standard_club_size int Images per API call (standard mode)
upload_threads int Parallel upload and cleanup threads (1–150, default: 100)
source_dir string Directory to scan for images
output_dir string Root directory for sorted output
features.restore_exif_date bool Inject date into EXIF metadata
pricing.<model> object input_per_1m and output_per_1m in USD
currency.symbol string Local currency symbol
currency.usd_exchange_rate float USD to local currency rate
fallback_category string Used when the AI is unsure (e.g., "Uncategorized_Review")
whatsapp_categories list List of config objects with name and description

CLI Flags

Flag Description
--test-mode Process exactly one batch and exit
--dry-run Scan images, show stats/cost estimate, exit without changes
--prune-queue Wipe the entire tracking queue inside the SQLite DB

Directory Structure (Output)

Sorted/
β”œβ”€β”€ Documents & IDs/
β”‚   β”œβ”€β”€ 2024/
β”‚   β”‚   β”œβ”€β”€ IMG-20240115-WA0001.jpg
β”‚   β”‚   └── ...
β”‚   └── Unknown_Date/
β”œβ”€β”€ Financial & Receipts/
β”œβ”€β”€ People & Social/
β”œβ”€β”€ Memes & Junk/
β”œβ”€β”€ Scenery & Objects/
└── Uncategorized_Review/

Why WhatsApp?

This project was built primarily to sort WhatsApp media. WhatsApp automatically receives immense amounts of "junk" daily: forwarded morning quotes, receipts, ID cards, memes, and random screenshots.

Because WhatsApp aggressively compresses images, the file sizes are exceptionally small and API-friendly to upload. Furthermore, WhatsApp filenames follow a predictable date structure (IMG-20240115-WA0001.jpg), making them perfect for auto-sorting into nested Category/Year folders once AI evaluates them.

(Note: This application only supports Google Gemini. It utilizes the official Google Generative AI SDK, maximizing cost-effectiveness by leveraging Google's extremely generous async Batch API discounts.)

Standard vs. Batch Mode

Choosing the right mode in config.json depends entirely on your queue size and patience:

Feature Standard Mode Batch Mode
Best For Small runs (< 50 images) Massive backlogs (1,000+ images)
Speed Instant / Synchronous Asynchronous (Delay of 15+ mins)
Cost Full API Price 50% Discount
Data Sent Inline Base64 Data Concurrent File API Uploads

Standard Mode

  • Processes short batches synchronously.
  • Instantly returns results.
  • You pay the full API token price.

Batch Mode (Recommended for Backlogs)

  • Cost-efficient (50% cheaper).
  • Phase 1 (Submit): Parses thousands of images and leverages up to 100 concurrent threads to upload images straight into the Google Cloud File API at lightning speed.
  • Phase 2 (Poll): Instead of keeping a synchronous connection open, the application steps back and automatically polls Google until their servers have processed your entire backlog.
  • Automatic cleanup of File API uploads when completed or cancelled.

Realistic Cost Analysis & Model Comparison

Using Gemini's Batch API makes classifying thousands of images impressively cheap. However, your choice of model dramatically affects the price.

Below is a direct comparison from a real-world batch run of 4,745 images resized to 384x384, processed with the exact same configuration and prompts:

Model Avg Input Tokens Avg Output Tokens Total Tokens Cost (Batch 50% Off)
gemini-2.5-flash-lite ~697 / image ~22.3 / image ~3.4 Million ~$0.19 USD (β‚Ή15.58 INR)
gemini-3.1-flash-lite-preview ~1,480 / image ~21.5 / image ~7.3 Million ~$0.98 USD (β‚Ή81.66 INR)

Why is Gemini 3.1 so much more expensive?

The cost difference comes down to how the models calculate vision tokens:

  • Gemini 2.5 dynamically scales with image size. It charges just 258 tokens for a heavily compressed 384x384 image. Added to a ~439 token text prompt, it totals ~697 tokens per image.
  • Gemini 3.1 introduces a new vision architecture that enforces a fixed 1,120 token floor per image (treating all inputs as media_resolution_high), completely ignoring the fact that the image was downscaled to 384x384. Adding the same text prompt bumps the total to nearly ~1,500 tokens per image. (See the Gemini 3 Media Resolution Docs for more details).

Reference: Gemini 3 Media Resolution Docs

Recommendation: For simple visual classification tasks like WhatsApp sorting, gemini-2.5-flash-lite is significantly more cost-effective while delivering comparable accuracy.

Note: The SQLite database self-calibrates to your personal usage. If you run --dry-run, the cost printed uses your actual historical data.

Important Notes

⚠️ Duplicate Images: Ensure no duplicate images exist in your source directory to avoid unnecessary API costs. Consider using rmlint for deduplication before running this tool:

rmlint /path/to/WhatsApp/Media/Images

πŸ’‘ State Recovery: The SQLite database (state.db) tracks all progress. If processing is interrupted, simply re-run the script β€” it will resume from where it left off.

Good To Know: Gemini Storage Cleanup

If your batch processing script is killed forcefully or crashes during Phase 1, it may leave behind temporary images in Google's cloud storage. These dangling files silently consume your Gemini File API storage quota (which is typically 20 GB).

To safely inspect and clear out any dangling storage files, run the included manual cleanup utility:

python scripts/cleanup_gemini_storage.py

This will automatically securely authenticate using your .env API key, list precisely how many orphaned files exist and their total megabyte size, and prompt you before deleting them all to free up your Google Cloud quota.

Testing

pip install pytest pytest-cov
pytest tests/ -v --tb=short

Documentation

🧠 How it Works under the Hood

WhatsApp Image Sorter is designed to be bulletproof against interruptions. You can Ctrl+C the script, lose your internet connection, or hit an API rate limit, and the tool will seamlessly resume exactly where it left off.

  1. The SQLite State Machine: When you start the script, it scans your target directory and logs every single image into a local database (state.db) as Pending.
  2. Strict ID Mapping: Every image is assigned a permanent Database ID. When we ask the AI to categorize an image, we tag the image with this ID (e.g., img_14022). When the AI responds, we use that ID to map the answer back to your local file, completely eliminating the risk of files getting sorted into the wrong folders.
  3. Local Pre-Processing: Before anything touches the internet, Pillow resizes your images to a maximum of 384x384 pixels in memory. This drastically reduces your API token usage and speeds up upload times by 90%.
  4. Asynchronous Batching: In Batch Mode, the app uploads your files to Google's temporary storage using 100 parallel threads. It hands Google a "Job", marks your local files as Processing, and goes to sleep. When you run the script later, it downloads the results, sorts your files, and cleans up Google's servers to save your quota.

For a deeper dive into the system's execution paths and state management, check out docs/architecture.md and docs/project_flow.md.

About

AI-powered image categorization utility that uses the Google Gemini API to sort large volumes of images (e.g., WhatsApp media) into organized folders.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages