Consistency-aware batch image generation with automatic review and retry.
Generate a batch of images from text prompts that are visually consistent with each other — same style, same characters, same aesthetic. An AI agent automatically reviews each image and retries failed ones with different seeds.
When generating multiple images for a storyboard, product catalog, or visual narrative, each image is generated independently. This leads to style drift (shot 1 looks like watercolor, shot 15 looks like oil painting) and character inconsistency (the protagonist changes appearance between scenes).
consistgen wraps image generation in a Generate → Review → Retry agent loop:
Prompts → [Generate Agent] → [Review Agent] → Pass? → Output
FLUX.2 CLIP + SigLIP │
↑ │ Fail (change seed)
└──────────────────────────────────┘
(max 3 retries)
- Generate Agent: FLUX.2 schnell (or any diffusers-compatible model) + optional IP-Adapter for character reference
- Review Agent: SigLIP for cross-image style consistency + CLIP for character identity vs. reference
- Router: Automatically retries failed images with different seeds, up to N times
pip install consistgenimport consistgen
result = consistgen.run(
requests=[
consistgen.ImageRequest(shot_id=0, positive="A scholar by a misty bridge, ink wash style"),
consistgen.ImageRequest(shot_id=1, positive="The scholar painting on a silk fan, ink wash style"),
consistgen.ImageRequest(shot_id=2, positive="The scholar alone at the bridge, autumn, ink wash style"),
],
config=consistgen.ConsistGenConfig(
device="cuda",
style_threshold=0.82,
),
)
for img, rev in zip(result.images, result.reviews):
print(f"Shot {img.shot_id}: {rev.verdict.value} (style={rev.style_score:.3f})")# From a JSON file of prompts
consistgen prompts.json --device cuda --output-dir output/
# From a MelodyCanvas shot_scripts.csv
consistgen output/song/scripts/shot_scripts.csv --character-refs ref.png
# Skip review (just generate)
consistgen prompts.json --no-review[
{"shot_id": 0, "positive": "A scholar by a misty bridge, ink wash style"},
{"shot_id": 1, "positive": "The scholar painting on a silk fan, ink wash style"}
]consistgen.ConsistGenConfig(
# Generation
model_id="black-forest-labs/FLUX.1-schnell", # Any diffusers model
num_inference_steps=4,
guidance_scale=0.0,
device="cuda",
dtype="bfloat16",
# IP-Adapter (character consistency)
ip_adapter_model=None, # Set to enable character reference
character_refs=["ref.png"],
character_ref_weight=0.8,
# Consistency review
review_enabled=True,
siglip_model="google/siglip-so400m-patch14-384",
clip_model="openai/clip-vit-large-patch14",
style_threshold=0.82, # SigLIP score threshold
character_threshold=0.80, # CLIP score threshold
# Agent
max_retries=3,
output_dir="output/images",
seed=42,
)consistgen/
├── models.py # Pydantic data models (ImageRequest, ReviewResult, ...)
├── generate.py # FLUX.2 + IP-Adapter image generation engine
├── review.py # CLIP + SigLIP cross-image consistency scorer
├── agent.py # LangGraph state graph: Generate → Review → Route
└── cli.py # Command-line interface
The core pipeline is a LangGraph state machine:
- State: pending requests, generated images, review results, retry counts
- generate node: runs FLUX.2 on all pending requests
- review node: scores all generated images for style + character consistency
- route node: splits into accepted vs. needs-retry; changes seed for retries
- Conditional edge: if pending requests remain → back to generate; else → done
consistgen is designed to run efficiently on NVIDIA DGX Spark:
# TensorRT acceleration (optional)
pip install consistgen[tensorrt]- FLUX.2 schnell: ~13GB VRAM, 1-4 inference steps
- SigLIP + CLIP review: ~3GB VRAM, batch inference
- Total: ~16GB — well within DGX Spark's 128GB unified memory
consistgen is extracted from the MelodyCanvas music video generation pipeline. It implements Phase 4 (anchor image generation) as a standalone package.
# In MelodyCanvas pipeline.py
from consistgen import ConsistGenConfig, ImageRequest, run as consistgen_run
result = consistgen_run(requests, config=config, character_refs=refs)Apache 2.0