consistgen

Consistency-aware batch image generation with automatic review and retry.

Generate a batch of images from text prompts that are visually consistent with each other — same style, same characters, same aesthetic. An AI agent automatically reviews each image and retries failed ones with different seeds.

The Problem

When generating multiple images for a storyboard, product catalog, or visual narrative, each image is generated independently. This leads to style drift (shot 1 looks like watercolor, shot 15 looks like oil painting) and character inconsistency (the protagonist changes appearance between scenes).

The Solution

consistgen wraps image generation in a Generate → Review → Retry agent loop:

Prompts → [Generate Agent] → [Review Agent] → Pass? → Output
              FLUX.2            CLIP + SigLIP     │
                ↑                                  │ Fail (change seed)
                └──────────────────────────────────┘
                         (max 3 retries)

Generate Agent: FLUX.2 schnell (or any diffusers-compatible model) + optional IP-Adapter for character reference
Review Agent: SigLIP for cross-image style consistency + CLIP for character identity vs. reference
Router: Automatically retries failed images with different seeds, up to N times

Quick Start

pip install consistgen

import consistgen

result = consistgen.run(
    requests=[
        consistgen.ImageRequest(shot_id=0, positive="A scholar by a misty bridge, ink wash style"),
        consistgen.ImageRequest(shot_id=1, positive="The scholar painting on a silk fan, ink wash style"),
        consistgen.ImageRequest(shot_id=2, positive="The scholar alone at the bridge, autumn, ink wash style"),
    ],
    config=consistgen.ConsistGenConfig(
        device="cuda",
        style_threshold=0.82,
    ),
)

for img, rev in zip(result.images, result.reviews):
    print(f"Shot {img.shot_id}: {rev.verdict.value} (style={rev.style_score:.3f})")

CLI

# From a JSON file of prompts
consistgen prompts.json --device cuda --output-dir output/

# From a MelodyCanvas shot_scripts.csv
consistgen output/song/scripts/shot_scripts.csv --character-refs ref.png

# Skip review (just generate)
consistgen prompts.json --no-review

Prompt File Format

[
  {"shot_id": 0, "positive": "A scholar by a misty bridge, ink wash style"},
  {"shot_id": 1, "positive": "The scholar painting on a silk fan, ink wash style"}
]

Configuration

consistgen.ConsistGenConfig(
    # Generation
    model_id="black-forest-labs/FLUX.1-schnell",  # Any diffusers model
    num_inference_steps=4,
    guidance_scale=0.0,
    device="cuda",
    dtype="bfloat16",

    # IP-Adapter (character consistency)
    ip_adapter_model=None,            # Set to enable character reference
    character_refs=["ref.png"],
    character_ref_weight=0.8,

    # Consistency review
    review_enabled=True,
    siglip_model="google/siglip-so400m-patch14-384",
    clip_model="openai/clip-vit-large-patch14",
    style_threshold=0.82,             # SigLIP score threshold
    character_threshold=0.80,         # CLIP score threshold

    # Agent
    max_retries=3,
    output_dir="output/images",
    seed=42,
)

Architecture

consistgen/
├── models.py      # Pydantic data models (ImageRequest, ReviewResult, ...)
├── generate.py    # FLUX.2 + IP-Adapter image generation engine
├── review.py      # CLIP + SigLIP cross-image consistency scorer
├── agent.py       # LangGraph state graph: Generate → Review → Route
└── cli.py         # Command-line interface

LangGraph Agent

The core pipeline is a LangGraph state machine:

State: pending requests, generated images, review results, retry counts
generate node: runs FLUX.2 on all pending requests
review node: scores all generated images for style + character consistency
route node: splits into accepted vs. needs-retry; changes seed for retries
Conditional edge: if pending requests remain → back to generate; else → done

GPU Optimization (DGX Spark)

consistgen is designed to run efficiently on NVIDIA DGX Spark:

# TensorRT acceleration (optional)
pip install consistgen[tensorrt]

FLUX.2 schnell: ~13GB VRAM, 1-4 inference steps
SigLIP + CLIP review: ~3GB VRAM, batch inference
Total: ~16GB — well within DGX Spark's 128GB unified memory

Integration with MelodyCanvas

consistgen is extracted from the MelodyCanvas music video generation pipeline. It implements Phase 4 (anchor image generation) as a standalone package.

# In MelodyCanvas pipeline.py
from consistgen import ConsistGenConfig, ImageRequest, run as consistgen_run

result = consistgen_run(requests, config=config, character_refs=refs)

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
consistgen		consistgen
docs		docs
examples		examples
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

consistgen

The Problem

The Solution

Quick Start

CLI

Prompt File Format

Configuration

Architecture

LangGraph Agent

GPU Optimization (DGX Spark)

Integration with MelodyCanvas

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

consistgen

The Problem

The Solution

Quick Start

CLI

Prompt File Format

Configuration

Architecture

LangGraph Agent

GPU Optimization (DGX Spark)

Integration with MelodyCanvas

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages