Skip to content
This repository was archived by the owner on Apr 10, 2026. It is now read-only.

melodic-lab/consistgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

consistgen

Consistency-aware batch image generation with automatic review and retry.

Generate a batch of images from text prompts that are visually consistent with each other — same style, same characters, same aesthetic. An AI agent automatically reviews each image and retries failed ones with different seeds.

The Problem

When generating multiple images for a storyboard, product catalog, or visual narrative, each image is generated independently. This leads to style drift (shot 1 looks like watercolor, shot 15 looks like oil painting) and character inconsistency (the protagonist changes appearance between scenes).

The Solution

consistgen wraps image generation in a Generate → Review → Retry agent loop:

Prompts → [Generate Agent] → [Review Agent] → Pass? → Output
              FLUX.2            CLIP + SigLIP     │
                ↑                                  │ Fail (change seed)
                └──────────────────────────────────┘
                         (max 3 retries)
  • Generate Agent: FLUX.2 schnell (or any diffusers-compatible model) + optional IP-Adapter for character reference
  • Review Agent: SigLIP for cross-image style consistency + CLIP for character identity vs. reference
  • Router: Automatically retries failed images with different seeds, up to N times

Quick Start

pip install consistgen
import consistgen

result = consistgen.run(
    requests=[
        consistgen.ImageRequest(shot_id=0, positive="A scholar by a misty bridge, ink wash style"),
        consistgen.ImageRequest(shot_id=1, positive="The scholar painting on a silk fan, ink wash style"),
        consistgen.ImageRequest(shot_id=2, positive="The scholar alone at the bridge, autumn, ink wash style"),
    ],
    config=consistgen.ConsistGenConfig(
        device="cuda",
        style_threshold=0.82,
    ),
)

for img, rev in zip(result.images, result.reviews):
    print(f"Shot {img.shot_id}: {rev.verdict.value} (style={rev.style_score:.3f})")

CLI

# From a JSON file of prompts
consistgen prompts.json --device cuda --output-dir output/

# From a MelodyCanvas shot_scripts.csv
consistgen output/song/scripts/shot_scripts.csv --character-refs ref.png

# Skip review (just generate)
consistgen prompts.json --no-review

Prompt File Format

[
  {"shot_id": 0, "positive": "A scholar by a misty bridge, ink wash style"},
  {"shot_id": 1, "positive": "The scholar painting on a silk fan, ink wash style"}
]

Configuration

consistgen.ConsistGenConfig(
    # Generation
    model_id="black-forest-labs/FLUX.1-schnell",  # Any diffusers model
    num_inference_steps=4,
    guidance_scale=0.0,
    device="cuda",
    dtype="bfloat16",

    # IP-Adapter (character consistency)
    ip_adapter_model=None,            # Set to enable character reference
    character_refs=["ref.png"],
    character_ref_weight=0.8,

    # Consistency review
    review_enabled=True,
    siglip_model="google/siglip-so400m-patch14-384",
    clip_model="openai/clip-vit-large-patch14",
    style_threshold=0.82,             # SigLIP score threshold
    character_threshold=0.80,         # CLIP score threshold

    # Agent
    max_retries=3,
    output_dir="output/images",
    seed=42,
)

Architecture

consistgen/
├── models.py      # Pydantic data models (ImageRequest, ReviewResult, ...)
├── generate.py    # FLUX.2 + IP-Adapter image generation engine
├── review.py      # CLIP + SigLIP cross-image consistency scorer
├── agent.py       # LangGraph state graph: Generate → Review → Route
└── cli.py         # Command-line interface

LangGraph Agent

The core pipeline is a LangGraph state machine:

  • State: pending requests, generated images, review results, retry counts
  • generate node: runs FLUX.2 on all pending requests
  • review node: scores all generated images for style + character consistency
  • route node: splits into accepted vs. needs-retry; changes seed for retries
  • Conditional edge: if pending requests remain → back to generate; else → done

GPU Optimization (DGX Spark)

consistgen is designed to run efficiently on NVIDIA DGX Spark:

# TensorRT acceleration (optional)
pip install consistgen[tensorrt]
  • FLUX.2 schnell: ~13GB VRAM, 1-4 inference steps
  • SigLIP + CLIP review: ~3GB VRAM, batch inference
  • Total: ~16GB — well within DGX Spark's 128GB unified memory

Integration with MelodyCanvas

consistgen is extracted from the MelodyCanvas music video generation pipeline. It implements Phase 4 (anchor image generation) as a standalone package.

# In MelodyCanvas pipeline.py
from consistgen import ConsistGenConfig, ImageRequest, run as consistgen_run

result = consistgen_run(requests, config=config, character_refs=refs)

License

Apache 2.0

About

Consistency-aware batch image generation with automatic review and retry. LangGraph agent loop with FLUX.2 + CLIP + SigLIP.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages