Sequence to Video

DDD-based video generation pipeline from sequence planning data.

Features

Sequencing Domain: Generate sequence JSON from scripts using Gemini
Planning Domain: Parse sequence JSON into structured scenarios
Library Domain: Index and search existing video footage using Gemini
Studio Domain: Generate TTS (Chirp v3), images (Gemini 3 Pro), videos (Veo 3.1), text animations
Editing Domain: Compose final video with FFmpeg, apply effects and transitions

Installation

uv sync
uv run playwright install chromium

Configuration

cp .env.example .env

Required environment variables:

GOOGLE_PROJECT_ID=your-project-id
GCS_BUCKET=your-gcs-bucket

Usage

Full Pipeline

# 1. Generate sequence from script
uv run python main.py sequence examples/script.txt -o sequence.json -v

# 2. Render video from sequence
uv run python main.py render sequence.json -o output.mp4 -v

# Or directly from existing sequence JSON
uv run python main.py render examples/sample_sequence.json -o output.mp4

Index Raw Footage

# Index all videos in assets/raw_footage (generic analysis)
uv run python main.py index -v

# Index as product UGC (with marketing-focused analysis)
uv run python main.py index --type product_ugc -v

# Index with explicit context file
uv run python main.py index --type product_ugc --context path/to/context.txt -v

# Index single file with context
uv run python main.py index path/to/video.mp4 --type product_ugc --context context.txt -v

# Force re-index all
uv run python main.py index --force -v

Product Context for UGC Analysis

When using --type product_ugc, provide product context via:

--context flag (explicit path):

uv run python main.py index --type product_ugc --context products/serum_info.txt -v

Auto-detection (place in footage folder):

assets/raw_footage/
├── my_product/
│   ├── context.txt      ← Auto-detected
│   ├── ugc_video1.mp4
│   └── ugc_video2.mp4

Auto-detected filenames (checked in order): context.txt, product.txt, info.txt

context.txt example:

Product: VitaC Brightening Serum
Category: Skincare / Serum
Key Ingredients: 15% Vitamin C, Niacinamide, Hyaluronic Acid
Benefits: Brightening, dark spot care, hydration
Target: 20-40s with dull skin tone concerns
USP: Fast absorption, non-sticky texture

The analyzer uses this context to:

Generate product-relevant tags and descriptions
Identify appeal points specific to the product's benefits
Better match clips to scene requirements during rendering

Scene-Level Editing

Individual scenes are automatically saved to assets/review_output/{project_id}/scenes/ during rendering.

# Re-render a specific scene (e.g., after modifying sequence.json)
uv run python main.py rerender-scene sequence.json s03 -v

# Reassemble final video from existing scene files
uv run python main.py reassemble sequence.json -o output_v2.mp4 -v

Workflow example:

# 1. Initial render (scenes saved individually)
uv run python main.py render sequence.json -o output.mp4 -v

# 2. Review output, identify issues with scene s03

# 3. Modify sequence.json for s03, then re-render only that scene
uv run python main.py rerender-scene sequence.json s03 -v

# 4. Reassemble all scenes into final video
uv run python main.py reassemble sequence.json -o output_v2.mp4 -v

Generate Embeddings

Embeddings enable semantic search for footage selection. They are automatically generated during indexing, but you can regenerate them separately:

# Generate embeddings for all existing indexes (without re-analyzing videos)
uv run python main.py embed -v

Use this when:

You updated the embedding model
Embeddings were missing from older indexes
You want to refresh embeddings without full re-indexing

Convert MOV to MP4

uv run python scripts/convert_mov_to_mp4.py -v
uv run python scripts/convert_mov_to_mp4.py -v --delete  # Delete original after conversion

Individual Domain Testing

Each domain can be tested independently for development and debugging.

TTS Generation

# Generate TTS audio
uv run python scripts/test_tts.py "안녕하세요, 테스트입니다."

# With custom preset and speed
uv run python scripts/test_tts.py "텍스트" -p chirp_v3_korean_female_confident -s 1.2

# List available voice presets
uv run python scripts/test_tts.py --list-presets

Image Generation

# Generate image with Gemini 3 Pro
uv run python scripts/test_image.py "A modern Korean skincare product on white background"

# Custom size and context
uv run python scripts/test_image.py "Product shot" -W 720 -H 1280 --context "skincare advertisement"

Video Generation (Veo 3.1)

# Generate video (takes several minutes)
uv run python scripts/test_video.py "A woman applying skincare product" -d 5

# Custom duration (max 8 seconds)
uv run python scripts/test_video.py "Motion graphic animation" -d 8

Text Overlay

# Generate text overlay video
uv run python scripts/test_text_overlay.py "강력한 효과!" -d 3

# With custom style and animation
uv run python scripts/test_text_overlay.py "텍스트" -s bold_impact_red -a bounce

# List available styles and animations
uv run python scripts/test_text_overlay.py --list-styles
uv run python scripts/test_text_overlay.py --list-animations

Sequence Generation

# Generate sequence JSON from script file
uv run python scripts/test_sequence.py examples/script.txt -v

# Generate from inline text
uv run python scripts/test_sequence.py "첫 번째 장면: 제품 클로즈업. 두 번째 장면: 사용 후기" -v

Project Structure

domains/
├── sequencing/  # Script → Sequence JSON (Gemini 3 Flash)
├── planning/    # Sequence JSON parsing
├── library/     # Video asset indexing & search (Gemini 3 Flash)
├── studio/      # Content generation
│   ├── tts_generator.py       # Google Cloud TTS (Chirp v3)
│   ├── image_generator.py     # Gemini 3 Pro Image
│   ├── video_generator.py     # Veo 3.1 (us-central1 only)
│   ├── text_renderer.py       # Playwright + FFmpeg
│   └── fallback_generator.py  # Black screen fallback
└── editing/     # FFmpeg composition
    ├── composer.py   # Scene orchestration
    ├── renderer.py   # FFmpeg rendering
    └── effects.py    # Camera movements, transitions

infrastructure/
├── config.py    # Configuration management
├── cache.py     # Asset caching
└── metadata.py  # Generation metadata tracking

scripts/
├── test_tts.py           # TTS testing
├── test_image.py         # Image generation testing
├── test_video.py         # Video generation testing
├── test_text_overlay.py  # Text overlay testing
├── test_sequence.py      # Sequence generation testing
└── convert_mov_to_mp4.py # Format conversion

Model Configuration

Purpose	Model	Location
Sequence Generation	gemini-3-flash-preview	global
Video Analysis	gemini-3-flash-preview	global
Footage Selection	gemini-2.0-flash	global
Embedding	text-embedding-005	global
Image Generation	gemini-3-pro-image-preview	global
Video Generation	veo-3.1-generate-preview	us-central1 (required)
TTS	Chirp3-HD (ko-KR)	-

Footage Selection

When rendering with existing footage (video_type: ugc_centered or mixed), the system uses a two-stage selection process:

Stage 1: Embedding-Based Search

Clips are pre-filtered using semantic similarity:

Each clip has an embedding generated from its description, appeal points, and tags
Scene requirements (narration + visual prompt + query tags) are embedded as a query
Top candidates are retrieved using cosine similarity

Stage 2: LLM-Based Selection

The LLM evaluates the top candidates and selects the best match:

Indexing: Footage is analyzed and tagged with descriptions, appeal points, content types
Selection: For each scene, LLM evaluates candidate clips against scene requirements (narration, visual prompt, tags)
Fallback: If no suitable clip found, falls back to AI image generation

This two-stage approach balances speed (embedding search) with accuracy (LLM reasoning).

Caching

Generated assets are cached by default in assets/generated/.cache/.

# Disable cache for render
uv run python main.py render input.json -o output.mp4 --no-cache

Metadata

Generation metadata (prompts, parameters) is saved in assets/generated/.metadata/ for each generated asset.

Composition metadata is saved alongside output videos as {output}.meta.json, containing:

Project settings (resolution, fps, total duration)
Per-scene details (sources, effects, rendered file paths)
Creation timestamp

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
assets/raw_footage		assets/raw_footage
domains		domains
examples		examples
infrastructure		infrastructure
scripts		scripts
templates		templates
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
guide.md		guide.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sequence to Video

Features

Installation

Configuration

Usage

Full Pipeline

Index Raw Footage

Product Context for UGC Analysis

Scene-Level Editing

Generate Embeddings

Convert MOV to MP4

Individual Domain Testing

TTS Generation

Image Generation

Video Generation (Veo 3.1)

Text Overlay

Sequence Generation

Project Structure

Model Configuration

Footage Selection

Stage 1: Embedding-Based Search

Stage 2: LLM-Based Selection

Caching

Metadata

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sequence to Video

Features

Installation

Configuration

Usage

Full Pipeline

Index Raw Footage

Product Context for UGC Analysis

Scene-Level Editing

Generate Embeddings

Convert MOV to MP4

Individual Domain Testing

TTS Generation

Image Generation

Video Generation (Veo 3.1)

Text Overlay

Sequence Generation

Project Structure

Model Configuration

Footage Selection

Stage 1: Embedding-Based Search

Stage 2: LLM-Based Selection

Caching

Metadata

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages