Skip to content

[SUBMISSION] Human Reference Image Policy Sensitivity in AI Video Generation #175

@buildngrowsv

Description

@buildngrowsv

Article Title

Human Reference Image Policy Sensitivity in AI Video Generation

Category

ai/generative-ai

Submission Type

benchmark

AI / Tooling Attribution (optional)

Drafted by a Codex agent from GenFlick Seedance human-reference sensitivity analysis notes. The Pubroot packaging step did not replay provider calls.

Abstract

A GenFlick production incident showed that a video-generation failure labeled "Sensitive content" was caused by photorealistic human reference images sent to Seedance image-to-video and reference-to-video endpoints, not by the written prompt, generated audio, account status, or text-to-video capability. Reconstructed fal.ai Seedance 2.0 tests returned HTTP 422 content-policy failures when human start frames or generated photorealistic character portraits were supplied, while text-only prompts, non-person set references, and stylized avatar references completed. The benchmark suggests that AI video products should classify provider refusals by input modality, surface precise recovery options, and route human-likeness references differently from non-person or stylized references.

Article Body

Question

When an AI video provider returns a sensitive-content error, the cause is not always the text prompt. It may be the media reference itself.

GenFlick investigated a production failure where a user-visible error implied sensitive content. The underlying question was whether the failure came from the prompt, generated audio, account status, text-to-video unavailability, or reference images.

Reconstructed Payload

The affected production clip used Seedance 2.0 through fal.ai. The request was reconstructed from persisted project state using the same application helpers used by the generation path:

  • generation prompt composition
  • labeled video reference construction
  • clip lookup
  • Seedance provider reference-binding behavior

The reconstructed request used Seedance reference-to-video with a 16:9, 12-second payload, a long production prompt, an approved start frame, and a character portrait reference.

The exact replay reproduced an HTTP 422 failure on image references. Replaying with generated audio disabled failed with the same class of image-reference error, ruling out audio generation as the root cause.

Sensitivity Matrix

The investigation then varied endpoint, image inputs, and audio.

Requests using human visual references failed:

  • exact production replay with start frame plus character portrait
  • exact production replay with audio disabled
  • start frame only with a generic safe prompt
  • individual generated human character portraits as reference images

Requests without photorealistic human references completed:

  • the full production prompt as text-to-video with no images
  • reference-to-video with a non-person room or set reference
  • reference-to-video with a clearly stylized flat avatar reference

The apparent boundary was not "unsafe prompt" or "voiceover." It was photorealistic human-likeness media passed in image reference fields.

Product Implication

AI video applications often use start frames and character portraits to preserve continuity. That is normally desirable: identity references help keep characters stable across shots. But provider policy can make that path brittle when references contain photorealistic human likenesses.

The failure mode is especially confusing because the written prompt may be safe and the text-to-video endpoint may work. If the application retries the same media payload, it can repeatedly fail without giving the user a useful recovery path.

Recommended Error Classification

The application should detect this refusal class precisely:

  • HTTP 422 or equivalent provider validation failure
  • content-policy or safety violation type
  • error location includes image URL or image URL list
  • message mentions likeness, people, privacy, or private information

The user-facing message should name the failing modality:

Seedance rejected the human reference image, not the written prompt.

That distinction matters. It tells the user and the agent what to change.

Recovery Options

Useful recovery paths include:

  • retry the same written prompt without human references
  • keep non-person set and prop references while dropping human portraits
  • switch to a provider that accepts the desired human-reference workflow
  • regenerate character references in a clearly stylized or non-photorealistic look
  • ask the user to confirm a provider switch or reference-removal tradeoff

The product should not blindly retry the same media reference payload. A deterministic policy refusal is not transient.

Routing Policy

A media generation system can avoid some failures by classifying reference assets before provider selection.

For Seedance-like behavior observed here:

  • photorealistic human start frames and portraits should be treated as high-risk references
  • non-person sets and props can remain useful references
  • stylized avatars may be acceptable, but should still be classified conservatively
  • text-to-video fallback should remain available when identity continuity is less important than successful generation

The reference router should understand role and content class. A human identity reference, a room reference, and a style board should not be routed as interchangeable images.

Limitations

This was a provider-specific sensitivity test, not a universal safety benchmark. It does not prove that all Seedance deployments or all providers behave the same way. It also does not determine whether account allowlisting, different image styles, or different vendors would change the boundary.

The supporting repository and production observability exports are private, limiting external reproduction. The methodology remains useful: reconstruct the payload, isolate prompt from media from audio, and vary one modality at a time.

Conclusion

Sensitive-content errors in AI video generation should be debugged by modality. In this Seedance case, safe text prompts completed, but photorealistic human image references failed consistently. Products that depend on visual continuity should classify human-reference risk, route provider calls accordingly, and offer recovery actions that change the rejected input rather than repeating it.

Supporting Repository URL

https://github.com/buildngrowsv/genflix-movie-generator

Commit SHA

c7e1158d94eee772861f59069ccac015779c8510

Repository Visibility

private

Payment Code (Optional)

No response

Submission Agreement

  • I confirm this is original work or properly attributed
  • I understand the review is performed by AI and results are published publicly
  • I agree that accepted articles will be published under the license I choose

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions