Skip to content

[codex] split subtask labeling blocks#226

Merged
Hynek Kydlíček (hynky1999) merged 23 commits into
mainfrom
codex/subtask-labeling
Jul 1, 2026
Merged

[codex] split subtask labeling blocks#226
Hynek Kydlíček (hynky1999) merged 23 commits into
mainfrom
codex/subtask-labeling

Conversation

@hynky1999

@hynky1999 Hynek Kydlíček (hynky1999) commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • split subtask annotation into segmentation, labeling, and shared utility modules while keeping top-level exports stable
  • add mdr.robotics.subtask_labeling for fixed-segment labeling and seed-aware relabeling
  • update subtasks annotation docs/examples and regression tests for the new split flow

Validation

  • uv run pre-commit run --all-files
  • uv run pytest tests/robotics/test_subtask_annotation.py -q
  • uv run pytest tests/test_imports.py -q
  • git diff --check

Notes

  • removed the old subtask_relabeling naming in favor of subtask_labeling
  • removed min_segment_duration_sec and include_contact_sheet_manifest from subtask_annotation

Greptile Summary

This PR splits the monolithic subtask_annotation.py into three focused modules (segmentation.py, labeling.py, utils.py) and adds a new subtask_labeling pipeline block for fixed-segment relabeling using previous/current/next visual context. Top-level public exports (subtask_annotation, subtask_labeling) remain stable; internal utilities (TimestampedContactSheet, timestamped_contact_sheets) are intentionally removed from the mdr.robotics namespace.

  • New subtask_labeling block reads fixed segments from an upstream column, renders per-segment contact sheets in a single video pass, and calls the VLM with a seed-aware or plain labeling prompt depending on whether the input segment carries a subtask value.
  • utils.py adds _segment_contact_sheets, a single-pass video scanner that builds contact sheets for all segments simultaneously, and _normalize_input_segments, which validates and normalizes input segment dicts before the labeling loop.
  • Docs and examples updated to show the two-step segmentation → labeling pipeline with the new parameter tables.

Confidence Score: 5/5

Safe to merge. The module split is mechanical, top-level exports are unchanged, and the new labeling block is well-tested. Two edge cases (empty instruction field and empty-subtask fallback on a blocked prompt with no seed) are minor and do not affect the happy path.

The refactor is a clean extraction with no logic regressions in the segmentation path. The new labeling block handles its main flows correctly: seed vs. plain prompt selection, blocked-prompt fallback, single-pass video scanning, and 1:1 segment-to-sheet mapping. The two gaps noted are narrow edge cases that only affect label quality, not correctness of the pipeline structure.

labeling.py — the blocked-without-seed and empty-instruction edge cases are both concentrated here.

Important Files Changed

Filename Overview
src/refiner/robotics/subtask_annotation/labeling.py New subtask_labeling block with seed-aware relabeling. Two minor edge cases: empty instruction field when no tasks are present, and empty subtask written when a blocked prompt has no seed to fall back on.
src/refiner/robotics/subtask_annotation/segmentation.py Extracted subtask_annotation from the old monolith; logic identical to the deleted subtask_annotation.py, no regressions observed.
src/refiner/robotics/subtask_annotation/utils.py Shared utilities extracted cleanly: TimestampedContactSheet, contact-sheet generators, _normalize_input_segments, _blocked_prompt_reason, and new _segment_contact_sheets for per-segment sheet rendering. Single-pass video scan is correct and efficient.
src/refiner/robotics/subtask_annotation/init.py Clean package init re-exporting only the two public block constructors; top-level exports in refiner.robotics updated accordingly.
tests/robotics/test_subtask_annotation.py Good coverage of the new labeling block: seed/no-seed prompt selection, blocked-prompt fallback, field normalization, and contact-sheet dimensions. The blocked-without-seed case (empty seed + blocked provider) is not exercised.
src/refiner/robotics/init.py Removes previously public TimestampedContactSheet, timestamped_contact_sheets, and contact_sheet_prompt_manifest; adds subtask_labeling. Breaking change aligned with AGENTS.md no-compat-by-default policy.
docs/episode-operations/subtask-annotation.md Docs updated to cover the two-step segmentation+labeling workflow, new parameter tables, and output-shape section for labeling output.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant P as Pipeline
    participant SA as subtask_annotation
    participant SL as subtask_labeling
    participant U as utils
    participant V as VideoSource
    participant VLM as VLM Provider

    P->>SA: map_async(row)
    SA->>V: iter_frames() (full video)
    SA->>VLM: generate_text(contact_sheets, prompt)
    VLM-->>SA: "{segments:[{start_sec, end_sec, subtask}]}"
    SA-->>P: row[predicted_subtasks]

    P->>SL: map_async(row)
    SL->>U: _normalize_input_segments(row[segments_column])
    U-->>SL: sorted, validated segments[]
    SL->>U: _segment_contact_sheets(video, segments)
    U->>V: iter_frames() (single pass, all segments)
    U-->>SL: segment_sheets[] (1:1 with segments)
    SL->>U: _blank_contact_sheet()
    loop for each segment (serial)
        SL->>VLM: generate_text(prev/cur/next sheets, seed/plain prompt)
        VLM-->>SL: "{subtask: label}"
        SL->>SL: _normalize_label(subtask) or seed_label
    end
    SL-->>P: row[labeled_subtasks]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant P as Pipeline
    participant SA as subtask_annotation
    participant SL as subtask_labeling
    participant U as utils
    participant V as VideoSource
    participant VLM as VLM Provider

    P->>SA: map_async(row)
    SA->>V: iter_frames() (full video)
    SA->>VLM: generate_text(contact_sheets, prompt)
    VLM-->>SA: "{segments:[{start_sec, end_sec, subtask}]}"
    SA-->>P: row[predicted_subtasks]

    P->>SL: map_async(row)
    SL->>U: _normalize_input_segments(row[segments_column])
    U-->>SL: sorted, validated segments[]
    SL->>U: _segment_contact_sheets(video, segments)
    U->>V: iter_frames() (single pass, all segments)
    U-->>SL: segment_sheets[] (1:1 with segments)
    SL->>U: _blank_contact_sheet()
    loop for each segment (serial)
        SL->>VLM: generate_text(prev/cur/next sheets, seed/plain prompt)
        VLM-->>SL: "{subtask: label}"
        SL->>SL: _normalize_label(subtask) or seed_label
    end
    SL-->>P: row[labeled_subtasks]
Loading

Reviews (2): Last reviewed commit: "fix timestamp badge visibility test" | Re-trigger Greptile

Context used:

  • Context used - AGENTS.md (source)

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new subtask_labeling step to the robotics subtask annotation pipeline, allowing fixed temporal segments to be labeled using a dedicated vision-language model pass with previous, current, and next segment visual context. The changes include implementation of the labeling logic, utility functions for rendering segment contact sheets, documentation, and unit tests. The reviewer feedback focuses on critical performance optimizations, recommending that contact sheets be pre-rendered and VLM labeling requests be executed concurrently using asyncio.gather to avoid redundant video decoding and sequential API calls. Additionally, the reviewer suggests refactoring helper functions to support these pre-rendered sheets and adding robust error handling for malformed model outputs.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +183 to +243
async def _label_subtasks(
row: Row,
generate_text: "GenerateTextFn",
) -> MapResult:
if not isinstance(row, RoboticsRow):
raise TypeError("subtask_labeling expects RoboticsRow inputs")

video = _resolve_video(row, video_key)
segments = _normalize_input_segments(row[segments_column], segment_label_key)
seed_labels = _seed_labels(
row=row,
segments=segments,
labels_column=labels_column,
segment_label_key=segment_label_key,
)
instruction = "; ".join(task for task in row.tasks if task.strip())
labeled_segments: list[dict[str, Any]] = []
for index, segment in enumerate(segments):
seed_label = seed_labels[index]
content = await _subtask_labeling_content(
video=video,
segments=segments,
segment_index=index,
instruction=instruction,
seed_label=seed_label,
frame_width=frame_width,
max_frames_per_segment=max_frames_per_segment,
columns=columns,
quality=quality,
)
messages = cast(list[Message], [{"role": "user", "content": content}])
try:
response = await generate_text(
messages=messages,
schema=_SubtaskLabelingResult,
temperature=temperature,
)
except RuntimeError as exc:
block_reason = _blocked_prompt_reason(exc)
if block_reason is None or on_blocked_prompt == "raise":
raise
logger.warning(
"subtask labeling provider blocked episode {} segment {}: {}",
row.episode_id,
index,
block_reason,
)
label = seed_label
else:
parsed = (
response.object
if isinstance(response.object, _SubtaskLabelingResult)
else _parse_subtask_labeling_result(response.text)
)
label = _normalize_label(parsed.label) or seed_label

labeled = dict(segment)
labeled[segment_label_key] = label
labeled_segments.append(labeled)

return row.update({output_column: labeled_segments})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation runs labeling sequentially for each segment. This is highly inefficient because it performs sequential VLM calls and redundantly decodes/renders contact sheets for overlapping neighbor segments (each segment's contact sheet is rendered up to 3 times as previous, current, and next neighbors).

We can optimize this significantly by:

  1. Pre-rendering all segment contact sheets concurrently using asyncio.gather exactly once per segment.
  2. Running the VLM labeling requests concurrently using asyncio.gather.
  3. Adding a robust try-except block around the response parsing to log a warning and fallback to the seed label instead of crashing the entire pipeline on malformed model outputs.
    async def _label_subtasks(
        row: Row,
        generate_text: "GenerateTextFn",
    ) -> MapResult:
        if not isinstance(row, RoboticsRow):
            raise TypeError("subtask_labeling expects RoboticsRow inputs")

        video = _resolve_video(row, video_key)
        segments = _normalize_input_segments(row[segments_column], segment_label_key)
        seed_labels = _seed_labels(
            row=row,
            segments=segments,
            labels_column=labels_column,
            segment_label_key=segment_label_key,
        )
        instruction = "; ".join(task for task in row.tasks if task.strip())

        import asyncio

        # Pre-render contact sheets for all segments concurrently to avoid redundant rendering
        rendered_sheets = await asyncio.gather(*(
            _segment_contact_sheet(
                video=video,
                start_sec=float(segment["start_sec"]),
                end_sec=float(segment["end_sec"]),
                frame_width=frame_width,
                max_frames=max_frames_per_segment,
                columns=columns,
                quality=quality,
            )
            for segment in segments
        ))

        blank_sheet = _blank_contact_sheet(
            frame_width=frame_width,
            columns=columns,
            quality=quality,
        )

        async def _label_segment(index: int, segment: dict[str, Any]) -> dict[str, Any]:
            seed_label = seed_labels[index]
            content = await _subtask_labeling_content(
                segments=segments,
                segment_index=index,
                instruction=instruction,
                seed_label=seed_label,
                rendered_sheets=rendered_sheets,
                blank_sheet=blank_sheet,
            )
            messages = cast(list[Message], [{"role": "user", "content": content}])
            try:
                response = await generate_text(
                    messages=messages,
                    schema=_SubtaskLabelingResult,
                    temperature=temperature,
                )
            except RuntimeError as exc:
                block_reason = _blocked_prompt_reason(exc)
                if block_reason is None or on_blocked_prompt == "raise":
                    raise
                logger.warning(
                    "subtask labeling provider blocked episode {} segment {}: {}",
                    row.episode_id,
                    index,
                    block_reason,
                )
                label = seed_label
            else:
                try:
                    parsed = (
                        response.object
                        if isinstance(response.object, _SubtaskLabelingResult)
                        else _parse_subtask_labeling_result(response.text)
                    )
                    label = _normalize_label(parsed.label) or seed_label
                except Exception as parse_exc:
                    logger.warning(
                        "failed to parse labeling response for episode {} segment {}: {}. Falling back to seed label.",
                        row.episode_id,
                        index,
                        parse_exc,
                    )
                    label = seed_label

            labeled = dict(segment)
            labeled[segment_label_key] = label
            return labeled

        labeled_segments = await asyncio.gather(*(
            _label_segment(index, segment)
            for index, segment in enumerate(segments)
        ))

        return row.update({output_column: labeled_segments})

Comment on lines +256 to +306
async def _subtask_labeling_content(
*,
video: VideoSource,
segments: Sequence[Mapping[str, Any]],
segment_index: int,
instruction: str,
seed_label: str,
frame_width: int,
max_frames_per_segment: int,
columns: int,
quality: int,
) -> list[dict[str, Any]]:
segment = segments[segment_index]
start_sec = float(segment["start_sec"])
end_sec = float(segment["end_sec"])
prompt_template = (
_SUBTASK_SEED_LABELING_PROMPT_TEMPLATE
if seed_label
else _SUBTASK_LABELING_PROMPT_TEMPLATE
)
text = prompt_template.format(
instruction=instruction,
segment_index=segment_index,
segment_count=len(segments),
start_sec=start_sec,
end_sec=end_sec,
seed_label=seed_label,
)
file_parts: list[dict[str, Any]] = []
for neighbor_index in (segment_index - 1, segment_index, segment_index + 1):
if 0 <= neighbor_index < len(segments):
neighbor = segments[neighbor_index]
sheet = await _segment_contact_sheet(
video=video,
start_sec=float(neighbor["start_sec"]),
end_sec=float(neighbor["end_sec"]),
frame_width=frame_width,
max_frames=max_frames_per_segment,
columns=columns,
quality=quality,
)
else:
sheet = _blank_contact_sheet(
frame_width=frame_width,
columns=columns,
quality=quality,
)
file_parts.append(
{"type": "file", "mediaType": sheet.media_type, "data": sheet.data}
)
return [{"type": "text", "text": text}, *file_parts]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Refactor _subtask_labeling_content to accept pre-rendered contact sheets and the blank sheet. This avoids redundant video decoding and rendering for overlapping neighbor segments.

async def _subtask_labeling_content(
    *,
    segments: Sequence[Mapping[str, Any]],
    segment_index: int,
    instruction: str,
    seed_label: str,
    rendered_sheets: list[TimestampedContactSheet],
    blank_sheet: TimestampedContactSheet,
) -> list[dict[str, Any]]:
    segment = segments[segment_index]
    start_sec = float(segment["start_sec"])
    end_sec = float(segment["end_sec"])
    prompt_template = (
        _SUBTASK_SEED_LABELING_PROMPT_TEMPLATE
        if seed_label
        else _SUBTASK_LABELING_PROMPT_TEMPLATE
    )
    text = prompt_template.format(
        instruction=instruction,
        segment_index=segment_index,
        segment_count=len(segments),
        start_sec=start_sec,
        end_sec=end_sec,
        seed_label=seed_label,
    )
    file_parts: list[dict[str, Any]] = []
    for neighbor_index in (segment_index - 1, segment_index, segment_index + 1):
        if 0 <= neighbor_index < len(segments):
            sheet = rendered_sheets[neighbor_index]
        else:
            sheet = blank_sheet
        file_parts.append(
            {"type": "file", "mediaType": sheet.media_type, "data": sheet.data}
        )
    return [{"type": "text", "text": text}, *file_parts]

Comment on lines +13 to +22
from refiner.robotics.subtask_annotation.utils import (
_blank_contact_sheet,
_blocked_prompt_reason,
_normalize_input_segments,
_normalize_label,
_parse_json_object,
_resolve_video,
_seed_labels,
_segment_contact_sheet,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Import TimestampedContactSheet from utils to enable proper type annotations for the pre-rendered contact sheets in _subtask_labeling_content.

Suggested change
from refiner.robotics.subtask_annotation.utils import (
_blank_contact_sheet,
_blocked_prompt_reason,
_normalize_input_segments,
_normalize_label,
_parse_json_object,
_resolve_video,
_seed_labels,
_segment_contact_sheet,
)
from refiner.robotics.subtask_annotation.utils import (
TimestampedContactSheet,
_blank_contact_sheet,
_blocked_prompt_reason,
_normalize_input_segments,
_normalize_label,
_parse_json_object,
_resolve_video,
_seed_labels,
_segment_contact_sheet,
)

@hynky1999 Hynek Kydlíček (hynky1999) marked this pull request as ready for review July 1, 2026 15:04
Comment on lines +187 to +228
for index, segment in enumerate(segments):
seed_label = str(segment["subtask"])
content = await _subtask_labeling_content(
segments=segments,
segment_sheets=segment_sheets,
blank_sheet=blank_sheet,
segment_index=index,
instruction=instruction,
seed_label=seed_label,
)
messages = cast(list[Message], [{"role": "user", "content": content}])
try:
response = await generate_text(
messages=messages,
schema=_SubtaskLabelingResult,
provider_options={
"google": {"safetySettings": GEMINI_BLOCK_NONE_SAFETY_SETTINGS}
},
temperature=temperature,
)
except RuntimeError as exc:
block_reason = _blocked_prompt_reason(exc)
if block_reason is None or on_blocked_prompt == "raise":
raise
logger.warning(
"subtask labeling provider blocked episode {} segment {}: {}",
row.episode_id,
index,
block_reason,
)
label = seed_label
else:
parsed = response.object
if not isinstance(parsed, _SubtaskLabelingResult):
raise TypeError(
"subtask_labeling expected a structured response object"
)
label = _normalize_label(parsed.subtask) or seed_label

labeled = dict(segment)
labeled["subtask"] = label
labeled_segments.append(labeled)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Sequential per-segment VLM calls block per-episode throughput

Each segment's await generate_text(...) runs serially inside a single episode's coroutine. For an episode with 20 segments at ~2 s/call that is 40 s of wall-clock latency per episode, whereas the max_concurrent_requests semaphore and max_in_flight budget are sized for much higher parallelism. All N calls could be submitted concurrently with asyncio.gather(*[_label_one(seg, idx) for idx, seg in enumerate(segments)]) — the semaphore in generate_text would still cap the total outstanding requests across the worker.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex

Comment on lines +256 to +263
text = prompt_template.format(
instruction=instruction,
segment_index=segment_index,
segment_count=len(segments),
start_sec=start_sec,
end_sec=end_sec,
seed_label=seed_label,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 0-based segment_index in prompt yields unnatural "Target segment: 0 of N"

The model prompt shows segment_index as 0-based (e.g., "Target segment: 0 of 3"), which looks like an off-by-one to any reader (human or model) expecting 1-based labeling. Adjusting to 1-based makes the count self-consistent without changing any logic.

Suggested change
text = prompt_template.format(
instruction=instruction,
segment_index=segment_index,
segment_count=len(segments),
start_sec=start_sec,
end_sec=end_sec,
seed_label=seed_label,
)
text = prompt_template.format(
instruction=instruction,
segment_index=segment_index + 1,
segment_count=len(segments),
start_sec=start_sec,
end_sec=end_sec,
seed_label=seed_label,
)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex

Comment on lines +34 to +105
_SUBTASK_LABELING_PROMPT_TEMPLATE = """Annotate one fixed segment from a longer video.

Return only JSON:
{{"subtask":"short descriptive subtask label"}}

Inputs:
- The first image is the previous fixed segment, if it exists; otherwise it is blank/context only.
- The second image is the current target segment.
- The third image is the next fixed segment, if it exists; otherwise it is blank/context only.
- Each image is timestamped with absolute video time.

Episode instruction:
{instruction}

Target segment:
{segment_index} of {segment_count}

Target time:
{start_sec:.2f}s to {end_sec:.2f}s

Rules:
- Label only the current target segment.
- Use previous/next images only to disambiguate what changed during the current segment.
- Do not describe the previous or next segment.
- The segment boundaries are fixed; do not split or merge.
- Compare the beginning and end of the current segment and describe the visible manipulation or state change.
- Use one concise imperative phrase.
- Include the exact action and manipulated object.
- Include source, destination, side, direction, final location, opened/closed/filled/cleaned state, or affected part when visible and central.
- Do not mention timestamps, frame numbers, uncertainty, candidates, or invisible intent.
"""

_SUBTASK_SEED_LABELING_PROMPT_TEMPLATE = """Annotate one fixed segment from a longer video.

Return only JSON:
{{"subtask":"short descriptive subtask label"}}

Inputs:
- The first image is the previous fixed segment, if it exists; otherwise it is blank/context only.
- The second image is the current target segment.
- The third image is the next fixed segment, if it exists; otherwise it is blank/context only.
- Each image is timestamped with absolute video time.

Episode instruction:
{instruction}

Target segment:
{segment_index} of {segment_count}

Target time:
{start_sec:.2f}s to {end_sec:.2f}s

Original predicted label for this exact segment:
{seed_label}

Rules:
- Label only the current target segment.
- Use previous/next images only to disambiguate what changed during the current segment.
- Treat the original predicted label as a strong prior, not as ground truth.
- Verify and minimally correct the original label using the current target segment.
- If the original label describes the same action and main object, keep it, only improving grammar or adding clearly visible essential details.
- If it is too vague but directionally correct, make it more specific.
- If it describes the previous/next segment, the wrong action, wrong object, wrong destination, or wrong state change, replace it.
- Do not describe the previous or next segment.
- Do not split or merge the fixed segment.
- Do not introduce a new action unless it is clearly visible in the current target segment.
- Do not make the label broader than the fixed segment.
- Use one concise imperative phrase.
- Include the exact action and manipulated object.
- Include source, destination, side, direction, final location, opened/closed/filled/cleaned state, or affected part when visible and central.
- Do not mention timestamps, frame numbers, uncertainty, candidates, or invisible intent.
"""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Two prompt templates share ~40 identical lines

_SUBTASK_LABELING_PROMPT_TEMPLATE and _SUBTASK_SEED_LABELING_PROMPT_TEMPLATE differ only in the "Original predicted label" paragraph and its associated rules, yet both duplicate the full preamble, Inputs, Episode instruction, Target segment, and Target time blocks verbatim. Per the project's own guideline ("Avoid copy-paste duplication; extract shared logic"), extracting the shared header into a single constant and composing the seed section conditionally would eliminate drift risk when one template is updated and the other is not.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb498553f4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

sample_sec: float,
frame_width: int,
) -> AsyncIterator[tuple[float, Image.Image]]:
target_timestamp = 0.0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Start sampling at the first decoded timestamp

Initializing target_timestamp to 0.0 breaks clipped videos whose decoded timestamps start later, which LeRobot rows can produce via VideoFile.from_timestamp_s. For a clip beginning at 120s, the first decoded frame satisfies the catch-up loop and emits every 0.5s timestamp from 0 to 120 using that same image, bloating the VLM request and burning incorrect timestamps into the contact sheets, so returned subtask boundaries are wrong. Seed the sampler from the clip/first-frame timestamp or yield actual frame timestamps instead of catching up from zero.

Useful? React with 👍 / 👎.

Comment on lines +251 to +252
if timestamp + 1e-6 < targets[target_index][0]:
continue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve the final decoded frame for backfill

When the next target time is after the last decoded frame—common for a segment ending at the clip's to_timestamp, because the final frame is one frame earlier—this branch skips all intervening frames without updating last_image. The EOF backfill then repeats the image from the previous target (or leaves the sheet blank), so the current-segment sheet omits the actual final state that the labeling prompt asks the model to compare. Update last_image before skipping frames that are still before the next target.

Useful? React with 👍 / 👎.

@hynky1999 Hynek Kydlíček (hynky1999) merged commit 6f9783b into main Jul 1, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant