[codex] split subtask labeling blocks by hynky1999 · Pull Request #226 · macrodata-labs/refiner

Hynek Kydlíček (hynky1999) · 2026-06-28T23:12:56Z

Summary

split subtask annotation into segmentation, labeling, and shared utility modules while keeping top-level exports stable
add mdr.robotics.subtask_labeling for fixed-segment labeling and seed-aware relabeling
update subtasks annotation docs/examples and regression tests for the new split flow

Validation

uv run pre-commit run --all-files
uv run pytest tests/robotics/test_subtask_annotation.py -q
uv run pytest tests/test_imports.py -q
git diff --check

Notes

removed the old subtask_relabeling naming in favor of subtask_labeling
removed min_segment_duration_sec and include_contact_sheet_manifest from subtask_annotation

Greptile Summary

This PR splits the monolithic subtask_annotation.py into three focused modules (segmentation.py, labeling.py, utils.py) and adds a new subtask_labeling pipeline block for fixed-segment relabeling using previous/current/next visual context. Top-level public exports (subtask_annotation, subtask_labeling) remain stable; internal utilities (TimestampedContactSheet, timestamped_contact_sheets) are intentionally removed from the mdr.robotics namespace.

New subtask_labeling block reads fixed segments from an upstream column, renders per-segment contact sheets in a single video pass, and calls the VLM with a seed-aware or plain labeling prompt depending on whether the input segment carries a subtask value.
utils.py adds _segment_contact_sheets, a single-pass video scanner that builds contact sheets for all segments simultaneously, and _normalize_input_segments, which validates and normalizes input segment dicts before the labeling loop.
Docs and examples updated to show the two-step segmentation → labeling pipeline with the new parameter tables.

Confidence Score: 5/5

Safe to merge. The module split is mechanical, top-level exports are unchanged, and the new labeling block is well-tested. Two edge cases (empty instruction field and empty-subtask fallback on a blocked prompt with no seed) are minor and do not affect the happy path.

The refactor is a clean extraction with no logic regressions in the segmentation path. The new labeling block handles its main flows correctly: seed vs. plain prompt selection, blocked-prompt fallback, single-pass video scanning, and 1:1 segment-to-sheet mapping. The two gaps noted are narrow edge cases that only affect label quality, not correctness of the pipeline structure.

labeling.py — the blocked-without-seed and empty-instruction edge cases are both concentrated here.

Important Files Changed

Filename	Overview
src/refiner/robotics/subtask_annotation/labeling.py	New subtask_labeling block with seed-aware relabeling. Two minor edge cases: empty instruction field when no tasks are present, and empty subtask written when a blocked prompt has no seed to fall back on.
src/refiner/robotics/subtask_annotation/segmentation.py	Extracted subtask_annotation from the old monolith; logic identical to the deleted subtask_annotation.py, no regressions observed.
src/refiner/robotics/subtask_annotation/utils.py	Shared utilities extracted cleanly: TimestampedContactSheet, contact-sheet generators, _normalize_input_segments, _blocked_prompt_reason, and new _segment_contact_sheets for per-segment sheet rendering. Single-pass video scan is correct and efficient.
src/refiner/robotics/subtask_annotation/init.py	Clean package init re-exporting only the two public block constructors; top-level exports in refiner.robotics updated accordingly.
tests/robotics/test_subtask_annotation.py	Good coverage of the new labeling block: seed/no-seed prompt selection, blocked-prompt fallback, field normalization, and contact-sheet dimensions. The blocked-without-seed case (empty seed + blocked provider) is not exercised.
src/refiner/robotics/init.py	Removes previously public TimestampedContactSheet, timestamped_contact_sheets, and contact_sheet_prompt_manifest; adds subtask_labeling. Breaking change aligned with AGENTS.md no-compat-by-default policy.
docs/episode-operations/subtask-annotation.md	Docs updated to cover the two-step segmentation+labeling workflow, new parameter tables, and output-shape section for labeling output.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant P as Pipeline
    participant SA as subtask_annotation
    participant SL as subtask_labeling
    participant U as utils
    participant V as VideoSource
    participant VLM as VLM Provider

    P->>SA: map_async(row)
    SA->>V: iter_frames() (full video)
    SA->>VLM: generate_text(contact_sheets, prompt)
    VLM-->>SA: "{segments:[{start_sec, end_sec, subtask}]}"
    SA-->>P: row[predicted_subtasks]

    P->>SL: map_async(row)
    SL->>U: _normalize_input_segments(row[segments_column])
    U-->>SL: sorted, validated segments[]
    SL->>U: _segment_contact_sheets(video, segments)
    U->>V: iter_frames() (single pass, all segments)
    U-->>SL: segment_sheets[] (1:1 with segments)
    SL->>U: _blank_contact_sheet()
    loop for each segment (serial)
        SL->>VLM: generate_text(prev/cur/next sheets, seed/plain prompt)
        VLM-->>SL: "{subtask: label}"
        SL->>SL: _normalize_label(subtask) or seed_label
    end
    SL-->>P: row[labeled_subtasks]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant P as Pipeline
    participant SA as subtask_annotation
    participant SL as subtask_labeling
    participant U as utils
    participant V as VideoSource
    participant VLM as VLM Provider

    P->>SA: map_async(row)
    SA->>V: iter_frames() (full video)
    SA->>VLM: generate_text(contact_sheets, prompt)
    VLM-->>SA: "{segments:[{start_sec, end_sec, subtask}]}"
    SA-->>P: row[predicted_subtasks]

    P->>SL: map_async(row)
    SL->>U: _normalize_input_segments(row[segments_column])
    U-->>SL: sorted, validated segments[]
    SL->>U: _segment_contact_sheets(video, segments)
    U->>V: iter_frames() (single pass, all segments)
    U-->>SL: segment_sheets[] (1:1 with segments)
    SL->>U: _blank_contact_sheet()
    loop for each segment (serial)
        SL->>VLM: generate_text(prev/cur/next sheets, seed/plain prompt)
        VLM-->>SL: "{subtask: label}"
        SL->>SL: _normalize_label(subtask) or seed_label
    end
    SL-->>P: row[labeled_subtasks]

_{Reviews (2): Last reviewed commit: "fix timestamp badge visibility test" | Re-trigger Greptile}

Context used:

Context used - AGENTS.md (source)

gemini-code-assist

Code Review

This pull request introduces a new subtask_labeling step to the robotics subtask annotation pipeline, allowing fixed temporal segments to be labeled using a dedicated vision-language model pass with previous, current, and next segment visual context. The changes include implementation of the labeling logic, utility functions for rendering segment contact sheets, documentation, and unit tests. The reviewer feedback focuses on critical performance optimizations, recommending that contact sheets be pre-rendered and VLM labeling requests be executed concurrently using asyncio.gather to avoid redundant video decoding and sequential API calls. Additionally, the reviewer suggests refactoring helper functions to support these pre-rendered sheets and adding robust error handling for malformed model outputs.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-28T23:14:57Z

+    async def _label_subtasks(
+        row: Row,
+        generate_text: "GenerateTextFn",
+    ) -> MapResult:
+        if not isinstance(row, RoboticsRow):
+            raise TypeError("subtask_labeling expects RoboticsRow inputs")
+
+        video = _resolve_video(row, video_key)
+        segments = _normalize_input_segments(row[segments_column], segment_label_key)
+        seed_labels = _seed_labels(
+            row=row,
+            segments=segments,
+            labels_column=labels_column,
+            segment_label_key=segment_label_key,
+        )
+        instruction = "; ".join(task for task in row.tasks if task.strip())
+        labeled_segments: list[dict[str, Any]] = []
+        for index, segment in enumerate(segments):
+            seed_label = seed_labels[index]
+            content = await _subtask_labeling_content(
+                video=video,
+                segments=segments,
+                segment_index=index,
+                instruction=instruction,
+                seed_label=seed_label,
+                frame_width=frame_width,
+                max_frames_per_segment=max_frames_per_segment,
+                columns=columns,
+                quality=quality,
+            )
+            messages = cast(list[Message], [{"role": "user", "content": content}])
+            try:
+                response = await generate_text(
+                    messages=messages,
+                    schema=_SubtaskLabelingResult,
+                    temperature=temperature,
+                )
+            except RuntimeError as exc:
+                block_reason = _blocked_prompt_reason(exc)
+                if block_reason is None or on_blocked_prompt == "raise":
+                    raise
+                logger.warning(
+                    "subtask labeling provider blocked episode {} segment {}: {}",
+                    row.episode_id,
+                    index,
+                    block_reason,
+                )
+                label = seed_label
+            else:
+                parsed = (
+                    response.object
+                    if isinstance(response.object, _SubtaskLabelingResult)
+                    else _parse_subtask_labeling_result(response.text)
+                )
+                label = _normalize_label(parsed.label) or seed_label
+
+            labeled = dict(segment)
+            labeled[segment_label_key] = label
+            labeled_segments.append(labeled)
+
+        return row.update({output_column: labeled_segments})


The current implementation runs labeling sequentially for each segment. This is highly inefficient because it performs sequential VLM calls and redundantly decodes/renders contact sheets for overlapping neighbor segments (each segment's contact sheet is rendered up to 3 times as previous, current, and next neighbors).

We can optimize this significantly by:

Pre-rendering all segment contact sheets concurrently using asyncio.gather exactly once per segment.

Running the VLM labeling requests concurrently using asyncio.gather.

Adding a robust try-except block around the response parsing to log a warning and fallback to the seed label instead of crashing the entire pipeline on malformed model outputs.

async def _label_subtasks( row: Row, generate_text: "GenerateTextFn", ) -> MapResult: if not isinstance(row, RoboticsRow): raise TypeError("subtask_labeling expects RoboticsRow inputs") video = _resolve_video(row, video_key) segments = _normalize_input_segments(row[segments_column], segment_label_key) seed_labels = _seed_labels( row=row, segments=segments, labels_column=labels_column, segment_label_key=segment_label_key, ) instruction = "; ".join(task for task in row.tasks if task.strip()) import asyncio # Pre-render contact sheets for all segments concurrently to avoid redundant rendering rendered_sheets = await asyncio.gather(*( _segment_contact_sheet( video=video, start_sec=float(segment["start_sec"]), end_sec=float(segment["end_sec"]), frame_width=frame_width, max_frames=max_frames_per_segment, columns=columns, quality=quality, ) for segment in segments )) blank_sheet = _blank_contact_sheet( frame_width=frame_width, columns=columns, quality=quality, ) async def _label_segment(index: int, segment: dict[str, Any]) -> dict[str, Any]: seed_label = seed_labels[index] content = await _subtask_labeling_content( segments=segments, segment_index=index, instruction=instruction, seed_label=seed_label, rendered_sheets=rendered_sheets, blank_sheet=blank_sheet, ) messages = cast(list[Message], [{"role": "user", "content": content}]) try: response = await generate_text( messages=messages, schema=_SubtaskLabelingResult, temperature=temperature, ) except RuntimeError as exc: block_reason = _blocked_prompt_reason(exc) if block_reason is None or on_blocked_prompt == "raise": raise logger.warning( "subtask labeling provider blocked episode {} segment {}: {}", row.episode_id, index, block_reason, ) label = seed_label else: try: parsed = ( response.object if isinstance(response.object, _SubtaskLabelingResult) else _parse_subtask_labeling_result(response.text) ) label = _normalize_label(parsed.label) or seed_label except Exception as parse_exc: logger.warning( "failed to parse labeling response for episode {} segment {}: {}. Falling back to seed label.", row.episode_id, index, parse_exc, ) label = seed_label labeled = dict(segment) labeled[segment_label_key] = label return labeled labeled_segments = await asyncio.gather(*( _label_segment(index, segment) for index, segment in enumerate(segments) )) return row.update({output_column: labeled_segments})

gemini-code-assist · 2026-06-28T23:14:58Z

+async def _subtask_labeling_content(
+    *,
+    video: VideoSource,
+    segments: Sequence[Mapping[str, Any]],
+    segment_index: int,
+    instruction: str,
+    seed_label: str,
+    frame_width: int,
+    max_frames_per_segment: int,
+    columns: int,
+    quality: int,
+) -> list[dict[str, Any]]:
+    segment = segments[segment_index]
+    start_sec = float(segment["start_sec"])
+    end_sec = float(segment["end_sec"])
+    prompt_template = (
+        _SUBTASK_SEED_LABELING_PROMPT_TEMPLATE
+        if seed_label
+        else _SUBTASK_LABELING_PROMPT_TEMPLATE
+    )
+    text = prompt_template.format(
+        instruction=instruction,
+        segment_index=segment_index,
+        segment_count=len(segments),
+        start_sec=start_sec,
+        end_sec=end_sec,
+        seed_label=seed_label,
+    )
+    file_parts: list[dict[str, Any]] = []
+    for neighbor_index in (segment_index - 1, segment_index, segment_index + 1):
+        if 0 <= neighbor_index < len(segments):
+            neighbor = segments[neighbor_index]
+            sheet = await _segment_contact_sheet(
+                video=video,
+                start_sec=float(neighbor["start_sec"]),
+                end_sec=float(neighbor["end_sec"]),
+                frame_width=frame_width,
+                max_frames=max_frames_per_segment,
+                columns=columns,
+                quality=quality,
+            )
+        else:
+            sheet = _blank_contact_sheet(
+                frame_width=frame_width,
+                columns=columns,
+                quality=quality,
+            )
+        file_parts.append(
+            {"type": "file", "mediaType": sheet.media_type, "data": sheet.data}
+        )
+    return [{"type": "text", "text": text}, *file_parts]


Refactor _subtask_labeling_content to accept pre-rendered contact sheets and the blank sheet. This avoids redundant video decoding and rendering for overlapping neighbor segments.

async def _subtask_labeling_content( *, segments: Sequence[Mapping[str, Any]], segment_index: int, instruction: str, seed_label: str, rendered_sheets: list[TimestampedContactSheet], blank_sheet: TimestampedContactSheet, ) -> list[dict[str, Any]]: segment = segments[segment_index] start_sec = float(segment["start_sec"]) end_sec = float(segment["end_sec"]) prompt_template = ( _SUBTASK_SEED_LABELING_PROMPT_TEMPLATE if seed_label else _SUBTASK_LABELING_PROMPT_TEMPLATE ) text = prompt_template.format( instruction=instruction, segment_index=segment_index, segment_count=len(segments), start_sec=start_sec, end_sec=end_sec, seed_label=seed_label, ) file_parts: list[dict[str, Any]] = [] for neighbor_index in (segment_index - 1, segment_index, segment_index + 1): if 0 <= neighbor_index < len(segments): sheet = rendered_sheets[neighbor_index] else: sheet = blank_sheet file_parts.append( {"type": "file", "mediaType": sheet.media_type, "data": sheet.data} ) return [{"type": "text", "text": text}, *file_parts]

gemini-code-assist · 2026-06-28T23:14:58Z

+from refiner.robotics.subtask_annotation.utils import (
+    _blank_contact_sheet,
+    _blocked_prompt_reason,
+    _normalize_input_segments,
+    _normalize_label,
+    _parse_json_object,
+    _resolve_video,
+    _seed_labels,
+    _segment_contact_sheet,
+)


Import TimestampedContactSheet from utils to enable proper type annotations for the pre-rendered contact sheets in _subtask_labeling_content.

Suggested change

from refiner.robotics.subtask_annotation.utils import (

_blank_contact_sheet,

_blocked_prompt_reason,

_normalize_input_segments,

_normalize_label,

_parse_json_object,

_resolve_video,

_seed_labels,

_segment_contact_sheet,

)

from refiner.robotics.subtask_annotation.utils import (

TimestampedContactSheet,

_blank_contact_sheet,

_blocked_prompt_reason,

_normalize_input_segments,

_normalize_label,

_parse_json_object,

_resolve_video,

_seed_labels,

_segment_contact_sheet,

)

greptile-apps · 2026-07-01T15:09:25Z

+        for index, segment in enumerate(segments):
+            seed_label = str(segment["subtask"])
+            content = await _subtask_labeling_content(
+                segments=segments,
+                segment_sheets=segment_sheets,
+                blank_sheet=blank_sheet,
+                segment_index=index,
+                instruction=instruction,
+                seed_label=seed_label,
+            )
+            messages = cast(list[Message], [{"role": "user", "content": content}])
+            try:
+                response = await generate_text(
+                    messages=messages,
+                    schema=_SubtaskLabelingResult,
+                    provider_options={
+                        "google": {"safetySettings": GEMINI_BLOCK_NONE_SAFETY_SETTINGS}
+                    },
+                    temperature=temperature,
+                )
+            except RuntimeError as exc:
+                block_reason = _blocked_prompt_reason(exc)
+                if block_reason is None or on_blocked_prompt == "raise":
+                    raise
+                logger.warning(
+                    "subtask labeling provider blocked episode {} segment {}: {}",
+                    row.episode_id,
+                    index,
+                    block_reason,
+                )
+                label = seed_label
+            else:
+                parsed = response.object
+                if not isinstance(parsed, _SubtaskLabelingResult):
+                    raise TypeError(
+                        "subtask_labeling expected a structured response object"
+                    )
+                label = _normalize_label(parsed.subtask) or seed_label
+
+            labeled = dict(segment)
+            labeled["subtask"] = label
+            labeled_segments.append(labeled)


Sequential per-segment VLM calls block per-episode throughput

Each segment's await generate_text(...) runs serially inside a single episode's coroutine. For an episode with 20 segments at ~2 s/call that is 40 s of wall-clock latency per episode, whereas the max_concurrent_requests semaphore and max_in_flight budget are sized for much higher parallelism. All N calls could be submitted concurrently with asyncio.gather(*[_label_one(seg, idx) for idx, seg in enumerate(segments)]) — the semaphore in generate_text would still cap the total outstanding requests across the worker.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-07-01T15:09:26Z

+    text = prompt_template.format(
+        instruction=instruction,
+        segment_index=segment_index,
+        segment_count=len(segments),
+        start_sec=start_sec,
+        end_sec=end_sec,
+        seed_label=seed_label,
+    )


0-based segment_index in prompt yields unnatural "Target segment: 0 of N"

The model prompt shows segment_index as 0-based (e.g., "Target segment: 0 of 3"), which looks like an off-by-one to any reader (human or model) expecting 1-based labeling. Adjusting to 1-based makes the count self-consistent without changing any logic.

Suggested change

text = prompt_template.format(

instruction=instruction,

segment_index=segment_index,

segment_count=len(segments),

start_sec=start_sec,

end_sec=end_sec,

seed_label=seed_label,

)

text = prompt_template.format(

instruction=instruction,

segment_index=segment_index + 1,

segment_count=len(segments),

start_sec=start_sec,

end_sec=end_sec,

seed_label=seed_label,

)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-07-01T15:09:27Z

+_SUBTASK_LABELING_PROMPT_TEMPLATE = """Annotate one fixed segment from a longer video.
+
+Return only JSON:
+{{"subtask":"short descriptive subtask label"}}
+
+Inputs:
+- The first image is the previous fixed segment, if it exists; otherwise it is blank/context only.
+- The second image is the current target segment.
+- The third image is the next fixed segment, if it exists; otherwise it is blank/context only.
+- Each image is timestamped with absolute video time.
+
+Episode instruction:
+{instruction}
+
+Target segment:
+{segment_index} of {segment_count}
+
+Target time:
+{start_sec:.2f}s to {end_sec:.2f}s
+
+Rules:
+- Label only the current target segment.
+- Use previous/next images only to disambiguate what changed during the current segment.
+- Do not describe the previous or next segment.
+- The segment boundaries are fixed; do not split or merge.
+- Compare the beginning and end of the current segment and describe the visible manipulation or state change.
+- Use one concise imperative phrase.
+- Include the exact action and manipulated object.
+- Include source, destination, side, direction, final location, opened/closed/filled/cleaned state, or affected part when visible and central.
+- Do not mention timestamps, frame numbers, uncertainty, candidates, or invisible intent.
+"""
+
+_SUBTASK_SEED_LABELING_PROMPT_TEMPLATE = """Annotate one fixed segment from a longer video.
+
+Return only JSON:
+{{"subtask":"short descriptive subtask label"}}
+
+Inputs:
+- The first image is the previous fixed segment, if it exists; otherwise it is blank/context only.
+- The second image is the current target segment.
+- The third image is the next fixed segment, if it exists; otherwise it is blank/context only.
+- Each image is timestamped with absolute video time.
+
+Episode instruction:
+{instruction}
+
+Target segment:
+{segment_index} of {segment_count}
+
+Target time:
+{start_sec:.2f}s to {end_sec:.2f}s
+
+Original predicted label for this exact segment:
+{seed_label}
+
+Rules:
+- Label only the current target segment.
+- Use previous/next images only to disambiguate what changed during the current segment.
+- Treat the original predicted label as a strong prior, not as ground truth.
+- Verify and minimally correct the original label using the current target segment.
+- If the original label describes the same action and main object, keep it, only improving grammar or adding clearly visible essential details.
+- If it is too vague but directionally correct, make it more specific.
+- If it describes the previous/next segment, the wrong action, wrong object, wrong destination, or wrong state change, replace it.
+- Do not describe the previous or next segment.
+- Do not split or merge the fixed segment.
+- Do not introduce a new action unless it is clearly visible in the current target segment.
+- Do not make the label broader than the fixed segment.
+- Use one concise imperative phrase.
+- Include the exact action and manipulated object.
+- Include source, destination, side, direction, final location, opened/closed/filled/cleaned state, or affected part when visible and central.
+- Do not mention timestamps, frame numbers, uncertainty, candidates, or invisible intent.
+"""


Two prompt templates share ~40 identical lines

_SUBTASK_LABELING_PROMPT_TEMPLATE and _SUBTASK_SEED_LABELING_PROMPT_TEMPLATE differ only in the "Original predicted label" paragraph and its associated rules, yet both duplicate the full preamble, Inputs, Episode instruction, Target segment, and Target time blocks verbatim. Per the project's own guideline ("Avoid copy-paste duplication; extract shared logic"), extracting the shared header into a single constant and composing the seed section conditionally would eliminate drift risk when one template is updated and the other is not.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb498553f4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-07-01T15:10:38Z

+    sample_sec: float,
+    frame_width: int,
+) -> AsyncIterator[tuple[float, Image.Image]]:
+    target_timestamp = 0.0


Start sampling at the first decoded timestamp

Initializing target_timestamp to 0.0 breaks clipped videos whose decoded timestamps start later, which LeRobot rows can produce via VideoFile.from_timestamp_s. For a clip beginning at 120s, the first decoded frame satisfies the catch-up loop and emits every 0.5s timestamp from 0 to 120 using that same image, bloating the VLM request and burning incorrect timestamps into the contact sheets, so returned subtask boundaries are wrong. Seed the sampler from the clip/first-frame timestamp or yield actual frame timestamps instead of catching up from zero.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-01T15:10:38Z

+        if timestamp + 1e-6 < targets[target_index][0]:
+            continue


Preserve the final decoded frame for backfill

When the next target time is after the last decoded frame—common for a segment ending at the clip's to_timestamp, because the final frame is one frame earlier—this branch skips all intervening frames without updating last_image. The EOF backfill then repeats the image from the previous target (or leaves the sheet blank), so the current-segment sheet omits the actual final state that the labeling prompt asks the model to compare. Update last_image before skipping frames that are still before the next target.

Useful? React with 👍 / 👎.

Hynek Kydlíček (hynky1999) added 2 commits June 29, 2026 01:12

split subtask labeling blocks

8f1a837

limit subtask annotation exports

4874af8

gemini-code-assist Bot reviewed Jun 28, 2026

View reviewed changes

Hynek Kydlíček (hynky1999) added 20 commits June 29, 2026 09:53

remove subtask json fallback parsing

7f33e91

merge contact sheet iterator

d184e83

remove unused subtask helpers

1642f5d

inline single contact sheet batch

469b169

align subtask contact sheets

0987a68

align subtask annotation renderer

928ebbd

restore subtask segmentation field

fc1f369

align subtask labeling renderer

7e5d6aa

optimize subtask labeling rendering

e9941e5

use subtask seeds for relabeling

5625f6b

update subtask labeling render defaults

ab7f196

write relabeled subtasks under subtask

87768ab

request subtask field from labeling

188fd00

remove segmentation quality option

a686034

remove subtask render knobs

1ac3e14

hardcode subtask labeling quality

f1d3184

hardcode subtask segmentation sampling

bf1a79d

link subtask annotation blog

ee3c294

update subtask annotation example

05ebf8b

polish subtask annotation docs

eb49855

Hynek Kydlíček (hynky1999) marked this pull request as ready for review July 1, 2026 15:04

greptile-apps Bot reviewed Jul 1, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jul 1, 2026

View reviewed changes

fix timestamp badge visibility test

59e36b7

Hynek Kydlíček (hynky1999) merged commit 6f9783b into main Jul 1, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[codex] split subtask labeling blocks#226

[codex] split subtask labeling blocks#226
Hynek Kydlíček (hynky1999) merged 23 commits into
mainfrom
codex/subtask-labeling

Hynek Kydlíček (hynky1999) commented Jun 28, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 28, 2026

Uh oh!

gemini-code-assist Bot Jun 28, 2026

Uh oh!

gemini-code-assist Bot Jun 28, 2026

Uh oh!

greptile-apps Bot Jul 1, 2026

Uh oh!

greptile-apps Bot Jul 1, 2026

Uh oh!

greptile-apps Bot Jul 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jul 1, 2026

Uh oh!

chatgpt-codex-connector Bot Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Hynek Kydlíček (hynky1999) commented Jun 28, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Notes

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Hynek Kydlíček (hynky1999) commented Jun 28, 2026 •

edited by greptile-apps Bot

Loading