Skip to content

Adds async retarget to IsaacTeleopDevice (on by default)#5235

Open
hougantc-nvda wants to merge 6 commits intoisaac-sim:developfrom
hougantc-nvda:hougantc/async-retarget
Open

Adds async retarget to IsaacTeleopDevice (on by default)#5235
hougantc-nvda wants to merge 6 commits intoisaac-sim:developfrom
hougantc-nvda:hougantc/async-retarget

Conversation

@hougantc-nvda
Copy link
Copy Markdown
Contributor

@hougantc-nvda hougantc-nvda commented Apr 10, 2026

Description

Retargeting now runs asynchronously by default so work overlaps with env.step() instead of blocking the render path. This PR also hardens async scheduling by combining a consumption gate (strict 1:1 retarget-to-advance() cadence) with paced timing, and then refactors pacing into a pluggable estimator interface (EMA remains the default).

To disable async and use synchronous retargeting:

device.set_async(False)

Key changes:
Added AsyncRetargetLoop and wired IsaacTeleopDevice to use async retargeting by default.
Added consumption-gated pacing to prevent double-retargets and keep inputs fresh.
Refactored pacing into pluggable TimingEstimator / TimingEstimatorCfg with EmaTimingEstimator as default.

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.

Fixes # (issue)

Type of change

  • New feature (non-breaking change which adds functionality)

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@hougantc-nvda hougantc-nvda requested a review from rwiltz April 10, 2026 19:47
@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label Apr 10, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 10, 2026

Greptile Summary

This PR adds an AsyncRetargetLoop class that runs retargeting in a background thread with EMA-paced scheduling, letting retarget work overlap with env.step() instead of blocking the render path. IsaacTeleopDevice enables async mode by default and gains a set_async() toggle; TeleopSessionLifecycle gets add_on_stop_callback() so the device can cleanly shut down the background thread before the session is torn down.

  • P1 — consume() split-lock violates stated TOCTOU contract: the docstring says Condition bundles the freshness check and the _latest read under one lock acquisition, but the implementation releases the lock between the blocking wait and the state read. In the gap, the background thread can overwrite _latest with None, causing a spurious session teardown. Merging the EMA update into the same with self._cond: block as the wait and state read would fix this.
  • P2 — stale result state on start() restart: _gen, _consumed_gen, and _latest are not reset in start(), so a restart after a non-drained stop can return stale cached data as fresh=True on the first consume().

Confidence Score: 4/5

Safe to merge after addressing the split-lock TOCTOU in consume() — the race window is narrow but contradicts the stated thread-safety contract and can cause spurious session teardowns.

One P1 finding (split-lock in consume()) directly contradicts the documented Condition-based TOCTOU-avoidance design and introduces a real (if low-probability) spurious teardown race. The P2 stale-state-on-restart issue is minor but worth a one-liner fix. All other code is well-structured, tests are comprehensive, changelog and version bump are correct.

source/isaaclab_teleop/isaaclab_teleop/async_retarget_loop.py — consume() lock split; source/isaaclab_teleop/isaaclab_teleop/isaac_teleop_device.py — button polling gap in non-blocking mode.

Important Files Changed

Filename Overview
source/isaaclab_teleop/isaaclab_teleop/async_retarget_loop.py New EMA-paced background retarget loop; consume() splits _cond across two lock acquisitions, contradicting the TOCTOU-avoidance contract stated in the docstring, and allows a race where the background thread can overwrite _latest with None between the wait and the state read.
source/isaaclab_teleop/isaaclab_teleop/isaac_teleop_device.py Adds async advance path with EMA pacing; _poll_buttons() is skipped in non-blocking mode when no fresh result is available, potentially missing button-press edges even when a cached action is present.
source/isaaclab_teleop/isaaclab_teleop/session_lifecycle.py Adds add_on_stop_callback() to allow the device to stop background threads before session teardown; straightforward and correct.
source/isaaclab_teleop/test/test_async_retarget_loop.py Comprehensive tests covering blocking/non-blocking consume, EMA convergence, error propagation (transient and persistent), input isolation, and lifecycle; good coverage with make_loop fixture ensuring cleanup.
source/isaaclab_teleop/docs/CHANGELOG.rst New 0.3.5 heading added with correct Added/Changed entries; version matches extension.toml and follows RST formatting guidelines.
source/isaaclab_teleop/config/extension.toml Version bumped to 0.3.5, matching the new changelog entry.

Sequence Diagram

sequenceDiagram
    participant Main as Main Thread
    participant ARL as AsyncRetargetLoop
    participant BG as Background Thread
    participant Session as TeleopSessionLifecycle

    Main->>ARL: update_inputs(anchor_matrix, target_T_world)
    Main->>ARL: consume(block=True)
    activate ARL
    ARL-->>BG: wait on _cond until fresh result

    activate BG
    BG->>BG: _pace() EMA sleep until predicted deadline
    BG->>Session: step_fn(anchor_matrix, target_T_world)
    Session-->>BG: action tensor
    BG->>BG: action.clone() plus CUDA sync
    BG->>ARL: set _latest and notify_all()
    deactivate BG

    ARL-->>Main: returns (action, fresh=True)
    deactivate ARL

    Main->>Main: _poll_buttons() reads last_right_controller
    Main->>Main: update _cached_action
    Main-->>Main: return action
Loading

Reviews (1): Last reviewed commit: "Update version and changelog" | Re-trigger Greptile

Comment thread source/isaaclab_teleop/isaaclab_teleop/async_retarget_loop.py Outdated
Comment thread source/isaaclab_teleop/isaaclab_teleop/async_retarget_loop.py
Comment thread source/isaaclab_teleop/isaaclab_teleop/isaac_teleop_device.py Outdated
kellyguo11

This comment was marked as outdated.

isaaclab-review-bot[bot]

This comment was marked as outdated.

@hougantc-nvda hougantc-nvda force-pushed the hougantc/async-retarget branch from 8fcd6e8 to 06e4a47 Compare April 13, 2026 15:58
Comment thread source/isaaclab_teleop/isaaclab_teleop/async_retarget_loop.py Outdated
Comment thread source/isaaclab_teleop/isaaclab_teleop/isaac_teleop_device.py
@hougantc-nvda hougantc-nvda force-pushed the hougantc/async-retarget branch 2 times, most recently from 60bb8e0 to be0e9ca Compare April 15, 2026 16:25
@hougantc-nvda hougantc-nvda requested a review from nvddr April 15, 2026 21:22
@hougantc-nvda hougantc-nvda force-pushed the hougantc/async-retarget branch from eeb9e77 to d84a74e Compare April 17, 2026 17:59
Retargeting now runs in a background thread by default so it overlaps
with env.step() instead of blocking the render path.  An EMA-based
pacer predicts when the next advance() call will happen and starts
each retarget just early enough to have a fresh result ready in time.

To disable and revert to synchronous retargeting:

    device.set_async(False)

Key changes:
- New AsyncRetargetLoop class with EMA-paced scheduling, configurable
  blocking/non-blocking consume, and automatic retry with a cap on
  consecutive failures.
- IsaacTeleopDevice enables async mode in __init__.  set_async() can
  toggle between sync and async at runtime; the first async advance()
  runs synchronously to seed the cache, then starts the background
  loop.
- TeleopSessionLifecycle gains add_on_stop_callback() so the device can
  cleanly shut down the background thread before the session is torn
  down.
…o alleviate stuttering. The gate enforces a strict 1:1 retarget-to-advance ratio (preventing double retargets), while the EMA pacer delays input reading until just before the predicted deadline to maximize input freshness.
…fault EMA implementation in AsyncRetargetLoop.

Improve async readability in IsaacTeleopDevice and make async restart detection compare timing config values rather than object identity.
@hougantc-nvda hougantc-nvda force-pushed the hougantc/async-retarget branch from d84a74e to d611f3b Compare April 28, 2026 20:02
When _seed_async_loop called _advance_sync, both _poll_buttons and
_dispatch_control_callbacks fired once inside _advance_sync and again
at the end of _advance_async on the seed frame.  This double-dispatched
START/STOP/RESET callbacks and corrupted the right-A rising-edge state.

Call _session_lifecycle.step directly from _seed_async_loop so the
poll/dispatch happens exactly once per advance, matching the consume
path.

Made-with: Cursor
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

All previous concerns have been addressed in this revision. The torch.Tensor cloning gap in update_inputs() is now fixed (line 322-325), and EmaTimingEstimatorCfg.__post_init__ now validates ema_alpha is in (0, 1] (lines 197-198).

Implementation Verdict

Ship it

The code is clean, the threading model is correct, and test coverage is solid. Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants