Skip to content

fix(security): verify checksum before executing NodeSource installer as root (#576)#1814

Merged
ericksoa merged 3 commits intoNVIDIA:mainfrom
ColinM-sys:fix/576-nodesource-integrity-check
Apr 16, 2026
Merged

fix(security): verify checksum before executing NodeSource installer as root (#576)#1814
ericksoa merged 3 commits intoNVIDIA:mainfrom
ColinM-sys:fix/576-nodesource-integrity-check

Conversation

@ColinM-sys
Copy link
Copy Markdown
Contributor

@ColinM-sys ColinM-sys commented Apr 13, 2026

Summary

Closes #576.

The NodeSource installer in scripts/brev-launchable-ci-cpu.sh is downloaded via curl | sudo bash with no integrity verification (CWE-494). This executes arbitrary remote code as root.

Fix: follows the exact SHA-256 verification pattern already used for nvm in scripts/install.sh:634-657:

  • Download NodeSource script to temp file instead of piping to bash
  • Compute SHA-256 and verify against pinned hash
  • Only execute with sudo -E bash if hash matches
  • Refuse to execute and exit non-zero on mismatch, printing expected vs actual hash
  • Clean up temp file in all code paths

Note: The issue references scripts/brev-setup.sh and scripts/install.sh, but in the current codebase the only vulnerable occurrence is in scripts/brev-launchable-ci-cpu.sh:154. scripts/install.sh uses nvm (not NodeSource) and already has integrity verification.

Test plan

  • Verified pinned SHA-256 matches current NodeSource setup_22.x script
  • Verified hash mismatch path prints both hashes and exits non-zero
  • Verified temp file cleanup in both success and failure paths
  • Pattern matches existing nvm verification in scripts/install.sh

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores
    • Improved Node.js 22 installation security by downloading the installer, verifying its SHA-256 integrity before execution, and aborting on mismatch.
  • Style
    • Minor script formatting and control-flow whitespace adjustments to improve readability without changing behavior.

Signed-off-by: ColinM-sys cmcdonough@50words.com

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 183f4d56-677d-4fee-b0e1-bb311a3599b4

📥 Commits

Reviewing files that changed from the base of the PR and between 3b09542 and 76db304.

📒 Files selected for processing (1)
  • scripts/brev-launchable-ci-cpu.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/brev-launchable-ci-cpu.sh

📝 Walkthrough

Walkthrough

Replaces piping NodeSource's remote installer into bash with downloading the installer to a temp file, verifying its SHA-256 against a pinned NODESOURCE_SHA256, hard-failing on mismatch, executing the verified installer, and cleaning up. Also reformats some loop/case/conditional continuations without changing control flow.

Changes

Cohort / File(s) Summary
NodeSource installer & verification
scripts/brev-launchable-ci-cpu.sh
Replaces `curl ...
Shell formatting & control-flow tweaks
scripts/brev-launchable-ci-cpu.sh
Adjusted wait_for_apt_lock loop condition formatting, reformatted several case blocks and `sg docker ...

Sequence Diagram(s)

sequenceDiagram
    participant Script as "Launch script\n(bash)"
    participant Remote as "deb.nodesource.com\n(HTTP(S))"
    participant Temp as "Temp file\n(mktemp)"
    participant SHA as "sha256sum / shasum"
    participant Root as "sudo bash\n(execute)"

    Script->>Remote: curl -fsSL https://deb.nodesource.com/setup_22.x -o temp
    Remote-->>Script: installer script bytes
    Script->>SHA: compute hash of temp
    SHA-->>Script: actual_hash
    alt hashes match
        Script->>Root: sudo -E bash temp
        Root-->>Script: installer executed
        Script->>Temp: rm -f temp
    else mismatch
        Script->>Temp: rm -f temp
        Script-->>Script: exit/fail with error
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I fetched the script to a temp bed,
I hashed its bits before it fled.
No blind pipe now, we check the sign,
I nibble bugs and keep things fine. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding SHA-256 checksum verification before executing the NodeSource installer as root, which directly addresses issue #576.
Linked Issues check ✅ Passed The PR implements all coding requirements from issue #576: downloads installer to temp file, computes SHA-256 using available tools, compares to pinned hash, fails with diagnostic output on mismatch, and cleans up temp files in all paths.
Out of Scope Changes check ✅ Passed The PR contains only the NodeSource checksum verification fix and formatting adjustments. No unrelated changes are present beyond the scope of issue #576.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@wscurran wscurran added bug Something isn't working security Something isn't secure priority: high Important issue that should be resolved in the next release fix labels Apr 13, 2026
@wscurran
Copy link
Copy Markdown
Contributor

@cv cv added the v0.0.16 Release target label Apr 14, 2026
@ColinM-sys ColinM-sys force-pushed the fix/576-nodesource-integrity-check branch from d36de93 to ca4466f Compare April 14, 2026 23:26
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
scripts/docs-to-skills.py (1)

984-989: ⚠️ Potential issue | 🟡 Minor

Avoid orphan blank lines when deduplicating prerequisites.

On Line 984–985, a blank line is appended before deduplication. If the current bullet is a duplicate, the blank line remains without a corresponding item.

Suggested fix
-                    if stripped.startswith("- "):
-                        if prereq_items and not prereq_items[-1].startswith("- "):
-                            prereq_items.append("")
-                        norm = stripped.lower().strip("- .")
-                        if norm not in seen_prereqs:
-                            seen_prereqs.add(norm)
-                            prereq_items.append(stripped)
+                    if stripped.startswith("- "):
+                        norm = stripped.lower().strip("- .")
+                        if norm not in seen_prereqs:
+                            if prereq_items and not prereq_items[-1].startswith("- "):
+                                prereq_items.append("")
+                            seen_prereqs.add(norm)
+                            prereq_items.append(stripped)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/docs-to-skills.py` around lines 984 - 989, The code is appending a
blank separator to prereq_items before checking for duplicates, leaving orphan
blanks when the current bullet is a duplicate; in the block handling
prerequisites (variables prereq_items, seen_prereqs, norm, stripped) change the
order so you compute norm and check if norm is not in seen_prereqs (and add it
to seen_prereqs) before you append any blank separator or the stripped line, or
conditionally append the blank only when you will also append the new item —
this ensures no blank lines are added for duplicate bullets.
src/lib/onboard.ts (2)

3061-3080: ⚠️ Potential issue | 🟡 Minor

Env-backed provider keys are still ignored in the interactive branch.

This block copies NEMOCLAW_PROVIDER_KEY into process.env[credentialEnv], but then immediately calls ensureNamedCredential(...), which only checks saved credentials before prompting. Interactive runs therefore re-prompt even when the alias or the provider-specific env var is already present.

Suggested fix
           } else {
-            await ensureNamedCredential(
-              credentialEnv,
-              remoteConfig.label + " API key",
-              remoteConfig.helpUrl,
-            );
+            const existingCredential = normalizeCredentialValue(process.env[credentialEnv]);
+            if (existingCredential) {
+              process.env[credentialEnv] = existingCredential;
+            } else {
+              await ensureNamedCredential(
+                credentialEnv,
+                remoteConfig.label + " API key",
+                remoteConfig.helpUrl,
+              );
+            }
           }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 3061 - 3080, The interactive branch is
re-prompting despite copying NEMOCLAW_PROVIDER_KEY into
process.env[credentialEnv]; modify the logic so that after setting
process.env[credentialEnv] (via _providerKeyHint) you check if
process.env[credentialEnv] is already populated and skip calling
ensureNamedCredential when it is. In short: in the block guarded by
isNonInteractive() === false, add a conditional that bypasses
ensureNamedCredential if process.env[credentialEnv] exists (using the same
credentialEnv/_providerKeyHint variables and remoteConfig for context) so
existing env-backed keys are honored in interactive runs.

4521-4543: ⚠️ Potential issue | 🟠 Major

Preserve preset access outside the interactive picker.

This change adds per-preset access modes, but the non-interactive and resume paths still collapse tier defaults to plain preset names and then call applyPreset(...) without { access }. A resumed onboard or CI run can therefore widen a restrictive tier back to the callee's default access, which is a security regression for the new tier model.

Also applies to: 4581-4585, 4621-4629

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 4521 - 4543, The resume and non-interactive
flows collapse presets to plain names and call policies.applyPreset(...) without
the per-preset access mode, which loses restrictive access settings; update the
resume loop (where `chosen` is processed and `policies.applyPreset(sandboxName,
name)` is called) to carry the preset object or access flag and call
policies.applyPreset(sandboxName, presetName, { access }) (or pass the preset
object) instead of only the name; likewise change the tier seeding logic that
builds `suggestions` from tiers.resolveTierPresets(tierName).map(p => p.name) so
it preserves each preset's access (e.g., keep the objects or map to { name,
access }) and ensure any later call sites that iterate `suggestions` use the
access when invoking policies.applyPreset; also make the same fix in the other
instances noted (around lines referenced) so resumed/CI and non-interactive
installs retain the original preset access.
🧹 Nitpick comments (7)
src/lib/debug.ts (1)

383-383: Keep the artifact name aligned with the probed port.

When DASHBOARD_PORT is overridden, this still writes lsof-18789.txt even though the command inspects a different port. That makes the debug bundle misleading հենց when the override is what you're debugging.

Suggested fix
-  collect(collectDir, "lsof-18789", "lsof", ["-i", `:${DASHBOARD_PORT}`]);
+  collect(collectDir, `lsof-${DASHBOARD_PORT}`, "lsof", ["-i", `:${DASHBOARD_PORT}`]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/debug.ts` at line 383, The artifact name "lsof-18789" is hard-coded
and doesn't reflect overrides of DASHBOARD_PORT; update the call to collect (the
invocation using collectDir, "lsof-18789", "lsof", ["-i", `:${DASHBOARD_PORT}`])
to generate the artifact name dynamically using the DASHBOARD_PORT variable
(e.g., "lsof-${DASHBOARD_PORT}") so the collected file matches the probed port;
adjust any surrounding code that references the static name if present.
scripts/debug.sh (1)

328-331: Consider warning when falling back to default dashboard port.

Right now invalid NEMOCLAW_DASHBOARD_PORT values are silently coerced to 18789, which can make diagnostics misleading.

Suggested patch
 _dp="$(printf '%s' "${NEMOCLAW_DASHBOARD_PORT:-18789}" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')"
 case "$_dp" in *[!0-9]* | '') _dp=18789 ;; esac
-[ "$_dp" -ge 1024 ] && [ "$_dp" -le 65535 ] 2>/dev/null || _dp=18789
+if ! ([ "$_dp" -ge 1024 ] && [ "$_dp" -le 65535 ]); then
+  warn "Invalid NEMOCLAW_DASHBOARD_PORT='${NEMOCLAW_DASHBOARD_PORT:-}' — defaulting to 18789"
+  _dp=18789
+fi
 collect "lsof-dashboard" lsof -i ":${_dp}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/debug.sh` around lines 328 - 331, The script silently coerces invalid
NEMOCLAW_DASHBOARD_PORT into the default 18789 (variable _dp) which hides
misconfiguration; update the logic around _dp validation in scripts/debug.sh so
that when the env value is empty or non-numeric or out of range you emit a clear
warning (e.g., to stderr) stating the provided NEMOCLAW_DASHBOARD_PORT and that
it is being reset to 18789, then proceed with setting _dp and calling collect
"lsof-dashboard" lsof -i ":${_dp}". Ensure the warning is emitted before the
fallback assignment so it occurs for all invalid cases handled by the existing
checks.
scripts/nemoclaw-start.sh (1)

158-176: Optional: deduplicate the default port and error string in this block.

This would reduce drift risk if the allowed range/default ever changes again.

Possible cleanup
+_DASHBOARD_PORT_DEFAULT=18789
+_PORT_ERR="must be an integer between 1024 and 65535"
 _DASHBOARD_PORT_RAW="${NEMOCLAW_DASHBOARD_PORT:-}"
 if [ -z "$_DASHBOARD_PORT_RAW" ]; then
-  _DASHBOARD_PORT=18789
+  _DASHBOARD_PORT=$_DASHBOARD_PORT_DEFAULT
 else
   _DASHBOARD_PORT="$(printf '%s' "$_DASHBOARD_PORT_RAW" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')"
   case "$_DASHBOARD_PORT" in
     *[!0-9]* | '')
-      echo "[SECURITY] Invalid NEMOCLAW_DASHBOARD_PORT='${NEMOCLAW_DASHBOARD_PORT}' — must be an integer between 1024 and 65535" >&2
+      echo "[SECURITY] Invalid NEMOCLAW_DASHBOARD_PORT='${NEMOCLAW_DASHBOARD_PORT}' — ${_PORT_ERR}" >&2
       exit 1
       ;;
   esac
   if [ "$_DASHBOARD_PORT" -lt 1024 ] || [ "$_DASHBOARD_PORT" -gt 65535 ]; then
-    echo "[SECURITY] Invalid NEMOCLAW_DASHBOARD_PORT='${NEMOCLAW_DASHBOARD_PORT}' — must be an integer between 1024 and 65535" >&2
+    echo "[SECURITY] Invalid NEMOCLAW_DASHBOARD_PORT='${NEMOCLAW_DASHBOARD_PORT}' — ${_PORT_ERR}" >&2
     exit 1
   fi
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` around lines 158 - 176, Introduce a single
canonical default and error message and reuse them instead of repeating magic
values: add a constant like _DASHBOARD_PORT_DEFAULT=18789 and an ERROR_MSG
variable containing the "[SECURITY] Invalid NEMOCLAW_DASHBOARD_PORT='...'" text,
then replace the literal 18789 assignment to _DASHBOARD_PORT and both echo lines
that reference the hardcoded message to use those variables; update CHAT_UI_URL
and PUBLIC_PORT to derive from _DASHBOARD_PORT as before and ensure you still
trim/validate NEMOCLAW_DASHBOARD_PORT via _DASHBOARD_PORT_RAW/_DASHBOARD_PORT
and the numeric range checks.
scripts/dev-tier-selector.js (1)

14-15: Update the referenced test filename in the header comment.

The comment points to test/policy-tiers-onboard.test.js; current test naming in this repo appears to use .test.ts. Aligning the path avoids confusion.

Suggested patch
-// of this flow see test/policy-tiers-onboard.test.js.
+// of this flow see test/policy-tiers-onboard.test.ts.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/dev-tier-selector.js` around lines 14 - 15, The header comment in
scripts/dev-tier-selector.js references the outdated test file name
"test/policy-tiers-onboard.test.js"; update that comment to reference the
correct current test filename "test/policy-tiers-onboard.test.ts" so the
documentation matches the repository naming convention.
test/onboard-selection.test.ts (1)

2197-2264: Validate secret: true in the pasted-key retry test.

The test name says the flow re-prompts securely, but it only checks prompt text. Capture prompt options and assert secret input is enforced.

Suggested patch
-const answers = ["1", "", "nvapi-fake-key-value", "nvapi-good", ""];
-const messages = [];
+const answers = ["1", "", "nvapi-fake-key-value", "nvapi-good", ""];
+const messages = [];
+const prompts = [];

-credentials.prompt = async (message) => {
+credentials.prompt = async (message, opts = {}) => {
   messages.push(message);
+  prompts.push({ message, secret: opts.secret === true });
   return answers.shift() || "";
 };
@@
-    originalLog(JSON.stringify({ result, messages, lines, key: process.env.NVIDIA_API_KEY }));
+    originalLog(JSON.stringify({ result, messages, prompts, lines, key: process.env.NVIDIA_API_KEY }));
@@
     assert.ok(payload.messages.some((message) => CREDENTIAL_RETRY_PROMPT_RE.test(message)));
     assert.ok(payload.messages.some((message) => /NVIDIA Endpoints API key: /.test(message)));
+    const retryPrompt = payload.prompts.find((entry) => CREDENTIAL_RETRY_PROMPT_RE.test(entry.message));
+    assert.equal(retryPrompt?.secret, true);
+    const apiKeyPrompt = payload.prompts.find((entry) => /NVIDIA Endpoints API key: /.test(entry.message));
+    assert.equal(apiKeyPrompt?.secret, true);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/onboard-selection.test.ts` around lines 2197 - 2264, The test currently
only records prompt text via credentials.prompt and must also capture the prompt
options to assert secret input is enforced; change the mocked credentials.prompt
in the test around setupNim to accept both (message, options) and push an object
like {message, options} into messages (instead of just the string), update uses
of messages in assertions to inspect message.message and message.options, and
add an assertion that the entry matching CREDENTIAL_RETRY_PROMPT_RE has
options.secret === true (retain existing checks for the plaintext prompt text
and other assertions such as setupNim, CREDENTIAL_RETRY_PROMPT_RE, and NVIDIA
prompt matching).
scripts/lib/runtime.sh (1)

255-264: Consider extracting port validation to avoid duplication.

Port validation is performed identically in both get_local_provider_base_url and check_local_provider_health. If these functions are frequently called together, the redundant validation adds overhead.

♻️ Optional: Extract validated ports to a shared helper

If both functions are always called in the same context, consider a helper that validates once:

+_get_validated_ports() {
+  local vllm_port="${NEMOCLAW_VLLM_PORT:-8000}"
+  local ollama_port="${NEMOCLAW_OLLAMA_PORT:-11434}"
+  _validate_port NEMOCLAW_VLLM_PORT "$vllm_port" || return 1
+  _validate_port NEMOCLAW_OLLAMA_PORT "$ollama_port" || return 1
+  printf '%s %s\n' "$vllm_port" "$ollama_port"
+}

However, if these functions are called independently in different scripts, the current approach is safer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lib/runtime.sh` around lines 255 - 264, The duplicated port
validation in get_local_provider_base_url and check_local_provider_health should
be extracted into a single helper: create a function (e.g.
_validate_local_provider_ports) that reads
NEMOCLAW_VLLM_PORT/NEMOCLAW_OLLAMA_PORT, calls _validate_port for each and
returns the validated port values (or non-zero on failure); then replace the
inline validation in get_local_provider_base_url and check_local_provider_health
with calls to this helper and use the returned ports (vllm_port/ollama_port) to
build the base URL or perform health checks, keeping original behavior and error
returns.
test/e2e/test-messaging-providers.sh (1)

102-121: Function name sandbox_exec_stdin is misleading.

The function name implies the command is passed via stdin, but the command is actually passed as a positional argument to SSH (line 116). Only the callers at lines 337 and 374 actually pipe data into the command via stdin (using printf '%s' "$TOKEN" | sandbox_exec_stdin "grep ...").

Consider renaming to sandbox_exec_quiet or similar to reflect that it suppresses stderr and avoids exposing args in the outer process list, rather than implying stdin-based command passing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-messaging-providers.sh` around lines 102 - 121, Rename the
misleading function sandbox_exec_stdin to a name that reflects its behavior
(e.g., sandbox_exec_quiet); update the function definition (previously
sandbox_exec_stdin) and all call sites to the new name, including the callers
that actually pipe stdin (the uses like "printf '%s' \"$TOKEN\" |
sandbox_exec_stdin 'grep ...'") so those pipelines still work, and update the
function comment to explain it suppresses stderr, hides args from the outer
process list, and still accepts a command as a positional argument. Ensure no
other behavior or arguments change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/skills/nemoclaw-user-workspace/SKILL.md:
- Line 3: Update the description front-matter entry that currently reads "Hows
to back up and restore..." to "How to back up and restore..." in the source
documentation that generates SKILL.md (the description key in the generated
file); edit the source docs entry under the docs source so the regenerated
.agents/skills/nemoclaw-user-workspace/SKILL.md contains the corrected wording.
- Line 68: The doc references the non-existent subcommand "openshell sandbox
shell"; update the generated skill doc and source documentation to use the
correct subcommand "openshell sandbox connect" instead—specifically edit
SKILL.md (the skill doc line shown) and the workspace documentation entry (the
workspace-files.md document) to replace "openshell sandbox shell" with
"openshell sandbox connect", and scan for any remaining occurrences of
"openshell sandbox shell" to ensure consistency with the codebase (e.g.,
matching the usage found in src/lib/deploy.ts and test scripts).

In `@agents/hermes/plugin/__init__.py`:
- Around line 217-220: The call to _reload_skills() during session startup
currently ignores failures; update the startup flow that calls _reload_skills()
to handle errors explicitly by checking its return/exception: if
_reload_skills() returns a failure flag or raises, log a clear error via the
session/process logger (including error details), and abort or mark startup as
failed (e.g., raise/return a non-success status) so missing skills aren't
silent; modify the caller around the _reload_skills() invocation to perform this
check and take the appropriate failure action.
- Around line 125-133: The catch-all except in the skill reload path (inside the
try that imports agent.skill_commands and calls sc.scan_skill_commands) swallows
errors; change the broad except Exception to capture the exception as a variable
and log full details (e.g., use logger.exception or process_logger.error with
the exception) before returning None so root causes (API/runtime/metadata
issues) are visible; then update the startup hook that calls _reload_skills() to
check its return value (from _reload_skills or scan_skill_commands) and if None
log a clear error and fail fast or raise so session initialization does not
silently continue with missing skills—refer to symbols sc._skill_commands,
sc.scan_skill_commands, _reload_skills and the startup hook call.

In `@docs/network-policy/customize-network-policy.md`:
- Around line 35-39: Replace the GitHub-style blockquote that starts with
"[!IMPORTANT]" and the following three lines with a MyST admonition using the
appropriate level (e.g., use ::: warning ... :::) so the callout is rendered
correctly; locate the block containing the exact text "Make static policy edits
on the host, not inside the sandbox." and wrap that content inside a MyST
admonition (::: warning) with the same message and closing ::: to comply with
docs/** admonition conventions.
- Around line 98-100: Update the documented examples for the openshell policy
set command to match the canonical command builder and tests by inserting the
--wait flag between the policy file and the sandbox name (i.e., change
occurrences of "openshell policy set --policy <policy-file> <sandbox-name>" to
include "--wait" before the sandbox name); ensure both documented examples are
updated so they exactly mirror the behavior expected by the command builder and
test suite.

In `@nemoclaw/src/blueprint/runner.test.ts`:
- Around line 433-483: The test "does not leak parent secrets into subprocess
env" depends on ambient PATH/HOME; make it deterministic by saving
prevPath/prevHome, set process.env.PATH and process.env.HOME to explicit values
(e.g. "/usr/bin" and "/home/test") before calling actionApply, assert
subEnv.PATH and subEnv.HOME equal those exact values, and restore
prevPath/prevHome in the finally block; update references around actionApply,
mockExeca providerCall and subEnv to use these explicit values.

In `@scripts/docs-to-skills.py`:
- Around line 450-451: The code strips URL fragments into path_no_frag from
raw_path but never reattaches them when constructing fallback/rewrite links,
dropping anchors; change the logic around raw_path/path_no_frag to capture the
fragment (e.g., split once into base_path and frag = raw_path.partition("#")[2]
if any), use base_path/path_no_frag for extension checks and resolution
(functions referenced as path_no_frag and raw_path in this diff), and when
building the rewritten or fallback link append the fragment back (e.g., add
"#"+frag only if frag is non-empty) so anchors like foo.md#bar are preserved;
apply the same fragment-preservation change to the other occurrences noted
around the path handling lines (the similar use at lines 458 and 471).

In `@src/lib/model-prompts.ts`:
- Around line 141-149: The prompt input (variable choice from deps.promptFn)
needs to be trimmed before being inspected or used for defaults so
whitespace-only answers don't bypass fallbacks; call trim() on the returned
string, pass the trimmed value into deps.getNavigationChoiceFn, check navigation
against "back"/"exit" using the trimmed value, and when computing index use
parseInt(trimmedChoice || String(defaultListChoice), 10) - 1 (same change for
the similar occurrence around line 169) so empty or whitespace responses
correctly fall back to defaultListChoice and navigation handling.

In `@src/lib/onboard.ts`:
- Around line 582-591: The API_KEY_PREFIXES array is including model-id prefixes
('gpt-' and 'gemini-') which causes model names to be misclassified as secrets;
update the API_KEY_PREFIXES constant to remove "gpt-" and "gemini-" (keep only
actual secret prefixes), and ensure the token-detection logic that computes
looksLikeToken (the prefix check and the regex/no-space length checks) is
applied only for real secret patterns—i.e., rely on the existing validator
(credentialEnv/validateNvidiaApiKeyValue) when available or tighten the prefix
set so model IDs won't match; adjust the API_KEY_PREFIXES and the uses in
looksLikeToken accordingly.

In `@src/lib/skill-install.ts`:
- Around line 286-309: The mirror step can fail (mirrorFailed) but postInstall
still reports success; update the post-install logic so that if any mirror
operation fails (detect via the mirrorFailed flag set in the paths.mirrorDir
loop that uses collectFiles and sshExec) the final result returned by
postInstall is not { success: true } — instead set success: false (or include an
explicit error) and add a corresponding error message to messages so callers
know the skill may be unloadable; ensure the change references the existing
mirrorFailed variable and the postInstall return path so the failure is
propagated to callers.
- Around line 59-71: The current parseFrontmatter implementation validates and
normalizes nameValue but returns only { name: nameValue }, discarding other
frontmatter fields; update parseFrontmatter to preserve and return the full
parsed frontmatter object (the fm variable) with the validated/trimmed name
assigned back to fm.name (or typed equivalent) so all other metadata is retained
while enforcing the name validation in functions like parseFrontmatter and where
fm is used.

In `@src/lib/tiers.ts`:
- Around line 1-18: Remove the top-line //@ts-nocheck and convert CommonJS
requires to ES6 typed imports: replace const fs = require("fs"), path =
require("path"), YAML = require("yaml"), and { ROOT } = require("./runner") with
respective import statements (e.g., import * as fs from "fs"; import * as path
from "path"; import YAML from "yaml"; import { ROOT } from "./runner";) and add
appropriate TypeScript types for any variables or function signatures in this
module (e.g., typed return shapes for functions that parse YAML and typed
parameters for tier resolver functions) so the file compiles under
tsconfig.cli.json and leverages type-checking.

In `@test/e2e/test-sandbox-operations.sh`:
- Around line 569-605: The probe code hardcodes port 18789 (in the node -e
strings sent by sandbox_exec_for for probe_a and probe_b) so it ignores
NEMOCLAW_DASHBOARD_PORT overrides; update both probes to interpolate the shell
variable instead of 18789 (e.g. replace ':18789' with
':${NEMOCLAW_DASHBOARD_PORT:-18789}') so the correct dashboard port is probed
while still defaulting to 18789 when the env var is unset; adjust the strings
inside the sandbox_exec_for calls (the node -e "...${SANDBOX_B}..." and
"...${SANDBOX_A}..." commands) to include this variable.

In `@test/onboard.test.ts`:
- Around line 1000-1010: The test calling spawnSync(nodeExec, [scriptPath], ...)
can hang if startGateway() fails to exit; add a timeout to the spawnSync options
(e.g., timeout: 5000) and assert that the child either exited with a non-zero
status or was killed by the timeout so the test fails fast; update the options
object passed to spawnSync and add an assertion checking result.status !== 0 ||
result.signal === 'SIGTERM' (or result.error.code === 'ETIMEDOUT') to ensure the
gateway-failure regression fails quickly if process.exit is no longer called.

In `@uninstall.sh`:
- Around line 260-268: The current port variable _dp (set from
NEMOCLAW_DASHBOARD_PORT) can be a low-value like 1 or 80 and will be
interpolated into the pgrep pattern; update the guard around _dp to enforce the
same valid port range as src/lib/ports.ts (numeric and between 1024 and 65535)
and otherwise reset _dp to the default 18789 before constructing the pgrep -f
"openshell.*forward.*${_dp}" query; specifically, replace the permissive case
check with an explicit numeric-and-range check for _dp (or add one immediately
after local _dp=...) so only valid ports are used when calling pgrep and
potential unrelated processes are not matched.

---

Outside diff comments:
In `@scripts/docs-to-skills.py`:
- Around line 984-989: The code is appending a blank separator to prereq_items
before checking for duplicates, leaving orphan blanks when the current bullet is
a duplicate; in the block handling prerequisites (variables prereq_items,
seen_prereqs, norm, stripped) change the order so you compute norm and check if
norm is not in seen_prereqs (and add it to seen_prereqs) before you append any
blank separator or the stripped line, or conditionally append the blank only
when you will also append the new item — this ensures no blank lines are added
for duplicate bullets.

In `@src/lib/onboard.ts`:
- Around line 3061-3080: The interactive branch is re-prompting despite copying
NEMOCLAW_PROVIDER_KEY into process.env[credentialEnv]; modify the logic so that
after setting process.env[credentialEnv] (via _providerKeyHint) you check if
process.env[credentialEnv] is already populated and skip calling
ensureNamedCredential when it is. In short: in the block guarded by
isNonInteractive() === false, add a conditional that bypasses
ensureNamedCredential if process.env[credentialEnv] exists (using the same
credentialEnv/_providerKeyHint variables and remoteConfig for context) so
existing env-backed keys are honored in interactive runs.
- Around line 4521-4543: The resume and non-interactive flows collapse presets
to plain names and call policies.applyPreset(...) without the per-preset access
mode, which loses restrictive access settings; update the resume loop (where
`chosen` is processed and `policies.applyPreset(sandboxName, name)` is called)
to carry the preset object or access flag and call
policies.applyPreset(sandboxName, presetName, { access }) (or pass the preset
object) instead of only the name; likewise change the tier seeding logic that
builds `suggestions` from tiers.resolveTierPresets(tierName).map(p => p.name) so
it preserves each preset's access (e.g., keep the objects or map to { name,
access }) and ensure any later call sites that iterate `suggestions` use the
access when invoking policies.applyPreset; also make the same fix in the other
instances noted (around lines referenced) so resumed/CI and non-interactive
installs retain the original preset access.

---

Nitpick comments:
In `@scripts/debug.sh`:
- Around line 328-331: The script silently coerces invalid
NEMOCLAW_DASHBOARD_PORT into the default 18789 (variable _dp) which hides
misconfiguration; update the logic around _dp validation in scripts/debug.sh so
that when the env value is empty or non-numeric or out of range you emit a clear
warning (e.g., to stderr) stating the provided NEMOCLAW_DASHBOARD_PORT and that
it is being reset to 18789, then proceed with setting _dp and calling collect
"lsof-dashboard" lsof -i ":${_dp}". Ensure the warning is emitted before the
fallback assignment so it occurs for all invalid cases handled by the existing
checks.

In `@scripts/dev-tier-selector.js`:
- Around line 14-15: The header comment in scripts/dev-tier-selector.js
references the outdated test file name "test/policy-tiers-onboard.test.js";
update that comment to reference the correct current test filename
"test/policy-tiers-onboard.test.ts" so the documentation matches the repository
naming convention.

In `@scripts/lib/runtime.sh`:
- Around line 255-264: The duplicated port validation in
get_local_provider_base_url and check_local_provider_health should be extracted
into a single helper: create a function (e.g. _validate_local_provider_ports)
that reads NEMOCLAW_VLLM_PORT/NEMOCLAW_OLLAMA_PORT, calls _validate_port for
each and returns the validated port values (or non-zero on failure); then
replace the inline validation in get_local_provider_base_url and
check_local_provider_health with calls to this helper and use the returned ports
(vllm_port/ollama_port) to build the base URL or perform health checks, keeping
original behavior and error returns.

In `@scripts/nemoclaw-start.sh`:
- Around line 158-176: Introduce a single canonical default and error message
and reuse them instead of repeating magic values: add a constant like
_DASHBOARD_PORT_DEFAULT=18789 and an ERROR_MSG variable containing the
"[SECURITY] Invalid NEMOCLAW_DASHBOARD_PORT='...'" text, then replace the
literal 18789 assignment to _DASHBOARD_PORT and both echo lines that reference
the hardcoded message to use those variables; update CHAT_UI_URL and PUBLIC_PORT
to derive from _DASHBOARD_PORT as before and ensure you still trim/validate
NEMOCLAW_DASHBOARD_PORT via _DASHBOARD_PORT_RAW/_DASHBOARD_PORT and the numeric
range checks.

In `@src/lib/debug.ts`:
- Line 383: The artifact name "lsof-18789" is hard-coded and doesn't reflect
overrides of DASHBOARD_PORT; update the call to collect (the invocation using
collectDir, "lsof-18789", "lsof", ["-i", `:${DASHBOARD_PORT}`]) to generate the
artifact name dynamically using the DASHBOARD_PORT variable (e.g.,
"lsof-${DASHBOARD_PORT}") so the collected file matches the probed port; adjust
any surrounding code that references the static name if present.

In `@test/e2e/test-messaging-providers.sh`:
- Around line 102-121: Rename the misleading function sandbox_exec_stdin to a
name that reflects its behavior (e.g., sandbox_exec_quiet); update the function
definition (previously sandbox_exec_stdin) and all call sites to the new name,
including the callers that actually pipe stdin (the uses like "printf '%s'
\"$TOKEN\" | sandbox_exec_stdin 'grep ...'") so those pipelines still work, and
update the function comment to explain it suppresses stderr, hides args from the
outer process list, and still accepts a command as a positional argument. Ensure
no other behavior or arguments change.

In `@test/onboard-selection.test.ts`:
- Around line 2197-2264: The test currently only records prompt text via
credentials.prompt and must also capture the prompt options to assert secret
input is enforced; change the mocked credentials.prompt in the test around
setupNim to accept both (message, options) and push an object like {message,
options} into messages (instead of just the string), update uses of messages in
assertions to inspect message.message and message.options, and add an assertion
that the entry matching CREDENTIAL_RETRY_PROMPT_RE has options.secret === true
(retain existing checks for the plaintext prompt text and other assertions such
as setupNim, CREDENTIAL_RETRY_PROMPT_RE, and NVIDIA prompt matching).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 243135c3-c296-41ed-816b-a854e13c91eb

📥 Commits

Reviewing files that changed from the base of the PR and between 2e3f84c and ca4466f.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (80)
  • .agents/skills/nemoclaw-contributor-update-docs/SKILL.md
  • .agents/skills/nemoclaw-maintainer-cut-release-tag/SKILL.md
  • .agents/skills/nemoclaw-user-configure-inference/SKILL.md
  • .agents/skills/nemoclaw-user-get-started/SKILL.md
  • .agents/skills/nemoclaw-user-get-started/references/windows-setup.md
  • .agents/skills/nemoclaw-user-manage-policy/SKILL.md
  • .agents/skills/nemoclaw-user-reference/references/commands.md
  • .agents/skills/nemoclaw-user-reference/references/troubleshooting.md
  • .agents/skills/nemoclaw-user-workspace/SKILL.md
  • .agents/skills/nemoclaw-user-workspace/references/workspace-files.md
  • .github/workflows/nightly-e2e.yaml
  • .github/workflows/sandbox-images-and-e2e.yaml
  • Dockerfile
  • Dockerfile.base
  • agents/hermes/plugin/__init__.py
  • agents/hermes/plugin/plugin.yaml
  • bin/lib/ports.js
  • bin/lib/tiers.js
  • docs/get-started/windows-setup.md
  • docs/index.md
  • docs/inference/switch-inference-providers.md
  • docs/inference/use-local-inference.md
  • docs/network-policy/customize-network-policy.md
  • docs/reference/commands.md
  • docs/reference/network-policies.md
  • docs/reference/troubleshooting.md
  • docs/workspace/backup-restore.md
  • docs/workspace/workspace-files.md
  • nemoclaw-blueprint/blueprint.yaml
  • nemoclaw-blueprint/policies/openclaw-sandbox.yaml
  • nemoclaw-blueprint/policies/tiers.yaml
  • nemoclaw/src/blueprint/runner.test.ts
  • nemoclaw/src/blueprint/runner.ts
  • nemoclaw/src/lib/ports.test.ts
  • nemoclaw/src/lib/ports.ts
  • nemoclaw/src/lib/subprocess-env.ts
  • scripts/backup-workspace.sh
  • scripts/debug.sh
  • scripts/dev-tier-selector.js
  • scripts/docs-to-skills.py
  • scripts/lib/runtime.sh
  • scripts/nemoclaw-start.sh
  • src/lib/agent-defs.ts
  • src/lib/agent-onboard.ts
  • src/lib/agent-runtime.ts
  • src/lib/dashboard.test.ts
  • src/lib/dashboard.ts
  • src/lib/debug.ts
  • src/lib/http-probe.test.ts
  • src/lib/http-probe.ts
  • src/lib/local-inference.ts
  • src/lib/model-prompts.ts
  • src/lib/nim.ts
  • src/lib/onboard.ts
  • src/lib/policies.ts
  • src/lib/ports.test.ts
  • src/lib/ports.ts
  • src/lib/preflight.ts
  • src/lib/registry.ts
  • src/lib/services.ts
  • src/lib/skill-install.test.ts
  • src/lib/skill-install.ts
  • src/lib/subprocess-env.ts
  • src/lib/tiers.ts
  • src/lib/uninstall-command.test.ts
  • src/lib/uninstall-command.ts
  • src/lib/validation.test.ts
  • src/lib/validation.ts
  • src/nemoclaw.ts
  • test/credential-exposure.test.ts
  • test/e2e-port-overrides.sh
  • test/e2e/test-messaging-providers.sh
  • test/e2e/test-sandbox-operations.sh
  • test/e2e/test-sandbox-survival.sh
  • test/onboard-selection.test.ts
  • test/onboard.test.ts
  • test/policy-tiers-onboard.test.ts
  • test/policy-tiers.test.ts
  • test/runner.test.ts
  • uninstall.sh
✅ Files skipped from review due to trivial changes (21)
  • docs/reference/network-policies.md
  • bin/lib/ports.js
  • .agents/skills/nemoclaw-user-get-started/SKILL.md
  • nemoclaw-blueprint/blueprint.yaml
  • Dockerfile
  • src/lib/validation.test.ts
  • docs/inference/switch-inference-providers.md
  • docs/workspace/backup-restore.md
  • test/runner.test.ts
  • .agents/skills/nemoclaw-user-get-started/references/windows-setup.md
  • src/lib/policies.ts
  • nemoclaw-blueprint/policies/openclaw-sandbox.yaml
  • agents/hermes/plugin/plugin.yaml
  • src/lib/dashboard.test.ts
  • docs/reference/troubleshooting.md
  • docs/get-started/windows-setup.md
  • .agents/skills/nemoclaw-user-configure-inference/SKILL.md
  • .agents/skills/nemoclaw-contributor-update-docs/SKILL.md
  • nemoclaw-blueprint/policies/tiers.yaml
  • .agents/skills/nemoclaw-user-reference/references/troubleshooting.md
  • docs/inference/use-local-inference.md

Comment thread .agents/skills/nemoclaw-user-workspace/SKILL.md
- The OpenShell CLI on your `PATH`.
- The sandbox name (shown by `nemoclaw list`).
1. **Let the agent do it** — Ask your agent to update its persona, memory, or user context.
2. **Edit manually** — Use `openshell sandbox shell` to open a terminal inside the sandbox and edit files directly, or use `openshell sandbox upload` to push edited files from your host.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if 'shell' is a recognized openshell sandbox subcommand
# by searching for its definition or usage in the codebase

echo "=== Searching for 'sandbox shell' in codebase ==="
rg -n "sandbox shell" --type-add 'code:*.{ts,js,sh,md}' -t code

echo ""
echo "=== Searching for openshell subcommand definitions ==="
rg -n "subcommand.*shell|shell.*subcommand" --type-add 'code:*.{ts,js}' -t code

echo ""
echo "=== Checking openshell sandbox command references ==="
rg -n "openshell sandbox (connect|shell)" --type-add 'code:*.{ts,js,sh,md}' -t code | head -30

Repository: NVIDIA/NemoClaw

Length of output: 2520


Update source documentation to use connect instead of shell.

The documentation references openshell sandbox shell, but the codebase consistently uses openshell sandbox connect (see src/lib/deploy.ts, test files, and scripts). There is no evidence that shell is a valid subcommand. Since this is a generated skill file, update the source in docs/workspace/workspace-files.md line 78 to use openshell sandbox connect instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/skills/nemoclaw-user-workspace/SKILL.md at line 68, The doc
references the non-existent subcommand "openshell sandbox shell"; update the
generated skill doc and source documentation to use the correct subcommand
"openshell sandbox connect" instead—specifically edit SKILL.md (the skill doc
line shown) and the workspace documentation entry (the workspace-files.md
document) to replace "openshell sandbox shell" with "openshell sandbox connect",
and scan for any remaining occurrences of "openshell sandbox shell" to ensure
consistency with the codebase (e.g., matching the usage found in
src/lib/deploy.ts and test scripts).

Comment on lines +125 to +133
try:
import agent.skill_commands as sc

sc._skill_commands.clear()
return sc.scan_skill_commands()
except ImportError:
return None
except Exception:
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify broad exception handling and whether failures are observable.
rg -n -C3 '_reload_skills|except Exception|logger\.(exception|error|warning)' --type=py

Repository: NVIDIA/NemoClaw

Length of output: 28978


🏁 Script executed:

cat -n agents/hermes/plugin/__init__.py | head -250 | tail -150

Repository: NVIDIA/NemoClaw

Length of output: 6880


🏁 Script executed:

head -30 agents/hermes/plugin/__init__.py

Repository: NVIDIA/NemoClaw

Length of output: 1139


🏁 Script executed:

rg -n "except Exception" agents/hermes/plugin/__init__.py -A 1

Repository: NVIDIA/NemoClaw

Length of output: 330


Avoid blind exception swallowing in skill reload path; startup failures are silent.

Line 132 catches all exceptions and returns None without logging, hiding root causes (API drift, runtime errors, bad skill metadata) and making reload failures opaque. Additionally, line 219 calls _reload_skills() in the startup hook without checking the result, so failures during session initialization are completely silent while the banner implies skills are ready.

Proposed fix
+import logging
+
+logger = logging.getLogger(__name__)
+
 def _reload_skills():
     """Clear the Hermes skill slash-command cache and re-scan skill directories.
     ...
     """
     try:
         import agent.skill_commands as sc

         sc._skill_commands.clear()
         return sc.scan_skill_commands()
     except ImportError:
         return None
-    except Exception:
+    except Exception as exc:
+        logger.exception("Skill reload failed: %s", exc)
         return None

And check the result at startup:

     def _on_session_start(**kwargs):
         # Refresh skill cache so skills installed since last session are
         # immediately available as slash commands.
-        _reload_skills()
+        if _reload_skills() is None:
+            logger.warning("Failed to reload skills on session start")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
import agent.skill_commands as sc
sc._skill_commands.clear()
return sc.scan_skill_commands()
except ImportError:
return None
except Exception:
return None
try:
import agent.skill_commands as sc
sc._skill_commands.clear()
return sc.scan_skill_commands()
except ImportError:
return None
except Exception as exc:
logger.exception("Skill reload failed: %s", exc)
return None
🧰 Tools
🪛 Ruff (0.15.10)

[warning] 132-132: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/hermes/plugin/__init__.py` around lines 125 - 133, The catch-all
except in the skill reload path (inside the try that imports
agent.skill_commands and calls sc.scan_skill_commands) swallows errors; change
the broad except Exception to capture the exception as a variable and log full
details (e.g., use logger.exception or process_logger.error with the exception)
before returning None so root causes (API/runtime/metadata issues) are visible;
then update the startup hook that calls _reload_skills() to check its return
value (from _reload_skills or scan_skill_commands) and if None log a clear error
and fail fast or raise so session initialization does not silently continue with
missing skills—refer to symbols sc._skill_commands, sc.scan_skill_commands,
_reload_skills and the startup hook call.

Comment on lines +217 to +220
# Refresh skill cache so skills installed since last session are
# immediately available as slash commands.
_reload_skills()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle reload failure explicitly during session start.

Line 219 ignores _reload_skills() result. If reload fails, startup still looks healthy and the missing skills are silent.

Proposed fix
-        _reload_skills()
+        commands = _reload_skills()
+        if commands is None:
+            ctx.inject_message(
+                "Warning: skill cache refresh failed; new skills may be unavailable.",
+                role="system",
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Refresh skill cache so skills installed since last session are
# immediately available as slash commands.
_reload_skills()
# Refresh skill cache so skills installed since last session are
# immediately available as slash commands.
commands = _reload_skills()
if commands is None:
ctx.inject_message(
"Warning: skill cache refresh failed; new skills may be unavailable.",
role="system",
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agents/hermes/plugin/__init__.py` around lines 217 - 220, The call to
_reload_skills() during session startup currently ignores failures; update the
startup flow that calls _reload_skills() to handle errors explicitly by checking
its return/exception: if _reload_skills() returns a failure flag or raises, log
a clear error via the session/process logger (including error details), and
abort or mark startup as failed (e.g., raise/return a non-success status) so
missing skills aren't silent; modify the caller around the _reload_skills()
invocation to perform this check and take the appropriate failure action.

Comment on lines +35 to +39
> [!IMPORTANT]
> Make static policy edits on the host, not inside the sandbox.
> The sandbox image is intentionally minimal and may not include editors or package-management tools.
> Changes made only inside the sandbox are also ephemeral and are lost when the sandbox is recreated.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use MyST admonition syntax for this callout.

Line 35 currently uses GitHub-style blockquote admonition. In docs pages, this should be a MyST admonition block.

Suggested doc fix
-> [!IMPORTANT]
-> Make static policy edits on the host, not inside the sandbox.
-> The sandbox image is intentionally minimal and may not include editors or package-management tools.
-> Changes made only inside the sandbox are also ephemeral and are lost when the sandbox is recreated.
+:::{warning}
+Make static policy edits on the host, not inside the sandbox.
+The sandbox image is intentionally minimal and may not include editors or package-management tools.
+Changes made only inside the sandbox are ephemeral and are lost when the sandbox is recreated.
+:::

As per coding guidelines: docs/** ... Use MyST admonitions (:::{tip}, :::{note}, :::{warning}) for callouts, not bold text or blockquotes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/network-policy/customize-network-policy.md` around lines 35 - 39,
Replace the GitHub-style blockquote that starts with "[!IMPORTANT]" and the
following three lines with a MyST admonition using the appropriate level (e.g.,
use ::: warning ... :::) so the callout is rendered correctly; locate the block
containing the exact text "Make static policy edits on the host, not inside the
sandbox." and wrap that content inside a MyST admonition (::: warning) with the
same message and closing ::: to comply with docs/** admonition conventions.

Comment thread src/lib/skill-install.ts
Comment on lines +286 to +309
if (paths.mirrorDir) {
const { files } = collectFiles(localSkillDir);
let mirrorFailed = false;
for (const rel of files) {
const content = fs.readFileSync(path.join(localSkillDir, rel));
const mirrorSubdir = rel.includes("/")
? `${paths.mirrorDir}/${path.dirname(rel)}`
: paths.mirrorDir;
const mirrorFile = `${paths.mirrorDir}/${rel}`;
// mirrorDir contains $HOME which must expand, so we use double
// quotes (not shellQuote). Safe because validateRelativePath
// restricts filenames to [A-Za-z0-9._-/] before we reach here.
const result = sshExec(
ctx,
`mkdir -p "${mirrorSubdir}" && cat > "${mirrorFile}"`,
{ input: content },
);
if (!result || result.status !== 0) {
mirrorFailed = true;
}
}
if (mirrorFailed) {
messages.push("Warning: failed to mirror some files to $HOME/.openclaw/skills/");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not report success when the OpenClaw mirror step fails.

The comment here says OpenClaw resolves skills from $HOME/.openclaw/skills/. If that copy fails, postInstall() still returns { success: true }, so callers can report a successful install even though the skill may not be loadable.

Suggested change
       if (mirrorFailed) {
-        messages.push("Warning: failed to mirror some files to $HOME/.openclaw/skills/");
+        messages.push("Failed to mirror files to $HOME/.openclaw/skills/");
+        return { success: false, messages };
       }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (paths.mirrorDir) {
const { files } = collectFiles(localSkillDir);
let mirrorFailed = false;
for (const rel of files) {
const content = fs.readFileSync(path.join(localSkillDir, rel));
const mirrorSubdir = rel.includes("/")
? `${paths.mirrorDir}/${path.dirname(rel)}`
: paths.mirrorDir;
const mirrorFile = `${paths.mirrorDir}/${rel}`;
// mirrorDir contains $HOME which must expand, so we use double
// quotes (not shellQuote). Safe because validateRelativePath
// restricts filenames to [A-Za-z0-9._-/] before we reach here.
const result = sshExec(
ctx,
`mkdir -p "${mirrorSubdir}" && cat > "${mirrorFile}"`,
{ input: content },
);
if (!result || result.status !== 0) {
mirrorFailed = true;
}
}
if (mirrorFailed) {
messages.push("Warning: failed to mirror some files to $HOME/.openclaw/skills/");
}
if (paths.mirrorDir) {
const { files } = collectFiles(localSkillDir);
let mirrorFailed = false;
for (const rel of files) {
const content = fs.readFileSync(path.join(localSkillDir, rel));
const mirrorSubdir = rel.includes("/")
? `${paths.mirrorDir}/${path.dirname(rel)}`
: paths.mirrorDir;
const mirrorFile = `${paths.mirrorDir}/${rel}`;
// mirrorDir contains $HOME which must expand, so we use double
// quotes (not shellQuote). Safe because validateRelativePath
// restricts filenames to [A-Za-z0-9._-/] before we reach here.
const result = sshExec(
ctx,
`mkdir -p "${mirrorSubdir}" && cat > "${mirrorFile}"`,
{ input: content },
);
if (!result || result.status !== 0) {
mirrorFailed = true;
}
}
if (mirrorFailed) {
messages.push("Failed to mirror files to $HOME/.openclaw/skills/");
return { success: false, messages };
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/skill-install.ts` around lines 286 - 309, The mirror step can fail
(mirrorFailed) but postInstall still reports success; update the post-install
logic so that if any mirror operation fails (detect via the mirrorFailed flag
set in the paths.mirrorDir loop that uses collectFiles and sshExec) the final
result returned by postInstall is not { success: true } — instead set success:
false (or include an explicit error) and add a corresponding error message to
messages so callers know the skill may be unloadable; ensure the change
references the existing mirrorFailed variable and the postInstall return path so
the failure is propagated to callers.

Comment thread src/lib/tiers.ts
Comment on lines +1 to +18
// @ts-nocheck
// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Tier management — load tier definitions and resolve preset selections.
//
// Tiers are defined in nemoclaw-blueprint/policies/tiers.yaml.
// Each tier is a named posture (restricted, balanced, open) that maps to
// a set of policy presets and their default access levels.
//
// The base sandbox policy is always applied regardless of tier.
// Tiers layer additional presets on top of that baseline.

const fs = require("fs");
const path = require("path");
const YAML = require("yaml");
const { ROOT } = require("./runner");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if src/lib/tiers.ts exists and view its content
ls -la src/lib/tiers.ts 2>/dev/null && wc -l src/lib/tiers.ts

Repository: NVIDIA/NemoClaw

Length of output: 144


🏁 Script executed:

# Check if tsconfig.cli.json exists
ls -la tsconfig.cli.json 2>/dev/null && cat tsconfig.cli.json

Repository: NVIDIA/NemoClaw

Length of output: 655


🏁 Script executed:

# Search for any references to tsconfig.cli.json or CLI type-checking setup
rg "tsconfig.cli" --type ts --type json

Repository: NVIDIA/NemoClaw

Length of output: 120


🏁 Script executed:

# Check the full content of src/lib/tiers.ts
cat -n src/lib/tiers.ts 2>/dev/null || echo "File not found"

Repository: NVIDIA/NemoClaw

Length of output: 3283


Remove @ts-nocheck and convert to typed ES6 imports.

This src/lib module opts out of CLI type-checking and uses CommonJS require() for runtime imports, preventing TypeScript from validating the module dependencies and types. Convert to ES6 imports (import) and remove the @ts-nocheck directive to enable type-checking via tsconfig.cli.json. This aligns with the coding guideline: "Use TypeScript for CLI code in src/lib/ with type-checking via tsconfig.cli.json."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/tiers.ts` around lines 1 - 18, Remove the top-line //@ts-nocheck and
convert CommonJS requires to ES6 typed imports: replace const fs =
require("fs"), path = require("path"), YAML = require("yaml"), and { ROOT } =
require("./runner") with respective import statements (e.g., import * as fs from
"fs"; import * as path from "path"; import YAML from "yaml"; import { ROOT }
from "./runner";) and add appropriate TypeScript types for any variables or
function signatures in this module (e.g., typed return shapes for functions that
parse YAML and typed parameters for tier resolver functions) so the file
compiles under tsconfig.cli.json and leverages type-checking.

Comment on lines +569 to +605
# Use node (always available) instead of curl (removed by hardening).
# Isolation is enforced by the OpenShell proxy — blocked requests return
# HTTP 403. Connection errors (ENOTFOUND, ECONNREFUSED, TIMEOUT) also
# count as isolation. Only HTTP 200 would indicate a breach.
log " Testing: sandbox A cannot reach sandbox B by hostname..."
local probe_a
probe_a=$(sandbox_exec_for "$SANDBOX_A" "node -e \"
const http = require('http');
const req = http.get('http://${SANDBOX_B}:18789/', (res) => {
console.log('STATUS_' + res.statusCode);
res.resume();
});
req.on('error', (e) => console.log('ERROR: ' + e.message));
req.setTimeout(5000, () => { req.destroy(); console.log('TIMEOUT'); });
\"" 2>&1) || true

if [[ -z "$probe_a" ]]; then
fail "TC-SBX-11: Isolation (A→B)" "Empty response — SSH or infrastructure failure"
elif echo "$probe_a" | grep -qiE "STATUS_403|ERROR|TIMEOUT"; then
pass "TC-SBX-11: Sandbox A cannot reach sandbox B ($(echo "$probe_a" | grep -oE 'STATUS_[0-9]+|ERROR|TIMEOUT' | head -1))"
elif echo "$probe_a" | grep -qE "STATUS_[0-9]+"; then
fail "TC-SBX-11: Isolation (A→B)" "Sandbox A reached sandbox B ($(echo "$probe_a" | grep -oE 'STATUS_[0-9]+' | head -1))"
else
fail "TC-SBX-11: Isolation (A→B)" "Unexpected probe output: $(echo "$probe_a" | head -3)"
fi

log " Testing reverse: sandbox B cannot reach sandbox A..."
local probe_b
probe_b=$(sandbox_exec_for "$SANDBOX_B" "node -e \"
const http = require('http');
const req = http.get('http://${SANDBOX_A}:18789/', (res) => {
console.log('STATUS_' + res.statusCode);
res.resume();
});
req.on('error', (e) => console.log('ERROR: ' + e.message));
req.setTimeout(5000, () => { req.destroy(); console.log('TIMEOUT'); });
\"" 2>&1) || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Respect NEMOCLAW_DASHBOARD_PORT in the isolation probes.

These requests are still pinned to 18789, so TC-SBX-11 can pass on ECONNREFUSED/TIMEOUT while never exercising the real dashboard port when a port override is configured. That turns the isolation check into a false positive for the override path.

Suggested fix
+  local dashboard_port="${NEMOCLAW_DASHBOARD_PORT:-18789}"
   log "  Testing: sandbox A cannot reach sandbox B by hostname..."
   local probe_a
   probe_a=$(sandbox_exec_for "$SANDBOX_A" "node -e \"
 const http = require('http');
-const req = http.get('http://${SANDBOX_B}:18789/', (res) => {
+const req = http.get('http://${SANDBOX_B}:${dashboard_port}/', (res) => {
   console.log('STATUS_' + res.statusCode);
   res.resume();
 });
 req.on('error', (e) => console.log('ERROR: ' + e.message));
 req.setTimeout(5000, () => { req.destroy(); console.log('TIMEOUT'); });
 \"" 2>&1) || true
@@
   log "  Testing reverse: sandbox B cannot reach sandbox A..."
   local probe_b
   probe_b=$(sandbox_exec_for "$SANDBOX_B" "node -e \"
 const http = require('http');
-const req = http.get('http://${SANDBOX_A}:18789/', (res) => {
+const req = http.get('http://${SANDBOX_A}:${dashboard_port}/', (res) => {
   console.log('STATUS_' + res.statusCode);
   res.resume();
 });
 req.on('error', (e) => console.log('ERROR: ' + e.message));
 req.setTimeout(5000, () => { req.destroy(); console.log('TIMEOUT'); });
 \"" 2>&1) || true
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Use node (always available) instead of curl (removed by hardening).
# Isolation is enforced by the OpenShell proxy — blocked requests return
# HTTP 403. Connection errors (ENOTFOUND, ECONNREFUSED, TIMEOUT) also
# count as isolation. Only HTTP 200 would indicate a breach.
log " Testing: sandbox A cannot reach sandbox B by hostname..."
local probe_a
probe_a=$(sandbox_exec_for "$SANDBOX_A" "node -e \"
const http = require('http');
const req = http.get('http://${SANDBOX_B}:18789/', (res) => {
console.log('STATUS_' + res.statusCode);
res.resume();
});
req.on('error', (e) => console.log('ERROR: ' + e.message));
req.setTimeout(5000, () => { req.destroy(); console.log('TIMEOUT'); });
\"" 2>&1) || true
if [[ -z "$probe_a" ]]; then
fail "TC-SBX-11: Isolation (A→B)" "Empty response — SSH or infrastructure failure"
elif echo "$probe_a" | grep -qiE "STATUS_403|ERROR|TIMEOUT"; then
pass "TC-SBX-11: Sandbox A cannot reach sandbox B ($(echo "$probe_a" | grep -oE 'STATUS_[0-9]+|ERROR|TIMEOUT' | head -1))"
elif echo "$probe_a" | grep -qE "STATUS_[0-9]+"; then
fail "TC-SBX-11: Isolation (A→B)" "Sandbox A reached sandbox B ($(echo "$probe_a" | grep -oE 'STATUS_[0-9]+' | head -1))"
else
fail "TC-SBX-11: Isolation (A→B)" "Unexpected probe output: $(echo "$probe_a" | head -3)"
fi
log " Testing reverse: sandbox B cannot reach sandbox A..."
local probe_b
probe_b=$(sandbox_exec_for "$SANDBOX_B" "node -e \"
const http = require('http');
const req = http.get('http://${SANDBOX_A}:18789/', (res) => {
console.log('STATUS_' + res.statusCode);
res.resume();
});
req.on('error', (e) => console.log('ERROR: ' + e.message));
req.setTimeout(5000, () => { req.destroy(); console.log('TIMEOUT'); });
\"" 2>&1) || true
# Use node (always available) instead of curl (removed by hardening).
# Isolation is enforced by the OpenShell proxy — blocked requests return
# HTTP 403. Connection errors (ENOTFOUND, ECONNREFUSED, TIMEOUT) also
# count as isolation. Only HTTP 200 would indicate a breach.
local dashboard_port="${NEMOCLAW_DASHBOARD_PORT:-18789}"
log " Testing: sandbox A cannot reach sandbox B by hostname..."
local probe_a
probe_a=$(sandbox_exec_for "$SANDBOX_A" "node -e \"
const http = require('http');
const req = http.get('http://${SANDBOX_B}:${dashboard_port}/', (res) => {
console.log('STATUS_' + res.statusCode);
res.resume();
});
req.on('error', (e) => console.log('ERROR: ' + e.message));
req.setTimeout(5000, () => { req.destroy(); console.log('TIMEOUT'); });
\"" 2>&1) || true
if [[ -z "$probe_a" ]]; then
fail "TC-SBX-11: Isolation (A→B)" "Empty response — SSH or infrastructure failure"
elif echo "$probe_a" | grep -qiE "STATUS_403|ERROR|TIMEOUT"; then
pass "TC-SBX-11: Sandbox A cannot reach sandbox B ($(echo "$probe_a" | grep -oE 'STATUS_[0-9]+|ERROR|TIMEOUT' | head -1))"
elif echo "$probe_a" | grep -qE "STATUS_[0-9]+"; then
fail "TC-SBX-11: Isolation (A→B)" "Sandbox A reached sandbox B ($(echo "$probe_a" | grep -oE 'STATUS_[0-9]+' | head -1))"
else
fail "TC-SBX-11: Isolation (A→B)" "Unexpected probe output: $(echo "$probe_a" | head -3)"
fi
log " Testing reverse: sandbox B cannot reach sandbox A..."
local probe_b
probe_b=$(sandbox_exec_for "$SANDBOX_B" "node -e \"
const http = require('http');
const req = http.get('http://${SANDBOX_A}:${dashboard_port}/', (res) => {
console.log('STATUS_' + res.statusCode);
res.resume();
});
req.on('error', (e) => console.log('ERROR: ' + e.message));
req.setTimeout(5000, () => { req.destroy(); console.log('TIMEOUT'); });
\"" 2>&1) || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-sandbox-operations.sh` around lines 569 - 605, The probe code
hardcodes port 18789 (in the node -e strings sent by sandbox_exec_for for
probe_a and probe_b) so it ignores NEMOCLAW_DASHBOARD_PORT overrides; update
both probes to interpolate the shell variable instead of 18789 (e.g. replace
':18789' with ':${NEMOCLAW_DASHBOARD_PORT:-18789}') so the correct dashboard
port is probed while still defaulting to 18789 when the env var is unset; adjust
the strings inside the sandbox_exec_for calls (the node -e "...${SANDBOX_B}..."
and "...${SANDBOX_A}..." commands) to include this variable.

Comment thread test/onboard.test.ts
Comment on lines +1000 to +1010
const result = spawnSync(nodeExec, [scriptPath], {
cwd: repoRoot,
encoding: "utf-8",
env: {
...process.env,
HOME: tmpDir,
PATH: `${fakeBin}:${process.env.PATH || ""}`,
NEMOCLAW_HEALTH_POLL_COUNT: "0",
NEMOCLAW_NON_INTERACTIVE: "1",
},
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and its size
wc -l test/onboard.test.ts

Repository: NVIDIA/NemoClaw

Length of output: 84


🏁 Script executed:

# Check the code at lines 1000-1010
sed -n '1000,1010p' test/onboard.test.ts

Repository: NVIDIA/NemoClaw

Length of output: 369


🏁 Script executed:

# Search for any existing timeout handling in this test file
rg -n "timeout|ETIMEDOUT" test/onboard.test.ts

Repository: NVIDIA/NemoClaw

Length of output: 887


🏁 Script executed:

# Check if there's any spawnSync usage elsewhere in the test files to see patterns
rg -n "spawnSync" test/ -B 2 -A 5

Repository: NVIDIA/NemoClaw

Length of output: 50371


🏁 Script executed:

sed -n '990,1015p' test/onboard.test.ts

Repository: NVIDIA/NemoClaw

Length of output: 830


🏁 Script executed:

sed -n '960,1000p' test/onboard.test.ts

Repository: NVIDIA/NemoClaw

Length of output: 1395


🏁 Script executed:

sed -n '920,950p' test/onboard.test.ts

Repository: NVIDIA/NemoClaw

Length of output: 1395


Add timeout to prevent hanging in gateway failure regression test.

This test verifies startGateway() exits cleanly on failure. If the underlying code stops calling process.exit(1), the spawnSync() call will hang indefinitely. Add a timeout and assertion to fail fast instead.

Suggested change
     const result = spawnSync(nodeExec, [scriptPath], {
       cwd: repoRoot,
       encoding: "utf-8",
+      timeout: 15000,
       env: {
         ...process.env,
         HOME: tmpDir,
         PATH: `${fakeBin}:${process.env.PATH || ""}`,
         NEMOCLAW_HEALTH_POLL_COUNT: "0",
         NEMOCLAW_NON_INTERACTIVE: "1",
       },
     });
+
+    assert.notEqual(result.error?.code, "ETIMEDOUT", "gateway failure regression test timed out");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const result = spawnSync(nodeExec, [scriptPath], {
cwd: repoRoot,
encoding: "utf-8",
env: {
...process.env,
HOME: tmpDir,
PATH: `${fakeBin}:${process.env.PATH || ""}`,
NEMOCLAW_HEALTH_POLL_COUNT: "0",
NEMOCLAW_NON_INTERACTIVE: "1",
},
});
const result = spawnSync(nodeExec, [scriptPath], {
cwd: repoRoot,
encoding: "utf-8",
timeout: 15000,
env: {
...process.env,
HOME: tmpDir,
PATH: `${fakeBin}:${process.env.PATH || ""}`,
NEMOCLAW_HEALTH_POLL_COUNT: "0",
NEMOCLAW_NON_INTERACTIVE: "1",
},
});
assert.notEqual(result.error?.code, "ETIMEDOUT", "gateway failure regression test timed out");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/onboard.test.ts` around lines 1000 - 1010, The test calling
spawnSync(nodeExec, [scriptPath], ...) can hang if startGateway() fails to exit;
add a timeout to the spawnSync options (e.g., timeout: 5000) and assert that the
child either exited with a non-zero status or was killed by the timeout so the
test fails fast; update the options object passed to spawnSync and add an
assertion checking result.status !== 0 || result.signal === 'SIGTERM' (or
result.error.code === 'ETIMEDOUT') to ensure the gateway-failure regression
fails quickly if process.exit is no longer called.

Comment thread uninstall.sh
Comment on lines +260 to +268
local _dp="${NEMOCLAW_DASHBOARD_PORT:-18789}"
case "$_dp" in *[!0-9]* | '') _dp=18789 ;; esac

local -a pids=()
local pid
while IFS= read -r pid; do
[ -n "$pid" ] || continue
pids+=("$pid")
done < <(pgrep -f 'openshell.*forward.*18789' 2>/dev/null || true)
done < <(pgrep -f "openshell.*forward.*${_dp}" 2>/dev/null || true)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate the dashboard port range before matching PIDs.

NEMOCLAW_DASHBOARD_PORT=1 or 80 currently passes this guard and becomes part of the pgrep -f pattern, which can match and kill unrelated openshell forward processes. src/lib/ports.ts already treats only 1024..65535 as valid, so this cleanup path should reject the same bad values instead of using them.

Suggested fix
   local _dp="${NEMOCLAW_DASHBOARD_PORT:-18789}"
   case "$_dp" in *[!0-9]* | '') _dp=18789 ;; esac
+  if [ "$_dp" -lt 1024 ] || [ "$_dp" -gt 65535 ]; then
+    warn "Invalid NEMOCLAW_DASHBOARD_PORT='${NEMOCLAW_DASHBOARD_PORT:-}'; falling back to 18789."
+    _dp=18789
+  fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
local _dp="${NEMOCLAW_DASHBOARD_PORT:-18789}"
case "$_dp" in *[!0-9]* | '') _dp=18789 ;; esac
local -a pids=()
local pid
while IFS= read -r pid; do
[ -n "$pid" ] || continue
pids+=("$pid")
done < <(pgrep -f 'openshell.*forward.*18789' 2>/dev/null || true)
done < <(pgrep -f "openshell.*forward.*${_dp}" 2>/dev/null || true)
local _dp="${NEMOCLAW_DASHBOARD_PORT:-18789}"
case "$_dp" in *[!0-9]* | '') _dp=18789 ;; esac
if [ "$_dp" -lt 1024 ] || [ "$_dp" -gt 65535 ]; then
warn "Invalid NEMOCLAW_DASHBOARD_PORT='${NEMOCLAW_DASHBOARD_PORT:-}'; falling back to 18789."
_dp=18789
fi
local -a pids=()
local pid
while IFS= read -r pid; do
[ -n "$pid" ] || continue
pids+=("$pid")
done < <(pgrep -f "openshell.*forward.*${_dp}" 2>/dev/null || true)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uninstall.sh` around lines 260 - 268, The current port variable _dp (set from
NEMOCLAW_DASHBOARD_PORT) can be a low-value like 1 or 80 and will be
interpolated into the pgrep pattern; update the guard around _dp to enforce the
same valid port range as src/lib/ports.ts (numeric and between 1024 and 65535)
and otherwise reset _dp to the default 18789 before constructing the pgrep -f
"openshell.*forward.*${_dp}" query; specifically, replace the permissive case
check with an explicit numeric-and-range check for _dp (or add one immediately
after local _dp=...) so only valid ports are used when calling pgrep and
potential unrelated processes are not matched.

@ericksoa
Copy link
Copy Markdown
Contributor

Hey @ColinM-sys — the checksum fix itself looks good, but this branch has accumulated 31 commits from main (merges over time), which means the diff shows 5500+ lines across 82 files instead of just your actual change. That makes it impossible to review meaningfully.

Could you rebase onto current main so the diff only shows your NodeSource checksum fix? Something like:

git fetch upstream main
git rebase upstream/main
git push --force-with-lease

That should drop it back down to a clean, reviewable diff. We're ready to merge once the PR shows just the fix. Thanks!

…as root (NVIDIA#576)

The NodeSource setup script was piped directly from curl into `sudo bash`
without any integrity verification (CWE-494). This adds SHA-256 hash
verification following the same pattern already used for nvm in
scripts/install.sh: download to temp file, compute hash, compare against
pinned expected value, and only execute if they match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ColinM-sys ColinM-sys force-pushed the fix/576-nodesource-integrity-check branch from ca4466f to 3b09542 Compare April 15, 2026 05:57
@ColinM-sys
Copy link
Copy Markdown
Contributor Author

Thanks Aaron — rebased onto current main. Should be a clean diff now showing just the NodeSource checksum fix.

cv and others added 2 commits April 15, 2026 08:21
Signed-off-by: Test User <test@example.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash verified against live NodeSource URL, shfmt formatting fixed. Low regression risk — only affects Brev CI setup, not user-facing code.

@ericksoa ericksoa merged commit 0ad68aa into NVIDIA:main Apr 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working fix priority: high Important issue that should be resolved in the next release security Something isn't secure v0.0.16 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CRITICAL: curl | sudo bash Without Integrity Check (NodeSource)

4 participants