fix(security): verify checksum before executing NodeSource installer as root (#576)#1814
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughReplaces piping NodeSource's remote installer into bash with downloading the installer to a temp file, verifying its SHA-256 against a pinned Changes
Sequence Diagram(s)sequenceDiagram
participant Script as "Launch script\n(bash)"
participant Remote as "deb.nodesource.com\n(HTTP(S))"
participant Temp as "Temp file\n(mktemp)"
participant SHA as "sha256sum / shasum"
participant Root as "sudo bash\n(execute)"
Script->>Remote: curl -fsSL https://deb.nodesource.com/setup_22.x -o temp
Remote-->>Script: installer script bytes
Script->>SHA: compute hash of temp
SHA-->>Script: actual_hash
alt hashes match
Script->>Root: sudo -E bash temp
Root-->>Script: installer executed
Script->>Temp: rm -f temp
else mismatch
Script->>Temp: rm -f temp
Script-->>Script: exit/fail with error
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
✨ Possibly related open issues: |
d36de93 to
ca4466f
Compare
There was a problem hiding this comment.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
scripts/docs-to-skills.py (1)
984-989:⚠️ Potential issue | 🟡 MinorAvoid orphan blank lines when deduplicating prerequisites.
On Line 984–985, a blank line is appended before deduplication. If the current bullet is a duplicate, the blank line remains without a corresponding item.
Suggested fix
- if stripped.startswith("- "): - if prereq_items and not prereq_items[-1].startswith("- "): - prereq_items.append("") - norm = stripped.lower().strip("- .") - if norm not in seen_prereqs: - seen_prereqs.add(norm) - prereq_items.append(stripped) + if stripped.startswith("- "): + norm = stripped.lower().strip("- .") + if norm not in seen_prereqs: + if prereq_items and not prereq_items[-1].startswith("- "): + prereq_items.append("") + seen_prereqs.add(norm) + prereq_items.append(stripped)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/docs-to-skills.py` around lines 984 - 989, The code is appending a blank separator to prereq_items before checking for duplicates, leaving orphan blanks when the current bullet is a duplicate; in the block handling prerequisites (variables prereq_items, seen_prereqs, norm, stripped) change the order so you compute norm and check if norm is not in seen_prereqs (and add it to seen_prereqs) before you append any blank separator or the stripped line, or conditionally append the blank only when you will also append the new item — this ensures no blank lines are added for duplicate bullets.src/lib/onboard.ts (2)
3061-3080:⚠️ Potential issue | 🟡 MinorEnv-backed provider keys are still ignored in the interactive branch.
This block copies
NEMOCLAW_PROVIDER_KEYintoprocess.env[credentialEnv], but then immediately callsensureNamedCredential(...), which only checks saved credentials before prompting. Interactive runs therefore re-prompt even when the alias or the provider-specific env var is already present.Suggested fix
} else { - await ensureNamedCredential( - credentialEnv, - remoteConfig.label + " API key", - remoteConfig.helpUrl, - ); + const existingCredential = normalizeCredentialValue(process.env[credentialEnv]); + if (existingCredential) { + process.env[credentialEnv] = existingCredential; + } else { + await ensureNamedCredential( + credentialEnv, + remoteConfig.label + " API key", + remoteConfig.helpUrl, + ); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 3061 - 3080, The interactive branch is re-prompting despite copying NEMOCLAW_PROVIDER_KEY into process.env[credentialEnv]; modify the logic so that after setting process.env[credentialEnv] (via _providerKeyHint) you check if process.env[credentialEnv] is already populated and skip calling ensureNamedCredential when it is. In short: in the block guarded by isNonInteractive() === false, add a conditional that bypasses ensureNamedCredential if process.env[credentialEnv] exists (using the same credentialEnv/_providerKeyHint variables and remoteConfig for context) so existing env-backed keys are honored in interactive runs.
4521-4543:⚠️ Potential issue | 🟠 MajorPreserve preset
accessoutside the interactive picker.This change adds per-preset access modes, but the non-interactive and resume paths still collapse tier defaults to plain preset names and then call
applyPreset(...)without{ access }. A resumed onboard or CI run can therefore widen a restrictive tier back to the callee's default access, which is a security regression for the new tier model.Also applies to: 4581-4585, 4621-4629
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 4521 - 4543, The resume and non-interactive flows collapse presets to plain names and call policies.applyPreset(...) without the per-preset access mode, which loses restrictive access settings; update the resume loop (where `chosen` is processed and `policies.applyPreset(sandboxName, name)` is called) to carry the preset object or access flag and call policies.applyPreset(sandboxName, presetName, { access }) (or pass the preset object) instead of only the name; likewise change the tier seeding logic that builds `suggestions` from tiers.resolveTierPresets(tierName).map(p => p.name) so it preserves each preset's access (e.g., keep the objects or map to { name, access }) and ensure any later call sites that iterate `suggestions` use the access when invoking policies.applyPreset; also make the same fix in the other instances noted (around lines referenced) so resumed/CI and non-interactive installs retain the original preset access.
🧹 Nitpick comments (7)
src/lib/debug.ts (1)
383-383: Keep the artifact name aligned with the probed port.When
DASHBOARD_PORTis overridden, this still writeslsof-18789.txteven though the command inspects a different port. That makes the debug bundle misleading հենց when the override is what you're debugging.Suggested fix
- collect(collectDir, "lsof-18789", "lsof", ["-i", `:${DASHBOARD_PORT}`]); + collect(collectDir, `lsof-${DASHBOARD_PORT}`, "lsof", ["-i", `:${DASHBOARD_PORT}`]);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/debug.ts` at line 383, The artifact name "lsof-18789" is hard-coded and doesn't reflect overrides of DASHBOARD_PORT; update the call to collect (the invocation using collectDir, "lsof-18789", "lsof", ["-i", `:${DASHBOARD_PORT}`]) to generate the artifact name dynamically using the DASHBOARD_PORT variable (e.g., "lsof-${DASHBOARD_PORT}") so the collected file matches the probed port; adjust any surrounding code that references the static name if present.scripts/debug.sh (1)
328-331: Consider warning when falling back to default dashboard port.Right now invalid
NEMOCLAW_DASHBOARD_PORTvalues are silently coerced to18789, which can make diagnostics misleading.Suggested patch
_dp="$(printf '%s' "${NEMOCLAW_DASHBOARD_PORT:-18789}" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')" case "$_dp" in *[!0-9]* | '') _dp=18789 ;; esac -[ "$_dp" -ge 1024 ] && [ "$_dp" -le 65535 ] 2>/dev/null || _dp=18789 +if ! ([ "$_dp" -ge 1024 ] && [ "$_dp" -le 65535 ]); then + warn "Invalid NEMOCLAW_DASHBOARD_PORT='${NEMOCLAW_DASHBOARD_PORT:-}' — defaulting to 18789" + _dp=18789 +fi collect "lsof-dashboard" lsof -i ":${_dp}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/debug.sh` around lines 328 - 331, The script silently coerces invalid NEMOCLAW_DASHBOARD_PORT into the default 18789 (variable _dp) which hides misconfiguration; update the logic around _dp validation in scripts/debug.sh so that when the env value is empty or non-numeric or out of range you emit a clear warning (e.g., to stderr) stating the provided NEMOCLAW_DASHBOARD_PORT and that it is being reset to 18789, then proceed with setting _dp and calling collect "lsof-dashboard" lsof -i ":${_dp}". Ensure the warning is emitted before the fallback assignment so it occurs for all invalid cases handled by the existing checks.scripts/nemoclaw-start.sh (1)
158-176: Optional: deduplicate the default port and error string in this block.This would reduce drift risk if the allowed range/default ever changes again.
Possible cleanup
+_DASHBOARD_PORT_DEFAULT=18789 +_PORT_ERR="must be an integer between 1024 and 65535" _DASHBOARD_PORT_RAW="${NEMOCLAW_DASHBOARD_PORT:-}" if [ -z "$_DASHBOARD_PORT_RAW" ]; then - _DASHBOARD_PORT=18789 + _DASHBOARD_PORT=$_DASHBOARD_PORT_DEFAULT else _DASHBOARD_PORT="$(printf '%s' "$_DASHBOARD_PORT_RAW" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')" case "$_DASHBOARD_PORT" in *[!0-9]* | '') - echo "[SECURITY] Invalid NEMOCLAW_DASHBOARD_PORT='${NEMOCLAW_DASHBOARD_PORT}' — must be an integer between 1024 and 65535" >&2 + echo "[SECURITY] Invalid NEMOCLAW_DASHBOARD_PORT='${NEMOCLAW_DASHBOARD_PORT}' — ${_PORT_ERR}" >&2 exit 1 ;; esac if [ "$_DASHBOARD_PORT" -lt 1024 ] || [ "$_DASHBOARD_PORT" -gt 65535 ]; then - echo "[SECURITY] Invalid NEMOCLAW_DASHBOARD_PORT='${NEMOCLAW_DASHBOARD_PORT}' — must be an integer between 1024 and 65535" >&2 + echo "[SECURITY] Invalid NEMOCLAW_DASHBOARD_PORT='${NEMOCLAW_DASHBOARD_PORT}' — ${_PORT_ERR}" >&2 exit 1 fi fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/nemoclaw-start.sh` around lines 158 - 176, Introduce a single canonical default and error message and reuse them instead of repeating magic values: add a constant like _DASHBOARD_PORT_DEFAULT=18789 and an ERROR_MSG variable containing the "[SECURITY] Invalid NEMOCLAW_DASHBOARD_PORT='...'" text, then replace the literal 18789 assignment to _DASHBOARD_PORT and both echo lines that reference the hardcoded message to use those variables; update CHAT_UI_URL and PUBLIC_PORT to derive from _DASHBOARD_PORT as before and ensure you still trim/validate NEMOCLAW_DASHBOARD_PORT via _DASHBOARD_PORT_RAW/_DASHBOARD_PORT and the numeric range checks.scripts/dev-tier-selector.js (1)
14-15: Update the referenced test filename in the header comment.The comment points to
test/policy-tiers-onboard.test.js; current test naming in this repo appears to use.test.ts. Aligning the path avoids confusion.Suggested patch
-// of this flow see test/policy-tiers-onboard.test.js. +// of this flow see test/policy-tiers-onboard.test.ts.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/dev-tier-selector.js` around lines 14 - 15, The header comment in scripts/dev-tier-selector.js references the outdated test file name "test/policy-tiers-onboard.test.js"; update that comment to reference the correct current test filename "test/policy-tiers-onboard.test.ts" so the documentation matches the repository naming convention.test/onboard-selection.test.ts (1)
2197-2264: Validatesecret: truein the pasted-key retry test.The test name says the flow re-prompts securely, but it only checks prompt text. Capture prompt options and assert secret input is enforced.
Suggested patch
-const answers = ["1", "", "nvapi-fake-key-value", "nvapi-good", ""]; -const messages = []; +const answers = ["1", "", "nvapi-fake-key-value", "nvapi-good", ""]; +const messages = []; +const prompts = []; -credentials.prompt = async (message) => { +credentials.prompt = async (message, opts = {}) => { messages.push(message); + prompts.push({ message, secret: opts.secret === true }); return answers.shift() || ""; }; @@ - originalLog(JSON.stringify({ result, messages, lines, key: process.env.NVIDIA_API_KEY })); + originalLog(JSON.stringify({ result, messages, prompts, lines, key: process.env.NVIDIA_API_KEY })); @@ assert.ok(payload.messages.some((message) => CREDENTIAL_RETRY_PROMPT_RE.test(message))); assert.ok(payload.messages.some((message) => /NVIDIA Endpoints API key: /.test(message))); + const retryPrompt = payload.prompts.find((entry) => CREDENTIAL_RETRY_PROMPT_RE.test(entry.message)); + assert.equal(retryPrompt?.secret, true); + const apiKeyPrompt = payload.prompts.find((entry) => /NVIDIA Endpoints API key: /.test(entry.message)); + assert.equal(apiKeyPrompt?.secret, true);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/onboard-selection.test.ts` around lines 2197 - 2264, The test currently only records prompt text via credentials.prompt and must also capture the prompt options to assert secret input is enforced; change the mocked credentials.prompt in the test around setupNim to accept both (message, options) and push an object like {message, options} into messages (instead of just the string), update uses of messages in assertions to inspect message.message and message.options, and add an assertion that the entry matching CREDENTIAL_RETRY_PROMPT_RE has options.secret === true (retain existing checks for the plaintext prompt text and other assertions such as setupNim, CREDENTIAL_RETRY_PROMPT_RE, and NVIDIA prompt matching).scripts/lib/runtime.sh (1)
255-264: Consider extracting port validation to avoid duplication.Port validation is performed identically in both
get_local_provider_base_urlandcheck_local_provider_health. If these functions are frequently called together, the redundant validation adds overhead.♻️ Optional: Extract validated ports to a shared helper
If both functions are always called in the same context, consider a helper that validates once:
+_get_validated_ports() { + local vllm_port="${NEMOCLAW_VLLM_PORT:-8000}" + local ollama_port="${NEMOCLAW_OLLAMA_PORT:-11434}" + _validate_port NEMOCLAW_VLLM_PORT "$vllm_port" || return 1 + _validate_port NEMOCLAW_OLLAMA_PORT "$ollama_port" || return 1 + printf '%s %s\n' "$vllm_port" "$ollama_port" +}However, if these functions are called independently in different scripts, the current approach is safer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/lib/runtime.sh` around lines 255 - 264, The duplicated port validation in get_local_provider_base_url and check_local_provider_health should be extracted into a single helper: create a function (e.g. _validate_local_provider_ports) that reads NEMOCLAW_VLLM_PORT/NEMOCLAW_OLLAMA_PORT, calls _validate_port for each and returns the validated port values (or non-zero on failure); then replace the inline validation in get_local_provider_base_url and check_local_provider_health with calls to this helper and use the returned ports (vllm_port/ollama_port) to build the base URL or perform health checks, keeping original behavior and error returns.test/e2e/test-messaging-providers.sh (1)
102-121: Function namesandbox_exec_stdinis misleading.The function name implies the command is passed via stdin, but the command is actually passed as a positional argument to SSH (line 116). Only the callers at lines 337 and 374 actually pipe data into the command via stdin (using
printf '%s' "$TOKEN" | sandbox_exec_stdin "grep ...").Consider renaming to
sandbox_exec_quietor similar to reflect that it suppresses stderr and avoids exposing args in the outer process list, rather than implying stdin-based command passing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-messaging-providers.sh` around lines 102 - 121, Rename the misleading function sandbox_exec_stdin to a name that reflects its behavior (e.g., sandbox_exec_quiet); update the function definition (previously sandbox_exec_stdin) and all call sites to the new name, including the callers that actually pipe stdin (the uses like "printf '%s' \"$TOKEN\" | sandbox_exec_stdin 'grep ...'") so those pipelines still work, and update the function comment to explain it suppresses stderr, hides args from the outer process list, and still accepts a command as a positional argument. Ensure no other behavior or arguments change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/skills/nemoclaw-user-workspace/SKILL.md:
- Line 3: Update the description front-matter entry that currently reads "Hows
to back up and restore..." to "How to back up and restore..." in the source
documentation that generates SKILL.md (the description key in the generated
file); edit the source docs entry under the docs source so the regenerated
.agents/skills/nemoclaw-user-workspace/SKILL.md contains the corrected wording.
- Line 68: The doc references the non-existent subcommand "openshell sandbox
shell"; update the generated skill doc and source documentation to use the
correct subcommand "openshell sandbox connect" instead—specifically edit
SKILL.md (the skill doc line shown) and the workspace documentation entry (the
workspace-files.md document) to replace "openshell sandbox shell" with
"openshell sandbox connect", and scan for any remaining occurrences of
"openshell sandbox shell" to ensure consistency with the codebase (e.g.,
matching the usage found in src/lib/deploy.ts and test scripts).
In `@agents/hermes/plugin/__init__.py`:
- Around line 217-220: The call to _reload_skills() during session startup
currently ignores failures; update the startup flow that calls _reload_skills()
to handle errors explicitly by checking its return/exception: if
_reload_skills() returns a failure flag or raises, log a clear error via the
session/process logger (including error details), and abort or mark startup as
failed (e.g., raise/return a non-success status) so missing skills aren't
silent; modify the caller around the _reload_skills() invocation to perform this
check and take the appropriate failure action.
- Around line 125-133: The catch-all except in the skill reload path (inside the
try that imports agent.skill_commands and calls sc.scan_skill_commands) swallows
errors; change the broad except Exception to capture the exception as a variable
and log full details (e.g., use logger.exception or process_logger.error with
the exception) before returning None so root causes (API/runtime/metadata
issues) are visible; then update the startup hook that calls _reload_skills() to
check its return value (from _reload_skills or scan_skill_commands) and if None
log a clear error and fail fast or raise so session initialization does not
silently continue with missing skills—refer to symbols sc._skill_commands,
sc.scan_skill_commands, _reload_skills and the startup hook call.
In `@docs/network-policy/customize-network-policy.md`:
- Around line 35-39: Replace the GitHub-style blockquote that starts with
"[!IMPORTANT]" and the following three lines with a MyST admonition using the
appropriate level (e.g., use ::: warning ... :::) so the callout is rendered
correctly; locate the block containing the exact text "Make static policy edits
on the host, not inside the sandbox." and wrap that content inside a MyST
admonition (::: warning) with the same message and closing ::: to comply with
docs/** admonition conventions.
- Around line 98-100: Update the documented examples for the openshell policy
set command to match the canonical command builder and tests by inserting the
--wait flag between the policy file and the sandbox name (i.e., change
occurrences of "openshell policy set --policy <policy-file> <sandbox-name>" to
include "--wait" before the sandbox name); ensure both documented examples are
updated so they exactly mirror the behavior expected by the command builder and
test suite.
In `@nemoclaw/src/blueprint/runner.test.ts`:
- Around line 433-483: The test "does not leak parent secrets into subprocess
env" depends on ambient PATH/HOME; make it deterministic by saving
prevPath/prevHome, set process.env.PATH and process.env.HOME to explicit values
(e.g. "/usr/bin" and "/home/test") before calling actionApply, assert
subEnv.PATH and subEnv.HOME equal those exact values, and restore
prevPath/prevHome in the finally block; update references around actionApply,
mockExeca providerCall and subEnv to use these explicit values.
In `@scripts/docs-to-skills.py`:
- Around line 450-451: The code strips URL fragments into path_no_frag from
raw_path but never reattaches them when constructing fallback/rewrite links,
dropping anchors; change the logic around raw_path/path_no_frag to capture the
fragment (e.g., split once into base_path and frag = raw_path.partition("#")[2]
if any), use base_path/path_no_frag for extension checks and resolution
(functions referenced as path_no_frag and raw_path in this diff), and when
building the rewritten or fallback link append the fragment back (e.g., add
"#"+frag only if frag is non-empty) so anchors like foo.md#bar are preserved;
apply the same fragment-preservation change to the other occurrences noted
around the path handling lines (the similar use at lines 458 and 471).
In `@src/lib/model-prompts.ts`:
- Around line 141-149: The prompt input (variable choice from deps.promptFn)
needs to be trimmed before being inspected or used for defaults so
whitespace-only answers don't bypass fallbacks; call trim() on the returned
string, pass the trimmed value into deps.getNavigationChoiceFn, check navigation
against "back"/"exit" using the trimmed value, and when computing index use
parseInt(trimmedChoice || String(defaultListChoice), 10) - 1 (same change for
the similar occurrence around line 169) so empty or whitespace responses
correctly fall back to defaultListChoice and navigation handling.
In `@src/lib/onboard.ts`:
- Around line 582-591: The API_KEY_PREFIXES array is including model-id prefixes
('gpt-' and 'gemini-') which causes model names to be misclassified as secrets;
update the API_KEY_PREFIXES constant to remove "gpt-" and "gemini-" (keep only
actual secret prefixes), and ensure the token-detection logic that computes
looksLikeToken (the prefix check and the regex/no-space length checks) is
applied only for real secret patterns—i.e., rely on the existing validator
(credentialEnv/validateNvidiaApiKeyValue) when available or tighten the prefix
set so model IDs won't match; adjust the API_KEY_PREFIXES and the uses in
looksLikeToken accordingly.
In `@src/lib/skill-install.ts`:
- Around line 286-309: The mirror step can fail (mirrorFailed) but postInstall
still reports success; update the post-install logic so that if any mirror
operation fails (detect via the mirrorFailed flag set in the paths.mirrorDir
loop that uses collectFiles and sshExec) the final result returned by
postInstall is not { success: true } — instead set success: false (or include an
explicit error) and add a corresponding error message to messages so callers
know the skill may be unloadable; ensure the change references the existing
mirrorFailed variable and the postInstall return path so the failure is
propagated to callers.
- Around line 59-71: The current parseFrontmatter implementation validates and
normalizes nameValue but returns only { name: nameValue }, discarding other
frontmatter fields; update parseFrontmatter to preserve and return the full
parsed frontmatter object (the fm variable) with the validated/trimmed name
assigned back to fm.name (or typed equivalent) so all other metadata is retained
while enforcing the name validation in functions like parseFrontmatter and where
fm is used.
In `@src/lib/tiers.ts`:
- Around line 1-18: Remove the top-line //@ts-nocheck and convert CommonJS
requires to ES6 typed imports: replace const fs = require("fs"), path =
require("path"), YAML = require("yaml"), and { ROOT } = require("./runner") with
respective import statements (e.g., import * as fs from "fs"; import * as path
from "path"; import YAML from "yaml"; import { ROOT } from "./runner";) and add
appropriate TypeScript types for any variables or function signatures in this
module (e.g., typed return shapes for functions that parse YAML and typed
parameters for tier resolver functions) so the file compiles under
tsconfig.cli.json and leverages type-checking.
In `@test/e2e/test-sandbox-operations.sh`:
- Around line 569-605: The probe code hardcodes port 18789 (in the node -e
strings sent by sandbox_exec_for for probe_a and probe_b) so it ignores
NEMOCLAW_DASHBOARD_PORT overrides; update both probes to interpolate the shell
variable instead of 18789 (e.g. replace ':18789' with
':${NEMOCLAW_DASHBOARD_PORT:-18789}') so the correct dashboard port is probed
while still defaulting to 18789 when the env var is unset; adjust the strings
inside the sandbox_exec_for calls (the node -e "...${SANDBOX_B}..." and
"...${SANDBOX_A}..." commands) to include this variable.
In `@test/onboard.test.ts`:
- Around line 1000-1010: The test calling spawnSync(nodeExec, [scriptPath], ...)
can hang if startGateway() fails to exit; add a timeout to the spawnSync options
(e.g., timeout: 5000) and assert that the child either exited with a non-zero
status or was killed by the timeout so the test fails fast; update the options
object passed to spawnSync and add an assertion checking result.status !== 0 ||
result.signal === 'SIGTERM' (or result.error.code === 'ETIMEDOUT') to ensure the
gateway-failure regression fails quickly if process.exit is no longer called.
In `@uninstall.sh`:
- Around line 260-268: The current port variable _dp (set from
NEMOCLAW_DASHBOARD_PORT) can be a low-value like 1 or 80 and will be
interpolated into the pgrep pattern; update the guard around _dp to enforce the
same valid port range as src/lib/ports.ts (numeric and between 1024 and 65535)
and otherwise reset _dp to the default 18789 before constructing the pgrep -f
"openshell.*forward.*${_dp}" query; specifically, replace the permissive case
check with an explicit numeric-and-range check for _dp (or add one immediately
after local _dp=...) so only valid ports are used when calling pgrep and
potential unrelated processes are not matched.
---
Outside diff comments:
In `@scripts/docs-to-skills.py`:
- Around line 984-989: The code is appending a blank separator to prereq_items
before checking for duplicates, leaving orphan blanks when the current bullet is
a duplicate; in the block handling prerequisites (variables prereq_items,
seen_prereqs, norm, stripped) change the order so you compute norm and check if
norm is not in seen_prereqs (and add it to seen_prereqs) before you append any
blank separator or the stripped line, or conditionally append the blank only
when you will also append the new item — this ensures no blank lines are added
for duplicate bullets.
In `@src/lib/onboard.ts`:
- Around line 3061-3080: The interactive branch is re-prompting despite copying
NEMOCLAW_PROVIDER_KEY into process.env[credentialEnv]; modify the logic so that
after setting process.env[credentialEnv] (via _providerKeyHint) you check if
process.env[credentialEnv] is already populated and skip calling
ensureNamedCredential when it is. In short: in the block guarded by
isNonInteractive() === false, add a conditional that bypasses
ensureNamedCredential if process.env[credentialEnv] exists (using the same
credentialEnv/_providerKeyHint variables and remoteConfig for context) so
existing env-backed keys are honored in interactive runs.
- Around line 4521-4543: The resume and non-interactive flows collapse presets
to plain names and call policies.applyPreset(...) without the per-preset access
mode, which loses restrictive access settings; update the resume loop (where
`chosen` is processed and `policies.applyPreset(sandboxName, name)` is called)
to carry the preset object or access flag and call
policies.applyPreset(sandboxName, presetName, { access }) (or pass the preset
object) instead of only the name; likewise change the tier seeding logic that
builds `suggestions` from tiers.resolveTierPresets(tierName).map(p => p.name) so
it preserves each preset's access (e.g., keep the objects or map to { name,
access }) and ensure any later call sites that iterate `suggestions` use the
access when invoking policies.applyPreset; also make the same fix in the other
instances noted (around lines referenced) so resumed/CI and non-interactive
installs retain the original preset access.
---
Nitpick comments:
In `@scripts/debug.sh`:
- Around line 328-331: The script silently coerces invalid
NEMOCLAW_DASHBOARD_PORT into the default 18789 (variable _dp) which hides
misconfiguration; update the logic around _dp validation in scripts/debug.sh so
that when the env value is empty or non-numeric or out of range you emit a clear
warning (e.g., to stderr) stating the provided NEMOCLAW_DASHBOARD_PORT and that
it is being reset to 18789, then proceed with setting _dp and calling collect
"lsof-dashboard" lsof -i ":${_dp}". Ensure the warning is emitted before the
fallback assignment so it occurs for all invalid cases handled by the existing
checks.
In `@scripts/dev-tier-selector.js`:
- Around line 14-15: The header comment in scripts/dev-tier-selector.js
references the outdated test file name "test/policy-tiers-onboard.test.js";
update that comment to reference the correct current test filename
"test/policy-tiers-onboard.test.ts" so the documentation matches the repository
naming convention.
In `@scripts/lib/runtime.sh`:
- Around line 255-264: The duplicated port validation in
get_local_provider_base_url and check_local_provider_health should be extracted
into a single helper: create a function (e.g. _validate_local_provider_ports)
that reads NEMOCLAW_VLLM_PORT/NEMOCLAW_OLLAMA_PORT, calls _validate_port for
each and returns the validated port values (or non-zero on failure); then
replace the inline validation in get_local_provider_base_url and
check_local_provider_health with calls to this helper and use the returned ports
(vllm_port/ollama_port) to build the base URL or perform health checks, keeping
original behavior and error returns.
In `@scripts/nemoclaw-start.sh`:
- Around line 158-176: Introduce a single canonical default and error message
and reuse them instead of repeating magic values: add a constant like
_DASHBOARD_PORT_DEFAULT=18789 and an ERROR_MSG variable containing the
"[SECURITY] Invalid NEMOCLAW_DASHBOARD_PORT='...'" text, then replace the
literal 18789 assignment to _DASHBOARD_PORT and both echo lines that reference
the hardcoded message to use those variables; update CHAT_UI_URL and PUBLIC_PORT
to derive from _DASHBOARD_PORT as before and ensure you still trim/validate
NEMOCLAW_DASHBOARD_PORT via _DASHBOARD_PORT_RAW/_DASHBOARD_PORT and the numeric
range checks.
In `@src/lib/debug.ts`:
- Line 383: The artifact name "lsof-18789" is hard-coded and doesn't reflect
overrides of DASHBOARD_PORT; update the call to collect (the invocation using
collectDir, "lsof-18789", "lsof", ["-i", `:${DASHBOARD_PORT}`]) to generate the
artifact name dynamically using the DASHBOARD_PORT variable (e.g.,
"lsof-${DASHBOARD_PORT}") so the collected file matches the probed port; adjust
any surrounding code that references the static name if present.
In `@test/e2e/test-messaging-providers.sh`:
- Around line 102-121: Rename the misleading function sandbox_exec_stdin to a
name that reflects its behavior (e.g., sandbox_exec_quiet); update the function
definition (previously sandbox_exec_stdin) and all call sites to the new name,
including the callers that actually pipe stdin (the uses like "printf '%s'
\"$TOKEN\" | sandbox_exec_stdin 'grep ...'") so those pipelines still work, and
update the function comment to explain it suppresses stderr, hides args from the
outer process list, and still accepts a command as a positional argument. Ensure
no other behavior or arguments change.
In `@test/onboard-selection.test.ts`:
- Around line 2197-2264: The test currently only records prompt text via
credentials.prompt and must also capture the prompt options to assert secret
input is enforced; change the mocked credentials.prompt in the test around
setupNim to accept both (message, options) and push an object like {message,
options} into messages (instead of just the string), update uses of messages in
assertions to inspect message.message and message.options, and add an assertion
that the entry matching CREDENTIAL_RETRY_PROMPT_RE has options.secret === true
(retain existing checks for the plaintext prompt text and other assertions such
as setupNim, CREDENTIAL_RETRY_PROMPT_RE, and NVIDIA prompt matching).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 243135c3-c296-41ed-816b-a854e13c91eb
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (80)
.agents/skills/nemoclaw-contributor-update-docs/SKILL.md.agents/skills/nemoclaw-maintainer-cut-release-tag/SKILL.md.agents/skills/nemoclaw-user-configure-inference/SKILL.md.agents/skills/nemoclaw-user-get-started/SKILL.md.agents/skills/nemoclaw-user-get-started/references/windows-setup.md.agents/skills/nemoclaw-user-manage-policy/SKILL.md.agents/skills/nemoclaw-user-reference/references/commands.md.agents/skills/nemoclaw-user-reference/references/troubleshooting.md.agents/skills/nemoclaw-user-workspace/SKILL.md.agents/skills/nemoclaw-user-workspace/references/workspace-files.md.github/workflows/nightly-e2e.yaml.github/workflows/sandbox-images-and-e2e.yamlDockerfileDockerfile.baseagents/hermes/plugin/__init__.pyagents/hermes/plugin/plugin.yamlbin/lib/ports.jsbin/lib/tiers.jsdocs/get-started/windows-setup.mddocs/index.mddocs/inference/switch-inference-providers.mddocs/inference/use-local-inference.mddocs/network-policy/customize-network-policy.mddocs/reference/commands.mddocs/reference/network-policies.mddocs/reference/troubleshooting.mddocs/workspace/backup-restore.mddocs/workspace/workspace-files.mdnemoclaw-blueprint/blueprint.yamlnemoclaw-blueprint/policies/openclaw-sandbox.yamlnemoclaw-blueprint/policies/tiers.yamlnemoclaw/src/blueprint/runner.test.tsnemoclaw/src/blueprint/runner.tsnemoclaw/src/lib/ports.test.tsnemoclaw/src/lib/ports.tsnemoclaw/src/lib/subprocess-env.tsscripts/backup-workspace.shscripts/debug.shscripts/dev-tier-selector.jsscripts/docs-to-skills.pyscripts/lib/runtime.shscripts/nemoclaw-start.shsrc/lib/agent-defs.tssrc/lib/agent-onboard.tssrc/lib/agent-runtime.tssrc/lib/dashboard.test.tssrc/lib/dashboard.tssrc/lib/debug.tssrc/lib/http-probe.test.tssrc/lib/http-probe.tssrc/lib/local-inference.tssrc/lib/model-prompts.tssrc/lib/nim.tssrc/lib/onboard.tssrc/lib/policies.tssrc/lib/ports.test.tssrc/lib/ports.tssrc/lib/preflight.tssrc/lib/registry.tssrc/lib/services.tssrc/lib/skill-install.test.tssrc/lib/skill-install.tssrc/lib/subprocess-env.tssrc/lib/tiers.tssrc/lib/uninstall-command.test.tssrc/lib/uninstall-command.tssrc/lib/validation.test.tssrc/lib/validation.tssrc/nemoclaw.tstest/credential-exposure.test.tstest/e2e-port-overrides.shtest/e2e/test-messaging-providers.shtest/e2e/test-sandbox-operations.shtest/e2e/test-sandbox-survival.shtest/onboard-selection.test.tstest/onboard.test.tstest/policy-tiers-onboard.test.tstest/policy-tiers.test.tstest/runner.test.tsuninstall.sh
✅ Files skipped from review due to trivial changes (21)
- docs/reference/network-policies.md
- bin/lib/ports.js
- .agents/skills/nemoclaw-user-get-started/SKILL.md
- nemoclaw-blueprint/blueprint.yaml
- Dockerfile
- src/lib/validation.test.ts
- docs/inference/switch-inference-providers.md
- docs/workspace/backup-restore.md
- test/runner.test.ts
- .agents/skills/nemoclaw-user-get-started/references/windows-setup.md
- src/lib/policies.ts
- nemoclaw-blueprint/policies/openclaw-sandbox.yaml
- agents/hermes/plugin/plugin.yaml
- src/lib/dashboard.test.ts
- docs/reference/troubleshooting.md
- docs/get-started/windows-setup.md
- .agents/skills/nemoclaw-user-configure-inference/SKILL.md
- .agents/skills/nemoclaw-contributor-update-docs/SKILL.md
- nemoclaw-blueprint/policies/tiers.yaml
- .agents/skills/nemoclaw-user-reference/references/troubleshooting.md
- docs/inference/use-local-inference.md
| - The OpenShell CLI on your `PATH`. | ||
| - The sandbox name (shown by `nemoclaw list`). | ||
| 1. **Let the agent do it** — Ask your agent to update its persona, memory, or user context. | ||
| 2. **Edit manually** — Use `openshell sandbox shell` to open a terminal inside the sandbox and edit files directly, or use `openshell sandbox upload` to push edited files from your host. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if 'shell' is a recognized openshell sandbox subcommand
# by searching for its definition or usage in the codebase
echo "=== Searching for 'sandbox shell' in codebase ==="
rg -n "sandbox shell" --type-add 'code:*.{ts,js,sh,md}' -t code
echo ""
echo "=== Searching for openshell subcommand definitions ==="
rg -n "subcommand.*shell|shell.*subcommand" --type-add 'code:*.{ts,js}' -t code
echo ""
echo "=== Checking openshell sandbox command references ==="
rg -n "openshell sandbox (connect|shell)" --type-add 'code:*.{ts,js,sh,md}' -t code | head -30Repository: NVIDIA/NemoClaw
Length of output: 2520
Update source documentation to use connect instead of shell.
The documentation references openshell sandbox shell, but the codebase consistently uses openshell sandbox connect (see src/lib/deploy.ts, test files, and scripts). There is no evidence that shell is a valid subcommand. Since this is a generated skill file, update the source in docs/workspace/workspace-files.md line 78 to use openshell sandbox connect instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/skills/nemoclaw-user-workspace/SKILL.md at line 68, The doc
references the non-existent subcommand "openshell sandbox shell"; update the
generated skill doc and source documentation to use the correct subcommand
"openshell sandbox connect" instead—specifically edit SKILL.md (the skill doc
line shown) and the workspace documentation entry (the workspace-files.md
document) to replace "openshell sandbox shell" with "openshell sandbox connect",
and scan for any remaining occurrences of "openshell sandbox shell" to ensure
consistency with the codebase (e.g., matching the usage found in
src/lib/deploy.ts and test scripts).
| try: | ||
| import agent.skill_commands as sc | ||
|
|
||
| sc._skill_commands.clear() | ||
| return sc.scan_skill_commands() | ||
| except ImportError: | ||
| return None | ||
| except Exception: | ||
| return None |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify broad exception handling and whether failures are observable.
rg -n -C3 '_reload_skills|except Exception|logger\.(exception|error|warning)' --type=pyRepository: NVIDIA/NemoClaw
Length of output: 28978
🏁 Script executed:
cat -n agents/hermes/plugin/__init__.py | head -250 | tail -150Repository: NVIDIA/NemoClaw
Length of output: 6880
🏁 Script executed:
head -30 agents/hermes/plugin/__init__.pyRepository: NVIDIA/NemoClaw
Length of output: 1139
🏁 Script executed:
rg -n "except Exception" agents/hermes/plugin/__init__.py -A 1Repository: NVIDIA/NemoClaw
Length of output: 330
Avoid blind exception swallowing in skill reload path; startup failures are silent.
Line 132 catches all exceptions and returns None without logging, hiding root causes (API drift, runtime errors, bad skill metadata) and making reload failures opaque. Additionally, line 219 calls _reload_skills() in the startup hook without checking the result, so failures during session initialization are completely silent while the banner implies skills are ready.
Proposed fix
+import logging
+
+logger = logging.getLogger(__name__)
+
def _reload_skills():
"""Clear the Hermes skill slash-command cache and re-scan skill directories.
...
"""
try:
import agent.skill_commands as sc
sc._skill_commands.clear()
return sc.scan_skill_commands()
except ImportError:
return None
- except Exception:
+ except Exception as exc:
+ logger.exception("Skill reload failed: %s", exc)
return NoneAnd check the result at startup:
def _on_session_start(**kwargs):
# Refresh skill cache so skills installed since last session are
# immediately available as slash commands.
- _reload_skills()
+ if _reload_skills() is None:
+ logger.warning("Failed to reload skills on session start")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| import agent.skill_commands as sc | |
| sc._skill_commands.clear() | |
| return sc.scan_skill_commands() | |
| except ImportError: | |
| return None | |
| except Exception: | |
| return None | |
| try: | |
| import agent.skill_commands as sc | |
| sc._skill_commands.clear() | |
| return sc.scan_skill_commands() | |
| except ImportError: | |
| return None | |
| except Exception as exc: | |
| logger.exception("Skill reload failed: %s", exc) | |
| return None |
🧰 Tools
🪛 Ruff (0.15.10)
[warning] 132-132: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/hermes/plugin/__init__.py` around lines 125 - 133, The catch-all
except in the skill reload path (inside the try that imports
agent.skill_commands and calls sc.scan_skill_commands) swallows errors; change
the broad except Exception to capture the exception as a variable and log full
details (e.g., use logger.exception or process_logger.error with the exception)
before returning None so root causes (API/runtime/metadata issues) are visible;
then update the startup hook that calls _reload_skills() to check its return
value (from _reload_skills or scan_skill_commands) and if None log a clear error
and fail fast or raise so session initialization does not silently continue with
missing skills—refer to symbols sc._skill_commands, sc.scan_skill_commands,
_reload_skills and the startup hook call.
| # Refresh skill cache so skills installed since last session are | ||
| # immediately available as slash commands. | ||
| _reload_skills() | ||
|
|
There was a problem hiding this comment.
Handle reload failure explicitly during session start.
Line 219 ignores _reload_skills() result. If reload fails, startup still looks healthy and the missing skills are silent.
Proposed fix
- _reload_skills()
+ commands = _reload_skills()
+ if commands is None:
+ ctx.inject_message(
+ "Warning: skill cache refresh failed; new skills may be unavailable.",
+ role="system",
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Refresh skill cache so skills installed since last session are | |
| # immediately available as slash commands. | |
| _reload_skills() | |
| # Refresh skill cache so skills installed since last session are | |
| # immediately available as slash commands. | |
| commands = _reload_skills() | |
| if commands is None: | |
| ctx.inject_message( | |
| "Warning: skill cache refresh failed; new skills may be unavailable.", | |
| role="system", | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@agents/hermes/plugin/__init__.py` around lines 217 - 220, The call to
_reload_skills() during session startup currently ignores failures; update the
startup flow that calls _reload_skills() to handle errors explicitly by checking
its return/exception: if _reload_skills() returns a failure flag or raises, log
a clear error via the session/process logger (including error details), and
abort or mark startup as failed (e.g., raise/return a non-success status) so
missing skills aren't silent; modify the caller around the _reload_skills()
invocation to perform this check and take the appropriate failure action.
| > [!IMPORTANT] | ||
| > Make static policy edits on the host, not inside the sandbox. | ||
| > The sandbox image is intentionally minimal and may not include editors or package-management tools. | ||
| > Changes made only inside the sandbox are also ephemeral and are lost when the sandbox is recreated. | ||
|
|
There was a problem hiding this comment.
Use MyST admonition syntax for this callout.
Line 35 currently uses GitHub-style blockquote admonition. In docs pages, this should be a MyST admonition block.
Suggested doc fix
-> [!IMPORTANT]
-> Make static policy edits on the host, not inside the sandbox.
-> The sandbox image is intentionally minimal and may not include editors or package-management tools.
-> Changes made only inside the sandbox are also ephemeral and are lost when the sandbox is recreated.
+:::{warning}
+Make static policy edits on the host, not inside the sandbox.
+The sandbox image is intentionally minimal and may not include editors or package-management tools.
+Changes made only inside the sandbox are ephemeral and are lost when the sandbox is recreated.
+:::As per coding guidelines: docs/** ... Use MyST admonitions (:::{tip}, :::{note}, :::{warning}) for callouts, not bold text or blockquotes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/network-policy/customize-network-policy.md` around lines 35 - 39,
Replace the GitHub-style blockquote that starts with "[!IMPORTANT]" and the
following three lines with a MyST admonition using the appropriate level (e.g.,
use ::: warning ... :::) so the callout is rendered correctly; locate the block
containing the exact text "Make static policy edits on the host, not inside the
sandbox." and wrap that content inside a MyST admonition (::: warning) with the
same message and closing ::: to comply with docs/** admonition conventions.
| if (paths.mirrorDir) { | ||
| const { files } = collectFiles(localSkillDir); | ||
| let mirrorFailed = false; | ||
| for (const rel of files) { | ||
| const content = fs.readFileSync(path.join(localSkillDir, rel)); | ||
| const mirrorSubdir = rel.includes("/") | ||
| ? `${paths.mirrorDir}/${path.dirname(rel)}` | ||
| : paths.mirrorDir; | ||
| const mirrorFile = `${paths.mirrorDir}/${rel}`; | ||
| // mirrorDir contains $HOME which must expand, so we use double | ||
| // quotes (not shellQuote). Safe because validateRelativePath | ||
| // restricts filenames to [A-Za-z0-9._-/] before we reach here. | ||
| const result = sshExec( | ||
| ctx, | ||
| `mkdir -p "${mirrorSubdir}" && cat > "${mirrorFile}"`, | ||
| { input: content }, | ||
| ); | ||
| if (!result || result.status !== 0) { | ||
| mirrorFailed = true; | ||
| } | ||
| } | ||
| if (mirrorFailed) { | ||
| messages.push("Warning: failed to mirror some files to $HOME/.openclaw/skills/"); | ||
| } |
There was a problem hiding this comment.
Do not report success when the OpenClaw mirror step fails.
The comment here says OpenClaw resolves skills from $HOME/.openclaw/skills/. If that copy fails, postInstall() still returns { success: true }, so callers can report a successful install even though the skill may not be loadable.
Suggested change
if (mirrorFailed) {
- messages.push("Warning: failed to mirror some files to $HOME/.openclaw/skills/");
+ messages.push("Failed to mirror files to $HOME/.openclaw/skills/");
+ return { success: false, messages };
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (paths.mirrorDir) { | |
| const { files } = collectFiles(localSkillDir); | |
| let mirrorFailed = false; | |
| for (const rel of files) { | |
| const content = fs.readFileSync(path.join(localSkillDir, rel)); | |
| const mirrorSubdir = rel.includes("/") | |
| ? `${paths.mirrorDir}/${path.dirname(rel)}` | |
| : paths.mirrorDir; | |
| const mirrorFile = `${paths.mirrorDir}/${rel}`; | |
| // mirrorDir contains $HOME which must expand, so we use double | |
| // quotes (not shellQuote). Safe because validateRelativePath | |
| // restricts filenames to [A-Za-z0-9._-/] before we reach here. | |
| const result = sshExec( | |
| ctx, | |
| `mkdir -p "${mirrorSubdir}" && cat > "${mirrorFile}"`, | |
| { input: content }, | |
| ); | |
| if (!result || result.status !== 0) { | |
| mirrorFailed = true; | |
| } | |
| } | |
| if (mirrorFailed) { | |
| messages.push("Warning: failed to mirror some files to $HOME/.openclaw/skills/"); | |
| } | |
| if (paths.mirrorDir) { | |
| const { files } = collectFiles(localSkillDir); | |
| let mirrorFailed = false; | |
| for (const rel of files) { | |
| const content = fs.readFileSync(path.join(localSkillDir, rel)); | |
| const mirrorSubdir = rel.includes("/") | |
| ? `${paths.mirrorDir}/${path.dirname(rel)}` | |
| : paths.mirrorDir; | |
| const mirrorFile = `${paths.mirrorDir}/${rel}`; | |
| // mirrorDir contains $HOME which must expand, so we use double | |
| // quotes (not shellQuote). Safe because validateRelativePath | |
| // restricts filenames to [A-Za-z0-9._-/] before we reach here. | |
| const result = sshExec( | |
| ctx, | |
| `mkdir -p "${mirrorSubdir}" && cat > "${mirrorFile}"`, | |
| { input: content }, | |
| ); | |
| if (!result || result.status !== 0) { | |
| mirrorFailed = true; | |
| } | |
| } | |
| if (mirrorFailed) { | |
| messages.push("Failed to mirror files to $HOME/.openclaw/skills/"); | |
| return { success: false, messages }; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/skill-install.ts` around lines 286 - 309, The mirror step can fail
(mirrorFailed) but postInstall still reports success; update the post-install
logic so that if any mirror operation fails (detect via the mirrorFailed flag
set in the paths.mirrorDir loop that uses collectFiles and sshExec) the final
result returned by postInstall is not { success: true } — instead set success:
false (or include an explicit error) and add a corresponding error message to
messages so callers know the skill may be unloadable; ensure the change
references the existing mirrorFailed variable and the postInstall return path so
the failure is propagated to callers.
| // @ts-nocheck | ||
| // SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| // | ||
| // Tier management — load tier definitions and resolve preset selections. | ||
| // | ||
| // Tiers are defined in nemoclaw-blueprint/policies/tiers.yaml. | ||
| // Each tier is a named posture (restricted, balanced, open) that maps to | ||
| // a set of policy presets and their default access levels. | ||
| // | ||
| // The base sandbox policy is always applied regardless of tier. | ||
| // Tiers layer additional presets on top of that baseline. | ||
|
|
||
| const fs = require("fs"); | ||
| const path = require("path"); | ||
| const YAML = require("yaml"); | ||
| const { ROOT } = require("./runner"); | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if src/lib/tiers.ts exists and view its content
ls -la src/lib/tiers.ts 2>/dev/null && wc -l src/lib/tiers.tsRepository: NVIDIA/NemoClaw
Length of output: 144
🏁 Script executed:
# Check if tsconfig.cli.json exists
ls -la tsconfig.cli.json 2>/dev/null && cat tsconfig.cli.jsonRepository: NVIDIA/NemoClaw
Length of output: 655
🏁 Script executed:
# Search for any references to tsconfig.cli.json or CLI type-checking setup
rg "tsconfig.cli" --type ts --type jsonRepository: NVIDIA/NemoClaw
Length of output: 120
🏁 Script executed:
# Check the full content of src/lib/tiers.ts
cat -n src/lib/tiers.ts 2>/dev/null || echo "File not found"Repository: NVIDIA/NemoClaw
Length of output: 3283
Remove @ts-nocheck and convert to typed ES6 imports.
This src/lib module opts out of CLI type-checking and uses CommonJS require() for runtime imports, preventing TypeScript from validating the module dependencies and types. Convert to ES6 imports (import) and remove the @ts-nocheck directive to enable type-checking via tsconfig.cli.json. This aligns with the coding guideline: "Use TypeScript for CLI code in src/lib/ with type-checking via tsconfig.cli.json."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/tiers.ts` around lines 1 - 18, Remove the top-line //@ts-nocheck and
convert CommonJS requires to ES6 typed imports: replace const fs =
require("fs"), path = require("path"), YAML = require("yaml"), and { ROOT } =
require("./runner") with respective import statements (e.g., import * as fs from
"fs"; import * as path from "path"; import YAML from "yaml"; import { ROOT }
from "./runner";) and add appropriate TypeScript types for any variables or
function signatures in this module (e.g., typed return shapes for functions that
parse YAML and typed parameters for tier resolver functions) so the file
compiles under tsconfig.cli.json and leverages type-checking.
| # Use node (always available) instead of curl (removed by hardening). | ||
| # Isolation is enforced by the OpenShell proxy — blocked requests return | ||
| # HTTP 403. Connection errors (ENOTFOUND, ECONNREFUSED, TIMEOUT) also | ||
| # count as isolation. Only HTTP 200 would indicate a breach. | ||
| log " Testing: sandbox A cannot reach sandbox B by hostname..." | ||
| local probe_a | ||
| probe_a=$(sandbox_exec_for "$SANDBOX_A" "node -e \" | ||
| const http = require('http'); | ||
| const req = http.get('http://${SANDBOX_B}:18789/', (res) => { | ||
| console.log('STATUS_' + res.statusCode); | ||
| res.resume(); | ||
| }); | ||
| req.on('error', (e) => console.log('ERROR: ' + e.message)); | ||
| req.setTimeout(5000, () => { req.destroy(); console.log('TIMEOUT'); }); | ||
| \"" 2>&1) || true | ||
|
|
||
| if [[ -z "$probe_a" ]]; then | ||
| fail "TC-SBX-11: Isolation (A→B)" "Empty response — SSH or infrastructure failure" | ||
| elif echo "$probe_a" | grep -qiE "STATUS_403|ERROR|TIMEOUT"; then | ||
| pass "TC-SBX-11: Sandbox A cannot reach sandbox B ($(echo "$probe_a" | grep -oE 'STATUS_[0-9]+|ERROR|TIMEOUT' | head -1))" | ||
| elif echo "$probe_a" | grep -qE "STATUS_[0-9]+"; then | ||
| fail "TC-SBX-11: Isolation (A→B)" "Sandbox A reached sandbox B ($(echo "$probe_a" | grep -oE 'STATUS_[0-9]+' | head -1))" | ||
| else | ||
| fail "TC-SBX-11: Isolation (A→B)" "Unexpected probe output: $(echo "$probe_a" | head -3)" | ||
| fi | ||
|
|
||
| log " Testing reverse: sandbox B cannot reach sandbox A..." | ||
| local probe_b | ||
| probe_b=$(sandbox_exec_for "$SANDBOX_B" "node -e \" | ||
| const http = require('http'); | ||
| const req = http.get('http://${SANDBOX_A}:18789/', (res) => { | ||
| console.log('STATUS_' + res.statusCode); | ||
| res.resume(); | ||
| }); | ||
| req.on('error', (e) => console.log('ERROR: ' + e.message)); | ||
| req.setTimeout(5000, () => { req.destroy(); console.log('TIMEOUT'); }); | ||
| \"" 2>&1) || true |
There was a problem hiding this comment.
Respect NEMOCLAW_DASHBOARD_PORT in the isolation probes.
These requests are still pinned to 18789, so TC-SBX-11 can pass on ECONNREFUSED/TIMEOUT while never exercising the real dashboard port when a port override is configured. That turns the isolation check into a false positive for the override path.
Suggested fix
+ local dashboard_port="${NEMOCLAW_DASHBOARD_PORT:-18789}"
log " Testing: sandbox A cannot reach sandbox B by hostname..."
local probe_a
probe_a=$(sandbox_exec_for "$SANDBOX_A" "node -e \"
const http = require('http');
-const req = http.get('http://${SANDBOX_B}:18789/', (res) => {
+const req = http.get('http://${SANDBOX_B}:${dashboard_port}/', (res) => {
console.log('STATUS_' + res.statusCode);
res.resume();
});
req.on('error', (e) => console.log('ERROR: ' + e.message));
req.setTimeout(5000, () => { req.destroy(); console.log('TIMEOUT'); });
\"" 2>&1) || true
@@
log " Testing reverse: sandbox B cannot reach sandbox A..."
local probe_b
probe_b=$(sandbox_exec_for "$SANDBOX_B" "node -e \"
const http = require('http');
-const req = http.get('http://${SANDBOX_A}:18789/', (res) => {
+const req = http.get('http://${SANDBOX_A}:${dashboard_port}/', (res) => {
console.log('STATUS_' + res.statusCode);
res.resume();
});
req.on('error', (e) => console.log('ERROR: ' + e.message));
req.setTimeout(5000, () => { req.destroy(); console.log('TIMEOUT'); });
\"" 2>&1) || true📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Use node (always available) instead of curl (removed by hardening). | |
| # Isolation is enforced by the OpenShell proxy — blocked requests return | |
| # HTTP 403. Connection errors (ENOTFOUND, ECONNREFUSED, TIMEOUT) also | |
| # count as isolation. Only HTTP 200 would indicate a breach. | |
| log " Testing: sandbox A cannot reach sandbox B by hostname..." | |
| local probe_a | |
| probe_a=$(sandbox_exec_for "$SANDBOX_A" "node -e \" | |
| const http = require('http'); | |
| const req = http.get('http://${SANDBOX_B}:18789/', (res) => { | |
| console.log('STATUS_' + res.statusCode); | |
| res.resume(); | |
| }); | |
| req.on('error', (e) => console.log('ERROR: ' + e.message)); | |
| req.setTimeout(5000, () => { req.destroy(); console.log('TIMEOUT'); }); | |
| \"" 2>&1) || true | |
| if [[ -z "$probe_a" ]]; then | |
| fail "TC-SBX-11: Isolation (A→B)" "Empty response — SSH or infrastructure failure" | |
| elif echo "$probe_a" | grep -qiE "STATUS_403|ERROR|TIMEOUT"; then | |
| pass "TC-SBX-11: Sandbox A cannot reach sandbox B ($(echo "$probe_a" | grep -oE 'STATUS_[0-9]+|ERROR|TIMEOUT' | head -1))" | |
| elif echo "$probe_a" | grep -qE "STATUS_[0-9]+"; then | |
| fail "TC-SBX-11: Isolation (A→B)" "Sandbox A reached sandbox B ($(echo "$probe_a" | grep -oE 'STATUS_[0-9]+' | head -1))" | |
| else | |
| fail "TC-SBX-11: Isolation (A→B)" "Unexpected probe output: $(echo "$probe_a" | head -3)" | |
| fi | |
| log " Testing reverse: sandbox B cannot reach sandbox A..." | |
| local probe_b | |
| probe_b=$(sandbox_exec_for "$SANDBOX_B" "node -e \" | |
| const http = require('http'); | |
| const req = http.get('http://${SANDBOX_A}:18789/', (res) => { | |
| console.log('STATUS_' + res.statusCode); | |
| res.resume(); | |
| }); | |
| req.on('error', (e) => console.log('ERROR: ' + e.message)); | |
| req.setTimeout(5000, () => { req.destroy(); console.log('TIMEOUT'); }); | |
| \"" 2>&1) || true | |
| # Use node (always available) instead of curl (removed by hardening). | |
| # Isolation is enforced by the OpenShell proxy — blocked requests return | |
| # HTTP 403. Connection errors (ENOTFOUND, ECONNREFUSED, TIMEOUT) also | |
| # count as isolation. Only HTTP 200 would indicate a breach. | |
| local dashboard_port="${NEMOCLAW_DASHBOARD_PORT:-18789}" | |
| log " Testing: sandbox A cannot reach sandbox B by hostname..." | |
| local probe_a | |
| probe_a=$(sandbox_exec_for "$SANDBOX_A" "node -e \" | |
| const http = require('http'); | |
| const req = http.get('http://${SANDBOX_B}:${dashboard_port}/', (res) => { | |
| console.log('STATUS_' + res.statusCode); | |
| res.resume(); | |
| }); | |
| req.on('error', (e) => console.log('ERROR: ' + e.message)); | |
| req.setTimeout(5000, () => { req.destroy(); console.log('TIMEOUT'); }); | |
| \"" 2>&1) || true | |
| if [[ -z "$probe_a" ]]; then | |
| fail "TC-SBX-11: Isolation (A→B)" "Empty response — SSH or infrastructure failure" | |
| elif echo "$probe_a" | grep -qiE "STATUS_403|ERROR|TIMEOUT"; then | |
| pass "TC-SBX-11: Sandbox A cannot reach sandbox B ($(echo "$probe_a" | grep -oE 'STATUS_[0-9]+|ERROR|TIMEOUT' | head -1))" | |
| elif echo "$probe_a" | grep -qE "STATUS_[0-9]+"; then | |
| fail "TC-SBX-11: Isolation (A→B)" "Sandbox A reached sandbox B ($(echo "$probe_a" | grep -oE 'STATUS_[0-9]+' | head -1))" | |
| else | |
| fail "TC-SBX-11: Isolation (A→B)" "Unexpected probe output: $(echo "$probe_a" | head -3)" | |
| fi | |
| log " Testing reverse: sandbox B cannot reach sandbox A..." | |
| local probe_b | |
| probe_b=$(sandbox_exec_for "$SANDBOX_B" "node -e \" | |
| const http = require('http'); | |
| const req = http.get('http://${SANDBOX_A}:${dashboard_port}/', (res) => { | |
| console.log('STATUS_' + res.statusCode); | |
| res.resume(); | |
| }); | |
| req.on('error', (e) => console.log('ERROR: ' + e.message)); | |
| req.setTimeout(5000, () => { req.destroy(); console.log('TIMEOUT'); }); | |
| \"" 2>&1) || true |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/test-sandbox-operations.sh` around lines 569 - 605, The probe code
hardcodes port 18789 (in the node -e strings sent by sandbox_exec_for for
probe_a and probe_b) so it ignores NEMOCLAW_DASHBOARD_PORT overrides; update
both probes to interpolate the shell variable instead of 18789 (e.g. replace
':18789' with ':${NEMOCLAW_DASHBOARD_PORT:-18789}') so the correct dashboard
port is probed while still defaulting to 18789 when the env var is unset; adjust
the strings inside the sandbox_exec_for calls (the node -e "...${SANDBOX_B}..."
and "...${SANDBOX_A}..." commands) to include this variable.
| const result = spawnSync(nodeExec, [scriptPath], { | ||
| cwd: repoRoot, | ||
| encoding: "utf-8", | ||
| env: { | ||
| ...process.env, | ||
| HOME: tmpDir, | ||
| PATH: `${fakeBin}:${process.env.PATH || ""}`, | ||
| NEMOCLAW_HEALTH_POLL_COUNT: "0", | ||
| NEMOCLAW_NON_INTERACTIVE: "1", | ||
| }, | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and its size
wc -l test/onboard.test.tsRepository: NVIDIA/NemoClaw
Length of output: 84
🏁 Script executed:
# Check the code at lines 1000-1010
sed -n '1000,1010p' test/onboard.test.tsRepository: NVIDIA/NemoClaw
Length of output: 369
🏁 Script executed:
# Search for any existing timeout handling in this test file
rg -n "timeout|ETIMEDOUT" test/onboard.test.tsRepository: NVIDIA/NemoClaw
Length of output: 887
🏁 Script executed:
# Check if there's any spawnSync usage elsewhere in the test files to see patterns
rg -n "spawnSync" test/ -B 2 -A 5Repository: NVIDIA/NemoClaw
Length of output: 50371
🏁 Script executed:
sed -n '990,1015p' test/onboard.test.tsRepository: NVIDIA/NemoClaw
Length of output: 830
🏁 Script executed:
sed -n '960,1000p' test/onboard.test.tsRepository: NVIDIA/NemoClaw
Length of output: 1395
🏁 Script executed:
sed -n '920,950p' test/onboard.test.tsRepository: NVIDIA/NemoClaw
Length of output: 1395
Add timeout to prevent hanging in gateway failure regression test.
This test verifies startGateway() exits cleanly on failure. If the underlying code stops calling process.exit(1), the spawnSync() call will hang indefinitely. Add a timeout and assertion to fail fast instead.
Suggested change
const result = spawnSync(nodeExec, [scriptPath], {
cwd: repoRoot,
encoding: "utf-8",
+ timeout: 15000,
env: {
...process.env,
HOME: tmpDir,
PATH: `${fakeBin}:${process.env.PATH || ""}`,
NEMOCLAW_HEALTH_POLL_COUNT: "0",
NEMOCLAW_NON_INTERACTIVE: "1",
},
});
+
+ assert.notEqual(result.error?.code, "ETIMEDOUT", "gateway failure regression test timed out");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const result = spawnSync(nodeExec, [scriptPath], { | |
| cwd: repoRoot, | |
| encoding: "utf-8", | |
| env: { | |
| ...process.env, | |
| HOME: tmpDir, | |
| PATH: `${fakeBin}:${process.env.PATH || ""}`, | |
| NEMOCLAW_HEALTH_POLL_COUNT: "0", | |
| NEMOCLAW_NON_INTERACTIVE: "1", | |
| }, | |
| }); | |
| const result = spawnSync(nodeExec, [scriptPath], { | |
| cwd: repoRoot, | |
| encoding: "utf-8", | |
| timeout: 15000, | |
| env: { | |
| ...process.env, | |
| HOME: tmpDir, | |
| PATH: `${fakeBin}:${process.env.PATH || ""}`, | |
| NEMOCLAW_HEALTH_POLL_COUNT: "0", | |
| NEMOCLAW_NON_INTERACTIVE: "1", | |
| }, | |
| }); | |
| assert.notEqual(result.error?.code, "ETIMEDOUT", "gateway failure regression test timed out"); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/onboard.test.ts` around lines 1000 - 1010, The test calling
spawnSync(nodeExec, [scriptPath], ...) can hang if startGateway() fails to exit;
add a timeout to the spawnSync options (e.g., timeout: 5000) and assert that the
child either exited with a non-zero status or was killed by the timeout so the
test fails fast; update the options object passed to spawnSync and add an
assertion checking result.status !== 0 || result.signal === 'SIGTERM' (or
result.error.code === 'ETIMEDOUT') to ensure the gateway-failure regression
fails quickly if process.exit is no longer called.
| local _dp="${NEMOCLAW_DASHBOARD_PORT:-18789}" | ||
| case "$_dp" in *[!0-9]* | '') _dp=18789 ;; esac | ||
|
|
||
| local -a pids=() | ||
| local pid | ||
| while IFS= read -r pid; do | ||
| [ -n "$pid" ] || continue | ||
| pids+=("$pid") | ||
| done < <(pgrep -f 'openshell.*forward.*18789' 2>/dev/null || true) | ||
| done < <(pgrep -f "openshell.*forward.*${_dp}" 2>/dev/null || true) |
There was a problem hiding this comment.
Validate the dashboard port range before matching PIDs.
NEMOCLAW_DASHBOARD_PORT=1 or 80 currently passes this guard and becomes part of the pgrep -f pattern, which can match and kill unrelated openshell forward processes. src/lib/ports.ts already treats only 1024..65535 as valid, so this cleanup path should reject the same bad values instead of using them.
Suggested fix
local _dp="${NEMOCLAW_DASHBOARD_PORT:-18789}"
case "$_dp" in *[!0-9]* | '') _dp=18789 ;; esac
+ if [ "$_dp" -lt 1024 ] || [ "$_dp" -gt 65535 ]; then
+ warn "Invalid NEMOCLAW_DASHBOARD_PORT='${NEMOCLAW_DASHBOARD_PORT:-}'; falling back to 18789."
+ _dp=18789
+ fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| local _dp="${NEMOCLAW_DASHBOARD_PORT:-18789}" | |
| case "$_dp" in *[!0-9]* | '') _dp=18789 ;; esac | |
| local -a pids=() | |
| local pid | |
| while IFS= read -r pid; do | |
| [ -n "$pid" ] || continue | |
| pids+=("$pid") | |
| done < <(pgrep -f 'openshell.*forward.*18789' 2>/dev/null || true) | |
| done < <(pgrep -f "openshell.*forward.*${_dp}" 2>/dev/null || true) | |
| local _dp="${NEMOCLAW_DASHBOARD_PORT:-18789}" | |
| case "$_dp" in *[!0-9]* | '') _dp=18789 ;; esac | |
| if [ "$_dp" -lt 1024 ] || [ "$_dp" -gt 65535 ]; then | |
| warn "Invalid NEMOCLAW_DASHBOARD_PORT='${NEMOCLAW_DASHBOARD_PORT:-}'; falling back to 18789." | |
| _dp=18789 | |
| fi | |
| local -a pids=() | |
| local pid | |
| while IFS= read -r pid; do | |
| [ -n "$pid" ] || continue | |
| pids+=("$pid") | |
| done < <(pgrep -f "openshell.*forward.*${_dp}" 2>/dev/null || true) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@uninstall.sh` around lines 260 - 268, The current port variable _dp (set from
NEMOCLAW_DASHBOARD_PORT) can be a low-value like 1 or 80 and will be
interpolated into the pgrep pattern; update the guard around _dp to enforce the
same valid port range as src/lib/ports.ts (numeric and between 1024 and 65535)
and otherwise reset _dp to the default 18789 before constructing the pgrep -f
"openshell.*forward.*${_dp}" query; specifically, replace the permissive case
check with an explicit numeric-and-range check for _dp (or add one immediately
after local _dp=...) so only valid ports are used when calling pgrep and
potential unrelated processes are not matched.
|
Hey @ColinM-sys — the checksum fix itself looks good, but this branch has accumulated 31 commits from main (merges over time), which means the diff shows 5500+ lines across 82 files instead of just your actual change. That makes it impossible to review meaningfully. Could you rebase onto current main so the diff only shows your NodeSource checksum fix? Something like: git fetch upstream main
git rebase upstream/main
git push --force-with-leaseThat should drop it back down to a clean, reviewable diff. We're ready to merge once the PR shows just the fix. Thanks! |
…as root (NVIDIA#576) The NodeSource setup script was piped directly from curl into `sudo bash` without any integrity verification (CWE-494). This adds SHA-256 hash verification following the same pattern already used for nvm in scripts/install.sh: download to temp file, compute hash, compare against pinned expected value, and only execute if they match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ca4466f to
3b09542
Compare
|
Thanks Aaron — rebased onto current main. Should be a clean diff now showing just the NodeSource checksum fix. |
Signed-off-by: Test User <test@example.com> Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
ericksoa
left a comment
There was a problem hiding this comment.
Hash verified against live NodeSource URL, shfmt formatting fixed. Low regression risk — only affects Brev CI setup, not user-facing code.
Summary
Closes #576.
The NodeSource installer in
scripts/brev-launchable-ci-cpu.shis downloaded viacurl | sudo bashwith no integrity verification (CWE-494). This executes arbitrary remote code as root.Fix: follows the exact SHA-256 verification pattern already used for nvm in
scripts/install.sh:634-657:sudo -E bashif hash matchesNote: The issue references
scripts/brev-setup.shandscripts/install.sh, but in the current codebase the only vulnerable occurrence is inscripts/brev-launchable-ci-cpu.sh:154.scripts/install.shuses nvm (not NodeSource) and already has integrity verification.Test plan
scripts/install.sh🤖 Generated with Claude Code
Summary by CodeRabbit
Signed-off-by: ColinM-sys cmcdonough@50words.com