Skip to content

Implement sourceos-agent local runtime CLI and doctor commands #20

@mdheller

Description

@mdheller

Context

The Mac node-commander incident exposed a platform gap: SourceOS needs a reusable local-agent control surface for Nix-governed services, launchd/systemd persistence, Podman runtime checks, auth isolation, observability, and quarantine.

Canonical spec: SourceOS-Linux/sourceos-spec specs/local-agent-runtime.md.

Problem

We had a service land as a root-owned system-wide LaunchAgent, invoking a Nix store wrapper that called Podman, depending on noninteractive registry auth, writing logs to /tmp, and using unbounded launch behavior. The failure path created thousands of runs and looked like hostile persistence.

Deliverables

Implement a sourceos-agent CLI with:

  • sourceos-agent preflight <name>
  • sourceos-agent doctor <name>
  • sourceos-agent status <name>
  • sourceos-agent logs <name>
  • sourceos-agent install <name>
  • sourceos-agent stage <name>
  • sourceos-agent start <name>
  • sourceos-agent stop <name>
  • sourceos-agent restart <name>
  • sourceos-agent quarantine <name>
  • sourceos-agent uninstall <name>

Also add sourceos doctor local-runtime.

Required checks

The CLI must report:

  • launchd/systemd backend state
  • disabled override or stale service labels
  • plist/unit lint and permissions
  • Nix generation/source when available
  • Podman binary/version
  • Podman machine existence/running/socket state
  • local image presence
  • container state including Stopping
  • auth mode and host credential-helper risk
  • log paths and last exit reason
  • suspicious run/restart counts

Acceptance criteria

  • A stopped/refusing Podman machine produces one clear preflight failure and does not install active persistence.
  • A host Docker config with Google credential helpers does not poison local runtime when empty-authfile is declared.
  • A local image can run with --authfile sourceos-empty-auth.json.
  • KeepAlive=true, /tmp logs, direct registry runtime images, and raw podman run persistence are flagged.
  • sourceos-agent quarantine node-commander captures service definition, logs, podman state, image metadata, and redacted auth config.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions