Skip to content

fix(platform): tolerate older VPS bundles without Symphony#230

Open
HamedMP wants to merge 3 commits into
mainfrom
fix/vps-recovery-optional-symphony
Open

fix(platform): tolerate older VPS bundles without Symphony#230
HamedMP wants to merge 3 commits into
mainfrom
fix/vps-recovery-optional-symphony

Conversation

@HamedMP
Copy link
Copy Markdown
Owner

@HamedMP HamedMP commented May 28, 2026

Summary

  • Treat the Symphony host binary as optional during customer VPS cloud-init bootstrap.
  • Enable/start matrix-symphony.service only when both the binary and unit exist in the extracted host bundle.
  • Keep core restore/gateway/shell/sync services fail-closed and required.

Incident Context

Nima's recovery VM booted an older bundle (v2026.05.24-082-staging-ff593047) that did not include /opt/matrix/bin/matrix-symphony. Current cloud-init treated that binary as required, so setup aborted before gateway startup and platform registration.

Invariants

  • Source of truth: platform user_machines remains canonical for VPS state; this only changes bootstrap service selection.
  • Lock/transaction scope: no database writes or external platform state changes in this PR.
  • Acceptable orphan states: older bundles without Symphony still boot core services; Symphony remains disabled until a bundle ships the binary and unit.
  • Auth source of truth: unchanged; gateway/platform bearer token checks are preserved.
  • Deferred scope: does not change stale recovery reconciliation or registration-token retry behavior.

Verification

  • bun run test tests/deploy/customer-vps/symphony-systemd.test.ts

@vercel
Copy link
Copy Markdown

vercel Bot commented May 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
matrix-os-www Ready Ready Preview, Comment May 28, 2026 6:44pm

Request Review

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 28, 2026

Greptile Summary

This PR makes matrix-symphony optional during customer VPS cloud-init bootstrap by moving it from the required_bin chmod loop to optional_bin, and gating systemctl enable/start behind a runtime check for both the binary and its systemd unit file. Core services (matrix-restore, matrix-gateway, matrix-shell, matrix-code, matrix-sync-agent) remain fail-closed and always enabled.

  • cloud-init.yaml: matrix-symphony removed from required_bin (which would chmod and fail on absence) and added to optional_bin; a new [ -x binary ] && [ -f unit ] guard populates $optional_services before the systemctl enable/start calls.
  • Tests: symphony-systemd.test.ts and customer-vps-cloud-init.test.ts updated to assert the new variable-based service selection pattern; the if-block guard strings are verified by toContain checks on optional_services="$optional_services matrix-symphony.service".

Confidence Score: 5/5

Safe to merge — the change is narrowly scoped to making one optional service truly optional, with no impact on core service startup.

The shell logic is correct: unquoted empty $optional_services is silently discarded by word splitting, so systemctl enable and systemctl start receive only the intended arguments whether or not Symphony is present. The binary+unit existence guard fires after the install and daemon-reload steps, so the check accurately reflects what's on disk. Core services remain unconditionally fail-closed. Tests are updated to cover the new variable-based pattern.

No files require special attention.

Important Files Changed

Filename Overview
distro/customer-vps/cloud-init.yaml Moves matrix-symphony from required_bin to optional_bin chmod loop and gates systemctl enable/start behind a binary+unit file existence check; base core services are unconditionally enabled.
tests/deploy/customer-vps/symphony-systemd.test.ts Updated assertions to match the new variable-based service enablement pattern; replaces old exact-string systemctl assertions with checks on the new conditional logic strings.
tests/platform/customer-vps-cloud-init.test.ts Two test cases updated to use the new base_services/optional_services variable pattern instead of the previously hardcoded systemctl enable string.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[cloud-init runcmd starts] --> B[Install systemd units from bundle\n/opt/matrix/systemd/*.service]
    B --> C[chmod required_bin loop\nmatrixctl, matrix-gateway, matrix-shell,\nmatrix-code, matrix-sync-agent,\nmatrix-install-hermes]
    C --> D[chmod optional_bin loop\nmatrix-symphony, linux-tools,\nmessaging binaries]
    D --> E{binary exists?}
    E -->|exists| F[chmod 0750]
    E -->|missing| G[skip silently]
    F --> H[systemctl daemon-reload]
    G --> H
    H --> I[base_services=\nmatrix-restore, gateway,\nshell, code, sync-agent]
    I --> J{binary -x AND unit -f?}
    J -->|YES| K[optional_services= matrix-symphony.service]
    J -->|NO| L[optional_services=empty, log skip]
    K --> M[systemctl enable base+optional+linux-tools+db-backup+nginx]
    L --> M
    M --> N[systemctl start base+optional]
    N --> O[Core platform running\nSymphony only if bundle includes it]
Loading

Reviews (2): Last reviewed commit: "test(platform): update optional Symphony..." | Re-trigger Greptile

expect(cloudInit).toContain("matrix-symphony");
expect(cloudInit).toContain("systemctl enable matrix-restore.service matrix-gateway.service matrix-shell.service matrix-code.service matrix-sync-agent.service matrix-symphony.service");
expect(cloudInit).toContain("systemctl start matrix-restore.service matrix-gateway.service matrix-shell.service matrix-code.service matrix-sync-agent.service matrix-symphony.service");
expect(cloudInit).not.toContain("required_bin in matrixctl matrix-db-backup.sh matrix-restore.sh matrix-gateway matrix-shell matrix-code matrix-sync-agent matrix-symphony");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Negative assertion is a partial-match guard

The not.toContain check uses a substring that stops at matrix-sync-agent matrix-symphony (omitting matrix-install-hermes). It correctly prevents matrix-symphony from re-appearing in the required list because any required-list string ending in ...matrix-sync-agent matrix-symphony would match. However, if matrix-symphony were accidentally re-inserted after matrix-install-hermes (e.g., ...matrix-install-hermes matrix-symphony), this assertion would still pass silently since the checked fragment isn't present. A toContain asserting the exact current required_bin string would catch both directions of regression.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/deploy/customer-vps/symphony-systemd.test.ts
Line: 43

Comment:
**Negative assertion is a partial-match guard**

The `not.toContain` check uses a substring that stops at `matrix-sync-agent matrix-symphony` (omitting `matrix-install-hermes`). It correctly prevents `matrix-symphony` from re-appearing in the required list because any required-list string ending in `...matrix-sync-agent matrix-symphony` would match. However, if `matrix-symphony` were accidentally re-inserted *after* `matrix-install-hermes` (e.g., `...matrix-install-hermes matrix-symphony`), this assertion would still pass silently since the checked fragment isn't present. A `toContain` asserting the exact current required_bin string would catch both directions of regression.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant