Skip to content

Fix per-node Origin CA certificate provisioning#1413

Draft
simple-agent-manager[bot] wants to merge 3 commits into
mainfrom
sam/execute-task-using-skill-01kvzg
Draft

Fix per-node Origin CA certificate provisioning#1413
simple-agent-manager[bot] wants to merge 3 commits into
mainfrom
sam/execute-task-using-skill-01kvzg

Conversation

@simple-agent-manager

Copy link
Copy Markdown
Contributor

Summary

Fixes VM-001 (HIGH, CWE-312/522) by removing the platform-shared Cloudflare Origin CA private key from static cloud-init user-data for new VM nodes.

New nodes now:

  • generate /etc/sam/tls/origin-ca-key.pem locally during cloud-init
  • generate a CSR locally
  • call POST /api/nodes/:id/origin-ca-certificate with the node callback JWT
  • receive only the signed Origin CA certificate from the API Worker

The API Worker signs the CSR through Cloudflare Origin CA using CF_API_TOKEN; it never returns or embeds a platform-wide private key.

Tradeoffs

This chooses per-node key material with a node-scoped signing endpoint rather than ACME or a broader secret retrieval service. It is the smallest compatible change for the current VM routing model.

The issued certificates remain wildcard-scoped for *.BASE_DOMAIN, *.vm.BASE_DOMAIN, and BASE_DOMAIN so existing ws-* and node VM hostnames continue to work. The private key is no longer shared fleet-wide, but full hostname minimization would require a larger routing and certificate model change.

apps/api/src/infra/resources/origin-ca.ts is intentionally left in place to avoid destructive protected-resource/Pulumi state churn in this PR. Runtime provisioning and deployment secret configuration no longer copy or require ORIGIN_CA_CERT / ORIGIN_CA_KEY for new nodes.

Validation

  • pnpm --filter @simple-agent-manager/cloud-init test
  • pnpm --filter @simple-agent-manager/api test -- tests/unit/services/origin-ca-certificates.test.ts tests/unit/node-callback-scope-enforcement.test.ts
  • pnpm --filter @simple-agent-manager/cloud-init typecheck
  • pnpm --filter @simple-agent-manager/api lint (existing warnings only)
  • pnpm --filter @simple-agent-manager/api typecheck
  • pnpm lint && pnpm typecheck && pnpm test && pnpm build (passed; existing warnings only)
  • post-review rerun: pnpm --filter @simple-agent-manager/api test -- tests/unit/node-callback-scope-enforcement.test.ts tests/unit/services/origin-ca-certificates.test.ts
  • post-review rerun: pnpm --filter @simple-agent-manager/api typecheck

Human Gate

Draft PR only. Do not merge yet.

Per task constraints, I did not deploy to staging and did not provision real VMs. Staging deploy, real VM provisioning, and TLS handshake verification are explicitly left for human review.

@simple-agent-manager simple-agent-manager Bot added the needs-human-review Agent could not complete all review gates — human must approve before merge label Jun 25, 2026
@sonarqubecloud

Copy link
Copy Markdown

@codspeed-hq

codspeed-hq Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Merging this PR will not alter performance

✅ 6 untouched benchmarks


Comparing sam/execute-task-using-skill-01kvzg (589ca57) with main (bc159a6)

Open in CodSpeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-human-review Agent could not complete all review gates — human must approve before merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant