diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 352973c..3e8f8c5 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -3,70 +3,4 @@ The canonical version of this document now lives in [`docs/docs/architecture.md`](docs/docs/architecture.md). -Use the Docusaurus site source under `docs/` for ongoing updates. -- VM creation should rely primarily on reusable templates -- the default storage path for downstream cluster VMs is node-local NVMe -- cluster scaling should be handled through CAPI rather than ad hoc Proxmox operations -- CAPI references templates that have already been published into Proxmox - -The desired outcome is that the clustered Proxmox layer exposes a stable substrate, while CAPI owns the actual Kubernetes cluster lifecycle. - -After creation, downstream clusters are treated as strongly isolated environments rather than extensions of the platform cluster. - -Downstream clusters may make use of BGP-advertised floating IPs through the `VP6630` when they need stable network entrypoints, but this is an available capability rather than a default requirement. - -## Role of GitOps - -`Argo CD` runs on the platform cluster and manages the desired state of the control plane. - -At minimum, that includes: - -- platform cluster applications -- platform cluster infrastructure controllers -- provisioning stack configuration - -The current design keeps `Argo CD` scoped to the platform cluster itself. - -This means: - -- the platform cluster's `Argo CD` manages only platform services and platform-owned infrastructure components -- downstream clusters are not assumed to be centrally registered into platform `Argo CD` -- downstream clusters are expected to manage their own services independently -- whether a downstream cluster uses `Argo CD` or some other delivery model is left to that cluster's own design - -## Why This Layout - -This layout separates concerns in a way that matches the intended operating model: - -- Talos on the `UM760` keeps the control plane narrow and appliance-like -- Proxmox on the `MS-02 Ultra` nodes provides a clustered VM substrate for downstream clusters -- Tinkerbell handles bare-metal installation -- Early Proxmox clustering gives CAPI a single control surface without waiting for the full shared-storage design -- AWX fills the gap between bare-metal install and a fully configured Proxmox node -- Packer, the NAS, and AWX together provide a clear image pipeline without pushing image management into CAPI -- TerraKube remains available as a future addition for complementary infrastructure automation when that need becomes concrete -- CAPI becomes the main abstraction for downstream cluster creation and scaling -- Argo CD keeps the platform cluster declarative -- BGP floating IPs remain a downstream-cluster capability rather than part of the platform cluster's default exposure model - -The design also avoids forcing too much day-one complexity into the Proxmox layer. The nodes can start as individually useful machines before later being combined into a more integrated Proxmox topology. - -For the same reason, the platform cluster remains single-node on the `UM760`. This accepts that `UM760` failure is a platform outage, but avoids introducing a false form of HA where multiple control-plane VMs still depend on the same underlying machine. - -The same rebuild-first logic applies to recovery boundaries: - -- the platform cluster is primarily rebuilt from Talos configuration, GitOps state, and automation -- Proxmox hosts are primarily rebuilt through Tinkerbell and AWX -- downstream clusters are primarily recreated through CAPI -- NAS-backed artifacts, backups, and selected workload data form the primary durable state boundary - -## Likely Next Sections - -As the design firms up, the next useful additions to this document are likely: - -- bootstrap path for the `UM760` -- Proxmox node lifecycle in more detail -- storage model -- network model -- downstream cluster lifecycle -- backup and disaster recovery boundaries +Use the Docusaurus source under `docs/` for ongoing updates. The canonical page now includes the current DNS and naming model for `lab.gilman.io` alongside the broader platform architecture. diff --git a/docs/docs/architecture.md b/docs/docs/architecture.md index f9dd499..4f2d40c 100644 --- a/docs/docs/architecture.md +++ b/docs/docs/architecture.md @@ -25,7 +25,7 @@ At a high level: - AWX and/or TerraKube handle node-level post-provisioning work. - CAPI creates downstream Talos-based clusters as Proxmox VMs through the clustered Proxmox API surface. - The `DS923+` provides shared storage to the Proxmox layer. -- The `VP6630` remains the lab router and network boundary to the home network. +- The `VP6630` remains the lab router, DNS entrypoint, and network boundary to the home network. ## Design Intent @@ -100,7 +100,7 @@ The `DS923+` is also the primary durable backup boundary in the system. Platform The physical network roles remain: - `CCR2004`: home router -- `VP6630`: lab router and DMZ boundary to the home network +- `VP6630`: lab router, DNS entrypoint, and DMZ boundary to the home network - `CRS309-1G-8S+IN`: lab switch - `TL-SG105`: dedicated Intel AMT switch for the `MS-02 Ultra` management links @@ -121,6 +121,58 @@ The baseline network model is intentionally smaller than the previous lab design The provisioning network exists because Tinkerbell's DHCP and PXE flow requires Layer 2 access or DHCP relay. The current design assumes a dedicated provisioning segment rather than folding PXE traffic into the general workload path. +### DNS and Naming + +The lab uses `lab.gilman.io` as its internal naming root rather than a private-only top-level domain. + +This keeps internal naming under a real domain the lab controls while still allowing private DNS views, selective public exposure later, and a clean future path for public certificates on intentionally exposed endpoints. + +The current intended DNS design is: + +- `VyOS` remains the client-facing resolver for the lab networks. +- A `PowerDNS Authoritative` service runs as a container on the `VP6630`. +- `VyOS` forwards the lab's internal zones to that local authoritative service. +- Internal names are private by default; public DNS is reserved for explicitly exposed entrypoints. +- Internal certificates are expected to use a private CA by default; public CA issuance is reserved for endpoints that benefit from public trust. + +Running the authoritative DNS service on the router boundary instead of inside the platform cluster avoids a bootstrap dependency where the platform control plane would need to be healthy before the lab can resolve the names used to reach it. + +The namespace is intentionally split by ownership boundary instead of using one flat dynamic zone: + +| Zone | Writers | Purpose | +| --- | --- | --- | +| `lab.gilman.io` | manual or GitOps-managed only | Parent zone, delegations, and a small set of static anchor records | +| `mgmt.lab.gilman.io` | manual or GitOps-managed only | Stable management and platform service names | +| `dhcp.lab.gilman.io` | `VyOS` DHCP via `RFC2136` | Dynamic lease-driven hostnames | +| `.k8s.lab.gilman.io` | `ExternalDNS` via `RFC2136` | Cluster-scoped workload and ingress names | + +This design keeps the management namespace stable while still allowing dynamic DNS for both DHCP clients and Kubernetes workloads. It also keeps update rights narrow: `VyOS` DHCP cannot mutate management records, and each Kubernetes cluster can be constrained to only its own delegated subzone. + +### Internal PKI and Trust + +The lab's internal PKI is designed around the same bootstrap constraint as internal DNS: the trust anchor for internal services cannot depend on the platform cluster being healthy before it can issue or rotate certificates. + +The current intended PKI design is: + +- The internal root CA key lives in `AWS KMS`. +- The root CA is treated as operationally offline: no always-on lab service has standing permission to use it for routine issuance. +- A `Smallstep step-ca` service runs as a container on the `VP6630` as the online intermediate CA. +- Internal ACME is provided by that `step-ca` instance for automated certificate issuance and renewal. +- `Vault` remains the expected long-term home for most secret management inside the platform cluster, but it is not the bootstrap owner of the internal CA hierarchy. + +This keeps naming and trust in the same edge-adjacent failure domain without forcing the platform cluster to come up first. If the platform cluster is down, the lab can still resolve internal names and issue or renew the certificates needed to restore that control plane. + +The intended trust boundary is deliberately split: + +| Component | Role | Notes | +| --- | --- | --- | +| Root CA | trust anchor | Stored in `AWS KMS`; used only for intermediate issuance and rotation | +| `step-ca` on `VP6630` | online issuing intermediate | Handles day-to-day certificate issuance for internal services | +| ACME clients | automated consumers | Used by `cert-manager` and other internal services that can rotate through ACME | +| `Vault` | secret management consumer | May issue or store service-specific material later, but does not own bootstrap PKI | + +This design accepts that routing, internal DNS, and the online intermediate CA share the `VP6630` failure domain. That is an intentional trade for the homelab: a single edge host keeps the bootstrap path simple, while the root CA remains outside that host's routine operating privileges. + ## Control Flow ### 1. Platform Bootstrap