feat: migrate to Server-Side Apply (SSA) and enforce controller owner…#269
feat: migrate to Server-Side Apply (SSA) and enforce controller owner…#269SpaceFace02 wants to merge 3 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: SpaceFace02 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Reviewer's GuideMigrates key operator interactions to Kubernetes Server-Side Apply (SSA), standardizes controller ownership via controller=true ownerReferences and a shared FIELD_MANAGER, tightens AttestationKey/Machine/Secret ownership semantics, and adds focused tests around SSA bodies, ownership, and idempotency, plus a deploy helper script. Sequence diagram for approve_ak SSA and ownership transfersequenceDiagram
participant Operator as AttestationKeyController
participant K8s as KubernetesAPI
participant AK as AttestationKey
participant Machine
participant Secret
Operator->>K8s: update_status!(AttestationKeyStatus) PATCH /status (SSA)
K8s-->>Operator: updated AttestationKey
Operator->>K8s: aks.patch(Patch::Merge ownerReferences)
K8s-->>Operator: AttestationKey owned by Machine (controller=true)
Operator->>K8s: Api<Secret>.get(secret_name)
alt Secret does not exist
K8s-->>Operator: 404 NotFound
Operator->>K8s: apply_resource!(Secret, secret) PATCH (SSA)
K8s-->>Operator: Secret created (owner AttestationKey controller=true)
else Secret exists
K8s-->>Operator: existing Secret
end
Entity relationship diagram for controller ownership chainerDiagram
TrustedExecutionCluster ||--o{ Deployment : owns_controller
TrustedExecutionCluster ||--o{ Service : owns_controller
TrustedExecutionCluster ||--o{ ConfigMap : owns_controller
TrustedExecutionCluster ||--o{ Job : owns_controller
TrustedExecutionCluster ||--o{ ApprovedImage : owns_controller
Machine ||--|| AttestationKey : owns_controller
AttestationKey ||--|| Secret : owns_controller
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="operator/src/trustee.rs" line_range="176-182" />
<code_context>
let container = pod_spec.containers.get_mut(0).context(err)?;
let vol_mounts = container.volume_mounts.get_or_insert_default();
+ // Additional check to prevent duplicate mounts.
if add {
- let (volume, volume_mount) = generate_secret_volume(id);
- pod_spec.volumes.get_or_insert_default().push(volume);
- vol_mounts.push(volume_mount);
+ let already_mounted = pod_spec
+ .volumes
+ .as_ref()
+ .is_some_and(|vs| vs.iter().any(|v| v.name == id));
+ if !already_mounted {
+ let (volume, volume_mount) = generate_secret_volume(id);
+ pod_spec.volumes.get_or_insert_default().push(volume);
</code_context>
<issue_to_address>
**issue (bug_risk):** Duplicate-volume check can prevent adding a missing volumeMount for partially configured pods
The `already_mounted` check only inspects `pod_spec.volumes`. In a partially configured pod where the volume exists but the `volumeMount` is missing, `add == true` will now skip adding both the volume and the mount, leaving the pod broken with no way for this code to self-heal. Instead, either:
- only skip when both the volume and corresponding `volumeMount` are present, or
- when the volume exists but the mount does not, add just the missing `volumeMount`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
3388d72 to
6683898
Compare
|
/review @Jakob-Naucke |
|
/cc @Jakob-Naucke |
|
/cc @alicefr |
| $api.patch_status($name, ¶ms, &patch).await | ||
| .map_err(Into::<anyhow::Error>::into) | ||
| }}; | ||
| ($api:ident, $name:expr, $status:expr, $type:ty, $field_manager:expr, force) => {{ |
There was a problem hiding this comment.
So this sets the component (operator, compute-pcrs) as field manager for the status, but also just uses force the only time that it could actually conflict. I don't really see the downside of having just one field manager by that point but I'm open to hearing @alicefr's opinion before you refactor.
There was a problem hiding this comment.
Previously, we had multiple writers to the Status Field of the ApprovedImage. the ComputePCR also updated the status with committed_condition (while status as PCR's are computing), and the operator requeued every 10 seconds to check whether the computation had completed and then updated the same field.
To fix this, ideally we should have only the operator update the status of the field. We can discuss this design decision in detail with Alice.
There was a problem hiding this comment.
I don't really see the downside of having just one field manager by that point
Field manager is inherently a SSA concept, initially we had 2 writers to the same field, and updated it client side. There was no concept of field managers, because the client patch would overwrite any changes, even if written previously by another writer.
There was a problem hiding this comment.
To fix this, ideally we should have only the operator update the status of the field. We can discuss this design decision in detail with Alice.
Yeah, we need to revisit the design of the approved images ideally, there should be a single writer. Let's discuss it today
| /// Generate an OwnerReference for any Kubernetes resource | ||
| pub fn generate_owner_reference<T: Resource<DynamicType = ()>>( | ||
| /// OwnerReference with `controller: true` for lifecycle and garbage collection. | ||
| pub fn generate_owner_controller_reference<T: Resource<DynamicType = ()>>( |
There was a problem hiding this comment.
not sure why this is necessary, it's not like there's a new function that this function's name must disambiguate from
There was a problem hiding this comment.
Mainly to disambiguate between a controller owner, and a linking owner.
| let api: Api<AttestationKey> = Api::default_namespaced(client.clone()); | ||
|
|
||
| // Get the TrustedExecutionCluster to use as owner reference | ||
| // TEC is owner-controller of the AttestationKey until the operator approves the key and transfers control to Machine, which then becomes the controller and manages lifecycle of the AttestationKey. |
There was a problem hiding this comment.
What's an owner-controller?
There was a problem hiding this comment.
Previously, we had the function generate_owner_reference which set controller to true.
This is ambiguous, because not all owners are controllers. A k8s object can have only 1 controller in its set of owner references (here). It can however have multiple non-controller linking owner references.
There was a problem hiding this comment.
if we keep these renames and comment changes, I think they should be a separate commit. not sure if we do though, cc @alicefr
There was a problem hiding this comment.
yes, please try to generate clean commits it gets easier to review
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn test_approve_ak_full_flow() { |
There was a problem hiding this comment.
It's generally good to expand the tests here. Maybe put new tests in a separate commit because this commit's almost unreadably big
There was a problem hiding this comment.
Sure, had kept unit tests in the same commit as they test the same functionality introduced and made sense to be together. But I will split them for readability and ease of review.
|
@SpaceFace02 please sign the commit. We should add a check that verifies that all the commits are signed. Filed #270 |
|
@SpaceFace02 is there a way to split the single commit into multiple. This is kind of hard to review it |
…ship Migrate resource creation and updates from client-side POST/PUT to Server-Side Apply (SSA) where appropriate, and strengthen ownership semantics with controller references throughout. - TEC owns: Deployments, Services, ConfigMaps, Jobs, ApprovedImages - Machine owns: AttestationKey (transferred from TEC via merge patch once its approved automatically). - AttestationKey owns: Secret (with controller: true + finalizer) - Duplicate volume entries: `do_mount_secret` now checks for existing volumes before adding, preventing 422 errors on reconciler retry. Signed-off-by: Chirag Rao <crao@redhat.com> Assisted-by: Cursor
Signed-off-by: Chirag Rao <crao@redhat.com> Assisted-by: Cursor
Signed-off-by: Chirag Rao <crao@redhat.com> Assisted-by: Cursor
6683898 to
3704ff0
Compare
Have split the commits into 3: |
Jakob-Naucke
left a comment
There was a problem hiding this comment.
Thank you for the separate commits! However, when a test update is necessary to how the code changes, it should be one commit so that bisects stay intact.
| let api: Api<AttestationKey> = Api::default_namespaced(client.clone()); | ||
|
|
||
| // Get the TrustedExecutionCluster to use as owner reference | ||
| // TEC is owner-controller of the AttestationKey until the operator approves the key and transfers control to Machine, which then becomes the controller and manages lifecycle of the AttestationKey. |
There was a problem hiding this comment.
if we keep these renames and comment changes, I think they should be a separate commit. not sure if we do though, cc @alicefr
| // 1. PATCH /status (SSA status update). AttestationKey is approved. | ||
| // 2. PATCH owner transfer (Merge). Ownership transfered to Machine. | ||
| // 3. GET secret (check existence) | ||
| // 4. PATCH secret (apply_resource! SSA create) |
There was a problem hiding this comment.
the numbers and HTTP endpoints are in the code, so I think this is just drift risk. if you like you can add the comment as to what happens at those respective points.
| // SSA related unit test helpers. | ||
|
|
There was a problem hiding this comment.
can drop, it's just one and could be reused
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |


feat: migrate to Server-Side Apply (SSA) and enforce controller ownership
Fixes: #258
Migrate resource creation and updates from client-side POST/PUT to Server-Side Apply (SSA) where appropriate, and strengthen ownership semantics with controller references throughout.
This commit establishes ownership of various fields and CRD's, to prevent accidental overwrites or race conditions by other controllers It doesn't migrate all patches to SSA, if we plan to migrate completely to SSA, there would be significant code rework, and honestly, its not necessary for now.
All resources created by the operator now use
controller: truein ownerReferences, enabling Kubernetes garbage collection:generate_owner_referencerenamed togenerate_owner_controller_referenceto reflect that all owner refs are now controller refs withblockOwnerDeletion: true.All controllers are owners, but not all owners have to be controllers. Kubernetes mandates only one controller. Multiple owners are fine for Garbage Collection and delete cascades.
New
apply_resource!macro replacescreate_or_info_if_exists!for idempotent resource creation via SSA (PATCH with Patch::Apply).update_reference_values: uses full-data SSA patch on trustee-data ConfigMap (includes all keys: kbs-config.toml, policy.rego, reference-values.json) to prevent the field manager from implicitly relinquishing keys not in a partial patch.If a field manager (controller owning a field of a CRD) makes a SSA patch, all fields of that patch are considered owned, and any previously owned fields not part of the patch are released from its control.
update_attestation_keys: uses full-object SSA patch on the Deployment (sends entire deployment.spec) to avoid stripping required fields like spec.selector. Metadata fields (managedFields, resourceVersion, uid) are excluded from the SSA body to prevent 400 errors.do_mount_secret: kept as GET+mutate+replace (PUT) since it follows a read-modify-write pattern on the same field manager. Migrating this is significant code rework.Owner transfer in
approve_akuses Merge patch (not SSA) because SSA would upsert into the ownerReferences array, leaving stale references of both TEC. In the end TEC and Machine would co-exist as controllers, which we don't want. Merge patch replaces the entire ownerReferences field atomically.Single field manager
trusted-cluster-operatorused consistently viaFIELD_MANAGERconstant. register-server uses its ownregister-serverfield manager for Machine creation (POST with fieldManager query param, not SSA).update_status!macro now requires explicit type and field manager parameters, enabling force-apply on the status subresource where multiple writers exist (e.g. ApprovedImage status).do_mount_secretnow checks for existing volumes before adding, preventing 422 errors on reconciler retry.update_attestation_keysandsecret_reconcilenow requirecontroller: trueon AttestationKey owner refs, not justkind == "AttestationKey".Added unit tests validating SSA body content, ownership, idempotency, field manager usage, and filtering logic across trustee, reference values, attestation key register, and register server modules.
Commit message co-authored-by: Cursor cursoragent@cursor.com
Summary by Sourcery
Migrate core Kubernetes interactions to server-side apply while standardizing controller ownership and field manager usage across the operator, register-server, and auxiliary binaries.
New Features:
Bug Fixes:
Enhancements:
Tests: