operator: KBS API for LUKS key registration#248
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: iroykaufman The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Jakob-Naucke
left a comment
There was a problem hiding this comment.
Thank you for creating this, and thank you for already creating a test. Make sure that linting & build/unit tests pass.
| pub async fn launch_trustee_sync_controller(client: Client) { | ||
| let deployments: Api<Deployment> = Api::default_namespaced(client.clone()); | ||
| let watcher_config = watcher::Config { | ||
| label_selector: Some("app=kbs".to_string()), |
There was a problem hiding this comment.
nit: maybe a constant also used in generate_kbs_{service,deployment} is better
| use serde::{Serialize, Serializer}; | ||
| use serde_json::{Value::String as JsonString, json}; | ||
| use std::collections::BTreeMap; | ||
| use kbs_client; |
There was a problem hiding this comment.
Is this going to be https://github.com/confidential-containers/trustee/tree/main/tools/kbs-client?
dc726d8 to
c5ec81d
Compare
|
The CI is failing because we need to have some Perl dependency. I solved this by adding this |
|
@iroykaufman these tests run in (Ubuntu) GHA containers that aren't affected by the Containerfile (this would get picked up in integration tests once I manage to fix them). You'll need to add them to the steps in steps:
- name: "Install OpenSSL dependencies"
run: apt-get install -y … |
|
@iroykaufman whoops, you can set your own build container, and we do, and it's a Fedora container, so it should be |
|
@iroykaufman thanks for the updates. integration tests are failing because these packages are missing there too. in my intuition, those shouldn't need to be built on the host at all so let me look into that first. e: before the next push you can also still look into the lint failures |
Thanks, let me know and then I'll push with the lint fix |
|
@Jakob-Naucke maybe update the container file in buildroot with the openssl dependency is better then adding them in the github workflow files. WDYT? |
Yes. I got confused by first thinking we were on GHA's Ubuntu containers, but the buildroot is the better place. |
|
and the CI host does need these to compile the integration tests, which is slightly not how I thought cargo dependencies worked, but alas, these packages are now installed |
|
I deleted the changes for the github workflow and opened PR#9. When this is merged, the CI should work |
|
@sourcery-ai review |
Reviewer's GuideImplements KBS API–based LUKS key management so trustee no longer needs its deployment patched on each key update, introduces an auth keypair/secret for KBS, adds a controller to resync machine LUKS keys when trustee pods restart, switches KBS storage to kvstorage, and adds an end‑to‑end test covering LUKS key sync and deletion. Sequence diagram for LUKS key registration via KBS APIsequenceDiagram
actor User as ClusterOperator
participant K8s as KubernetesAPI
participant RegSrv as RegisterServerController
participant Trustee as TrusteeModule
participant KBS as KBSService
User->>K8s: Create/Update Machine
K8s-->>RegSrv: Machine event
RegSrv->>RegSrv: keygen_reconcile
RegSrv->>Trustee: generate_secret(client, id, owner_reference)
Trustee->>K8s: Create Secret id (root key)
K8s-->>Trustee: Secret id created
RegSrv->>Trustee: send_secret(client, id)
Trustee->>K8s: Get Secret id
K8s-->>Trustee: Secret id (root data)
Trustee->>K8s: Get Secret trustee-auth
K8s-->>Trustee: Secret trustee-auth (private.key)
Trustee->>KBS: set_resource(url, auth_key, resource_bytes, path, [])
KBS-->>Trustee: 201 Created
Trustee-->>RegSrv: Ok
RegSrv-->>K8s: Reconcile result Action::await_change()
Sequence diagram for LUKS key resync on trustee restartsequenceDiagram
participant K8s as KubernetesAPI
participant SyncCtl as TrusteeSyncController
participant TrusteeDep as TrusteeDeployment
participant Trustee as TrusteeModule
participant KBS as KBSService
Note over TrusteeDep,K8s: TrusteeDeployment restarted
K8s-->>SyncCtl: Deployment event (label app=kbs)
SyncCtl->>SyncCtl: trustee_deployment_reconcile
SyncCtl->>TrusteeDep: Read status (ready_replicas, replicas)
TrusteeDep-->>SyncCtl: Status ready
SyncCtl->>Trustee: sync_all_machine_luks_key(client)
Trustee->>K8s: List Secrets (default namespace)
K8s-->>Trustee: Secrets owned by Machines
loop For each machine Secret id
Trustee->>Trustee: send_secret(client, id)
Trustee->>K8s: Get Secret id
K8s-->>Trustee: Secret id (root data)
Trustee->>K8s: Get Secret trustee-auth
K8s-->>Trustee: Secret trustee-auth (private.key)
Trustee->>KBS: set_resource(url, auth_key, resource_bytes, path, [])
KBS-->>Trustee: Response
end
Trustee-->>SyncCtl: Ok
SyncCtl-->>K8s: Action::await_change()
Sequence diagram for LUKS key deletion via KBS APIsequenceDiagram
participant K8s as KubernetesAPI
participant RegSrv as RegisterServerController
participant Trustee as TrusteeModule
participant KBS as KBSService
K8s-->>RegSrv: Machine deletion event
RegSrv->>RegSrv: keygen_reconcile finalizer
RegSrv->>Trustee: delete_secret(client, id)
Trustee->>K8s: Get Secret trustee-auth
K8s-->>Trustee: Secret trustee-auth (private.key)
Trustee->>KBS: delete_resource(url, auth_key, path, [])
KBS-->>Trustee: 200 OK
Trustee-->>RegSrv: Ok
RegSrv-->>K8s: Finalizer completed (Action::await_change())
Class diagram for new Ed25519 key pair and trustee auth secret generationclassDiagram
class Ed25519KeyPair {
+Vec~u8~ private_key_pem
+Vec~u8~ public_key_pem
}
class TrusteeModule {
+generate_ed25519_key_pair() Result~Ed25519KeyPair~
+generate_trustee_auth_keys_secret(client, owner_reference) Result~()~
+get_auth_key(client) Result~String~
+send_secret(client, id) Result~()~
+delete_secret(client, id) Result~()~
+sync_all_machine_luks_key(client) Result~()~
+launch_trustee_sync_controller(client) void
}
TrusteeModule --> Ed25519KeyPair : creates
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
launch_trustee_sync_controller, the watcherlabel_selectoris set as"app={KBS_LABEL_SELECTOR}", which will not expand the constant and thus never match your KBS pods; this should be built withformat!("app={KBS_LABEL_SELECTOR}")or similar. - The new
test_luks_key_syncrelies on grepping operator logs for specific hard-coded message substrings (including the exact count"Syncing 2 machine luks key to KBS"), which makes the test brittle to benign log or wording changes; consider asserting behavior via the KBS state or Kubernetes resources instead of log contents where possible.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `launch_trustee_sync_controller`, the watcher `label_selector` is set as `"app={KBS_LABEL_SELECTOR}"`, which will not expand the constant and thus never match your KBS pods; this should be built with `format!("app={KBS_LABEL_SELECTOR}")` or similar.
- The new `test_luks_key_sync` relies on grepping operator logs for specific hard-coded message substrings (including the exact count `"Syncing 2 machine luks key to KBS"`), which makes the test brittle to benign log or wording changes; consider asserting behavior via the KBS state or Kubernetes resources instead of log contents where possible.
## Individual Comments
### Comment 1
<location path="operator/src/trustee.rs" line_range="244" />
<code_context>
+pub async fn launch_trustee_sync_controller(client: Client) {
+ let deployments: Api<Deployment> = Api::default_namespaced(client.clone());
+ let watcher_config = watcher::Config {
+ label_selector: Some("app={KBS_LABEL_SELECTOR}".to_string()),
+ ..Default::default()
+ };
</code_context>
<issue_to_address>
**issue (bug_risk):** The label selector string is not interpolating `KBS_LABEL_SELECTOR` and will not match any pods.
Because the selector is the literal string `"app={KBS_LABEL_SELECTOR}"`, the controller will never match any `kbs`-labelled resources. Build the selector string dynamically (e.g. `format!("app={}", KBS_LABEL_SELECTOR)` or a const with the expanded value) so it actually aligns with the deployment and service labels.
</issue_to_address>
### Comment 2
<location path="tests/trusted_execution_cluster.rs" line_range="604-613" />
<code_context>
+ // Delete machine1 and verify its secret is removed from both K8s and KBS
</code_context>
<issue_to_address>
**issue (testing):** Also assert that the machine Secret is deleted from Kubernetes, not only from KBS logs
Right now the test only checks the operator logs for `"Secret {id1} deleted successfully"`, which covers the KBS side. To match the comment and fully validate behavior, please also assert that the corresponding Kubernetes Secret no longer exists (for example, `secrets_api.get(&machine1_uuid).await` returns `NOT_FOUND` or fails via the Poller).
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| // Delete machine1 and verify its secret is removed from both K8s and KBS | ||
| machines | ||
| .delete(&machine1_name, &Default::default()) | ||
| .await?; | ||
| test_ctx.info(format!("Deleted Machine 1: {machine1_name}")); | ||
|
|
||
| let poller = Poller::new() | ||
| .with_timeout(Duration::from_secs(60)) | ||
| .with_interval(Duration::from_secs(2)) | ||
| .with_error_message("Machine1 secret not deleted from KBS".to_string()); |
There was a problem hiding this comment.
issue (testing): Also assert that the machine Secret is deleted from Kubernetes, not only from KBS logs
Right now the test only checks the operator logs for "Secret {id1} deleted successfully", which covers the KBS side. To match the comment and fully validate behavior, please also assert that the corresponding Kubernetes Secret no longer exists (for example, secrets_api.get(&machine1_uuid).await returns NOT_FOUND or fails via the Poller).
There was a problem hiding this comment.
Yes, please don't check the trustee log, it is a bad practice. But rather, try to evaluate if the objects exist/have been deleted
There was a problem hiding this comment.
Good point, agreed on avoiding trustee log inspection.
For this test we can keep the KBS side validation via operator logs (since the operator is the actor performing the deletions), and add an explicit check on the Kubernetes Secret itself. Something along these lines at the end of the test:
// Verify the machine1 Secret is gone from Kubernetes
let poller = Poller::new()
.with_timeout(Duration::from_secs(60))
.with_interval(Duration::from_secs(2))
.with_error_message("Machine1 secret not deleted from Kubernetes".to_string());
poller
.poll_async(|| {
let secrets_api = secrets_api.clone();
let id1 = machine1_uuid.clone();
async move {
match secrets_api.get(&id1).await {
Ok(_) => Err(anyhow::anyhow!("Machine1 secret still exists")),
Err(kube::Error::Api(ae)) if ae.code == 404 => Ok(()),
Err(e) => Err(e.into()),
}
}
})
.await?;That way we:
- Assert the Secret resource is actually deleted from the cluster.
- Avoid depending on trustee logs entirely.
If you’d prefer, we can also drop the KBS log-based assertion and instead exercise the KBS client via a direct call (if we expose such a helper in the test harness), but that’s a larger change than this PR seems to aim for.
a751ea3 to
eeb4005
Compare
Jakob-Naucke
left a comment
There was a problem hiding this comment.
- If changes to the base branch made a rebase necessary, the usual practice is to fix and keep the existing commits, so I think you can squash in this instance.
- Pushing new buildroots got auto-disabled due to inactivity (the more you know). I re-enabled it so FindBin etc. are hopefully there on next attempt.
|
|
||
| [admin] | ||
| type = "DenyAll" | ||
| type = "Simple" |
There was a problem hiding this comment.
Sorry about this one, didn't see it coming but you fixed it :)
This is still WIP so I'll squash it into one commit after everything is fixed. |
9c68c4b to
7dd60c7
Compare
f62a749 to
5caad14
Compare
fbfdcd0 to
5e0e814
Compare
Jakob-Naucke
left a comment
There was a problem hiding this comment.
To not break bisects, the Trustee bump should be in the same commit as the API interaction right? It's good that the tests remain separate like you put it though.
The API should also work with v0.17.0, but we are using a newer version of kbs-client this can be a problem, so I'll put them in the same commit. I need to put the three last commits(without the tests) together hupfuly is will be clear to read. If you have any other ideas to make it better, let me know |
| const TRUSTED_AK_KEYS_VOLUME: &str = "trusted-ak-keys"; | ||
| const TRUSTED_AK_KEYS_DIR: &str = "/etc/tpm/trusted_ak_keys"; | ||
| const TRUSTEE_AUTH_SECRET: &str = "trustee-auth"; | ||
| const TRUSTEE_AUTH_KEY_DIR: &str = "/key"; |
There was a problem hiding this comment.
nit: perhaps better /opt/trustee/keys or /etc/trustee/keys
| @@ -1,37 +1,36 @@ | |||
| [http_server] | |||
| sockets = ["0.0.0.0:8080"] | |||
| sockets = ["0.0.0.0:8080"] | |||
There was a problem hiding this comment.
There are no spaces added in a toml file.
| match trustee::generate_attestation_policy(client.clone(), owner_reference.clone()).await { | ||
| Ok(_) => info!("Generate configmap for the attestation policy",), | ||
| Err(e) => error!("Failed to create the attestation policy configmap: {e}"), | ||
| match reference_values::create_pcrs_config_map(client.clone()).await { |
There was a problem hiding this comment.
Is there a need to keep the call to reference_values::create_pcrs_config_map in main() (line -291 or +297)?
There was a problem hiding this comment.
There is no need for this call in main. I removed it
96f4f92 to
db6751e
Compare
Signed-off-by: Roy Kaufman <rkaufman@redhat.com>
kube-rs finalizer() does not call the Apply handler on the first reconcile — it only adds the finalizer and returns. This means adopt_approved_image inside image_add_reconcile never runs on the first reconcile. If the TEC is deleted before the second reconcile, the owner reference is never set and GC (garbage collector) cannot cascade-delete the ApprovedImage. Move adoption before the finalizer() call in image_reconcile so it runs on the very first reconcile. For more information see https://docs.rs/kube/latest/kube/runtime/finalizer/fn.finalizer.html in sections Guarantees, Assumptions (Third point) and Expected Flow(from 1 to 6). Signed-off-by: Roy Kaufman <rkaufman@redhat.com>
All trustee resources, including the resource policy, attestation policy, RV, LUKS key, and AK, are now synchronized using the KBS API.
The main motivation for this is replacing the patch mechanism that causes a restart of the trustee deployment.
Also, this abstraction over the trustee backend simplifies the process of upgrading to future versions of trustee.
Config:
- Update kbs-config.toml to v0.20.0 format
- Store reference values in dedicated ConfigMap (trustee-rv-data)
instead of trustee-data
API-driven resource management:
- Add KBS API client functions: send_secret, delete_secret,
register_ak, sync_resource_policy, sync_attestation_policy,
sync_reference_values
- Replace volume-mount approach for secrets with KBS API calls
- Add trustee deployment reconciler that syncs policies, reference
values, and secrets when deployment becomes available
Dependencies:
- Add kbs-client and jsonwebtoken crates
- Bump trustee image tag to v0.20.0
Signed-off-by: Roy Kaufman <rkaufman@redhat.com>
checks that the luks key/AK is first sent to tustee and validates that after trustee restarts, the keys are sent again. Also, at the end, I delete one of the machines and check that the secret has been deleted. Signed-off-by: Roy Kaufman <rkaufman@redhat.com>
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Currently, every time the LUKS key is updated, the operator patches the trustee deployment, which causes a restart of the pod. This PR introduces a way to avoid this by setting the LUKS key using the KBS API.
Core implementation points:
Tests:
Summary by Sourcery
Integrate KBS API–based LUKS key management for trustee, replacing deployment patching with authenticated API calls and adding automatic resync on trustee restarts.
New Features:
Enhancements:
Build:
Tests: