Add application credential finalizer management#369
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/1c533ab5c6e44e719416fe0284f555ae ✔️ openstack-meta-content-provider-master SUCCESS in 3h 57m 23s |
amoralej
left a comment
There was a problem hiding this comment.
See my note about the workflow, please.
May you also validate the change with finalizer in the existing appcred kuttl tests? https://github.com/openstack-k8s-operators/watcher-operator/tree/main/test/kuttl/test-suites/default/appcred-tests
| return ctrl.Result{}, err | ||
| } | ||
|
|
||
| if instance.Spec.Auth.ApplicationCredentialSecret != "" || instance.Status.ApplicationCredentialSecret != "" { |
There was a problem hiding this comment.
Note that the finalizer of the old secret is created after the configuration for DBjobs is recreated, but the watcher services have not been reconfigured yet to use the new config. In case something fails between this finalizer removal and the reconciling of the new config in watcherapi, watcherapplier and watcherdecisionengine, I guess it may break the running services. Unless, the logic to remove revoke the ACs without finalizer checks the Ready status of watcher (I'm not sure if that's the case). Will keystone do any check on the status of watcher?
An alternative workflow may be:
- Add the finalizer to the new AC here (but not removing the finalizer from the old one and not updating the instance.Status.ApplicationCredentialSecret.
- Remove the finalizer from the old finalizer and adding the instance.Status.ApplicationCredentialSecret right before mariadbv1.DeleteUnusedMariaDBAccountFinalizers (https://github.com/Deydra71/watcher-operator/blob/685c276a76228fbd999ff0b3dbbd821477c009bc/internal/controller/watcher_controller.go#L552)
Actually, this case is quite similar to the one with the mariadbaccounts.
There was a problem hiding this comment.
A valid concern and keystone-operator is protecting both the current and previous AC secrets via status.previousSecretName that's saved in AC CR, independent of consumer finalizers. The consumer finalizer is a second layer of defense, but the previousSecretName check in cleanupUnusedRotatedSecrets ensures the old secret is never revoked during the rotation.
So if IIUC, the alternative splitting pattern is not necessary?
There was a problem hiding this comment.
I will later add a comment on that code part in keystone-operator, so it's clear what is not revoked.
There was a problem hiding this comment.
Thanks for the hint! So my understanding of the AC rotation workflow is:
- The previous secret is stored in status.previousSecretName and the new one in status.SecretName of the AC CR.
- The watcher controller removes the finalizer in the old secret and adds it to the new one.
- The keystone operator revokes all ACs for secrets which are not in status.previousSecretName or status.SecretName and has no finalizer.
This cover my concern for one rotation which is probably enough in regular terms, specially for the automatic rotations proposed in the doc.
Doing the splitting patern may cover other potential cases when doing multiple rotations. For example, a user manually rotating the ACs. Think in the case of an openstack operator wants to rotate the AC for any reason:
- A cloud admin do the manual rotation step (i.e. Patching status.expiresAt to a timestamp in the past).
- The AC is rotated successfully and goes to Ready state.
- For any reason Watcher fails to deploy the new deployment and keeps running with the old AC which is still protected by both finalizer and being in the previousSecretName.
- The cloud admin thinks... "mmm maybe something was wrong with the rotation, let's rotate it again".
- Watcher fails to deploy again (as the issue is unrelated to AC).
- The cloud admin does the rotation process again, the AC gets rotated and the orginal AC is removed from previousSecretName and the finalizer removed.
- The original AC is cleaned by the keystone operator.
- Watcher stops working.
I may be missing something or this case being too elaborate but IMHO, the other pattern, adding finalizer to the new one first and removing it from the old one at the end, is more robust in terms of service availability. it may be argued that it's less secure as a problem removing the finalizer after the deployment is done would prevent from revoking and unused AC but I'd say this case is much less likely and usually, keep the service running is a top priority.
This may not be a blocker for me, but i think it's better to discuss this now than later.
There was a problem hiding this comment.
Yes, this situation can definitely happen even if manual rotation is discouraged and it's not part of normal operation, but emergency/out of order situations.
But still - I agree that keeping the service alive is the top priority. The split is worth doing in all service operators, I will implement it in watcher and other single deployment services, but for nova and telemetry I need to put more thought in to it, so it could be done as a follow up for some operators, also based on other reviewers.
Thanks for this input!
| }) | ||
| }) | ||
|
|
||
| When("ApplicationCredential consumer finalizer is managed", func() { |
There was a problem hiding this comment.
Thanks! functional tests looks good wrt the impelemented workflow,
685c276 to
d4c4d29
Compare
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/keystone-operator#685 is needed. |
d4c4d29 to
2dc7240
Compare
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
Signed-off-by: Veronika Fisarova <vfisarov@redhat.com>
2dc7240 to
1822be7
Compare
|
@Deydra71: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
I had to reorder the env test to include all subservices to be ready before checking AC finalizer, but I think there's a race condition in env test timing now happening about once in 5 test runs :( I think it's because after the rotation changes I might be overcomplicating it, but not sure what could help to make it pass 100% of time, @amoralej any suggestion? |
|
Build failed (check pipeline). Post ✔️ openstack-meta-content-provider-master SUCCESS in 3h 07m 55s |
Jira: OSPRH-29269
Application Credential dev-doc: https://github.com/openstack-k8s-operators/dev-docs/blob/main/application_credentials.md
Status.ApplicationCredentialSecretopenstack.org/watcher-ac-consumerfinalizer to the AC secret after service config is renderedThis ensures that the keystone-operator cannot revoke a rotated AC secret while Watcher is still consuming it.
Depends-On: openstack-k8s-operators/keystone-operator#685
Assisted-by: Claude Opus 4.6 noreply@anthropic.com