Skip to content

Add application credential finalizer management#369

Open
Deydra71 wants to merge 1 commit into
openstack-k8s-operators:mainfrom
Deydra71:appcred-finalizer
Open

Add application credential finalizer management#369
Deydra71 wants to merge 1 commit into
openstack-k8s-operators:mainfrom
Deydra71:appcred-finalizer

Conversation

@Deydra71
Copy link
Copy Markdown
Contributor

@Deydra71 Deydra71 commented Apr 27, 2026

Jira: OSPRH-29269

Application Credential dev-doc: https://github.com/openstack-k8s-operators/dev-docs/blob/main/application_credentials.md

  • Tracks the active AC secret name in Status.ApplicationCredentialSecret
  • Add openstack.org/watcher-ac-consumer finalizer to the AC secret after service config is rendered
  • On AC rotation, move the finalizer from the old secret to the new one
  • On CR deletion, remove the consumer finalizer from the AC secret before cleaning up the CR

This ensures that the keystone-operator cannot revoke a rotated AC secret while Watcher is still consuming it.

2026-04-28T12:04:45Z	INFO	Controllers.Watcher	Added consumer finalizer	{"controller": "watcher", "controllerGroup": "watcher.openstack.org", "controllerKind": "Watcher", "Watcher": {"name":"watcher","namespace":"openstack"}, "namespace": "openstack", "name": "watcher", "reconcileID": "ac040495-e159-4291-bdd4-bb370a316368", "object": "ac-watcher-16b84-secret", "finalizer": "openstack.org/watcher-ac-consumer"}
2026-04-28T12:04:45Z	INFO	Controllers.Watcher	Removed consumer finalizer	{"controller": "watcher", "controllerGroup": "watcher.openstack.org", "controllerKind": "Watcher", "Watcher": {"name":"watcher","namespace":"openstack"}, "namespace": "openstack", "name": "watcher", "reconcileID": "ac040495-e159-4291-bdd4-bb370a316368", "object": "ac-watcher-b7b42-secret", "finalizer": "openstack.org/watcher-ac-consumer"}

Depends-On: openstack-k8s-operators/keystone-operator#685

Assisted-by: Claude Opus 4.6 noreply@anthropic.com

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign amoralej for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/1c533ab5c6e44e719416fe0284f555ae

✔️ openstack-meta-content-provider-master SUCCESS in 3h 57m 23s
✔️ watcher-operator-validation-master SUCCESS in 2h 14m 00s
✔️ openstack-meta-content-provider-epoxy SUCCESS in 3h 41m 26s
✔️ watcher-operator-validation-epoxy SUCCESS in 1h 57m 54s
✔️ watcher-operator-validation-epoxy-ocp4-16 SUCCESS in 1h 54m 38s
✔️ noop SUCCESS in 0s
watcher-operator-kuttl FAILURE in 1h 10m 00s

Copy link
Copy Markdown
Contributor

@amoralej amoralej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my note about the workflow, please.

May you also validate the change with finalizer in the existing appcred kuttl tests? https://github.com/openstack-k8s-operators/watcher-operator/tree/main/test/kuttl/test-suites/default/appcred-tests

return ctrl.Result{}, err
}

if instance.Spec.Auth.ApplicationCredentialSecret != "" || instance.Status.ApplicationCredentialSecret != "" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the finalizer of the old secret is created after the configuration for DBjobs is recreated, but the watcher services have not been reconfigured yet to use the new config. In case something fails between this finalizer removal and the reconciling of the new config in watcherapi, watcherapplier and watcherdecisionengine, I guess it may break the running services. Unless, the logic to remove revoke the ACs without finalizer checks the Ready status of watcher (I'm not sure if that's the case). Will keystone do any check on the status of watcher?

An alternative workflow may be:

  1. Add the finalizer to the new AC here (but not removing the finalizer from the old one and not updating the instance.Status.ApplicationCredentialSecret.
  2. Remove the finalizer from the old finalizer and adding the instance.Status.ApplicationCredentialSecret right before mariadbv1.DeleteUnusedMariaDBAccountFinalizers (https://github.com/Deydra71/watcher-operator/blob/685c276a76228fbd999ff0b3dbbd821477c009bc/internal/controller/watcher_controller.go#L552)

Actually, this case is quite similar to the one with the mariadbaccounts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A valid concern and keystone-operator is protecting both the current and previous AC secrets via status.previousSecretName that's saved in AC CR, independent of consumer finalizers. The consumer finalizer is a second layer of defense, but the previousSecretName check in cleanupUnusedRotatedSecrets ensures the old secret is never revoked during the rotation.

So if IIUC, the alternative splitting pattern is not necessary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will later add a comment on that code part in keystone-operator, so it's clear what is not revoked.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint! So my understanding of the AC rotation workflow is:

  • The previous secret is stored in status.previousSecretName and the new one in status.SecretName of the AC CR.
  • The watcher controller removes the finalizer in the old secret and adds it to the new one.
  • The keystone operator revokes all ACs for secrets which are not in status.previousSecretName or status.SecretName and has no finalizer.

This cover my concern for one rotation which is probably enough in regular terms, specially for the automatic rotations proposed in the doc.

Doing the splitting patern may cover other potential cases when doing multiple rotations. For example, a user manually rotating the ACs. Think in the case of an openstack operator wants to rotate the AC for any reason:

  • A cloud admin do the manual rotation step (i.e. Patching status.expiresAt to a timestamp in the past).
  • The AC is rotated successfully and goes to Ready state.
  • For any reason Watcher fails to deploy the new deployment and keeps running with the old AC which is still protected by both finalizer and being in the previousSecretName.
  • The cloud admin thinks... "mmm maybe something was wrong with the rotation, let's rotate it again".
  • Watcher fails to deploy again (as the issue is unrelated to AC).
  • The cloud admin does the rotation process again, the AC gets rotated and the orginal AC is removed from previousSecretName and the finalizer removed.
  • The original AC is cleaned by the keystone operator.
  • Watcher stops working.

I may be missing something or this case being too elaborate but IMHO, the other pattern, adding finalizer to the new one first and removing it from the old one at the end, is more robust in terms of service availability. it may be argued that it's less secure as a problem removing the finalizer after the deployment is done would prevent from revoking and unused AC but I'd say this case is much less likely and usually, keep the service running is a top priority.

This may not be a blocker for me, but i think it's better to discuss this now than later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this situation can definitely happen even if manual rotation is discouraged and it's not part of normal operation, but emergency/out of order situations.

But still - I agree that keeping the service alive is the top priority. The split is worth doing in all service operators, I will implement it in watcher and other single deployment services, but for nova and telemetry I need to put more thought in to it, so it could be done as a follow up for some operators, also based on other reviewers.

Thanks for this input!

})
})

When("ApplicationCredential consumer finalizer is managed", func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! functional tests looks good wrt the impelemented workflow,

@Deydra71 Deydra71 force-pushed the appcred-finalizer branch from 685c276 to d4c4d29 Compare May 13, 2026 10:40
@centosinfra-prod-github-app
Copy link
Copy Markdown

This change depends on a change that failed to merge.

Change openstack-k8s-operators/keystone-operator#685 is needed.

@centosinfra-prod-github-app
Copy link
Copy Markdown

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/watcher-operator for 369,2dc724033ff9beaf4dcb932dc39987462b22c368

Signed-off-by: Veronika Fisarova <vfisarov@redhat.com>
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 14, 2026

@Deydra71: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/functional 1822be7 link true /test functional

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@Deydra71
Copy link
Copy Markdown
Contributor Author

I had to reorder the env test to include all subservices to be ready before checking AC finalizer, but I think there's a race condition in env test timing now happening about once in 5 test runs :(

I think it's because after the rotation changes Spec.Auth.ApplicationCredentialSecret, the Watcher controller calls ensureAPI which updates the WatcherAPI sub CR's spec (incrementing the generation) and the WatcherAPI controller needs to complete a full reconcile cycle to propagate WatcherAPIReadyCondition = True while the test simulations race with the sub CR controller's spec updates.

I might be overcomplicating it, but not sure what could help to make it pass 100% of time, @amoralej any suggestion?

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdoproject.org/buildset/f75725d6839e491e9f43a931fe13c76c

✔️ openstack-meta-content-provider-master SUCCESS in 3h 07m 55s
✔️ watcher-operator-validation-master SUCCESS in 2h 13m 06s
✔️ openstack-meta-content-provider-epoxy SUCCESS in 2h 39m 26s
✔️ watcher-operator-validation-epoxy SUCCESS in 2h 00m 50s
✔️ watcher-operator-validation-epoxy-ocp4-18 SUCCESS in 2h 02m 04s
✔️ noop SUCCESS in 0s
watcher-operator-kuttl FAILURE in 1h 19m 05s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants