Skip to content

[Feature]: Restart containers after certificates rotate #57

@spresse1

Description

@spresse1

Describe the feature

By default OpenCHAMI cycles certificates daily. Some services are restarted automatically, but not all. For example, after a certificate rotation, I see the following in the coresmd-coredns logs:

May 22 08:29:03 admin coresmd-coredns[869438]: time="2026-05-22T07:29:03Z" level=info msg="initiating cache refresh" prefix="plugins/coresmd"
May 22 08:29:03 admin coresmd-coredns[869438]: time="2026-05-22T07:29:03Z" level=error msg="failed to refresh cache: failed to fetch EthernetInterfaces from SMD: failed to execute HTTP request: Get "https://admin.cluster.hpcnexuslab.ie:8443/hsm/v2/Inventory/EthernetInterfaces": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2026-05-22T07:28:33Z is after 2026-05-22T07:28:21Z" prefix="plugins/coresmd"

I am running podman containers.

Ideally, when certificates rotate, all the services which serve certificates would be restarted to use the new certificates.

(ref: https://openchami.slack.com/archives/C066RMDS708/p1780222681743229?thread_ts=1779814910.245869&cid=C066RMDS708)

Why do you want this feature?

I am currently having to manually restart OpenCHAMI daily.

Alternatives you've considered

My current workaround is a cron job to perform the restart. However, this is sub-optimal in that:

  • Certificates rotate daily. A daily cron job would leave a potential window where certificates are invalid but the restart job has not yet fired. Additionally, if misaligned, this period could be quite long
  • A cron job could perform unnecessary restarts if a user adjusts the certificate validity period to be longer.
  • Not all services need a restart and restarting these could impair ongoing system operations.

Therefore, I think a better solution is to adjust the systemd unit files to force restarts for the containers where a restart is required after the certificate job fires.

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions