Summary
Kubernetes launchers should expose a first-class, policy-controlled way for a task to request selected pod-level metadata, especially autoscaler eviction protection for long-running jobs.
Problem
Some long-running Kubernetes workloads need pod metadata such as:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
to avoid disruption during cluster scale-down. Today, task annotations are launcher/control-plane inputs, not Kubernetes pod annotations. That boundary is good: blindly copying every task annotation onto pod metadata would leak internal launcher controls and could accidentally trigger Kubernetes/webhook behavior.
The gap is that downstream launcher wrappers may need to add one-off allowlists to forward specific pod annotations. That works as a tactical fix, but it is not an ideal upstream API.
Proposal
Add an explicit Kubernetes-launcher API for eviction protection, for example:
tangleml.com/launchers/kubernetes/safe_to_evict: "false"
The Kubernetes launcher would map this to pod metadata:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
This should apply to both direct pod launches and Job/multi-node launches by setting the pod template metadata annotations.
A more general follow-up option would be a validated/allowlisted pod-annotation passthrough API, but the autoscaler case can be solved with a narrower first-class knob.
Non-goals
- Do not pass through all task annotations to Kubernetes pod metadata.
- Do not make arbitrary Kubernetes metadata mutation the default behavior.
- Do not require task authors to know every internal launcher annotation.
Acceptance criteria
- Kubernetes pod launcher supports an explicit safe-to-evict task annotation.
- Kubernetes Job / multi-node launcher sets the annotation on
spec.template.metadata.annotations.
- Boolean/string values are validated or normalized.
- Tests cover both forwarding the supported annotation and not forwarding arbitrary task annotations.
- Documentation explains when to use this for long-running jobs.
Migration note
Downstream deployments may carry a narrow allowlisted passthrough as an immediate operational fix. Once this exists upstream and is released, those wrappers can adopt the Tangle version and remove the local one-off mapping.
Summary
Kubernetes launchers should expose a first-class, policy-controlled way for a task to request selected pod-level metadata, especially autoscaler eviction protection for long-running jobs.
Problem
Some long-running Kubernetes workloads need pod metadata such as:
to avoid disruption during cluster scale-down. Today, task annotations are launcher/control-plane inputs, not Kubernetes pod annotations. That boundary is good: blindly copying every task annotation onto pod metadata would leak internal launcher controls and could accidentally trigger Kubernetes/webhook behavior.
The gap is that downstream launcher wrappers may need to add one-off allowlists to forward specific pod annotations. That works as a tactical fix, but it is not an ideal upstream API.
Proposal
Add an explicit Kubernetes-launcher API for eviction protection, for example:
The Kubernetes launcher would map this to pod metadata:
This should apply to both direct pod launches and Job/multi-node launches by setting the pod template metadata annotations.
A more general follow-up option would be a validated/allowlisted pod-annotation passthrough API, but the autoscaler case can be solved with a narrower first-class knob.
Non-goals
Acceptance criteria
spec.template.metadata.annotations.Migration note
Downstream deployments may carry a narrow allowlisted passthrough as an immediate operational fix. Once this exists upstream and is released, those wrappers can adopt the Tangle version and remove the local one-off mapping.