Add first-class Kubernetes safe-to-evict support

## Summary

Kubernetes launchers should expose a first-class, policy-controlled way for a task to request selected pod-level metadata, especially autoscaler eviction protection for long-running jobs.

## Problem

Some long-running Kubernetes workloads need pod metadata such as:

```yaml
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
```

to avoid disruption during cluster scale-down. Today, task annotations are launcher/control-plane inputs, not Kubernetes pod annotations. That boundary is good: blindly copying every task annotation onto pod metadata would leak internal launcher controls and could accidentally trigger Kubernetes/webhook behavior.

The gap is that downstream launcher wrappers may need to add one-off allowlists to forward specific pod annotations. That works as a tactical fix, but it is not an ideal upstream API.

## Proposal

Add an explicit Kubernetes-launcher API for eviction protection, for example:

```yaml
tangleml.com/launchers/kubernetes/safe_to_evict: "false"
```

The Kubernetes launcher would map this to pod metadata:

```yaml
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
```

This should apply to both direct pod launches and Job/multi-node launches by setting the pod template metadata annotations.

A more general follow-up option would be a validated/allowlisted pod-annotation passthrough API, but the autoscaler case can be solved with a narrower first-class knob.

## Non-goals

- Do not pass through all task annotations to Kubernetes pod metadata.
- Do not make arbitrary Kubernetes metadata mutation the default behavior.
- Do not require task authors to know every internal launcher annotation.

## Acceptance criteria

- Kubernetes pod launcher supports an explicit safe-to-evict task annotation.
- Kubernetes Job / multi-node launcher sets the annotation on `spec.template.metadata.annotations`.
- Boolean/string values are validated or normalized.
- Tests cover both forwarding the supported annotation and not forwarding arbitrary task annotations.
- Documentation explains when to use this for long-running jobs.

## Migration note

Downstream deployments may carry a narrow allowlisted passthrough as an immediate operational fix. Once this exists upstream and is released, those wrappers can adopt the Tangle version and remove the local one-off mapping.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add first-class Kubernetes safe-to-evict support #250

Summary

Problem

Proposal

Non-goals

Acceptance criteria

Migration note

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add first-class Kubernetes safe-to-evict support #250

Description

Summary

Problem

Proposal

Non-goals

Acceptance criteria

Migration note

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions