Skip to content

Add live pipeline reconfiguration and shutdown control to the admin API #2617

@lquerel

Description

@lquerel

Pre-filing checklist

  • I searched existing issues and didn't find a duplicate

Component(s)

Rust OTAP dataflow (rust/otap-dataflow/)

Objective

Enable operators to manage logical pipelines in a running OTAP Dataflow Engine instance through the admin API, without restarting the engine. This includes creating a new pipeline in an existing group, updating an existing pipeline, resizing pipeline core allocation, detecting no-op updates, and shutting down an individual pipeline.

Rationale

Today, pipeline topology and resource changes require engine restarts or full config redeploys, which increases operational cost and risk. The engine needs a resident control plane that can safely apply pipeline changes in place, track progress, expose status to operators, and preserve service continuity during updates.

Scope

  • Add admin API endpoints under /api/v1/groups/... for pipeline create/update and shutdown.
  • Refactor the controller into a long-lived runtime manager with live pipeline registry/state.
  • Support pipeline update modes:
    • create for new logical pipelines
    • rolling replace for topology/config changes
    • resize for resource-only core allocation changes
    • noop for identical effective configs
  • Add rollout and shutdown operation tracking with status lookup endpoints.
  • Extend runtime/state identity with deployment generations so overlapping instances remain distinguishable.
  • Preserve existing /status compatibility while surfacing rollout metadata.
  • Add operator documentation for live reconfiguration behavior and API usage.

Acceptance Criteria

  • PUT /api/v1/groups/{group}/pipelines/{id} can create a new pipeline in an existing group.
  • The same endpoint can update an existing pipeline with a health-gated rolling cutover.
  • Resource-only core allocation changes perform resize behavior without restarting unchanged cores.
  • Identical effective pipeline updates return a successful noop result without restarting instances.
  • POST /api/v1/groups/{group}/pipelines/{id}/shutdown shuts down a logical pipeline and exposes shutdown progress.
  • Rollout and shutdown status can be queried through dedicated status endpoints.
  • Overlapping old/new instances are distinguishable in observed state and status through deployment generation.
  • Existing status payloads remain backward compatible, with rollout metadata added rather than replacing current fields.
  • Controller and admin tests cover create, replace, resize, noop, shutdown, rollback/failure handling, and endpoint behavior.
  • Operator documentation explains the workflow and API examples.

Dependencies or Blockers

#831 will be supported in a future PR.

Additional Context

  • This work is pipeline-scoped only. Group-level multi-pipeline rollouts are out of scope (for now).
  • Runtime config persistence is out of scope; changes are applied in memory only.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions