architecture: seller offers as first-class declarative state

## Context

`obol stack up` today has three different lifecycles bolted together. Infrastructure and the Hermes agent come up declaratively via helmfile, but seller offers (`obol sell http`, `obol sell inference`) are imperative — created by direct `kubectl apply`/CR writes from the CLI, and re-hydrated on stack-up by a bespoke `resumeSellOffers` function (see PR #487).

This issue captures the longer-term direction: bring seller offers under the same declarative helmfile pass that already manages infrastructure and Hermes, so `stack up` is one mechanism applied uniformly.

## Today (overcomplicated)

```
                            obol stack up
                                  │
              ┌───────────────────┼───────────────────┐
              │                   │                   │
              ▼                   ▼                   ▼
      Infrastructure         Agent (Hermes)     Seller offers
       (Traefik, eRPC,       via hermes.Sync()  (sell inference,
        LiteLLM,                                  sell http)
        x402-verifier,                                 │
        serviceoffer-                                  ▼
        controller,                            NOT MANAGED BY
        cloudflared, ...)                      STACK UP
              │                   │                    │
              │                   │           ┌────────┴────────┐
              ▼                   ▼           ▼                 ▼
          helmfile            helmfile  sell inference     sell http
          (declarative)       (declarative)  ↓                 ↓
                                       descriptor on       YAML manifest
                                       disk (JSON)         on disk (PR #487)
                                              │                 │
                                              ▼                 ▼
                                       host gateway        in-cluster only
                                       foreground process
                                       (#487 spawns detached;
                                        TODO: helm chart)

          ────  resume gap filled with bespoke code  ────
                  (cmd/obol/sell.go::resumeSellOffers)
```

### Three problems with this shape

1. **Two persistence formats.** \`inference.Store\` (rich JSON descriptor) and the sell-http YAML manifest store added in #487. Two walkers, two parsers, two test surfaces, two failure modes.
2. **Foreground host process asymmetry.** Only \`sell inference\` runs a host-side gateway. That's why resume has to fork-and-detach for inference but not for http, and why \`startDetachedInferenceGateway\` + PID files exist at all.
3. **Resume function exists only because seller offers aren't first-class infra.** Every recovery scenario (\`stack down\`/\`stack up\`, \`stack purge\`, host reboot, agent reset) has to re-implement the recovery path. The controller and Hermes don't need any of that — they come back the same way infrastructure comes back.

## Proposed end state

```
                            obol stack up
                                  │
                  ┌───────────────┼───────────────┐
                  │               │               │
                  ▼               ▼               ▼
            Infrastructure   Agent          Seller offers
                  │           │              │
                  └───────────┴──────────────┘
                              │
                              ▼
                       SAME MECHANISM:
                       a helmfile pass over
                       declarative sources of
                       truth on disk
                              │
              ┌───────────────┴────────────────┐
              ▼                                ▼
       infra/*.yaml                     applications/
       (already exists)                 ├── hermes/<id>/         (exists)
                                        ├── sell-http/<name>/    (new)
                                        └── sell-inference/<name>/ (new)
                                                │
                                                ▼
                                        helmfile.yaml +
                                        values-*.yaml per offer
```

In this shape:

- \`obol sell http\` / \`obol sell inference\` become "edit the descriptor on disk + helmfile sync the slice". No imperative \`kubectl apply\`, no foreground process spawn.
- \`obol stack down\`/\`up\` is "helmfile destroy/sync the whole tree" — agents and offers come back the same way the controller comes back.
- The inference gateway becomes an in-cluster Deployment, so the host-side foreground process disappears entirely.
- \`resumeSellOffers\`, \`startDetachedInferenceGateway\`, PID files, gateway logs — none of those need to exist. They are scaffolding around the asymmetry.

## Migration path

1. **Build the inference gateway as a Pod image.**
   - Replaces \`startDetachedInferenceGateway\` and the PID-file plumbing.
   - The host-side subprocess is the only blocker to symmetric lifecycle.
2. **Move sell-inference / sell-http to helmfile-managed slices** under \`applications/sell-{http,inference}/<name>/\`.
   - Replaces \`inference.Store\` and the sell-http YAML manifest store with a single declarative format.
   - One walker, one parser, one test surface.
3. **\`obol stack up\` becomes a single helmfile pass over everything.**
   - Replaces \`resumeSellOffers\` entirely.
   - Recovery becomes "whatever's on disk is what's running".

## Where PR #487 fits

PR #487 (https://github.com/ObolNetwork/obol-stack/pull/487) is the deliberate near-term step:

- **Necessary today** because seller offers are still imperative — without it, \`stack up\` doesn't bring back paid services and the spark2 dev cluster needs manual replay after every restart.
- **Ships now**, unblocks spark2 and any other persistent dev cluster.
- **Superseded once the declarative model lands.** The \`startDetachedInferenceGateway\` comment already points at the helm-chart future, and \`resumeSellOffers\` is explicitly scaffolding.

## Acceptance criteria for closing this issue

- [ ] Inference gateway runs as an in-cluster Deployment built from a Pod image, not a host subprocess.
- [ ] Sell-http and sell-inference offers are represented on disk as helmfile-managed slices under \`applications/\`.
- [ ] \`obol stack up\` requires no resume-specific code path for seller offers — the same helmfile pass that brings up infra and Hermes also brings up offers.
- [ ] \`resumeSellOffers\`, \`startDetachedInferenceGateway\`, the PID file plumbing, and the two persistence stores are deleted.

## Out of scope

- Buyer-side resume. Buyer state is already cluster-resident (\`PurchaseRequest\` CRs + sidecar config).
- Migration of existing on-disk descriptors written by older CLIs — a one-shot importer can live alongside the new layout if needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

architecture: seller offers as first-class declarative state #488

Context

Today (overcomplicated)

Three problems with this shape

Proposed end state

Migration path

Where PR #487 fits

Acceptance criteria for closing this issue

Out of scope

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

architecture: seller offers as first-class declarative state #488

Description

Context

Today (overcomplicated)

Three problems with this shape

Proposed end state

Migration path

Where PR #487 fits

Acceptance criteria for closing this issue

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions