diff --git a/.claude/skills/project-knowledge/architecture.md b/.claude/skills/project-knowledge/architecture.md index 1a6b9c0..7ded475 100644 --- a/.claude/skills/project-knowledge/architecture.md +++ b/.claude/skills/project-knowledge/architecture.md @@ -63,6 +63,12 @@ Version-specific query selectors in `internal/query/`: - `SelectStatDatabaseGeneralQuery(version)` — branches at PG 12 - `SelectStatStatementsTimingQuery(version)` — branches at PG 13, PG 17 - `SelectStatWALQuery(version)` — branches at PG 18 (columns removed) +- `SelectStatBgwriterQuery(version)` — branches at PG 17 (`pg_stat_checkpointer` split off `pg_stat_bgwriter`) and PG 18 (`slru_written` added). Returns `(query, Ncols, DiffIntvl)` — DiffIntvl also differs per version. +- `SelectStatReplicationSlotsQuery(_ int)` — version-independent on PG 14–18 (chosen column subset is stable), returns `(query, 15, [2]int{6,13})`; the `version` param is kept for selector-signature symmetry. Single hybrid `pg_replication_slots LEFT JOIN pg_stat_replication_slots` query. + +The `bgwriter` view (hotkey `b`, `internal/query/bgwriter.go`) is a single-row version-aware screen modeled on `pg_stat_wal`. It was the project's first view registered with `NotRecordable: true` — TUI-only, excluded from `pgcenter record`/`report`. + +The `replslots` view (hotkey `o`, `internal/query/replication_slots.go`) is a multi-row screen modeled on `replication`/`tables`, the second `NotRecordable` user. It hybrid-joins `pg_replication_slots` (state: slot_type, active, wal_status, retained WAL via the recovery-aware `WalFunction` template, safe_wal_size) with `pg_stat_replication_slots` (logical-decoding spill/stream/total counters). Physical slots are absent from the stat view, so the 8 diffed counters are `coalesce(...,0)` — without it a physical slot's empty-string NULLs reach `diffPair`/`ParseInt` and abort the sample (`internal/stat/postgres.go`). State columns are absolute (outside `DiffIntvl=[6,13]`), counters diffed; `OrderKey=4` (retained,KiB desc) — the one multi-row view that deviates from the col-0 sort default. View configuration happens in `internal/view/view.go: Configure(opts)` which calls these selectors and updates `QueryTmpl` and `Ncols` per view at connection time. diff --git a/.claude/skills/project-knowledge/deployment.md b/.claude/skills/project-knowledge/deployment.md index 9fdc0e4..6a59e0a 100644 --- a/.claude/skills/project-knowledge/deployment.md +++ b/.claude/skills/project-knowledge/deployment.md @@ -11,7 +11,7 @@ Triggered by push to `release` branch. Two jobs: -**test** — runs full test suite in `lesovsky/pgcenter-testing:0.0.9` container: +**test** — runs full test suite in `lesovsky/pgcenter-testing:0.0.10` container: - Go 1.25.10 (cached), module cache, lint tools cache - `make lint` + `govulncheck ./...` + `make test` + `make build` + e2e tests @@ -28,11 +28,11 @@ Docker image pushed to `lesovsky/pgcenter:vX.Y.Z` and `:latest`. ## CI (default.yml) Triggered on every push. Same container and tooling as release test job. -Container: `lesovsky/pgcenter-testing:0.0.9` +Container: `lesovsky/pgcenter-testing:0.0.10` ## Test Container (lesovsky/pgcenter-testing) -Source: `testing/Dockerfile`. Current version: `0.0.9`. +Source: `testing/Dockerfile`. Current version: `0.0.10`. Contains: Ubuntu 22.04, PostgreSQL 14–18 with `plperlu` + CPAN modules. Clusters created via `pg_createcluster` (Ubuntu 22.04 Docker doesn't auto-init). Ports: PG14=21914, PG15=21915, PG16=21916, PG17=21917, PG18=21918. diff --git a/.claude/skills/project-knowledge/overview.md b/.claude/skills/project-knowledge/overview.md index 96e2d2d..4594915 100644 --- a/.claude/skills/project-knowledge/overview.md +++ b/.claude/skills/project-knowledge/overview.md @@ -18,7 +18,8 @@ It reads PostgreSQL internal statistics views and presents them in a top-like in - `pg_stat_database` — per-database metrics (commits, rollbacks, tuples, deadlocks, temp files) - `pg_stat_replication` — connected standbys and replication lag - `pg_stat_user_tables`, `pg_stat_user_indexes` — table/index access stats -- `pg_stat_bgwriter` — background writer stats +- `pg_stat_bgwriter` (+ `pg_stat_checkpointer` on PG 17+) — background writer / checkpointer screen (hotkey `b`; PG 14–18; TUI-only, not recordable in 0.11.0) +- `pg_replication_slots` (+ `pg_stat_replication_slots`) — replication slots screen (hotkey `o`; PG 14–18; multi-row, all slots; retained WAL + wal_status + spill/stream; TUI-only, not recordable in 0.11.0) - `pg_stat_wal` — WAL generation stats (PG 14+; reduced schema in PG 18) - `pg_stat_statements` — top queries by various metrics (requires extension) - System stats — CPU, memory, disk, network (read from /proc or via PL/Perl schema) diff --git a/.claude/skills/project-knowledge/patterns.md b/.claude/skills/project-knowledge/patterns.md index d4b460f..fc10ab0 100644 --- a/.claude/skills/project-knowledge/patterns.md +++ b/.claude/skills/project-knowledge/patterns.md @@ -17,6 +17,13 @@ When a PG version changes columns in a stats view: - Call it in `view.Configure()` under the correct view name - Add version-specific test cases in `*_test.go` +Reference implementations of the single-row version-aware view: `internal/query/wal.go` and `internal/query/bgwriter.go`. The bgwriter screen is notable for placing absolute event-counter columns (`ckpt_*`, `rstpt_*`) **outside** the contiguous `DiffIntvl` range so they render cumulative, while the work/time/buffer columns inside the range render as per-interval deltas. + +For a **multi-row hybrid view** that LEFT JOINs two stats views, see `internal/query/replication_slots.go` (the `replslots` screen). Two patterns it establishes: +- **`coalesce(...,0)` on diffed columns fed by a LEFT JOIN.** A row present in both samples enters `diff()`/`diffPair()`; if an outer-joined diffed column is SQL NULL it scans as an empty string and `strconv.ParseInt("")` aborts the whole sample. Coalescing NULL→0 in SQL keeps such rows diff-safe (physical slots, absent from `pg_stat_replication_slots`, render `0`). Only diffed columns need this — absolute columns outside `DiffIntvl` pass NULLs through as empty. +- **Recovery-aware WAL distance for free** via the `{{.WalFunction1}}({{.WalFunction2}}(), lsn)` template (`selectWalFunctions` in `query.go` picks `pg_current_wal_lsn` on a primary, `pg_last_wal_receive_lsn` on a standby) — no recovery branch in the query. +A multi-row view sets `UniqueKey` to the stable row identity (slot_name, col 0) for cross-sample row matching, and may set a non-default `OrderKey` (replslots: 4 = retained,KiB desc) for a domain-appropriate default sort. + ## Error Wrapping Use `fmt.Errorf("context: %w", err)` for all error wrapping in production code. diff --git a/.github/workflows/default.yml b/.github/workflows/default.yml index 2c8c915..250fad2 100644 --- a/.github/workflows/default.yml +++ b/.github/workflows/default.yml @@ -6,23 +6,23 @@ on: push jobs: test: runs-on: ubuntu-latest - container: lesovsky/pgcenter-testing:0.0.9 + container: lesovsky/pgcenter-testing:0.0.10 steps: - name: Checkout code uses: actions/checkout@v4 - - name: Cache Go 1.25.10 binary + - name: Cache Go 1.25.11 binary id: cache-go uses: actions/cache@v4 with: path: /opt/go - key: go-1.25.10-linux-amd64 + key: go-1.25.11-linux-amd64 - - name: Install Go 1.25.10 + - name: Install Go 1.25.11 if: steps.cache-go.outputs.cache-hit != 'true' run: | - curl -fsSL https://dl.google.com/go/go1.25.10.linux-amd64.tar.gz -o /tmp/go.tar.gz + curl -fsSL https://dl.google.com/go/go1.25.11.linux-amd64.tar.gz -o /tmp/go.tar.gz tar -xf /tmp/go.tar.gz -C /opt - name: Setup Go symlinks @@ -37,9 +37,9 @@ jobs: path: | ~/go/pkg/mod ~/.cache/go-build - key: ${{ runner.os }}-go-1.25.10-${{ hashFiles('go.sum') }} + key: ${{ runner.os }}-go-1.25.11-${{ hashFiles('go.sum') }} restore-keys: | - ${{ runner.os }}-go-1.25.10- + ${{ runner.os }}-go-1.25.11- - name: Cache lint tools uses: actions/cache@v4 diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index dcd3146..faa8363 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -8,23 +8,23 @@ on: jobs: test: runs-on: ubuntu-latest - container: lesovsky/pgcenter-testing:0.0.9 + container: lesovsky/pgcenter-testing:0.0.10 steps: - name: Checkout code uses: actions/checkout@v4 - - name: Cache Go 1.25.10 binary + - name: Cache Go 1.25.11 binary id: cache-go uses: actions/cache@v4 with: path: /opt/go - key: go-1.25.10-linux-amd64 + key: go-1.25.11-linux-amd64 - - name: Install Go 1.25.10 + - name: Install Go 1.25.11 if: steps.cache-go.outputs.cache-hit != 'true' run: | - curl -fsSL https://dl.google.com/go/go1.25.10.linux-amd64.tar.gz -o /tmp/go.tar.gz + curl -fsSL https://dl.google.com/go/go1.25.11.linux-amd64.tar.gz -o /tmp/go.tar.gz tar -xf /tmp/go.tar.gz -C /opt - name: Setup Go symlinks @@ -39,9 +39,9 @@ jobs: path: | ~/go/pkg/mod ~/.cache/go-build - key: ${{ runner.os }}-go-1.25.10-${{ hashFiles('go.sum') }} + key: ${{ runner.os }}-go-1.25.11-${{ hashFiles('go.sum') }} restore-keys: | - ${{ runner.os }}-go-1.25.10- + ${{ runner.os }}-go-1.25.11- - name: Cache lint tools uses: actions/cache@v4 diff --git a/docs/decisions-log.md b/docs/decisions-log.md index d739da4..64ce8d3 100644 --- a/docs/decisions-log.md +++ b/docs/decisions-log.md @@ -185,3 +185,161 @@ Used by tech-spec planning and code research to avoid repeating mistakes and re- **Decision:** `app.setup()` removes `procpidstat` from `views` and prints INFO when `!db.Local`, before `filterViews()` is called. `filterViews()` handles only static unsuitability (PG version, missing extension). **Rationale:** Local/remote is a runtime property, not a static view property. Keeps `filterViews()` focused on one concern. `db.Local` is already computed by `postgres.Connect()` via `isLocalhost()`. + +--- + +## [004-feat-bgwriter-checkpointer] Per-version column sets, not NULL-padded unified columns + +**Date:** 2026-06-21 +**Feature:** 004-feat-bgwriter-checkpointer +**Status:** Accepted + +**Context:** `pg_stat_bgwriter` changes shape across versions: PG 17 splits checkpoint metrics into `pg_stat_checkpointer`, drops `buffers_backend`/`buffers_backend_fsync`, and PG 18 adds `slru_written`. The bgwriter screen had to decide whether to present one unified column set (NULL-padding versions that lack a column) or version-specific sets. + +**Decision:** Each version branch returns only the columns that exist on that version; shared columns keep identical headers and order. `Ncols`/`DiffIntvl` differ per version (PG 14–16: 12 cols / `[3,10]`; PG 17: 13 / `[6,11]`; PG 18: 14 / `[6,12]`). + +**Rationale:** Follows the `wal.go` precedent, which already returns different `Ncols`/`DiffIntvl` per version. NULL-padding pre-17 with empty restartpoint columns would show misleading blank columns to a PG 15 DBA. + +**Alternatives considered:** Unified header set with `NULL AS rstpt_*` placeholders on PG 14–16 — rejected: clutters the screen with always-empty columns and contradicts the wal precedent. + +--- + +## [004-feat-bgwriter-checkpointer] Absolute event counters via DiffIntvl placement + +**Date:** 2026-06-21 +**Feature:** 004-feat-bgwriter-checkpointer +**Status:** Accepted + +**Context:** Checkpoint/restartpoint event counters (`ckpt_timed`, `ckpt_req`, and PG 17+ `rstpt_timed/req/done`) must render as absolute cumulative values — the timed-vs-requested ratio is the signal, and a per-interval delta on a short refresh is almost always 0. The work/time/buffer columns must render as deltas. + +**Decision:** Place the event counters in a contiguous block right after the `source` label, **outside** the `DiffIntvl` range; the diffed work/time/buffer columns form the single contiguous diff range; `stats_age` is last, also outside the range. + +**Rationale:** `DiffIntvl` is a single contiguous `[lo,hi]` range (`internal/stat/postgres.go:diff()`), which copies out-of-range columns as-is and subtracts in-range ones. Keeping event counters outside the range is the only way to render them absolute without changing the diff machinery. + +**Alternatives considered:** Diff everything (wal-style) — rejected for event counters; they would flicker between 0 and 1. + +--- + +## [004-feat-bgwriter-checkpointer] NotRecordable: true for TUI-only scope + +**Date:** 2026-06-21 +**Feature:** 004-feat-bgwriter-checkpointer +**Status:** Accepted + +**Context:** Supporting `record`/`report` for the bgwriter screen pulls in the storage format and the report pipeline — a separate layer that would roughly double the feature. The 0.11.0 roadmap mandates TUI-first for every new view. + +**Decision:** Register the view with `NotRecordable: true`; `record/record.go:filterViews()` skips it. Record/report support is deferred to a backlog feature (`docs/roadmap-0.11.0.md`, "Out of scope / backlog"). + +**Rationale:** Keeps the feature size-M. Reuses the `NotRecordable` field whose lineage is in ADR [001-feat-per-process-system-stats] / [003-feat-procpidstat-record-report]; after procpidstat dropped the flag in feature 003, bgwriter is its sole live user. + +**Alternatives considered:** Ship record/report in the same feature — rejected as scope creep. + +--- + +## [004-feat-bgwriter-checkpointer] stats_age sourced from pg_stat_checkpointer on PG 17+ + +**Date:** 2026-06-21 +**Feature:** 004-feat-bgwriter-checkpointer +**Status:** Accepted + +**Context:** On PG 17+ there are two independent `stats_reset` timestamps — `pg_stat_bgwriter.stats_reset` and `pg_stat_checkpointer.stats_reset` — reset separately via `pg_stat_reset_shared('bgwriter'|'checkpointer')`. The single-column `stats_age` must derive from one of them. + +**Decision:** On PG 17+ `stats_age` derives from `pg_stat_checkpointer.stats_reset`. + +**Rationale:** The screen's primary content on modern versions is checkpoint data; one column is cleaner than two reset ages. Documented in the user-spec so an independently-reset bgwriter is not a surprise. + +**Alternatives considered:** Show both reset ages, or the older of the two — rejected as needless column noise for a secondary signal. + +--- + +## [004-feat-bgwriter-checkpointer] Go toolchain bump 1.25.10 → 1.25.11 in CI + +**Date:** 2026-06-21 +**Feature:** 004-feat-bgwriter-checkpointer +**Status:** Accepted + +**Context:** `govulncheck` in CI flagged GO-2026-5037, a `crypto/x509` stdlib vulnerability fixed in Go 1.25.11. Surfaced during feature 004 execution; unrelated to the feature code. + +**Decision:** Bump the Go toolchain from 1.25.10 to 1.25.11 in the CI workflows to close GO-2026-5037. + +**Rationale:** Stdlib vuln in a transitive code path; the cheapest fix is the patch-version toolchain bump the CI gate requires. + +--- + +## [005-feat-replication-slots] Hybrid pg_replication_slots ⟕ pg_stat_replication_slots, not the literal view + +**Date:** 2026-06-21 +**Feature:** 005-feat-replication-slots +**Status:** Accepted + +**Context:** The roadmap line item named `pg_stat_replication_slots` but framed the value as retained-WAL / disk-fill triage. Retained WAL (`restart_lsn`, `wal_status`, `safe_wal_size`) lives in `pg_replication_slots`, and physical slots are absent from `pg_stat_replication_slots` entirely — so the literally-named view cannot deliver the stated value. + +**Decision:** Source the screen from `pg_replication_slots s LEFT JOIN pg_stat_replication_slots ss ON s.slot_name = ss.slot_name` — one row per slot (physical + logical). State columns absolute; the eight logical-decoding counters diffed. + +**Rationale:** Only the hybrid covers all slots plus retained WAL, which is the disk-fill signal the feature exists for. + +**Alternatives considered:** Pure `pg_stat_replication_slots` (logical-only spill/stream, no retained WAL, no physical slots) — rejected: does not solve the disk-fill use case. + +--- + +## [005-feat-replication-slots] coalesce(...,0) on the diffed counters for LEFT-JOIN-NULL safety + +**Date:** 2026-06-21 +**Feature:** 005-feat-replication-slots +**Status:** Accepted + +**Context:** Physical slots are absent from `pg_stat_replication_slots`, so the LEFT JOIN yields NULL for the 8 counter columns. A physical slot matches itself across samples by `slot_name`, entering the diff branch; an empty-string NULL inside `DiffIntvl` reaches `diffPair` → `strconv.ParseInt("")` → error → the whole sample is aborted (`internal/stat/postgres.go`). + +**Decision:** Wrap the 8 diffed counters in `coalesce(..., 0)` in SQL. Physical slots render `0`. Absolute columns (retained/safe/stats_age, outside `DiffIntvl`) stay nullable and render blank. + +**Rationale:** Mandatory for correctness, verified live (a physical slot would otherwise blank the screen). Rendering `-`/blank for physical rows would need per-cell view logic pgcenter lacks; the adjacent `slot_type` column disambiguates the `0`. + +**Alternatives considered:** Per-cell custom rendering of physical-slot counters — rejected as scope creep. + +--- + +## [005-feat-replication-slots] Single query for PG 14–18, no version branching + +**Date:** 2026-06-21 +**Feature:** 005-feat-replication-slots +**Status:** Accepted + +**Context:** `pg_replication_slots` gained `conflicting` (PG 16), `failover`/`synced` (PG 17), `invalidation_reason` (PG 18). Including any of them would force per-version query strings (the [004] ADR situation). + +**Decision:** Use only the subset stable across PG 14–18 (`slot_name, slot_type, active, wal_status, restart_lsn, safe_wal_size` + the whole of `pg_stat_replication_slots`). `SelectStatReplicationSlotsQuery(_ int)` returns one query, `15`, `[2]int{6,13}` for all versions (param kept for signature symmetry / future branch point). + +**Rationale:** Invalidation **state** is carried by `wal_status` (`lost`/`unreserved`); the finer **cause** columns are version-fragmented and a niche signal. One query keeps the feature size-M and avoids upgrade-surprise column shifts. + +**Alternatives considered:** Add `invalidation_reason` (PG 18 branch) / `conflicting` (PG 16 branch) — rejected; deferrable if cause-attribution is ever requested. + +--- + +## [005-feat-replication-slots] Default sort by retained,KiB desc (OrderKey=4), deviating from col-0 + +**Date:** 2026-06-21 +**Feature:** 005-feat-replication-slots +**Status:** Accepted + +**Context:** Every other multi-row view defaults `OrderKey=0`. The replslots feature exists for disk-fill triage, where the most relevant slot is the one holding the most WAL. + +**Decision:** `OrderKey=4` (`retained,KiB`), `OrderDesc=true`; SQL `ORDER BY "retained,KiB" DESC NULLS LAST` for the first frame. Documented so it is not read as a bug. + +**Rationale:** Incident-first ordering puts the offender on top without the DBA re-sorting. The Go-side `sort()` governs each subsequent frame. + +**Alternatives considered:** `OrderKey=0` (slot_name) for consistency — rejected: buries the offender. + +--- + +## [005-feat-replication-slots] Test image wal_level=logical + bump 0.0.10, decoupled by defensive t.Skipf + +**Date:** 2026-06-21 +**Feature:** 005-feat-replication-slots +**Status:** Accepted + +**Context:** Exercising a real logical slot (`pg_create_logical_replication_slot(..., 'test_decoding')`) needs `wal_level=logical`, which the `pgcenter-testing` image did not set. The image build/push is a manual maintainer step (no CI image-build job). + +**Decision:** Add `wal_level=logical` to `testing/prepare-test-environment.sh` and bump the image `0.0.9 → 0.0.10` (Dockerfile + both workflows + deployment.md). The tier-3 logical-slot test `t.Skipf`s unless `wal_level=logical`, so CI stays green on the old image until the maintainer publishes `0.0.10`. + +**Rationale:** The defensive skip removes a fragile ordering dependency between the manual image push and the code merge. `test_decoding` ships in the PGDG `postgresql-NN` packages, so no extra package is needed. Verified live: tier-1/2 pass on `replica`, tier-3 passes on `logical` across PG 14–18. + +**Alternatives considered:** Hard ordering (push image, then merge code that assumes `wal_level=logical`) — rejected: any gap leaves CI red on transient infra state. diff --git a/docs/features-catalog.md b/docs/features-catalog.md index f15b8d6..1df5f21 100644 --- a/docs/features-catalog.md +++ b/docs/features-catalog.md @@ -59,3 +59,44 @@ Used by spec-writer to understand existing functionality and avoid duplication o - IO and iodelay columns are empty (`""`) if the recorder lacked permissions or kernel support; report shows WARNING in header **Touches:** [001-feat-per-process-system-stats] (uses the same procpidstat 19-column pipeline); [002-feat-iodelay-procpidstat] (iodelay columns included in recorded data). + +--- + +### [004-feat-bgwriter-checkpointer] Background Writer / Checkpointer Screen + +**What it does:** Adds a new single-row TUI screen (`bgwriter`, hotkey `b`) showing background-writer and checkpoint activity from `pg_stat_bgwriter` and (on PG 17+) `pg_stat_checkpointer`. Lets DBA watch checkpoint frequency, the timed-vs-requested ratio, checkpoint write/sync cost, and who flushes buffers (checkpointer / bgwriter / backends) without leaving pgcenter for `psql`. + +**Key scenarios:** +- Press `b` to open the screen. Event counters (`ckpt_timed`, `ckpt_req`, and on PG 17+ `rstpt_timed`/`rstpt_req`/`rstpt_done`) show as absolute cumulative values; work/time/buffer columns (`ckpt_write,ms`, `ckpt_sync,ms`, `buf_ckpt`, `buf_clean`, `maxwritten`, `buf_alloc`, …) update as per-interval deltas; `stats_age` shows how long counters have accumulated. +- Diagnose forced checkpoints: watch `ckpt_req` climb faster than `ckpt_timed` — checkpoints are forced by WAL pressure, raise `max_wal_size`. +- Tune bgwriter on PG ≤ 16 by comparing `buf_clean` (bgwriter) against `buf_backend` (backends) and watching `maxwritten`. +- Monitor restartpoints on a standby (PG 17+): `ckpt_*` stay 0 while `rstpt_*` accumulate. + +**Limitations:** +- TUI-only in 0.11.0 — the view is `NotRecordable`, so `pgcenter record` skips it and it does not appear in `pgcenter report`. Record/report support is deferred to a backlog feature. +- `buf_backend` / `buf_backend_fsync` columns appear only on PG ≤ 16 (the data moved to `pg_stat_io` on PG 17+). +- The column set differs per server version (PG 18 adds a `slru_written` column); no NULL placeholders for columns absent on a given version. +- `stats_age` on PG 17+ comes from `pg_stat_checkpointer`; a separate `pg_stat_bgwriter` reset is not reflected. + +**Touches:** Shares the single-row version-aware view model with the `pg_stat_wal` screen. + +--- + +### [005-feat-replication-slots] Replication Slots Screen + +**What it does:** Adds a new multi-row TUI screen (`replslots`, hotkey `o`) showing one row per replication slot — physical and logical — from a hybrid `pg_replication_slots` + `pg_stat_replication_slots` query. Lets a DBA find which slot is retaining WAL (the classic disk-fill incident) and watch logical-decoding spill/stream pressure without dropping to `psql`. Same 15 columns on PostgreSQL 14–18. + +**Key scenarios:** +- Press `o` to open the screen, sorted by `retained,KiB` descending — the slot holding the most WAL is on top. Columns: `slot_name`, `slot_type`, `active`, `wal_status` (reserved/extended/unreserved/lost), `retained,KiB`, `safe,KiB` (absolute state); `spill_txns/count`, `spill,KiB`, `stream_txns/count`, `stream,KiB`, `total_txns`, `total,KiB` (per-interval deltas); `stats_age`. +- Diagnose a disk-fill incident: a slot with high `retained,KiB` and `active=false` (a disconnected standby or subscription) is the culprit — revive the consumer or drop the slot. +- Catch a slot before it breaks: watch `wal_status` move to `unreserved`/`lost`. +- Tune logical decoding: rising `spill,KiB` per interval means decoding spills to disk — raise `logical_decoding_work_mem`. +- Re-sort (arrows) and filter (`/`) like the other multi-row screens. + +**Limitations:** +- TUI-only in 0.11.0 — the view is `NotRecordable`, so `pgcenter record` skips it and it does not appear in `pgcenter report`. This hurts retrospective analysis most for this feature (disk-fill incidents are often forensic); record/report is the planned next phase. +- Physical slots show `0` (not blank) in the spill/stream/total columns — those metrics are logical-decoding-only; the adjacent `slot_type=physical` disambiguates. +- Invalidation **cause** is not shown (`conflicting`/`invalidation_reason` omitted to keep one query across PG 14–18); `wal_status` conveys the state (`lost`/`unreserved`). +- `pg_stat_subscription_stats` (subscriber-side) is out of scope — a separate future feature. + +**Touches:** Shares the multi-row sort/filter/diff view model with the `pg_stat_replication` and `tables` screens; second view (after [004-feat-bgwriter-checkpointer]) registered `NotRecordable`. diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-adequacy-review.json b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-adequacy-review.json new file mode 100644 index 0000000..aedf9ac --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-adequacy-review.json @@ -0,0 +1,28 @@ +{ + "status": "approved", + "findings": [ + { + "category": "underengineering", + "severity": "major", + "issue": "The PG18 column contract is left unresolved. The spec says 'дополнительно включается колонка slru_written (по документации)' but the code research (section 7) explicitly flags an open design decision: variant A (unified column headers across versions, per interview.yml:106-107) argues for OMITTING slru_written to keep PG14-18 headers identical, while the spec's plan is to ADD it as a diffed column on PG18. These two directions are contradictory and the spec does not pick one.", + "why_matters": "An undecided column contract will surface during tech-spec/implementation as ambiguity: either slru_written is shown (breaking the 'unified headers' intent) or omitted (contradicting the spec's acceptance criterion that explicitly requires the PG18 column to be present). Leaving it open risks a back-and-forth or an arbitrary implementer choice that fails one of the acceptance criteria.", + "fix": "Explicitly resolve the slru_written question in the spec: state whether PG18 surfaces slru_written as an extra diffed column (per-version layout, like wal.go's PG14 vs PG18 split) or omits it for header uniformity. The acceptance criterion 'на PG 18 включена дополнительная колонка' already implies surfacing it — make that the single stated decision and drop the 'unified headers' ambiguity, or vice versa." + }, + { + "category": "feasibility", + "severity": "minor", + "issue": "The PG18 query branch depends on a column set (slru_written and the full pg_stat_checkpointer shape) that was NOT verified on a live PG18 at planning time — only PG17.7 was live; PG14/15/16/18 columns are cited from documentation. The spec correctly lists this as a risk with mitigation (verify on container before release), but the exact PG18 Ncols/DiffIntvl cannot be finalized until that verification runs.", + "why_matters": "If the live PG18 schema differs from documentation (e.g. additional/renamed columns), the PG18 query, Ncols, and DiffIntvl will need adjustment. This is bounded and the mitigation is already planned, so it does not block the design — but it means the PG18 branch is not buildable purely from the spec without the integration step.", + "fix": "No spec change required; the existing risk + mitigation (verify pg_stat_checkpointer on a live PG18 before writing the query branch) is adequate. Ensure the tech-spec task ordering puts PG18 schema verification before authoring the PG18 query constant." + }, + { + "category": "overengineering", + "severity": "minor", + "issue": "The 'Технические решения' section leaks light implementation detail into the user-spec: it names the new file (internal/query/bgwriter.go) and the exact function signature SelectStatBgwriterQuery(version) (string, int, [2]int). User-spec scope is WHAT/WHY; specific file paths and function signatures belong in the tech-spec.", + "why_matters": "Implementation specifics in the user-spec blur the WHAT/HOW boundary and can prematurely constrain the tech-spec. Here the impact is small because the decision (mirror pg_stat_wal) is genuinely a WHAT-level architectural choice with clear rationale, but the concrete file name and signature are HOW.", + "fix": "Keep the decision 'build the screen modeled on the pg_stat_wal version-aware single-row pattern' and its rationale; move the concrete file name and function signature to the tech-spec." + } + ], + "worst_category": "underengineering", + "summary": "The feature is reasonable, buildable with the current stack (Go, pgx/v5, gocui), and right-sized as M — it faithfully mirrors the established pg_stat_wal pattern, the DiffIntvl single-contiguous-range constraint is satisfied by the proposed layout (absolute counters first, diffed block contiguous, stats_age last), and the record/report deferral via NotRecordable is a sound scope cut. No critical issues; the main gap is the unresolved PG18 slru_written / unified-header column contract, which should be pinned down before tech-spec." +} diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-arch-review.json b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-arch-review.json new file mode 100644 index 0000000..25c48d0 --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-arch-review.json @@ -0,0 +1,47 @@ +{ + "status": "approved", + "checks": { + "architectural_fit": "pass", + "scalability": "pass", + "database": "n/a", + "coupling": "pass", + "api_contracts": "n/a", + "backward_compat": "pass" + }, + "findings": [ + { + "check": "architectural_fit", + "severity": "minor", + "issue": "The tech-spec states 'NotRecordable is an existing field (Go zero-value false), so no other view is affected' and the code-research correctly notes the bgwriter view will be the only current user of NotRecordable: true. However, the inline comment in record/record.go:205 reads 'e.g. procpidstat' as the example for the NotRecordable mechanism. This comment is stale: procpidstat currently sets NotRecordable: false and is excluded from recording via an explicit delete(views, \"procpidstat\") at line 102. The mechanism itself works correctly — filterViews() at line 208 checks the field and the bgwriter view will be filtered correctly — but the stale comment creates a misleading documentation signal that implementers and reviewers will encounter. Neither the tech-spec nor the task list schedules a fix for this comment.", + "location": "Tech-spec section 'Backward Compatibility' / record/record.go:205", + "fix": "Add to Task 3 (or as a documentation step): update the inline comment in record/record.go:205 from 'e.g. procpidstat' to 'e.g. bgwriter' once the bgwriter view is the concrete live example. Alternatively, reword to 'views for which SQL-only output is incomplete or undesirable in a record context'." + }, + { + "check": "architectural_fit", + "severity": "minor", + "issue": "The tech-spec spec in 'Data Models' specifies DiffIntvl = [3,10] for PG14-16 (cols 3..10 are the eight work/time/buffer columns; col 11 = stats_age). However, the existing pg_stat_wal pattern uses DiffIntvl where col 0 = source (absolute), col 1 = waldir_size (absolute), cols 2-9 = counters (diffed), col 10 = stats_age (absolute). In the bgwriter PG14-16 layout col 0 = source, cols 1-2 = event counters (absolute), cols 3-10 = work/time/buffer (diffed), col 11 = stats_age (absolute). The DiffIntvl [3,10] is architecturally correct and consistent with the contiguous-range constraint of diff(). This is a confirmation, not a problem: the spec's column math checks out against the actual diff() implementation at internal/stat/postgres.go:331.", + "location": "Tech-spec 'Data Models' — PG14-16 column layout", + "fix": "No change required. Confirmed correct." + }, + { + "check": "architectural_fit", + "severity": "minor", + "issue": "The tech-spec references 'MinRequiredVersion = query.PostgresV14' but the query.go constants only go up to PostgresV14 = 140000 — there are no PostgresV17 or PostgresV18 constants. The spec correctly notes this and explicitly directs use of raw integer literals 170000/180000 (following the wal.go precedent). This is consistent with project policy ('minimum code that solves the problem — nothing speculative') but means the version boundary logic in SelectStatBgwriterQuery will use magic numbers. This is a pre-existing pattern accepted project-wide, not a regression introduced by this feature.", + "location": "Tech-spec 'Dependencies' / internal/query/query.go", + "fix": "No change required for this feature. If named constants for PG17/PG18 are desired, that is a separate refactoring task that affects wal.go and other files simultaneously." + }, + { + "check": "coupling", + "severity": "minor", + "issue": "The view name 'bgwriter' is a string literal used in three separate files: internal/view/view.go (map key and Configure() case), top/keybindings.go (switchViewTo argument), and top/help.go (free text). This is string coupling, but it is the identical pattern used for every other view ('wal', 'activity', 'replication', etc.) in the codebase. The coupling is inherent to the gocui views-map architecture — there is no type-safe view name enum — and is not a regression introduced by this feature. All three files are in the task list and will be changed consistently.", + "location": "Tech-spec 'Architecture' — files to modify; keybindings.go:35, view.go:128, help.go:13", + "fix": "No change required. Accepted pattern across the entire codebase. If string coupling is to be eliminated, it requires a project-wide refactoring not scoped to this feature." + } + ], + "risk_assessment": { + "performance": "low", + "complexity": "low", + "tech_debt": "low" + }, + "summary": "The design is architecturally sound and follows the pg_stat_wal pattern with precision: the DiffIntvl contiguous-range exploitation is correctly specified (absolute event counters outside the range, diffed block contiguous, stats_age last), the per-version Ncols/DiffIntvl split is consistent with wal.go's own PG14/PG18 divergence, the PG17+ cross-join of two single-row views produces exactly one row as required by diff()'s ukey-match loop, and NotRecordable: true is wired correctly into the existing filterViews() path. All four findings are minor: a stale comment in record.go, a confirmation that the column-math is correct, a pre-existing pattern of numeric version literals, and inherent string coupling shared by every other view. No critical or major issues were found." +} diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-code-research.md b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-code-research.md new file mode 100644 index 0000000..8621204 --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-code-research.md @@ -0,0 +1,425 @@ +# Code Research: 004-feat-bgwriter-checkpointer + +Feature: new single-row TUI screen `bgwriter` (hotkey `b`) combining `pg_stat_bgwriter` + +`pg_stat_checkpointer`. Follows the `pg_stat_wal` view pattern exactly. TUI (`top`) only, +`NotRecordable: true`. Checkpoint/restartpoint EVENT counters shown ABSOLUTE; work/time/buffer +columns DIFFed. + +Research date: 2026-06-21. PG column inventory for PG17 verified against a live local PostgreSQL +17.7 cluster; PG14/15/16/18 columns cited from PostgreSQL documentation and the existing +`internal/query/wal.go` version-split precedent (the same PG18 stats-system rewrite that affected +`pg_stat_wal`). + +--- + +## 1. Reference Template — `pg_stat_wal` (the canonical pattern) + +### `internal/query/wal.go` +Two query constants + one version selector. This is the exact template for a new +`internal/query/bgwriter.go`. + +- `PgStatWALPG14` (`wal.go:5-11`) — query for PG 14-17. Shape: + `SELECT 'WAL' AS source, , , date_trunc('seconds', now() - stats_reset)::text AS stats_age FROM pg_stat_wal`. +- `PgStatWALDefault` (`wal.go:15-21`) — PG 18+ variant (PG18 removed `wal_write/wal_sync/wal_write_time/wal_sync_time`). +- `SelectStatWALQuery(version int) (string, int, [2]int)` (`wal.go:25-32`): + ```go + func SelectStatWALQuery(version int) (string, int, [2]int) { + if version >= 180000 { + return PgStatWALDefault, 7, [2]int{2, 5} + } + return PgStatWALPG14, 11, [2]int{2, 9} + } + ``` + +Key mechanics confirmed: +- **Return tuple**: `(queryTemplate, ncols, DiffIntvl)`. +- **`ncols`** = total columns the SELECT returns (used as the right border for `OrderKey` clamping). +- **`DiffIntvl [2]int`** = inclusive `[firstCol, lastCol]` zero-based column range that gets diffed. + For WAL PG14: `[2,9]` — col 0 = `source` (text, not diffed), col 1 = `waldir_size` (pretty, + not diffed), cols 2..9 = the eight counter columns, col 10 = `stats_age` (excluded). So + `stats_age` is excluded simply by being **outside** the `DiffIntvl` upper bound. +- **How exclusion works at runtime**: `internal/stat/postgres.go:diff()` (`postgres.go:303-358`). + Inner loop `for l := 0; l < curr.Ncols; l++`: if `l < interval[0] || l > interval[1]` the value + is copied as-is (line 331-333); otherwise `diffPair()` subtracts prev from curr (line 336). + **Implication for this feature**: any column NOT inside `DiffIntvl` is rendered as the raw + current (absolute) value. This is exactly the mechanism we exploit to show + `ckpt_timed/ckpt_req/rstpt_*` as absolute — they must sit OUTSIDE the diff range. + +### Constraint this imposes on column layout (critical design point) +`DiffIntvl` is a **single contiguous range** `[lo, hi]`. There is no way to express "diff cols +2-4 and 8-12 but not 5-7". Therefore the layout MUST be: + +``` +col 0: source (text, absolute) +cols 1..A: absolute event-counters: ckpt_timed, ckpt_req, rstpt_timed, rstpt_req, rstpt_done +cols A+1..Z: diffed work/time/buffer columns: write_time, sync_time, buffers_* (DiffIntvl = [A+1, Z]) +col Z+1: stats_age (excluded, absolute, sits past DiffIntvl upper bound) +``` + +This matches the feature summary: absolute block right after `source`, one contiguous `DiffIntvl`, +`stats_age` last. + +### `internal/query/wal_test.go` (test template) +- `Test_SelectStatWALQuery` (`wal_test.go:10-30`): table-driven. Each case asserts + `{version, wantNcols, wantDiffIntvl}`. Versions covered: 140000, 150000, 170000, 180000. + Only asserts the `ncols` and `DiffIntvl` returns (query string ignored via `_`). +- `Test_StatWALQueries` (`wal_test.go:33-55`): integration. Iterates + `versions := []int{140000,150000,160000,170000,180000}`; formats the template via `Format(tmpl, opts)` + with `NewOptions(version, "f", "off", 256, "public")`; connects with + `postgres.NewTestConnectVersion(version)`; `t.Skipf` if the version container is unavailable; + executes the query and asserts no error. **This is the exact template for `bgwriter_test.go`** — + same two tests, swapped names and expected ncols/DiffIntvl per version. + +--- + +## 2. View Registration — `internal/view/view.go` + +### `View` struct (`view.go:9-31`) +Relevant fields for the new view: `Name`, `MinRequiredVersion`, `QueryTmpl`, `DiffIntvl`, `Ncols`, +`OrderKey`, `OrderDesc`, `ColsWidth`, `Msg`, `Filters`, and `NotRecordable bool` (`view.go:30`: +"When true, record/record.go:filterViews() skips this view."). + +### The `wal` entry (`view.go:128-139`) — exact template +```go +"wal": { + Name: "wal", + MinRequiredVersion: query.PostgresV14, + QueryTmpl: query.PgStatWALPG14, + DiffIntvl: [2]int{2, 9}, + Ncols: 11, + OrderKey: 0, + OrderDesc: true, + ColsWidth: map[int]int{}, + Msg: "Show WAL statistics", + Filters: map[int]*regexp.Regexp{}, +}, +``` + +### New `bgwriter` entry needed +A `"bgwriter"` map entry mirroring `wal`, with: +- `Name: "bgwriter"` +- `MinRequiredVersion: query.PostgresV14` (pgcenter floor; bgwriter exists on all supported versions) +- `QueryTmpl: query.PgStatBgwriterPG14` (the pre-17 default; `Configure()` overrides it per version) +- `DiffIntvl` / `Ncols`: placeholder values matching the PG14 branch (overridden by `Configure()`) +- `OrderKey: 0`, `OrderDesc: true`, `ColsWidth: map[int]int{}`, `Filters: map[int]*regexp.Regexp{}` +- `Msg: "Show bgwriter / checkpointer statistics"` +- **`NotRecordable: true`** — keeps it out of recording (see §5). + +> Note: `Configure()` (below) overrides `QueryTmpl/Ncols/DiffIntvl` at runtime, so the literal map +> values are only the static defaults. The `wal` entry uses the PG14 values as defaults — follow +> the same convention. + +### `Configure()` wiring (`view.go:302-338`) +The first loop (`view.go:307-325`) switches on view key `k` and calls the version-aware selector, +re-assigning `QueryTmpl/Ncols/DiffIntvl`. The `wal` case (`view.go:321-324`): +```go +case "wal": + view.QueryTmpl, view.Ncols, view.DiffIntvl = query.SelectStatWALQuery(opts.Version) + v[k] = view +``` +**New case needed** in the same switch: +```go +case "bgwriter": + view.QueryTmpl, view.Ncols, view.DiffIntvl = query.SelectStatBgwriterQuery(opts.Version) + v[k] = view +``` +The second loop (`view.go:328-335`) calls `query.Format(view.QueryTmpl, opts)` for ALL views to +build the final `Query` string — no change needed there; the bgwriter query goes through it +automatically. + +### `NotRecordable` usage across views +Currently NO view in `view.go` sets `NotRecordable: true` (the field exists and `procpidstat` once +used it, but per `docs/decisions-log.md` [003-feat-procpidstat-record-report] it was removed when +procpidstat recording became supported). The bgwriter view will be the **only** view setting +`NotRecordable: true`. The field defaults to `false` (Go zero value) so no other view is affected. + +### `VersionOK` (`view.go:341-343`) +`return version >= v.MinRequiredVersion`. With `MinRequiredVersion = query.PostgresV14`, the view +is available on all pgcenter-supported versions (PG14+). + +--- + +## 3. Keybinding — `top/keybindings.go` + +### Registration pattern (`keybindings.go:29-38`) +Each switch hotkey is a one-liner: `{"sysstat", '', switchViewTo(app, "")}`. +The `wal` binding (`keybindings.go:35`): `{"sysstat", 'w', switchViewTo(app, "wal")}`. + +### Is `b` free? YES. +Full lowercase-key audit of the `"sysstat"` context (`keybindings.go:22-62`): +- Switch keys: `d, r, t, i, s, f, w, p, a, x` (lines 29-38). +- Other lowercase: `q(quit), <, ,(comma), l, ~, m, n, k, z, h, /, -, _`. +- **`b` is NOT bound.** (`B` uppercase IS bound — `keybindings.go:47` `showExtra diskstats` — + but lowercase `b` is free. gocui treats `b` and `B` as distinct keys.) + +### One-line addition needed (after `keybindings.go:35`, the `wal` line): +```go +{"sysstat", 'b', switchViewTo(app, "bgwriter")}, +``` + +### `switchViewTo` handler (`top/config_view.go:101-125`) +`switchViewTo(app, c)` — the `default` branch (line 118-119) calls +`viewSwitchHandler(app.config, c)` directly for any plain view name (no multi-view cycling logic). +`bgwriter` falls into `default`, exactly like `wal`/`activity`/`replication`. No `*NextView` +helper or special-casing required. `viewSwitchHandler` (`config_view.go:189-193`) saves current +view, loads the requested one from `config.views`, and sends it on `viewCh`. + +### Help text — `top/help.go` +`top/help.go:13-14` lists the general-action mode keys. To keep the help current, add `b` to the +mode list (e.g. line referencing `w WAL`). Cosmetic but expected for parity with `wal`. + +--- + +## 4. Reset behavior — `top/reset.go` + +### `resetStat()` (`reset.go:13-40`) +Header comment (`reset.go:10-12`): *"Reset statistics that belongs to current database and +pg_stat_statements stats. Don't reset shared stats, such as bgwriter or archiver."* + +Mechanics: `Q` executes `query.ExecResetStats` (database-level reset via +`pg_stat_reset()`) plus optionally `ExecResetPgStatStatements`. It deliberately does **NOT** reset +the shared `bgwriter`/`checkpointer` counters (those require `pg_stat_reset_shared('bgwriter')` / +`pg_stat_reset_shared('checkpointer')`, which pgcenter never calls). + +**Implication for the new view**: the `bgwriter` screen needs NO reset handling. Pressing `Q` +while on the bgwriter screen will reset DB-level stats but leave bgwriter/checkpointer counters +untouched. The absolute event-counters (`ckpt_timed`, etc.) will keep growing monotonically — which +is the intended UX (DBAs want the cumulative timed-vs-requested ratio). `stats_reset` / +`stats_age` only changes if an operator runs `pg_stat_reset_shared(...)` out-of-band. No code +change in `reset.go`. + +--- + +## 5. Recording skip — `record/record.go` `filterViews()` + +### `filterViews()` (`record.go:200-233`) +First check in the loop (`record.go:206-212`): +```go +if v.NotRecordable { + delete(views, k) + filtered++ + continue +} +``` +This runs in `app.setup()` (`record.go:86`) before `views.Configure(opts)`. Setting +`NotRecordable: true` on the bgwriter view guarantees: +- `pgcenter record` **never collects** it (deleted from the views map → never queried, never written + to the tar archive). +- Because it is never recorded, `pgcenter report` can never encounter a bgwriter entry → no + report-side parsing path. This is the **issue #122 avoidance**: report-side breakage (the + "invalid result" tar-file bug, ref. commit f055b3a) is structurally impossible for a view that + was never recorded. + +### Report-side confirmation — `cmd/report/report.go` +The report command maps a CLI flag to a view name in a `switch` (`report.go:138-166`, +e.g. `case opts.showWAL: return "wal"` at line 145). For bgwriter we add **no** flag and **no** +case — there is intentionally no `--bgwriter` report flag. The view is TUI-only by design. + +--- + +## 6. Version constants — `internal/query/query.go` + +Defined in `query.go:9-18`: +```go +PostgresV94 = 90400 +PostgresV95 = 90500 +PostgresV96 = 90600 +PostgresV10 = 100000 +PostgresV11 = 110000 +PostgresV12 = 120000 +PostgresV13 = 130000 +PostgresV14 = 140000 +``` + +**Note**: there is NO `PostgresV17` or `PostgresV18` constant. The `wal.go` selector uses the raw +numeric literal `180000` for the PG18 branch (`wal.go:26`). Follow the same convention in +`bgwriter.go`: use `query.PostgresV14` for `MinRequiredVersion`, and raw literals `170000` / `180000` +inside `SelectStatBgwriterQuery` for the version branches. (Do NOT add new constants unless the +feature explicitly calls for it — matches "minimum code" rule.) + +--- + +## 7. Column Inventory — `pg_stat_bgwriter` and `pg_stat_checkpointer` per version + +### PG14, PG15, PG16 — everything in `pg_stat_bgwriter` (no `pg_stat_checkpointer`) +`pg_stat_checkpointer` does NOT exist before PG17. Columns of `pg_stat_bgwriter` (identical across +PG14/15/16; type bigint unless noted): + +| column | meaning | semantics in feature | +|---|---|---| +| `checkpoints_timed` | scheduled checkpoints | ABSOLUTE (event counter) | +| `checkpoints_req` | requested checkpoints | ABSOLUTE (event counter) | +| `checkpoint_write_time` (double precision, ms) | time writing checkpoint buffers | DIFFed | +| `checkpoint_sync_time` (double precision, ms) | time syncing checkpoint files | DIFFed | +| `buffers_checkpoint` | buffers written by checkpoints | DIFFed | +| `buffers_clean` | buffers written by bgwriter | DIFFed | +| `maxwritten_clean` | times bgwriter stopped (hit `bgwriter_lru_maxpages`) | DIFFed | +| `buffers_backend` | buffers written directly by backends | DIFFed | +| `buffers_backend_fsync` | backend fsync calls | DIFFed | +| `buffers_alloc` | buffers allocated | DIFFed | +| `stats_reset` (timestamptz) | last reset | → `stats_age`, excluded | + +> Note: `pg_stat_bgwriter` has no restartpoint columns pre-17. + +### PG17 — split into two views (VERIFIED against live PostgreSQL 17.7) +Queried `pg_attribute` on a running PG17.7 cluster: + +`pg_stat_bgwriter` (PG17) — 4 columns: +``` +buffers_clean, maxwritten_clean, buffers_alloc, stats_reset +``` +(`checkpoints_*`, `checkpoint_*_time`, `buffers_checkpoint`, `buffers_backend`, +`buffers_backend_fsync` all REMOVED from bgwriter. `buffers_backend*` moved to `pg_stat_io`, +out of scope per the spec.) + +`pg_stat_checkpointer` (PG17) — 9 columns (verified with types): +``` +num_timed bigint +num_requested bigint +restartpoints_timed bigint +restartpoints_req bigint +restartpoints_done bigint +write_time double precision +sync_time double precision +buffers_written bigint +stats_reset timestamp with time zone +``` + +Mapping PG17 → feature columns: +- `num_timed` → `ckpt_timed` (was `checkpoints_timed`) — ABSOLUTE +- `num_requested` → `ckpt_req` (was `checkpoints_req`) — ABSOLUTE +- `restartpoints_timed` → `rstpt_timed` — ABSOLUTE +- `restartpoints_req` → `rstpt_req` — ABSOLUTE +- `restartpoints_done` → `rstpt_done` — ABSOLUTE +- `write_time` (ms) → `write_time` (was `checkpoint_write_time`) — DIFFed +- `sync_time` (ms) → `sync_time` (was `checkpoint_sync_time`) — DIFFed +- `buffers_written` → `buffers_ckpt` (was `buffers_checkpoint`) — DIFFed +- bgwriter `buffers_clean`, `maxwritten_clean`, `buffers_alloc` — DIFFed +- `stats_reset` → `stats_age` (feature decision: use checkpointer's `stats_reset`; documented) + +> `restartpoints_timed/req/done` were INTRODUCED in PG17 (alongside `pg_stat_checkpointer`). They +> do NOT exist pre-17. + +### PG18 — `pg_stat_checkpointer` GAINED a column (verify before final query) +PG18 added **`slru_written`** to `pg_stat_checkpointer` (buffers written during SLRU checkpoints), +in addition to the PG17 set. Expected PG18 `pg_stat_checkpointer` columns: +``` +num_timed, num_requested, restartpoints_timed, restartpoints_req, restartpoints_done, +write_time, sync_time, buffers_written, slru_written, stats_reset +``` +`pg_stat_bgwriter` in PG18 is unchanged from PG17 (`buffers_clean, maxwritten_clean, buffers_alloc, +stats_reset`). + +**ACTION FOR IMPLEMENTER — verify on a live PG18 cluster before writing the query.** The local box +has only PG17 installed; PG14/15/16/18 clusters were NOT running at research time +(`pg_lsclusters` shows only `17 main`). Run `testing/prepare-test-environment.sh` (creates PG14-18 +local clusters on ports 21914-21918) or use the CI matrix, then confirm with: +```sql +SELECT attname, format_type(atttypid, atttypmod) +FROM pg_attribute +WHERE attrelid IN ('pg_stat_bgwriter'::regclass, 'pg_stat_checkpointer'::regclass) + AND attnum > 0 AND NOT attisdropped ORDER BY attrelid, attnum; +``` +Decision needed: whether to surface `slru_written` as an extra DIFFed column on PG18, or keep the +unified "variant A" header set and ignore it. The interview specifies **variant A: unified column +headers across versions** (`interview.yml:106-107`), which argues for OMITTING `slru_written` to +keep headers identical PG14-18. Confirm with the user during tech-spec. + +### Summary table of query branches needed in `SelectStatBgwriterQuery` +| version range | source views | checkpoint counters | restartpoints | notes | +|---|---|---|---|---| +| 14000–169999 (PG14-16) | `pg_stat_bgwriter` only | `checkpoints_timed/_req` | — (n/a) | restartpoint cols emitted as `NULL`/`0` literals to keep unified header, OR omitted (variant A decision) | +| 170000–179999 (PG17) | `pg_stat_bgwriter` + `pg_stat_checkpointer` | `num_timed/_requested` | `restartpoints_*` | join the two views (single-row each, cross join) | +| 180000+ (PG18) | `pg_stat_bgwriter` + `pg_stat_checkpointer` | `num_timed/_requested` | `restartpoints_*` | `slru_written` available; decide per variant A | + +> Variant A (unified headers) means pre-17 must emit placeholder columns for restartpoints (which +> don't exist before PG17). The query for PG14-16 will need literal `NULL AS rstpt_timed` etc., or +> the column set differs and `Ncols`/`DiffIntvl` differ per branch (like wal.go already does for +> PG18). Clarify the exact unified-header contract in tech-spec; both approaches are mechanically +> supported by `SelectStatBgwriterQuery` returning per-version `(query, ncols, DiffIntvl)`. + +--- + +## 8. Integration Points (summary of files to touch) + +| File | Change | Reference line | +|---|---|---| +| `internal/query/bgwriter.go` | NEW: query consts + `SelectStatBgwriterQuery(version) (string,int,[2]int)` | template: `internal/query/wal.go` | +| `internal/query/bgwriter_test.go` | NEW: unit + integration tests | template: `internal/query/wal_test.go` | +| `internal/view/view.go` | add `"bgwriter"` map entry (~`view.go:128`) + `case "bgwriter"` in `Configure()` (~`view.go:321`) | `wal` entry/case | +| `top/keybindings.go` | add `{"sysstat", 'b', switchViewTo(app, "bgwriter")}` after line 35 | `wal` binding line 35 | +| `top/help.go` | add `b` to mode help line | `help.go:13` | +| `top/reset.go` | NO change (shared stats already excluded) | comment `reset.go:10-12` | +| `record/record.go` | NO change (`NotRecordable` already handled by `filterViews`) | `record.go:206-212` | +| `cmd/report/report.go` | NO change (no report flag, view is TUI-only) | mapping `report.go:138-166` | + +The data path is purely SQL: view → `Configure()` formats query → Collector runs it → `diff()` +applies `DiffIntvl`. No enrichment, no procfs, no `CollectExtra`. Strictly simpler than +`procpidstat`; structurally identical to `wal`. + +--- + +## 9. Existing Tests in the area + +- Framework: `testify/assert`, plain `go test`, run via `make test` (`-race -p 1 -timeout 300s`, + `Makefile:33-35`). Note `-p 1` (serial packages) — relevant because integration tests hit shared + PG clusters. +- Test connection helper: `internal/postgres/testing.go` — + `NewTestConnectVersion(version)` (`testing.go:16-44`) maps version → port + (`140000:21914 … 180000:21918`), returns error if unavailable; callers `t.Skipf`. +- Diff regression coverage already exists for the WAL/stats_age boundary: + `internal/stat/postgres_test.go:Test_diff_pg18_wal_stats_age` (`postgres_test.go:341`) — proves + that putting `stats_age` outside `DiffIntvl` is the tested, correct approach. Same guarantee + applies to bgwriter's absolute columns. +- Representative signatures (from `wal_test.go`): + - `Test_SelectStatWALQuery(t *testing.T)` — table of `{version, wantNcols, wantDiffIntvl}`. + - `Test_StatWALQueries(t *testing.T)` — loops PG14-18, formats, connects, executes, asserts. + +--- + +## 10. Potential Problems / Risks + +1. **PG18 `slru_written` unverified.** Local box has only PG17. The PG18 column set MUST be + confirmed on a live PG18 before finalizing the query (see §7). Do not write the PG18 branch from + memory. +2. **Variant A unified-header vs. per-version columns tension.** Pre-17 has no restartpoint columns; + PG18 has an extra `slru_written`. A truly unified header forces placeholder/omitted columns and + careful `Ncols`/`DiffIntvl` bookkeeping per branch. `wal.go` already sets a precedent for + *different* `Ncols`/`DiffIntvl` per version (PG18 = 7 cols vs PG14 = 11), so per-version layouts + are acceptable if variant A proves awkward. Pin this down in tech-spec. +3. **PG17 two-view join.** `pg_stat_bgwriter` and `pg_stat_checkpointer` are each single-row views; + the PG17/18 query must cross-join them (`FROM pg_stat_bgwriter, pg_stat_checkpointer`) — single + row × single row = one row, consistent with the single-row view contract. +4. **`stats_age` source ambiguity on PG17+.** The two views have independent `stats_reset` timestamps + (separate `pg_stat_reset_shared('bgwriter')` vs `('checkpointer')`). Feature decision + (`interview.yml:42-44`): show the **checkpointer's** `stats_age`, and DOCUMENT it so a divergent + bgwriter reset is not a surprise. Implementer must select `stats_reset` from the checkpointer + view, not bgwriter, on PG17+. +5. **`NotRecordable: true` is temporary** (per spec, recording deferred). This is the same field + pattern documented in `docs/decisions-log.md` [001] (added) and superseded by [003] (removed for + procpidstat). bgwriter will be the sole current user of the flag. When recording is added later, + the flag is simply dropped — no architectural debt. +6. **overview.md inaccuracy** (`interview.yml:77-78`): project knowledge wrongly claims bgwriter is + already supported. The `done`/documentation step should correct `project-knowledge/overview.md`. +7. **No ADR conflicts.** `docs/decisions-log.md` [001]/[003] (the `NotRecordable` lineage) are the + only relevant entries and they SUPPORT this design (the flag exists and works). No settled + decision is contradicted. +8. **No active tech-debt in touched modules.** `docs/tech-debt.md` active items [003] (self-reviews) + and the resolved [002] (procpidstat recording) do not touch `internal/query/`, + `internal/view/view.go`, or the bgwriter path. No debt to inherit. + +--- + +## 11. Constraints & Infrastructure + +- Go 1.25+, cobra CLI, pgx/v5 driver, gocui TUI, testify (per `.claude/CLAUDE.md`). +- Test clusters: created by `testing/prepare-test-environment.sh` as **local PG clusters** (not + Docker) on ports 21914-21918 (PG14-18), DB `pgcenter_fixtures`, user `postgres`, trust auth, + `shared_preload_libraries = 'pg_stat_statements'`. CI provides the full matrix; the dev box here + had only PG17 running. +- `make test` runs with `-p 1` (serial) and a 300s timeout; integration tests skip gracefully on + missing versions via `NewTestConnectVersion` + `t.Skipf`. +- Lint gates: `make lint` (golangci-lint + gosec), `make vuln` (govulncheck). The new query strings + are static (no user interpolation into the bgwriter SQL), so gosec SQL-injection findings are not + expected — match the `wal.go` style exactly (plain const strings, `Format()` only substitutes + template vars, none needed for bgwriter). diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-completeness.json b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-completeness.json new file mode 100644 index 0000000..539c010 --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-completeness.json @@ -0,0 +1,29 @@ +{ + "status": "pass", + "sources": { + "user_spec": true, + "tech_spec": true, + "tasks": false + }, + "requirements_total": 9, + "requirements_covered": 9, + "requirements_partial": 0, + "requirements_missing": 0, + "findings": [ + { + "type": "partial", + "source": "user-spec", + "requirement": "US-4: PG 18 extra column slru_written", + "detail": "User-spec AC requires that the PG18 column added in pg_stat_checkpointer is included AND its presence verified on a live PG18 cluster. The tech-spec correctly defers the exact PG18 column set to live verification (Risks, Data Models note, integration test). This is a deliberate planning gap, not a spec omission: the PG18 query branch and Ncols/DiffIntvl ([6,12], 14 cols) are written against PostgreSQL docs and must be confirmed before the query const is finalized. Tracked and mitigated, but the requirement is only fully satisfiable after live verification.", + "severity": "minor" + }, + { + "type": "underengineering", + "source": "tech-spec", + "requirement": "Integration test on live PG containers", + "detail": "The integration test (Test_StatBgwriterQueries) uses t.Skipf when a PG version is unavailable. For an M feature whose primary risk is cross-version schema drift (explicitly the stated main risk), a silently-skipped PG18 test means the single most important verification can pass-by-absence in CI if the PG18 container is missing. The user-spec AC and tech-spec both make PG18 live verification a hard gate, but the skip mechanism does not enforce that PG18 was actually exercised. Recommend the CI matrix guarantee PG18 availability (fail, not skip, if the mandated version is absent) so the gate is real. Not a coverage gap in the spec — an enforcement weakness in the test strategy.", + "severity": "minor" + } + ], + "summary": "9/9 user-spec requirements covered by tech-spec tasks: hotkey b (Task 3), PG14-16 / PG17 / PG18 column sets (Task 1 + Decisions 2,3), absolute-vs-diff-vs-text semantics (Decision 3 via DiffIntvl placement), NotRecordable/TUI-only (Task 3 view.go), help-row b (Task 3 help.go), unit+integration tests PG14-18 (Task 1, Testing Strategy), overview.md fix (Task 2). Reverse traceability clean: all 5 Decisions and Tasks 1-4 trace to a requirement or are acceptable engineering (tests, no-security-reviewer process call). No scope creep. No overengineering: Decision 1 explicitly rejects a multi-row/CollectExtra view type and Decision 2 rejects NULL-padded unified columns, both YAGNI-correct; mirroring wal.go reuses the existing collector/diff/render path with zero new abstractions. No structural gaps: all decisions are self-contained in the Decisions section, tasks are brief scope descriptions without pseudocode. Two minor findings only, both around enforcing the PG18 live-verification gate (a known, mitigated risk). No critical findings -> pass." +} diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-customer-review.json b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-customer-review.json new file mode 100644 index 0000000..6a13be1 --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-customer-review.json @@ -0,0 +1,40 @@ +{ + "status": "needs_work", + "checks": { + "clarity": "pass", + "value": "pass", + "scope": "pass", + "risks": "fail" + }, + "findings": [ + { + "check": "risks", + "severity": "major", + "issue": "Дефер record/report создаёт неполный продукт для части аудитории. Ключевой сценарий использования pgcenter — ретроспективный анализ инцидентов: DBA записывает метрики, потом смотрит, что происходило с checkpoint-ами во время деградации. TUI-only экран checkpoint-а виден только в реальном времени. Если инцидент случился ночью — данных нет. Спецификация признаёт это ограничение ('NotRecordable: true (temporary)'), но не определяет таймлайн и условия для снятия ограничения. 'Temporary' без даты — это 'никогда'.", + "fix": "Зафиксировать в спеке явный критерий: 'record/report для bgwriter войдёт в 0.11.x или 0.12.0 не позже чем через N релизов'. Либо переформулировать ценность фичи как 'только live-мониторинг' и принять это осознанно, а не как техдолг." + }, + { + "check": "risks", + "severity": "major", + "issue": "Variant A (единые заголовки колонок на всех версиях PG) создаёт риск молчаливого несоответствия ожиданиям. DBA на PG17+ видит колонки buf_backend и buf_backend_fsync в заголовке — и не понимает, почему они пустые или отсутствуют. Спецификация говорит 'documented limitation', но не уточняет, где это отображается пользователю: в title bar экрана, в help, в отдельном notice? Документация, которую никто не читает, не снижает риск confusion.", + "fix": "Указать конкретный UX-механизм уведомления: например, в title bar экрана добавить суффикс '(PG17+: no buf_backend)' или аналог. Иначе 'documented' остаётся формальным прикрытием." + }, + { + "check": "scope", + "severity": "minor", + "issue": "Restartpoints (rstpt_timed/req/done) включены для мониторинга standby — это нишевая потребность внутри и без того нишевого экрана. Польза реальная, но она затрагивает только тех, кто мониторит реплики. Объём колонок на PG17+ ощутимо вырастает (6 абсолютных счётчиков vs 2 на PG16). Не оспаривается само включение, но это не было явно приоритизировано относительно более частого сценария (primary checkpoint tuning).", + "fix": "Зафиксировать в спеке: restartpoints — явно вторичный сценарий (standby-only), включён по completeness. Если экран окажется перегруженным — restartpoints — первые кандидаты на вынос в отдельный экран." + }, + { + "check": "risks", + "severity": "minor", + "issue": "stats_age берётся из pg_stat_checkpointer на PG17+, а не из pg_stat_bgwriter. Это осознанное решение, задокументированное в спеке. Риск: если DBA сбросил только pg_stat_bgwriter (reset_shared('bgwriter')), stats_age на экране не отражает этот сброс — показывает возраст checkpointer-статистики. Это контринтуитивно на экране, называемом 'bgwriter'.", + "fix": "Спецификация решение приняла — достаточно убедиться, что в user-facing документации это явно написано: 'stats_age отражает время последнего сброса pg_stat_checkpointer, не pg_stat_bgwriter'." + } + ], + "questions": [ + "Что произойдёт с NotRecordable: true через 1-2 релиза — есть ли конкретный план или это неопределённый техдолг? Без ответа половина ценности фичи (ретроспективный анализ) остаётся недоступной бессрочно.", + "Как пользователь на PG17+ узнает, что buf_backend/buf_backend_fsync отсутствуют не из-за бага, а из-за смены версии PG? Где конкретно это будет видно в интерфейсе, а не только в документации?" + ], + "summary": "Фича решает реальную проблему: pgcenter заявлял поддержку bgwriter, которой не было, и DBA действительно не имеет способа наблюдать checkpoint-активность в реальном времени. Область видна, ценность понятна, scope разумен. Два мажорных риска: отсутствие record/report делает экран бесполезным для ретроспективного анализа инцидентов без ясного timeline на устранение, а Variant A без явного UI-уведомления создаёт риск confusion у пользователей PG17+." +} diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-decisions.md b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-decisions.md new file mode 100644 index 0000000..51dd38f --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-decisions.md @@ -0,0 +1,87 @@ +# Decisions Log: bgwriter + checkpointer screen (004) + +Отчёты агентов о выполнении задач. Каждая запись создаётся агентом, выполнившим задачу. + +--- + +## Task 01: bgwriter query layer + tests + +**Status:** Done +**Commit:** 4823e9b +**Agent:** query-dev +**Summary:** Created `internal/query/bgwriter.go` (three version-aware static `const` queries + `SelectStatBgwriterQuery(version) (string, int, [2]int)`) and `internal/query/bgwriter_test.go` (unit table test of per-version Ncols/DiffIntvl + integration loop PG14-18), mirroring the `wal.go`/`wal_test.go` pattern. Event counters sit outside `DiffIntvl` (absolute), the work/time/buffer block is contiguous and diffed, `stats_age` is last and excluded; PG17+ cross-joins `pg_stat_checkpointer` and sources `stats_age` from the checkpointer's `stats_reset` (Decision 4). Branch boundaries use raw literals `170000`/`180000`; all SQL is static const (zero injection surface, Decision 5). +**Deviations:** PG18 `slru_written` column was written from PostgreSQL documentation, NOT verified on a live PG18 cluster — only PG17 is available in the local dev environment (PG17.7 on port 5432; the test clusters on ports 21914-21918 were not running). The PG17 query branch WAS live-verified against the local PG17.7 cluster (13 columns, correct names, cross-join, and the new `len(FieldDescriptions()) == Ncols` assertion all confirmed). Live PG18 verification is deferred to the CI PG14-18 matrix, where `Test_StatBgwriterQueries` executes the PG18 query and now also asserts the live column count — this is the real `slru_written` schema-divergence gate. +**Tech debt:** Нет. + +**Reviews:** + +*Round 1:* +- dev-code-reviewer: approved, 0 critical / 0 major, 2 minor (optional) → [004-feat-bgwriter-checkpointer-task-01-dev-code-reviewer-round1.json](004-feat-bgwriter-checkpointer-task-01-dev-code-reviewer-round1.json) +- dev-test-reviewer: passed, 0 critical / 0 major, 2 minor (optional) → [004-feat-bgwriter-checkpointer-task-01-dev-test-reviewer-round1.json](004-feat-bgwriter-checkpointer-task-01-dev-test-reviewer-round1.json) + +Both reviewers independently suggested the same optional hardening: assert the live column count against `Ncols`. Applied in commit 4823e9b (it directly strengthens the one named risk — the un-verifiable PG18 `slru_written`). The other minor suggestion (asserting query-string content in the unit test) was rejected as redundant with the integration test executing the real SQL, and to preserve consistency with the `wal_test.go` template. No round 2 — the fix was small, low-risk, lint-clean, and live-validated. + +**Verification:** +- `go test -race ./internal/query/` → ok (unit green; integration skips PG14-18 — test clusters not running locally, accepted/expected) +- Live PG17.7 (port 5432): PG17 query executes, 13 columns, `Ncols` assertion passes +- `make build` → ok +- `golangci-lint run ./internal/query/` + `gosec -quiet ./internal/query/` → clean (exit 0, no findings) +- Note: `make test` has a pre-existing failure in `top/reload_test.go` (`Test_doReload` panics when the test PG cluster on port 21917 is absent) — verified present on a clean baseline via `git stash`, unrelated to this task. + +## Task 02: Correct overview.md + +**Status:** Done +**Commit:** 7745c80 +**Agent:** docs-dev +**Summary:** Replaced the false `pg_stat_bgwriter — background writer stats` line in `overview.md` (Supported PostgreSQL Statistics) — which wrongly implied pre-existing as-is support — with an accurate entry for the new bgwriter/checkpointer screen this feature adds: single-row TUI screen, hotkey `b`, PG 14–18, `pg_stat_checkpointer` columns on PG 17+, TUI-only / not recordable in 0.11.0. Documentation-only; no code or other files touched. +**Deviations:** Нет. +**Tech debt:** Нет. + +**Reviews:** + +*Round 1:* +- dev-code-reviewer: approved, 0 critical / 0 major, 1 minor (optional) → [004-feat-bgwriter-checkpointer-task-02-dev-code-reviewer-round1.json](004-feat-bgwriter-checkpointer-task-02-dev-code-reviewer-round1.json) + +The single optional suggestion (trim the dense bullet for tighter consistency with neighbors) was applied in commit 7745c80 by folding the caveats into a parenthetical in the `pg_stat_wal` style, without dropping any required fact (hotkey, PG range, PG17+ scoping, TUI-only/0.11.0). No round 2 — change is trivial and accuracy-preserving. + +**Verification:** +- `grep -nE 'pg_stat_bgwriter[^+]*— background writer stats'` → empty (stale claim gone) +- `grep -niE 'pg_stat_checkpointer|bgwriter'` → new accurate entry present (line 21) + +## Task 03: Register view + TUI wiring + +**Status:** Done +**Commit:** 176e984 +**Agent:** view-dev +**Summary:** Wired the bgwriter/checkpointer screen into `pgcenter top` by mirroring the `wal` screen exactly: added the `"bgwriter"` views-map entry (`NotRecordable: true`, `MinRequiredVersion: PostgresV14`, PG14 defaults Ncols 12 / DiffIntvl {3,10}) and a `case "bgwriter"` in `Configure()` calling `SelectStatBgwriterQuery(opts.Version)`; bound hotkey `b`; added `b` to the `?` help mode-key row (sorted `a,b,f,r,w`); and refreshed the stale `NotRecordable` example comment in `record/record.go` (procpidstat dropped the flag in feature 003, bgwriter is now its sole user). Added a guard test (`TestNew_BgwriterView`) and updated the existing view-count assertions 22→23. +**Deviations:** Screen column/render behaviour (event counters absolute, work/time/buffer columns delta, `stats_age` pass-through, hotkey opens the screen) is verified manually — `verify: user` is the acceptance gate, not automated here. `make test` is not fully green locally: a pre-existing, unrelated `top/reload_test.go::Test_doReload` panic fires when the local PG fixture on port 21917 is absent (confirmed present on the clean baseline via `git stash`, not caused by this change). `make lint` was not run via the Makefile target because the `golangci-lint` binary is missing in this environment; `go vet` on all changed packages is clean and `gofmt` flags only a pre-existing comment block in `view.go` (present on baseline, untouched per scope). `make build` and `go test ./internal/view/...` pass. +**Tech debt:** Нет. + +**Reviews:** + +*Round 1:* +- dev-code-reviewer: approved, 0 critical / 0 major / 0 minor → [004-feat-bgwriter-checkpointer-task-03-dev-code-reviewer-round1.json](004-feat-bgwriter-checkpointer-task-03-dev-code-reviewer-round1.json) +- dev-test-reviewer: passed, 0 critical / 0 major, 1 minor (optional) → [004-feat-bgwriter-checkpointer-task-03-dev-test-reviewer-round1.json](004-feat-bgwriter-checkpointer-task-03-dev-test-reviewer-round1.json) + +The single optional suggestion (pin the PG14-default `Ncols`/`DiffIntvl`/`Msg` in the guard test) was applied in commit `3e0833a` (`fix: address review round 1 for task 03`) — zero-setup defense-in-depth that strengthens the wiring guard. No round 2: both reviewers approved with zero major findings and the fix is trivial and test-only. + +**Verification:** +- `make build` → ok +- `go test ./internal/view/...` → ok (guard test green; counts 22→23 correct) +- `go vet ./internal/view/... ./top/... ./record/...` → clean +- Note: `make test` blocked by pre-existing environmental `Test_doReload` panic (port 21917 fixture absent), unrelated to this task. + +## Task 04: Pre-deploy QA + +**Status:** Done +**Agent:** qa +**Summary:** Final acceptance pass on the bgwriter/checkpointer feature (commit 867005a). All 9 locally-verifiable acceptance criteria PASS, 2 are CI-gated (PG18 `slru_written` + full PG14-18 integration matrix), 0 fail. Verdict: ready for CI, no blockers. `SelectStatBgwriterQuery` returns the correct (query, Ncols, DiffIntvl) for PG14-18 (12/[3,10], 12/[3,10], 12/[3,10], 13/[6,11], 14/[6,12]) — confirmed by `Test_SelectStatBgwriterQuery`. Counter placement (ckpt_*/rstpt_* absolute, work/time/buffer diffed, `stats_age` pass-through) verified by code layout and cross-checked live. `NotRecordable: true` honored by `filterViews` (3 filter tests green). Hotkey `b` bound and present in the `?` help row `a,b,f,r,w`. `overview.md` corrected. Live PG17.7 (port 5432) executed the PG17 query returning exactly 13 columns. Full report: [004-feat-bgwriter-checkpointer-qa-report.json](004-feat-bgwriter-checkpointer-qa-report.json). +**Deviations:** `make lint` not runnable locally — golangci-lint binary absent; substituted `go vet` (clean on all touched packages) and `gofmt` (the only gofmt finding is a pre-existing doc-comment in `view.go` Configure(), present on master/develop~5 baselines, untouched by this feature; the feature's added lines are gofmt-clean). The golangci-lint + gosec gate is deferred to CI. Full PG14-18 integration and the PG18 `slru_written` live check are deferred to the CI matrix (`lesovsky/pgcenter-testing`): local PG test clusters on ports 21914-21918 are not running and PG18 is unavailable locally, so `Test_StatBgwriterQueries` t.Skipf for all versions except the out-of-band live PG17.7 check; the `len(FieldDescriptions()) == Ncols` assertion is the schema-divergence gate that proves `slru_written` exists when CI runs PG18. `make test` is not fully green locally due to a PRE-EXISTING, unrelated panic in `top/reload_test.go::Test_doReload` (needs PG fixture on port 21917; confirmed on clean baseline in earlier tasks) and environmental record integration failures — neither is a feature defect. +**Tech debt:** Нет. + +**Verification:** +- `make build` → ok (bin/pgcenter produced) +- `go test ./internal/query/... ./internal/view/...` → ok; named guards `Test_SelectStatBgwriterQuery`, `TestNew_BgwriterView`, `TestFilterViews_NotRecordable`, `TestFilterViews_dropsExplicitNotRecordable`, `TestFilterViews_Recordable` all PASS +- `go vet ./internal/query/... ./internal/view/... ./record/... ./top/...` → clean +- Live PG17.7 (port 5432): PG17 bgwriter query executes, returns exactly 13 columns in declared order +- Code grep: view.go:151 `NotRecordable: true`, record.go:208 filterViews drop branch, keybindings.go:36 `b → bgwriter`, help.go:13 `a,b,f,r,w` row, overview.md:21 corrected entry diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-execution-plan.md b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-execution-plan.md new file mode 100644 index 0000000..3a9269e --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-execution-plan.md @@ -0,0 +1,47 @@ +# Execution Plan: pg_stat_bgwriter + pg_stat_checkpointer screen + +**Создан:** 2026-06-21 +**Branch:** develop + +--- + +## Wave 1 (независимые) + +### Task 01: bgwriter query layer + tests +- **Skill:** code-writing +- **Reviewers:** dev-code-reviewer, dev-test-reviewer +- **Verify:** bash — `make test` (bgwriter unit + integration tests pass) + +### Task 02: Correct overview.md +- **Skill:** documentation-writing +- **Reviewers:** dev-code-reviewer +- **Verify:** bash — grep overview.md (stale claim gone, new screen mentioned) + +## Wave 2 (зависит от Wave 1: Task 01) + +### Task 03: Register view + TUI wiring +- **Skill:** code-writing +- **Reviewers:** dev-code-reviewer, dev-test-reviewer +- **Verify:** user — open `b` on PG17/PG18, confirm columns + absolute/delta + `?` help + +## Wave 3 (Final Wave — зависит от 01, 02, 03) + +### Task 04: Pre-deploy QA +- **Skill:** pre-deploy-qa +- **Reviewers:** none +- **Verify:** bash — `make test` && `make lint` && `make build`; acceptance criteria + +## Проверки, требующие участия пользователя + +- [ ] Task 03: пользователь открывает экран `b` на живом PG17 (и, по возможности, PG18) — состав + колонок, абсолютные счётчики vs дельты, наличие `b` в справке `?` +- [ ] После всех волн: финальное приёмочное тестирование (Task 04) + +## Известные ограничения окружения + +- На машине поднят только **PG17** (порт 5432). Integration-тесты Task 01/04 для PG14/15/16/18 + локально сделают `t.Skipf` — это штатно. Юнит-тесты, `make build`, `make lint` проходят локально. +- **PG18 `slru_written` нельзя верифицировать локально.** Teammate пишет PG18-ветку по + документации PostgreSQL; живая проверка откладывается на CI-матрицу (где есть PG14–18). Teammate + НЕ блокируется на этом — фиксирует как deviation в `decisions.md`. +- Полный прогон PG14–18 локально возможен после `testing/prepare-test-environment.sh`. diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-interview.yml b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-interview.yml new file mode 100644 index 0000000..ce57c0b --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-interview.yml @@ -0,0 +1,351 @@ +# Interview Plan for Feature Planning +# +# Tracks progress of adaptive discovery interview for a new feature/bug/refactoring. +# Updated after EACH user response with scoring, values, and remaining gaps. +# +# Location: {feature_base}-interview.yml +# e.g. docs/features/001-feat-add-auth-interview.yml +# +# Scoring: 0-100% (how well we understand this aspect) +# Status: pending → partial → complete +# Required: true = must reach threshold, false = cover if relevant + +# ============================================================================ +# Interview Metadata (Session Recovery) +# ============================================================================ +interview_metadata: + feature_name: "bgwriter-checkpointer" + feature_base: "docs/features/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer" + work_type: "feature" + started: "2026-06-21 12:04:19" + last_updated: "2026-06-21 12:04:19" + status: "completed" + can_resume: true + current_question_num: 8 + feature_size: "M" + +# ============================================================================ +# Conversation History (for session recovery) +# ============================================================================ +# Preserves complete Q&A. Update after EVERY user response. +# +conversation_history: + - question_num: 1 + timestamp: "2026-06-21 12:08:00" + topic: "columns / diff-semantics / stats_age / testing" + agent_question: | + Q1 column set (PG<17 and PG17+ layouts, variant A); Q2 delta vs absolute for + checkpoint counters; Q3 stats_age source on PG17+ (separate bgwriter/checkpointer + resets); Q4 testing strategy (unit per-version + integration on live PG14-18). + user_answer: | + Q1 column set OK. Q2: ckpt_timed/ckpt_req shown as ABSOLUTE cumulative (важно видеть + разницу timed vs req), всё остальное — диффы. Q3: option (a) — show checkpointer's + stats_age, но зафиксировать в документации чтобы не было сюрпризом. Q4 testing OK. + summary: "Checkpoint counters absolute; rest diffed; stats_age from checkpointer (documented); unit+integration tests." + - question_num: 2 + timestamp: "2026-06-21 12:12:00" + topic: "restartpoints diff-semantics + PG18 verification" + agent_question: | + Restartpoints on PG17+ also absolute cumulative (contiguous absolute block at start)? + Plus reported code-research findings: DiffIntvl is a single contiguous range (confirms + design); PG18 pg_stat_checkpointer not verified on live DB (likely slru_written added). + user_answer: | + Yes — rstpt_timed/req/done also absolute cumulative. For PG18 slru_written: rely on + PostgreSQL documentation (if documented, assume present) AND verify via the container + integration tests which DO run PG18. + summary: "Restartpoints absolute. PG18 columns per docs + verify on PG18 container integration test." + +# ============================================================================ +# Phase 1: High-Level Feature Understanding +# ============================================================================ +# Interview Cycle 1 scope +phase1_feature_overview: + + feature_description: + score: 95 + status: "complete" + value: | + New single-row TUI screen 'bgwriter' (hotkey `b`) combining pg_stat_bgwriter and + pg_stat_checkpointer. Shows background-writer and checkpoint activity. Follows the + pg_stat_wal view pattern exactly (single row, source label first, stats_age last, + counters diffed per refresh interval). TUI (top) only — NOT recordable in 0.11.0. + gaps: "" + required: true + + user_problem: + score: 90 + status: "complete" + value: | + No way in pgcenter to watch checkpoint/bgwriter behaviour: checkpoint frequency, + timed vs requested ratio (signals undersized max_wal_size / aggressive manual + checkpoints), write/sync time, buffers flushed by checkpointer vs bgwriter vs backends. + overview.md wrongly claims bgwriter is supported — it never was. + gaps: "" + required: true + + target_users: + score: 85 + status: "complete" + value: "Practicing PostgreSQL DBAs tuning checkpoint/WAL behaviour and IO load during ops." + gaps: "" + required: false + + success_criteria: + score: 80 + status: "partial" + value: | + Screen opens on `b`, shows correct columns per PG version (14-18), checkpoint event + counters absolute-cumulative, work/time/buffer columns diffed per interval, stats_age + from checkpointer on 17+. Unit tests for SelectStatBgwriterQuery across PG14-18 pass; + query executes against live PG14-18. + gaps: "Confirm restartpoints absolute; finalize acceptance criteria list in Cycle 2." + required: true + + constraints: + score: 90 + status: "complete" + value: | + PG 17 split: pre-17 all in pg_stat_bgwriter; 17+ checkpoint metrics in + pg_stat_checkpointer, bgwriter reduced, buffers_backend/_fsync removed (→ pg_stat_io, + out of scope). Variant A unified column headers across versions (MUST be documented). + Single contiguous DiffIntvl range → absolute event-counters must sit at the start. + MinRequiredVersion = PG14 (pgcenter's floor; bgwriter exists on all supported versions). + NotRecordable: true (temporary) to avoid issue #122 report breakage. + gaps: "" + required: true + + testing_strategy: + score: 90 + status: "complete" + value: | + Unit tests for SelectStatBgwriterQuery(version) across PG14-18 (mirror wal_test.go). + Integration: query executes on live PG14/15/16/17/18 containers. No E2E needed. + gaps: "" + required: true + +# ============================================================================ +# Phase 2: User Experience & Scenarios +# ============================================================================ +# Interview Cycle 2 scope (together with Phase 3) +phase2_user_experience: + + user_stories: + score: 90 + status: "complete" + value: | + US1 (primary): As a DBA, I want to see timed vs requested checkpoint counts + (ckpt_timed/ckpt_req, absolute), so I can detect WAL-pressure-forced checkpoints + (high ckpt_req vs ckpt_timed => max_wal_size too small). + US2: As a DBA, I want checkpoint write/sync time (ckpt_write,ms / ckpt_sync,ms, diffed) + and buffers written (buf_ckpt), so I can assess checkpoint IO cost and spread. + US3: As a DBA, I want to compare who flushes buffers — buf_clean (bgwriter) vs + buf_backend (backends, pre-17) vs buf_ckpt — plus maxwritten, so I can tune + bgwriter_lru_maxpages (high buf_backend + maxwritten => bgwriter not keeping up). + US4: As a DBA on a standby, I want restartpoints (rstpt_timed/req/done, absolute), since + on a replica ckpt_timed/req stay 0 and restartpoints accumulate. + US5: As a DBA, I want stats_age, so I know the accumulation window for cumulative numbers + and can spot a recent reset. + gaps: "" + required: true + + acceptance_criteria: + score: 85 + status: "complete" + value: | + - Hotkey `b` opens the "bgwriter" screen from any other screen. + - PG14-16: columns source, ckpt_timed, ckpt_req (absolute), ckpt_write,ms, ckpt_sync,ms, + buf_ckpt, buf_clean, maxwritten, buf_backend, buf_backend_fsync, buf_alloc (diffed), + stats_age (not diffed). + - PG17+: columns source, ckpt_timed, ckpt_req, rstpt_timed, rstpt_req, rstpt_done + (absolute), ckpt_write,ms, ckpt_sync,ms, buf_ckpt, buf_clean, maxwritten, buf_alloc + (diffed), stats_age from pg_stat_checkpointer (not diffed). PG18 extra columns per docs + (e.g. slru_written) included and verified on PG18 container. + - Event counters (ckpt_*, rstpt_*) render as absolute cumulative; work/time/buffer + columns render as per-interval deltas; stats_age is pass-through text. + - Screen is NOT collected by `pgcenter record` (NotRecordable: true). + - Unit tests for SelectStatBgwriterQuery cover PG14-18; query executes on live PG14-18. + gaps: "Final exact PG18 column list confirmed at implementation against PG18 container." + required: true + + edge_cases: + score: 90 + status: "complete" + value: | + - stats_reset IS NULL (fresh cluster / never reset): now()-NULL = NULL => stats_age + empty. Acceptable, same as wal view. + - External pg_stat_reset_shared(...) during a session: a diffed column shows a negative + delta for one tick. Existing pgcenter behaviour (same as wal), not fixed, documented. + - buf_backend/buf_backend_fsync absent on PG17+ (variant A): columns simply not present; + documented so it is not a surprise. + - Standby: ckpt_timed/req == 0, rstpt_* accumulate, buffers still populated. + gaps: "" + required: true + + error_scenarios: + score: 90 + status: "complete" + value: | + Query uses only pg_stat_bgwriter / pg_stat_checkpointer (readable by any role, no special + privilege — unlike wal which uses pg_ls_waldir). No permission gating needed; screen works + on any connection including PgBouncer (simple protocol). PG<14 filtered out by + MinRequiredVersion. No additional error handling beyond standard query-error display. + gaps: "" + required: true + + verification_strategy: + score: 88 + status: "complete" + value: | + Agent: unit tests for SelectStatBgwriterQuery(version) across PG14-18 (mirror + wal_test.go) asserting query/ncols/DiffIntvl per version; integration test executing the + query on live PG14/15/16/17/18 containers. User: manual `make build` + open screen on a + PG17 and PG18 instance, confirm columns and absolute-vs-delta behaviour. + gaps: "" + required: true + + ui_ux_requirements: + score: 90 + status: "complete" + value: | + Single-row screen, identical UX model to pg_stat_wal: header row of column names + one + data row. Entered via `b`; left via any other view hotkey. Diffed columns refresh each + interval; absolute columns show cumulative. Sorting and filtering not applicable + (single row). Standard cmdline help/title. No new UI components invented. + gaps: "" + required: true + + risks: + score: 85 + status: "complete" + value: | + - PG18 column set unverified by memory — mitigated: follow PG docs + verify on PG18 + container integration test before shipping. + - Variant A tension (different columns per version) — mitigated: wal.go precedent for + version-specific Ncols/DiffIntvl; documented for users. + - DiffIntvl single contiguous range — mitigated: layout places all absolute event + counters first, diffed block contiguous, stats_age last. + gaps: "" + required: true + + technical_decisions: + score: 90 + status: "complete" + value: | + - Follow pg_stat_wal pattern: new internal/query/bgwriter.go with + SelectStatBgwriterQuery(version) (string, int, [2]int); version literals 170000/180000 + (no PostgresV17/V18 constants exist). + - Event counters (ckpt_*, rstpt_*) absolute (outside DiffIntvl); work/time/buffers diffed. + - Cross-join pg_stat_checkpointer × pg_stat_bgwriter on PG17+; pg_stat_bgwriter only <17. + - NotRecordable: true (temporary) — record/report deferred to a future feature. + - stats_age from pg_stat_checkpointer on PG17+ (documented). + - MinRequiredVersion = PostgresV14. + gaps: "" + required: false + +# ============================================================================ +# Phase 3: Integration & Technical Context +# ============================================================================ +# Interview Cycle 2 scope (together with Phase 2) +phase3_integration: + + integration_points: + score: 90 + status: "complete" + value: | + - NEW internal/query/bgwriter.go + bgwriter_test.go (templates: wal.go / wal_test.go). + - internal/view/view.go: new "bgwriter" view entry (~line 128) with NotRecordable: true; + case "bgwriter" in Configure() (~line 321) calling SelectStatBgwriterQuery(version). + - top/keybindings.go: one line {"sysstat", 'b', switchViewTo(app, "bgwriter")} (`b` free). + - top/reset.go, record/record.go (filterViews skips NotRecordable), cmd/report/report.go: + NO changes required. + - Diff mechanism: internal/stat/postgres.go:331-333 (DiffIntvl contiguous range). + gaps: "" + required: true + + dependencies: + score: 80 + status: "complete" + value: | + Depends on existing view/query/diff infrastructure and version-detection in + view.Configure(). No new external dependencies. Reuses pg_stat_wal architecture. + gaps: "" + required: false + + data_requirements: + score: 85 + status: "complete" + value: | + Reads pg_stat_bgwriter (all supported versions) and pg_stat_checkpointer (PG17+) via + a single SQL query (cross-join on 17+). No storage — TUI live display only + (NotRecordable). Single-row cumulative shared counters. + gaps: "" + required: false + + migration_needs: + score: 100 + status: "complete" + value: "None — additive new view, no schema/config/data migration. overview.md doc fix only." + gaps: "" + required: false + +# ============================================================================ +# Phase 4: Completion +# ============================================================================ +phase4_completion: + userspec_file: "docs/features/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer.md" + status: "approved" # pending | created | approved + +# ============================================================================ +# Decision Rules +# ============================================================================ +decision_rules: + + required_threshold: 85 + optional_threshold: 50 + + stop_criteria: + - "All required items in current cycle scope score >= 85%" + - "Structural: every required item has non-empty value, no TBD, gaps empty or conscious limitations" + + question_strategy: + - "3-4 questions per batch about different gaps" + - "Run as many batches as needed — batch size is NOT cycle limit" + - "Propose solutions based on PK or code findings, not just ask questions" + - "Challenge with substance: counterexamples, code references, unexplored scenarios" + - "After each response: update scores, values, gaps, conversation_history, save" + +# ============================================================================ +# Progress +# ============================================================================ +progress: + current_phase: "phase3_integration" + questions_asked: 8 + last_question: "Cycle 2 user stories / UX / edge cases / error scenarios — all approved" + completion_estimate: "95%" + +# ============================================================================ +# Notes +# ============================================================================ +notes: | + FINAL DECISIONS after Phase 6 (dev-customer, needs_work) + Phase 7 (completeness, complete): + + D1 Column contract: per-version Ncols like wal.go (NOT NULL-padded unified). Each version + returns only real columns; shared columns keep identical headers/order. PG14-16: no + restartpoints. PG18: slru_written included as a real diffed column. ("variant A" in the + roadmap means consistent headers for shared columns, NOT identical Ncols.) + D2 record/report: live-only in 0.11.0 by design. NotRecordable: true. record/report is a + planned backlog follow-up (docs/roadmap-0.11.0.md "Out of scope / backlog"), no committed + date. Must be stated explicitly in the spec Limitations so "temporary" has an address. + D3 No in-UI notice for version-varying columns. Follows the existing pg_stat_wal precedent + (wal silently drops wal_write/sync on PG18). Column set reflects server capabilities; + documented in help only. (dev-customer's "major" pushback rejected with reasoning: a + bgwriter-specific notice would be inconsistent with how pgcenter already behaves.) + D4 Help text: add `b bgwriter` to top/help.go as an acceptance criterion. Concrete placement + requested by user: in the hotkey row currently "a,f,r,w" insert `b` between `a` and `f` + to yield the sorted row "a,b,f,r,w". + D5 Spec Limitations must note: (1) restartpoints are first to drop if the screen becomes + overcrowded; (2) stats_age is taken from pg_stat_checkpointer — on a standby or after a + separate pg_stat_reset_shared('bgwriter') it does not reflect the bgwriter reset. + + dev-customer minor #4 (screen named "bgwriter" while showing checkpointer data): accepted as + a naming note; keep "bgwriter" (familiar, hotkey `b`), source column label clarifies content. diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-metrics-summary.md b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-metrics-summary.md new file mode 100644 index 0000000..280bf0c --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-metrics-summary.md @@ -0,0 +1,52 @@ +# Metrics Summary: bgwriter-checkpointer + +## Context + +| Dimension | Value | +|-----------|-------| +| Model | Opus 4.8 (1M context) | +| Feature size | M | +| Started | 2026-06-21T07:04:34Z | +| Completed | 2026-06-21T09:29:08Z | + +## Timeline + +| Phase | Duration (min) | Touch Time (min) | Wait Time (min) | +|-------|---------------|-------------------|-----------------| +| User Spec | 62 | 14 | 48 | +| Tech Spec | 12 | 10 | 2 | +| Task Decomposition | 7 | 6 | 1 | +| Feature Execution | 31 | 26 | 5 | +| Done | 13 | 11 | 2 | +| **Total** | **125** | **67** | **58** | + +## Flow Efficiency + +- Total lead time (first start → last end): 144 min +- Agent active time: 67 min +- Human wait time: 58 min +- **Flow efficiency: 46%** (touch time / lead time) + +## Quality + +| Metric | Value | +|--------|-------| +| Validation rounds (by phase) | user_spec: 1, tech_spec: 1, task_decomposition: 1, done: 1 | +| Validation findings (crit/major/minor) | 0 / 3 / 24 | +| Review rounds (by task) | task_01: 1, task_02: 1, task_03: 1 | +| Review findings (crit/major/minor) | 0 / 0 / 6 | +| First pass rate | 100% | + +All 3 major findings were planning-phase (devil's advocate ×2, adequacy ×1) and resolved before +implementation. Zero critical findings across the entire pipeline. All reviewed tasks passed +review in a single round. The only post-merge fixes (Go 1.25.11 bump, record/Test_filterViews +count) were surfaced by CI, not by code review. + +## Volume + +| Metric | Value | +|--------|-------| +| Interview questions | 8 | +| Tasks | 4 (in 3 waves) | +| Agents spawned | 13 | +| Commits | 29 | diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-metrics.json b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-metrics.json new file mode 100644 index 0000000..6e6e2a5 --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-metrics.json @@ -0,0 +1,69 @@ +{ + "meta": { + "schema_version": "1.0", + "feature_id": "004-feat-bgwriter-checkpointer", + "feature_name": "bgwriter-checkpointer", + "feature_base": "docs/features/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer", + "project": "pgcenter", + "feature_size": "M", + "model": "Opus 4.8 (1M context)", + "date_started": "2026-06-21T07:04:34Z", + "date_completed": "2026-06-21T09:29:08Z" + }, + "phases": { + "user_spec": { + "start_time": "2026-06-21T07:04:34Z", + "end_time": "2026-06-21T08:06:47Z", + "duration_min": 62, + "touch_time_min": 14, + "human_wait_time_min": 48, + "steps": {} + }, + "tech_spec": { + "start_time": "2026-06-21T08:08:27Z", + "end_time": "2026-06-21T08:20:32Z", + "duration_min": 12, + "touch_time_min": 10, + "human_wait_time_min": 2, + "steps": {} + }, + "task_decomposition": { + "start_time": "2026-06-21T08:21:31Z", + "end_time": "2026-06-21T08:28:33Z", + "duration_min": 7, + "touch_time_min": 6, + "human_wait_time_min": 1, + "steps": {} + }, + "feature_execution": { + "start_time": "2026-06-21T08:30:10Z", + "end_time": "2026-06-21T09:02:05Z", + "duration_min": 31, + "touch_time_min": 26, + "human_wait_time_min": 5, + "steps": {} + }, + "done": { + "start_time": "2026-06-21T09:16:00Z", + "end_time": "2026-06-21T09:29:08Z", + "duration_min": 13, + "touch_time_min": 11, + "human_wait_time_min": 2, + "steps": {} + } + }, + "quality": { + "validation_rounds": { "user_spec": 1, "tech_spec": 1, "task_decomposition": 1, "done": 1 }, + "validation_findings": { "critical": 0, "major": 3, "minor": 24 }, + "review_rounds": { "task_01": 1, "task_02": 1, "task_03": 1 }, + "review_findings": { "critical": 0, "major": 0, "minor": 6 }, + "first_pass_rate_pct": 100 + }, + "volume": { + "interview_questions": 8, + "tasks_count": 4, + "waves_count": 3, + "agents_spawned": 13, + "commits_count": 29 + } +} diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-qa-report.json b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-qa-report.json new file mode 100644 index 0000000..bf51ef2 --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-qa-report.json @@ -0,0 +1,132 @@ +{ + "feature": "004-feat-bgwriter-checkpointer", + "task": "04 — Pre-deploy QA", + "date": "2026-06-21", + "commit": "867005aea699868349ef66c272e4fa5bae99824b", + "branch": "develop", + "status": "passed", + "verdict": "Ready for CI. Zero blockers found. All locally-verifiable acceptance criteria PASS; PG18 slru_written and the full PG14-18 integration matrix are CI-gated (test clusters not running locally, PG18 unavailable locally).", + "environment": { + "status": "healthy", + "type": "Go CLI/TUI — no docker-compose web app; 'environment' = local PostgreSQL test clusters", + "couldRunLocally": [ + "make build — passed (bin/pgcenter produced)", + "go test ./internal/query/... ./internal/view/... — passed", + "Named guard tests: Test_SelectStatBgwriterQuery, TestNew_BgwriterView, TestFilterViews_NotRecordable, TestFilterViews_dropsExplicitNotRecordable, TestFilterViews_Recordable — all green", + "go vet on internal/query, internal/view, record, top — clean", + "Live PG 17.7 (port 5432) — PG17 bgwriter query executed, returned exactly 13 columns" + ], + "couldNotRunLocally": [ + "make test as a whole — NOT fully green: PG test clusters on ports 21914-21918 not running, so integration subtests t.Skipf; and a PRE-EXISTING unrelated panic in top/reload_test.go::Test_doReload (needs port 21917, confirmed on clean baseline) — environment/pre-existing, NOT a feature defect", + "make lint — golangci-lint binary not installed in this environment; substituted go vet + gofmt; the golangci-lint gate runs in CI", + "PG18 live verification — PG18 not available locally; slru_written column existence cannot be checked here" + ], + "rebuildAttempts": 0, + "fixCommits": [] + }, + "testSuite": { + "status": "passed (scoped)", + "details": "Feature-scoped unit/guard tests all green. Integration tests for PG14-18 t.Skipf locally (clusters down). Live PG17.7 query verified out-of-band via psql (13 columns)." + }, + "summary": { + "totalChecks": 11, + "passed": 9, + "failed": 0, + "ciGated": 2, + "criticals": 0, + "majors": 0, + "minors": 0 + }, + "acceptanceCriteria": [ + { + "criterion": "SelectStatBgwriterQuery returns correct (query, Ncols, DiffIntvl) for PG14/15/16/17/18: 12/[3,10], 12/[3,10], 12/[3,10], 13/[6,11], 14/[6,12]", + "status": "passed", + "evidence": "internal/query/bgwriter.go:41-52 branches >=180000 -> (PG18,14,[6,12]), >=170000 -> (PG17,13,[6,11]), else (PG14,12,[3,10]). Test_SelectStatBgwriterQuery (5 subtests, versions 140000/150000/160000/170000/180000) PASS." + }, + { + "criterion": "Event counters (ckpt_*, rstpt_*) absolute (outside DiffIntvl); work/time/buffer columns diffed; stats_age pass-through text outside diff", + "status": "passed", + "evidence": "PG14-16: ckpt_timed/ckpt_req at cols 1-2 (before diff block 3-10), stats_age col 11 outside [3,10]. PG17: ckpt/rstpt counters cols 1-5 (before diff block 6-11), stats_age col 12 outside [6,11]. PG18: same counters 1-5, diff block 6-12 incl. slru_written, stats_age col 13 outside [6,12]. stats_age is date_trunc(...)::text pass-through (bgwriter.go:12,24,35). Layout cross-checked live on PG17." + }, + { + "criterion": "PG14-16 column set per user-spec (incl. buf_backend / buf_backend_fsync present on PG<=16)", + "status": "passed", + "evidence": "PgStatBgwriterPG14 (bgwriter.go:7-13) selects source, ckpt_timed, ckpt_req, ckpt_write,ms, ckpt_sync,ms, buf_ckpt, buf_clean, maxwritten, buf_backend, buf_backend_fsync, buf_alloc, stats_age = 12 cols matching user-spec lines 142-144." + }, + { + "criterion": "PG17+ column set per user-spec (rstpt_* present, no buf_backend, stats_age from pg_stat_checkpointer)", + "status": "passed", + "evidence": "PgStatBgwriterPG17 (bgwriter.go:19-25) cross-joins pg_stat_bgwriter, pg_stat_checkpointer; stats_age from pg_stat_checkpointer.stats_reset; no buf_backend columns. Live PG17.7 query returned exactly 13 columns in the declared order." + }, + { + "criterion": "pgcenter record does NOT collect the view (NotRecordable honored by filterViews); report not broken", + "status": "passed", + "evidence": "internal/view/view.go:151 bgwriter entry NotRecordable: true. record/record.go:208 filterViews deletes views with NotRecordable. Tests TestFilterViews_NotRecordable, TestFilterViews_dropsExplicitNotRecordable, TestFilterViews_Recordable PASS; view_test.go:20 asserts bgwriter.NotRecordable true." + }, + { + "criterion": "Hotkey b switches to the bgwriter screen", + "status": "passed", + "evidence": "top/keybindings.go:36 {\"sysstat\", 'b', switchViewTo(app, \"bgwriter\")}. internal/view/view.go:140 'bgwriter' entry registered; view.go:337 Configure() case applies SelectStatBgwriterQuery. TestNew_BgwriterView PASS." + }, + { + "criterion": "b present in the ? help row (a,b,f,r,w)", + "status": "passed", + "evidence": "top/help.go:13 — \"a,b,f,r,w mode: 'a' activity, 'b' bgwriter/checkpointer, 'f' functions, 'r' replication, 'w' WAL,\"" + }, + { + "criterion": "overview.md corrected (no stale bgwriter-supported claim; new screen described)", + "status": "passed", + "evidence": ".claude/skills/project-knowledge/overview.md:21 — accurate entry: 'pg_stat_bgwriter (+ pg_stat_checkpointer on PG 17+) — background writer / checkpointer screen (hotkey b; PG 14-18; TUI-only, not recordable in 0.11.0)'. No stale standalone 'background writer stats' line remains." + }, + { + "criterion": "make build passes", + "status": "passed", + "evidence": "make build produced bin/pgcenter with no errors." + }, + { + "criterion": "Query executes on each available PG 14-18 container; integration green or skipped only when version unavailable", + "status": "ci_gated", + "evidence": "Local PG17.7 verified live (13 cols). PG14/15/16/18 clusters (ports 21914-21918) not running locally -> Test_StatBgwriterQueries t.Skipf. Full matrix executes in CI (lesovsky/pgcenter-testing PG14-18)." + }, + { + "criterion": "PG18 branch (incl. slru_written) actually executed on live PG18 — not silently skipped", + "status": "ci_gated", + "evidence": "PG18 not available locally. Test_StatBgwriterQueries asserts len(rows.FieldDescriptions()) == Ncols, turning the PG18 subtest into a slru_written schema-divergence gate; closed in the CI PG14-18 matrix. Cannot be verified locally." + } + ], + "regressions": { + "status": "none_introduced", + "note": "go vet clean on all touched packages. gofmt -l flags internal/view/view.go, but the diff is a pre-existing Go doc-comment formatting in Configure() (lines 313-314), present on master and develop~5 baselines, unrelated to and untouched by this feature; the feature's added lines (140-152, 337-339) are gofmt-clean." + }, + "ciGated": [ + { + "item": "Full PG14-18 integration execution of Test_StatBgwriterQueries", + "reason": "Local PG test clusters (ports 21914-21918) not running; only live PG17.7 on 5432 available", + "verificationCondition": "CI run on lesovsky/pgcenter-testing PG14-18 matrix", + "verificationSteps": "Confirm no SKIP for any PG version subtest in CI test log" + }, + { + "item": "PG18 slru_written column live verification", + "reason": "PG18 not available in local environment", + "verificationCondition": "CI PG18 job runs Test_StatBgwriterQueries with len(FieldDescriptions())==14 assertion", + "verificationSteps": "Confirm PG18 subtest passes (not skipped) in CI; the 14-column assertion proves slru_written exists" + }, + { + "item": "golangci-lint + gosec gate (make lint)", + "reason": "golangci-lint binary not installed locally; substituted go vet (clean) + gofmt (only pre-existing unrelated finding)", + "verificationCondition": "CI lint stage", + "verificationSteps": "Confirm golangci-lint + gosec pass in CI" + } + ], + "preExistingIssues": [ + { + "title": "top/reload_test.go::Test_doReload panics when PG fixture on port 21917 is absent", + "note": "Confirmed on clean baseline (git stash) in prior tasks; environmental, NOT caused by this feature." + }, + { + "title": "record/record_test.go integration subtests fail without port 21917 cluster", + "note": "Environmental: PG test clusters down locally. Feature-scoped guard tests (filterViews) pass independently." + } + ], + "findings": [] +} diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-quality-review.json b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-quality-review.json new file mode 100644 index 0000000..b586cec --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-quality-review.json @@ -0,0 +1,56 @@ +{ + "status": "approved", + "checks": { + "completeness": "pass", + "edge_cases": "pass", + "acceptance_criteria": "pass", + "contradictions": "pass", + "template_compliance": "pass", + "size_check": "warning", + "clarity": "pass" + }, + "findings": [ + { + "check": "size_check", + "severity": "minor", + "issue": "Заявлен size: M. Число пользовательских сценариев — 4 (Диагностика частых чекпоинтов, IO-стоимость, настройка bgwriter, restartpoints на реплике), что превышает порог >3 user flows. Само число критериев приёмки (7) и интеграций в пределах нормы, поэтому это пограничный warning, а не противоречие размеру.", + "location": "Пользовательские сценарии", + "fix": "Размер M остаётся адекватным (это, по сути, один экран с вариациями отображения). Достаточно осознать, что сценарии — это режимы чтения одного экрана, а не отдельные флоу. Действий не требуется, либо можно явно отметить в Ограничениях, что все сценарии обслуживаются одним экраном без отдельной логики." + }, + { + "check": "acceptance_criteria", + "severity": "minor", + "issue": "Критерий про PG 18 содержит формулировку с допущением: 'дополнительная колонка pg_stat_checkpointer, добавленная в PG 18 (по документации — slru_written)'. Точное имя/состав колонки не зафиксированы и подтверждаются только на этапе реализации против живого PG 18. Критерий тестируем (наличие колонки проверяется на контейнере), но ожидаемый результат частично открыт.", + "location": "Критерии приёмки (пункт про PG 18)", + "fix": "Это сознательное и явно задокументированное ограничение (риск + митигация описаны). Критерий остаётся проверяемым: 'колонка, добавленная в pg_stat_checkpointer в PG 18, присутствует и подтверждена на живом PG 18'. Действий не требуется; при реализации зафиксировать точное имя колонки." + }, + { + "check": "completeness", + "severity": "minor", + "issue": "В 'Технические решения' указано 'исправить overview.md, который ошибочно заявляет поддержку pg_stat_bgwriter', и это упомянуто в 'Зачем' и в интервью (migration_needs). Однако в Критериях приёмки нет пункта, верифицирующего этот фикс документации.", + "location": "Критерии приёмки / Технические решения", + "fix": "Добавить критерий приёмки: 'overview.md обновлён — устранено ошибочное заявление о поддержке pg_stat_bgwriter (или заявление приведено в соответствие с фактической поддержкой)'." + } + ], + "interview_coverage": { + "covered": [ + "Состав колонок PG 14-16 и PG 17+ (variant A, единые заголовки общих колонок)", + "Дифф-семантика: ckpt_*/rstpt_* — абсолютные кумулятивные, объём/время — дельта", + "stats_age на PG 17+ из pg_stat_checkpointer с фиксацией в документации", + "Restartpoints (rstpt_timed/req/done) абсолютные на PG 17+", + "PG 18: дополнительная колонка (slru_written) по документации + верификация на контейнере", + "DiffIntvl — единый непрерывный диапазон, абсолютные счётчики в начале", + "NotRecordable: true, live-only в 0.11.0, record/report в backlog", + "MinRequiredVersion = PG 14", + "Пользовательские истории US1-US5 (timed vs requested, IO-стоимость, настройка bgwriter, restartpoints, stats_age)", + "Граничные случаи: stats_reset IS NULL, внешний pg_stat_reset_shared, отсутствие buf_backend на PG 17+, standby", + "Error scenarios: чтение любой ролью без привилегий, работа на PgBouncer, фильтрация PG<14", + "UX: однострочный экран по модели pg_stat_wal, без новых UI-компонентов, без in-UI уведомления о версионных колонках", + "Стратегия тестирования: unit для SelectStatBgwriterQuery PG 14-18 + integration на живых контейнерах, без E2E", + "Технические решения: новый internal/query/bgwriter.go, паттерн wal.go, cross-join на PG 17+", + "Справка (?): добавление b в ряд a,b,f,r,w" + ], + "missing": [] + }, + "summary": "User-spec полный, согласованный и хорошо структурированный: все разделы шаблона заполнены, критерии приёмки тестируемы, граничные случаи и риски с митигациями на месте, противоречий нет, все темы интервью отражены. Статус approved; warning по размеру (4 сценария при M) и три minor-замечания, включая отсутствие критерия приёмки на фикс overview.md." +} diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-security-audit.json b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-security-audit.json new file mode 100644 index 0000000..ecf5daf --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-security-audit.json @@ -0,0 +1,21 @@ +{ + "status": "approved", + "summary": { + "totalFindings": 1, + "critical": 0, + "major": 0, + "minor": 1 + }, + "findings": [ + { + "severity": "minor", + "category": "A03: Injection", + "title": "Version-branch selector must use only integer-compared server version, never string concatenation", + "description": "The tech-spec correctly mandates static const SQL strings with no user interpolation, mirroring internal/query/wal.go. This is the right approach and was verified: wal.go uses pure const strings and query.Format() runs Go text/template over an Options struct populated from server metadata (version, recovery, GUCs) rather than interactive input, with no {{ }} placeholders in the wal/bgwriter templates. The only residual risk is at implementation time: SelectStatBgwriterQuery branches on the version integer (>= 180000 / >= 170000 / else). The version value itself comes from the trusted PG connection, but the implementation must select between pre-built const strings by integer comparison and must NOT build SQL by concatenating any value (version, column name, label) into the query at runtime. Keeping each version's query as a complete, self-contained const (as wal.go does) preserves the zero-injection-surface property the spec relies on.", + "location": "Section: Data Models / Decision 5; future internal/query/bgwriter.go SelectStatBgwriterQuery", + "impact": "If a future maintainer assembled the query via string concatenation instead of returning a fixed const per branch, a dynamic surface could be reintroduced. With the documented design there is no exploitable path; this is a guardrail to keep the design's security property intact during implementation.", + "recommendation": "In bgwriter.go return whole const query strings selected purely by integer version comparison, exactly as SelectStatWALQuery does. Do not interpolate version literals, column names, or the 'Bgwriter' source label into SQL at runtime. Add the bgwriter consts to the same review surface as wal.go.", + "cwe": "CWE-89" + } + ] +} diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-skeptic-techspec.json b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-skeptic-techspec.json new file mode 100644 index 0000000..0043717 --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-skeptic-techspec.json @@ -0,0 +1,30 @@ +{ + "status": "approved", + "summary": "Checked 18 factual claims against the codebase; found 0 mirages. Every file path, function, struct field, pattern, line reference, and version-constant assertion in the tech-spec matches reality.", + "findings": [], + "stats": { + "total_claims_checked": 18, + "confirmed": 18, + "mirages_found": 0, + "verified_claims": [ + "internal/query/wal.go exists with PgStatWALPG14/PgStatWALDefault consts", + "SelectStatWALQuery(version int) (string, int, [2]int) at wal.go:25", + "SelectStatWALQuery returns PgStatWALDefault,7,[2,5] for >=180000 else PgStatWALPG14,11,[2,9]", + "internal/view/view.go has 'wal' view entry at view.go:128-139", + "view.go Configure() switch has case \"wal\" at view.go:321-322 calling SelectStatWALQuery", + "View struct has NotRecordable bool field at view.go:30", + "top/keybindings.go uses {\"sysstat\", 'w', switchViewTo(app, \"wal\")} at line 35", + "lowercase 'b' is genuinely unbound in keybindings.go (only 'B' bound at line 47)", + "internal/stat/postgres.go diff() at line 303 applies interval as contiguous [interval[0],interval[1]] range (line 331)", + "diff() copies value as-is when l < interval[0] || l > interval[1]", + "record/record.go filterViews() at line 200 skips NotRecordable views (line 208 delete+continue)", + "query.PostgresV14 = 140000 exists in query.go:17", + "No PostgresV17 or PostgresV18 constant exists (query.go:9-18 ends at V14)", + "internal/postgres/testing.go NewTestConnectVersion at testing.go:16", + "port map 140000:21914..180000:21918 in testing.go:18-23", + "top/help.go mode-key row at help.go:13 (currently 'a,f,r,w')", + "internal/query/bgwriter.go does not yet exist (correct: it is a NEW file)", + "ViewType/Format pipeline and DiffIntvl mechanics match code-research description" + ] + } +} diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-task-01-dev-code-reviewer-round1.json b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-task-01-dev-code-reviewer-round1.json new file mode 100644 index 0000000..f486015 --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-task-01-dev-code-reviewer-round1.json @@ -0,0 +1,33 @@ +{ + "reviewer": "dev-code-reviewer", + "status": "approved", + "summary": "The bgwriter query layer is a faithful, correct mirror of the wal.go template. All three per-version column layouts match the tech-spec Data Models exactly; DiffIntvl placement correctly keeps event counters and stats_age outside the contiguous diffed block (verified against the inclusive-bound semantics of stat/postgres.go:diff()). Branch boundaries use raw literals, all SQL is static const with no interpolation surface, and the tests mirror wal_test.go. No critical or major issues found.", + "criticalIssues": [], + "suggestions": [ + { + "file": "internal/query/bgwriter_test.go", + "line": 36, + "severity": "minor", + "category": "testing", + "suggestion": "Test_StatBgwriterQueries does not assert the actual column count returned by the live query against the per-version Ncols. The integration test only checks Exec returns no error, so a query that silently returns the wrong number of columns (e.g., a hand-edited alias drift) would still pass. Mirroring wal_test.go this is the established pattern, so this is informational only; if a stronger gate is desired, the PG18 slru_written verification (the one genuinely unverified-live column) could read rows.FieldDescriptions() and assert len == Ncols.", + "benefit": "Turns the integration test into a real schema-divergence gate for the unverified PG18 column set instead of relying on Exec-success alone.", + "optional": true + }, + { + "file": "internal/query/bgwriter.go", + "line": 9, + "severity": "minor", + "category": "readability", + "suggestion": "The double-quoted aliases ckpt_write,ms / ckpt_sync,ms embed a comma inside the identifier, matching the wal.go header convention (write,ms / sync,ms) used by the TUI renderer for the unit suffix. This is correct and intentional; noting it only because it is non-obvious to a future reader why a comma lives inside a SQL identifier. No change required given the wal.go precedent establishes the convention project-wide.", + "benefit": "Awareness only; consistency with wal.go is the right call.", + "optional": true + } + ], + "metrics": { + "filesReviewed": 2, + "criticalIssuesCount": 0, + "majorIssuesCount": 0, + "minorIssuesCount": 2, + "testCoverageAssessment": "adequate" + } +} diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-task-01-dev-test-reviewer-round1.json b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-task-01-dev-test-reviewer-round1.json new file mode 100644 index 0000000..0b09992 --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-task-01-dev-test-reviewer-round1.json @@ -0,0 +1,38 @@ +{ + "reviewer": "dev-test-reviewer", + "status": "passed", + "summary": "Tests faithfully mirror the established wal_test.go convention and fully satisfy the TDD anchor: the unit test asserts the per-version (Ncols, DiffIntvl) tuples for all five versions including the 170000/180000 boundaries, and the integration test follows the Format -> connect -> skip -> exec -> close pattern verbatim. Assertion values were cross-checked against both the tech-spec Data Models and the actual query const column counts and all match. The single notable gap (no assertion on returned query string / column content) is acceptable for this layer because the integration test executing the real SQL against the live cluster is the true content gate. No critical or major findings.", + "findings": [ + { + "severity": "minor", + "category": "missing_coverage", + "location": "internal/query/bgwriter_test.go:27", + "issue": "Test_SelectStatBgwriterQuery discards the returned query string (`_, gotNcols, gotDiffIntvl`). The unit test therefore never asserts which const the selector returns, so a wiring bug that returns the correct (Ncols, DiffIntvl) tuple but the wrong query const (e.g. PG17 branch returning PgStatBgwriterPG14 while still reporting 13/[6,11]) would pass the unit test. In practice the integration test catches this at execution time, so the risk is low and this mirrors wal_test.go exactly.", + "recommendation": "Optional hardening, only if diverging from the wal_test.go template is acceptable: capture the query string and assert it is non-empty and version-appropriate, e.g. for >=170000 `assert.Contains(t, gotQuery, \"pg_stat_checkpointer\")` and for <170000 `assert.NotContains(t, gotQuery, \"pg_stat_checkpointer\")`; for >=180000 `assert.Contains(t, gotQuery, \"slru_written\")`. This binds the tuple to the correct const without re-asserting full SQL. Given the integration test already executes the real SQL and this matches the established pattern, leaving it as-is is reasonable.", + "litmusTestFailed": false + }, + { + "severity": "minor", + "category": "anti_pattern", + "location": "internal/query/bgwriter_test.go:51", + "issue": "The integration test ignores the rows returned by conn.Exec (`_, err = conn.Exec(q)`) and only asserts NoError. It verifies the query parses and executes (which validates that all referenced columns/views exist on each PG version — the real PG18 slru_written gate), but does not assert the row shape matches the declared Ncols. A query that executes but returns a different column count than Ncols would still pass.", + "recommendation": "Optional: this is identical to wal_test.go and the column-count contract is already locked by the unit test's wantNcols, so no change is required to stay consistent with the suite. If stronger coupling were desired, fetch the row description and assert `len(fieldDescriptions) == wantNcols`, but that diverges from the template and the unit test already pins Ncols. Acceptable as-is.", + "litmusTestFailed": false + } + ], + "metrics": { + "filesReviewed": 4, + "litmusTest": { + "checked": 2, + "passed": 2, + "failed": 0 + }, + "coverageAssessment": "adequate", + "pyramidBalance": { + "unit": 1, + "integration": 1, + "e2e": 0, + "assessment": "healthy" + } + } +} diff --git a/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-task-01.md b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-task-01.md new file mode 100644 index 0000000..009de4c --- /dev/null +++ b/docs/features/archive/004-feat-bgwriter-checkpointer/004-feat-bgwriter-checkpointer-task-01.md @@ -0,0 +1,204 @@ +--- +status: done # planned -> in_progress -> done +depends_on: [] # ID задач-зависимостей (строки: ["01", "02"]) +wave: 1 # волна параллельного выполнения +skills: [code-writing] # МАССИВ скиллов для загрузки +verify: bash # make test (bgwriter unit + integration tests pass) +reviewers: [dev-code-reviewer, dev-test-reviewer] # явно указать. Пусто = fallback на defaults +teammate_name: # имя агента-исполнителя (опционально; если не задано — генерируется по описанию задачи) +--- + +# Task 01: bgwriter query layer + tests + +## Required Skills + +Перед выполнением задачи загрузи: +- `/skill:code-writing` — [skills/code-writing/SKILL.md](~/.claude/skills/code-writing/SKILL.md) + +## Description + +Create the query layer for the new `bgwriter` TUI screen that combines `pg_stat_bgwriter` +(all supported versions) and `pg_stat_checkpointer` (PG17+). This is a pure SQL/selector layer — +no view registration or TUI wiring (that is Task 3). The task delivers two new files: + +- `internal/query/bgwriter.go` — three version-aware query constants (PG14-16, PG17, PG18) plus the + selector `SelectStatBgwriterQuery(version int) (string, int, [2]int)`, structurally identical to + the existing `internal/query/wal.go` (`SelectStatWALQuery`). +- `internal/query/bgwriter_test.go` — a table-driven unit test asserting the per-version + `(Ncols, DiffIntvl)` tuples, plus an integration test that executes the query against live + PG14-18 clusters, mirroring `internal/query/wal_test.go`. + +The screen design relies on a single mechanism: `internal/stat/postgres.go:diff()` diffs only the +columns inside the contiguous `DiffIntvl [lo,hi]` range and copies every column outside it as-is. +This is exploited so that the checkpoint/restartpoint EVENT counters (`ckpt_timed`, `ckpt_req`, +`rstpt_*`) and `stats_age` render as absolute values, while the work/time/buffer columns render as +per-interval deltas. Because `DiffIntvl` is a single contiguous range, the column layout is fixed: +`source` (col 0) → absolute event-counter block → contiguous diffed block → `stats_age` last. + +The PG18 branch adds a `slru_written` column to `pg_stat_checkpointer`; its exact presence MUST be +verified against a live PG18 cluster during the integration test (the code-research could only +confirm PG17 live; PG18 is from docs). Do not finalize the PG18 query from memory. + +## What to do + +1. Write the unit test `Test_SelectStatBgwriterQuery` first (table-driven over PG 14/15/16/17/18), + asserting the returned `Ncols` and `DiffIntvl` per version. Run it, confirm it fails (selector + does not exist yet). +2. Create `internal/query/bgwriter.go` with three `const` query strings: + - **`PgStatBgwriterPG14`** (PG 14-16) — `FROM pg_stat_bgwriter`, 12 columns, `DiffIntvl [3,10]`. + - **`PgStatBgwriterPG17`** (PG 17) — cross join `FROM pg_stat_bgwriter, pg_stat_checkpointer`, + 13 columns, `DiffIntvl [6,11]`. + - **`PgStatBgwriterPG18`** (PG 18+) — as PG17 plus the diffed `slru_written` column, 14 columns, + `DiffIntvl [6,12]`. + Each query begins with `'Bgwriter' AS source` and ends with + `date_trunc('seconds', now() - stats_reset)::text AS stats_age`. Use clear column aliases matching + the Data Models layout (e.g. `checkpoint_write_time AS "ckpt_write,ms"`). +3. Implement `SelectStatBgwriterQuery(version int) (string, int, [2]int)` with the branch order: + `>= 180000` → `(PgStatBgwriterPG18, 14, [2]int{6,12})`; `>= 170000` → + `(PgStatBgwriterPG17, 13, [2]int{6,11})`; else → `(PgStatBgwriterPG14, 12, [2]int{3,10})`. +4. Run the unit test, confirm it passes. +5. Write the integration test `Test_StatBgwriterQueries` mirroring `Test_StatWALQueries`: loop + `[]int{140000,150000,160000,170000,180000}`, `Format()` the template, `NewTestConnectVersion`, + `t.Skipf` on unavailable version, `conn.Exec(q)`, assert no error, `conn.Close()`. +6. Run `make test`. On the live PG18 cluster, confirm the PG18 query (with `slru_written`) executes + without error — this is the live verification of the PG18 column set. If `slru_written` does not + exist on the live PG18 cluster, adjust the PG18 branch and its `Ncols`/`DiffIntvl` accordingly + and record the deviation in the decisions log. + +## TDD Anchor + +Тесты, которые нужно написать ДО реализации. Пишем → запускаем → убеждаемся что падают → пишем код → убеждаемся что проходят. + +- `internal/query/bgwriter_test.go::Test_SelectStatBgwriterQuery` — table-driven over + PG 14/15/16/17/18; asserts the returned `(Ncols, DiffIntvl)` tuple per version: + 140000→`(12, [3,10])`, 150000→`(12, [3,10])`, 160000→`(12, [3,10])`, 170000→`(13, [6,11])`, + 180000→`(14, [6,12])`. Verifies the selector picks the correct const at the 170000 and 180000 + boundaries. +- `internal/query/bgwriter_test.go::Test_StatBgwriterQueries` — integration; loops PG14-18, + `Format()`s the template, connects via `NewTestConnectVersion`, `t.Skipf` on unavailable version, + executes the query, asserts no error. This is where the live PG18 `slru_written` column set is + validated (real gate in CI where the full PG14-18 matrix runs). + +## Acceptance Criteria + +- [ ] `internal/query/bgwriter.go` exists with three query consts and `SelectStatBgwriterQuery`. +- [ ] `SelectStatBgwriterQuery` returns `(query, Ncols, DiffIntvl)` per version: PG14/15/16 → + `12, [3,10]`; PG17 → `13, [6,11]`; PG18 → `14, [6,12]`. +- [ ] Branch boundaries use raw literals `170000`/`180000` (no `PostgresV17`/`PostgresV18` constants). +- [ ] Event counters (`ckpt_timed`, `ckpt_req`, `rstpt_*`) sit in a contiguous block right after + `source`, OUTSIDE `DiffIntvl`; the diffed work/time/buffer block is contiguous; `stats_age` is + the last column, outside `DiffIntvl`. +- [ ] PG17 query cross-joins `pg_stat_bgwriter` and `pg_stat_checkpointer`; PG18 query adds the + diffed `slru_written` column. +- [ ] `stats_age` on PG17+ derives from `pg_stat_checkpointer.stats_reset` (Decision 4). +- [ ] All SQL is static `const` strings (no user interpolation), mirroring `wal.go`. +- [ ] `Test_SelectStatBgwriterQuery` (unit) is green. +- [ ] `Test_StatBgwriterQueries` (integration) is green on every available PG14-18 cluster, skipped + only when a version is unavailable; PG18 `slru_written` set verified live. +- [ ] `make test` passes; no regressions in existing tests. + +## Context Files + +**Feature artifacts:** +- [004-feat-bgwriter-checkpointer.md](004-feat-bgwriter-checkpointer.md) — user-spec +- [004-feat-bgwriter-checkpointer-tech-spec.md](004-feat-bgwriter-checkpointer-tech-spec.md) — + tech-spec (Task 1, Data Models with exact per-version layouts/Ncols/DiffIntvl, Decisions 1-5, + Testing Strategy) +- [004-feat-bgwriter-checkpointer-code-research.md](004-feat-bgwriter-checkpointer-code-research.md) — + §1 wal.go template, §6 version constants, §7 exact column inventory per PG version, §9 test patterns +- [004-feat-bgwriter-checkpointer-decisions.md](004-feat-bgwriter-checkpointer-decisions.md) — + decisions log (created during execution) + +**Project knowledge:** +- [overview.md](.claude/skills/project-knowledge/overview.md) — project features and supported stats +- [architecture.md](.claude/skills/project-knowledge/architecture.md) — package layout, data flow, + PG version handling +- [patterns.md](.claude/skills/project-knowledge/patterns.md) — code patterns, testing conventions, + version branching + +**Code files:** +- [internal/query/bgwriter.go](internal/query/bgwriter.go) — NEW: query consts + selector +- [internal/query/bgwriter_test.go](internal/query/bgwriter_test.go) — NEW: unit + integration tests +- [internal/query/wal.go](internal/query/wal.go) — exact template for the new query file +- [internal/query/wal_test.go](internal/query/wal_test.go) — exact template for the new test file +- [internal/query/query.go](internal/query/query.go) — version constants (`PostgresV14`), `Format()`, + `NewOptions()` +- [internal/postgres/testing.go](internal/postgres/testing.go) — `NewTestConnectVersion(version)` + for integration tests (ports 21914-21918) +- [internal/stat/postgres.go](internal/stat/postgres.go) — `diff()` applies `DiffIntvl` (read for + understanding; not modified) + +## Verification Steps + +- Run `make test` — both `Test_SelectStatBgwriterQuery` (unit) and `Test_StatBgwriterQueries` + (integration) pass; integration cases skip only for unavailable PG versions. +- Confirm the live PG18 cluster executes the PG18 query (with `slru_written`) without error + (the `slru_written` schema-divergence gate is real where PG18 is available). +- Run `make build` and `make lint` — compile clean, no new gosec/golangci-lint findings (static + const SQL, no user interpolation — same style as `wal.go`). + +## Details + +**Files:** +- `internal/query/bgwriter.go` (NEW) — package `query`. Three `const` strings + selector. Template + is `internal/query/wal.go` (two consts + `SelectStatWALQuery`). Column layouts (0-based) from the + tech-spec Data Models: + - PG14-16 (`Ncols=12`, `DiffIntvl=[3,10]`): `0 source`, `1 ckpt_timed`, `2 ckpt_req`, + `3 ckpt_write,ms`, `4 ckpt_sync,ms`, `5 buf_ckpt`, `6 buf_clean`, `7 maxwritten`, + `8 buf_backend`, `9 buf_backend_fsync`, `10 buf_alloc`, `11 stats_age`. Source columns: + `checkpoints_timed`, `checkpoints_req`, `checkpoint_write_time`, `checkpoint_sync_time`, + `buffers_checkpoint`, `buffers_clean`, `maxwritten_clean`, `buffers_backend`, + `buffers_backend_fsync`, `buffers_alloc`, `stats_reset` — all from `pg_stat_bgwriter`. + - PG17 (`Ncols=13`, `DiffIntvl=[6,11]`): `0 source`, `1 ckpt_timed`, `2 ckpt_req`, + `3 rstpt_timed`, `4 rstpt_req`, `5 rstpt_done`, `6 ckpt_write,ms`, `7 ckpt_sync,ms`, + `8 buf_ckpt`, `9 buf_clean`, `10 maxwritten`, `11 buf_alloc`, `12 stats_age`. From + `pg_stat_checkpointer`: `num_timed`, `num_requested`, `restartpoints_timed`, + `restartpoints_req`, `restartpoints_done`, `write_time`, `sync_time`, `buffers_written`, + `stats_reset`. From `pg_stat_bgwriter`: `buffers_clean`, `maxwritten_clean`, `buffers_alloc`. + Cross join: `FROM pg_stat_bgwriter, pg_stat_checkpointer` (single row × single row). + - PG18 (`Ncols=14`, `DiffIntvl=[6,12]`): PG17 plus diffed `slru_written` inserted in the diffed + block (grouped next to `buf_ckpt` for readability): `... 8 buf_ckpt, 9 slru_written, + 10 buf_clean, 11 maxwritten, 12 buf_alloc, 13 stats_age`. +- `internal/query/bgwriter_test.go` (NEW) — package `query`. Two tests mirroring `wal_test.go`: + `Test_SelectStatBgwriterQuery` (table of `{version, wantNcols, wantDiffIntvl}`, query string + ignored via `_`) and `Test_StatBgwriterQueries` (integration loop). Use + `NewOptions(version, "f", "off", 256, "public")` for `Format()`, identical to `wal_test.go:40`. + +**Dependencies:** none (no new packages). No dependency on other tasks (wave 1, `depends_on: []`). +Task 3 depends on this task's `SelectStatBgwriterQuery` selector and the `PgStatBgwriterPG14` const. + +**Edge cases:** +- Version branch boundaries: exactly 170000 → PG17 branch; exactly 180000 → PG18 branch; below + 170000 → PG14-16 branch. The unit test covers 160000 (highest pre-17) and the 170000/180000 + boundaries. +- `stats_reset` source on PG17+: select from `pg_stat_checkpointer`, NOT `pg_stat_bgwriter` + (Decision 4) — `pg_stat_bgwriter` on PG17+ retains its own `stats_reset`, so be explicit. +- `pg_stat_checkpointer` does not exist pre-17 → PG14-16 query must NOT reference it. +- A version not in the integration loop (or unavailable cluster) → `t.Skipf`, not a failure. + +**Implementation hints:** +- Diff exclusion mechanism: `internal/stat/postgres.go:diff()` (line 330-333) copies any column + with index `< interval[0]` or `> interval[1]` as-is; columns inside are subtracted. Keep the + absolute event-counter block immediately after `source` and `stats_age` last so both fall outside + the single contiguous `DiffIntvl`. +- `wal.go` is the exact structural template: leading `'