Skip to content

refactor(proxy): migrate from SQLite to Postgres via Kysely#235

Open
maciej-konczal wants to merge 2 commits into
HamedMP:mainfrom
maciej-konczal:refactor/proxy-postgres-migration
Open

refactor(proxy): migrate from SQLite to Postgres via Kysely#235
maciej-konczal wants to merge 2 commits into
HamedMP:mainfrom
maciej-konczal:refactor/proxy-postgres-migration

Conversation

@maciej-konczal
Copy link
Copy Markdown

Replaces better-sqlite3 with Kysely + pg, matching the project-wide "PostgreSQL via Kysely" rule in CLAUDE.md. Adds PROXY_DATABASE_URL with fallback to PLATFORM_DATABASE_URL, removes the SQLite file path default and the proxy-data Docker volume. Tests use kysely-pglite (in-memory PG) following tests/platform/platform-db-test-helper.ts.

Behavior change: host-mode bun run dev now requires a Postgres available (Homebrew or Docker). README documents both paths. Docker stack unchanged for users -- PROXY_DATABASE_URL defaults to the existing postgres service.

Also wires DB pool cleanup into the SIGTERM/SIGINT shutdown handler via resetProxyDb() so the pool doesn't outlive graceful shutdown.

Replaces better-sqlite3 with Kysely + pg, matching the project-wide
"PostgreSQL via Kysely" rule in CLAUDE.md. Adds PROXY_DATABASE_URL
with fallback to PLATFORM_DATABASE_URL, removes the SQLite file path
default and the proxy-data Docker volume. Tests use kysely-pglite
(in-memory PG) following tests/platform/platform-db-test-helper.ts.

Behavior change: host-mode bun run dev now requires a Postgres
available (Homebrew or Docker). README documents both paths. Docker
stack unchanged for users -- PROXY_DATABASE_URL defaults to the
existing postgres service.

Also wires DB pool cleanup into the SIGTERM/SIGINT shutdown handler
via resetProxyDb() so the pool doesn't outlive graceful shutdown.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 28, 2026

@maciej-konczal is attempting to deploy a commit to the Finna Team on Vercel.

A member of the Team first needs to authorize it.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 28, 2026

Greptile Summary

This PR migrates the proxy service's persistence layer from better-sqlite3 to Kysely with a PostgreSQL backend, aligning with the project-wide "PostgreSQL via Kysely" rule in CLAUDE.md. All synchronous DB calls are converted to async, the N+1 getUsageSummary query is collapsed into a single GROUP BY with FILTER aggregates, the shutdown handler now flushes the pg pool via resetProxyDb(), and a new tests/proxy/db.test.ts suite exercises all paths using kysely-pglite.

  • Schema migration: migrate() runs CREATE TABLE IF NOT EXISTS + indexes for api_usage and user_quotas on startup; the proxy-data Docker volume is removed and the proxy container now depends on the postgres health check.
  • Environment resolution: PROXY_DATABASE_URL is introduced with fallback to PLATFORM_DATABASE_URL (and a third-tier POSTGRES_URL fallback); the Docker Compose file wires the default to the existing postgres service, leaving the Docker stack unchanged for users.

Confidence Score: 5/5

Safe to merge; the SQLite-to-Postgres migration is complete and correct for the Docker deployment path, with good test coverage via kysely-pglite.

The refactor is well-scoped, all async conversions are correct, and the new test suite validates every DB path. Two findings were surfaced — the migrate() function runs DDL without a wrapping transaction, and the POSTGRES_URL fallback can produce a malformed URL in PaaS environments — but neither affects the primary Docker deployment path where PLATFORM_DATABASE_URL is always injected by Compose.

packages/proxy/src/db.ts — the migrate() transaction wrapping and POSTGRES_URL fallback construction.

Important Files Changed

Filename Overview
packages/proxy/src/db.ts Core refactor from better-sqlite3 to Kysely/pg; well-structured, but migrate() runs 4 DDL statements without a wrapping transaction, violating CLAUDE.md atomicity rules.
packages/proxy/src/main.ts Correctly updated all sync DB calls to async; shutdown handler now properly uses Promise.allSettled to flush both PostHog and the DB pool.
tests/proxy/db.test.ts New test suite using kysely-pglite (in-memory Postgres); covers insert/rollup, quota enforcement, upsert, summary, and metrics seed paths.
docker-compose.dev.yml Removes proxy-data volume, adds postgres health-check dependency for proxy service, and sets PROXY_DATABASE_URL with correct nested fallback.
packages/proxy/package.json Swaps better-sqlite3 for kysely + pg; adds kysely-pglite as a dev dependency for tests; dev script now auto-loads .env.

Sequence Diagram

sequenceDiagram
    participant Main as main.ts (startup)
    participant DB as db.ts (createProxyDb)
    participant PG as pg.Pool
    participant Client as Hono Route Handler

    Main->>DB: createProxyDb(url)
    DB->>PG: "new pg.Pool({ connectionString })"
    DB->>PG: migrate() [4 DDL statements]
    PG-->>DB: ready promise resolves
    DB-->>Main: ProxyDB instance

    Main->>PG: getMetricsSeed() [seed Prometheus]
    PG-->>Main: rows

    Client->>DB: checkQuota(userId)
    DB->>PG: SELECT user_quotas
    DB->>PG: SUM(cost_usd) day
    DB->>PG: SUM(cost_usd) month
    PG-->>Client: QuotaCheck

    Client->>DB: insertUsage(record)
    DB->>PG: INSERT api_usage
    PG-->>Client: void

    Note over Main,PG: SIGTERM / SIGINT
    Main->>DB: resetProxyDb()
    DB->>PG: "kysely.destroy() -> pool.end()"
    PG-->>DB: pool closed
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
packages/proxy/src/db.ts:232-259
The `migrate` function issues 4 separate DDL statements without wrapping them in a transaction. CLAUDE.md's atomicity rule is explicit: "2+ related DB writes MUST use a transaction. No exceptions." Without a transaction, a crash or pool error between statements leaves a partially-initialized schema. While `IF NOT EXISTS` guards make a retry recoverable, any concurrent second startup could observe the intermediate state and issue redundant DDL against a half-built schema.

### Issue 2 of 2
packages/proxy/src/db.ts:61-64
**`POSTGRES_URL` fallback constructs an invalid URL when the value already includes a database name.** Most PaaS environments (Heroku, Railway, Neon) set `POSTGRES_URL` as a full connection string that already includes the database component. Appending `/matrixos_platform` to such a value produces a path like `…/existingdb/matrixos_platform`, which the `pg` driver will reject with a connection error. Using `process.env.POSTGRES_URL` directly as the final fallback (without appending anything) avoids this.

Reviews (2): Last reviewed commit: "fix(proxy): single-query usage summary a..." | Re-trigger Greptile

Comment thread packages/proxy/src/db.ts
Comment on lines +132 to +147
await kysely
.insertInto('user_quotas')
.values({
user_id: userId,
daily_limit_usd: dailyLimitUsd,
monthly_limit_usd: monthlyLimitUsd,
enabled: true,
})
.onConflict((oc) =>
oc.column('user_id').doUpdateSet({
daily_limit_usd: dailyLimitUsd,
monthly_limit_usd: monthlyLimitUsd,
}),
)
.execute();
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 N+1 queries in getUsageSummary

getUsageSummary first fetches all distinct user_id values, then fires getUserUsage for each one — which itself issues 3 parallel queries (day, month, all). For N users this is 1 + 3N round-trips to the database pool (which is capped at 10 connections). Under even modest user counts the pool will be saturated on every admin summary call. A single aggregation query grouping by user_id and window (date_trunc) would collapse this to 1–2 queries.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/proxy/src/db.ts
Line: 132-147

Comment:
**N+1 queries in `getUsageSummary`**

`getUsageSummary` first fetches all distinct `user_id` values, then fires `getUserUsage` for each one — which itself issues 3 parallel queries (`day`, `month`, `all`). For N users this is 1 + 3N round-trips to the database pool (which is capped at 10 connections). Under even modest user counts the pool will be saturated on every admin summary call. A single aggregation query grouping by `user_id` and window (`date_trunc`) would collapse this to 1–2 queries.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread packages/proxy/src/main.ts Outdated
Comment on lines 332 to 339
try {
await posthogErrorTracker.shutdown();
await Promise.allSettled([
posthogErrorTracker.shutdown(),
resetProxyDb(),
]);
} catch (err: unknown) {
console.warn('[proxy] Failed during shutdown:', err instanceof Error ? err.message : String(err));
} finally {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unreachable catch branch in shutdown

Promise.allSettled is explicitly designed to never reject — it resolves with an array of settled results regardless of individual promise outcomes. The catch block here will never execute, so any real error from resetProxyDb is silently swallowed into the settled result rather than being warned about. Inspecting the settled results would restore the intended behavior.

Suggested change
try {
await posthogErrorTracker.shutdown();
await Promise.allSettled([
posthogErrorTracker.shutdown(),
resetProxyDb(),
]);
} catch (err: unknown) {
console.warn('[proxy] Failed during shutdown:', err instanceof Error ? err.message : String(err));
} finally {
const results = await Promise.allSettled([
posthogErrorTracker.shutdown(),
resetProxyDb(),
]);
for (const result of results) {
if (result.status === 'rejected') {
console.warn('[proxy] Failed during shutdown:', result.reason instanceof Error ? result.reason.message : String(result.reason));
}
}
try {
// intentionally empty — errors already logged above
} catch {
// unreachable
} finally {
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/proxy/src/main.ts
Line: 332-339

Comment:
**Unreachable `catch` branch in `shutdown`**

`Promise.allSettled` is explicitly designed to never reject — it resolves with an array of settled results regardless of individual promise outcomes. The `catch` block here will never execute, so any real error from `resetProxyDb` is silently swallowed into the settled result rather than being warned about. Inspecting the settled results would restore the intended behavior.

```suggestion
  const results = await Promise.allSettled([
    posthogErrorTracker.shutdown(),
    resetProxyDb(),
  ]);
  for (const result of results) {
    if (result.status === 'rejected') {
      console.warn('[proxy] Failed during shutdown:', result.reason instanceof Error ? result.reason.message : String(result.reason));
    }
  }
  try {
    // intentionally empty — errors already logged above
  } catch {
    // unreachable
  } finally {
```

How can I resolve this? If you propose a fix, please make it concise.

Addresses two review findings on the Postgres migration:

- getUsageSummary previously fired 1 + 3N queries (one DISTINCT plus
  three per user). Collapsed to a single grouped query using FILTER
  aggregates -- O(1) round-trips regardless of user count, so the
  10-connection pool no longer saturates on admin summary calls.

- shutdown() wrapped Promise.allSettled in try/catch, but allSettled
  never rejects so the catch was dead code that silently swallowed
  resetProxyDb errors. Now iterates settled results and logs each
  rejected reason individually.

New test in tests/proxy/db.test.ts locks in the per-user roll-up
behavior so future query rewrites preserve it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@HamedMP
Copy link
Copy Markdown
Owner

HamedMP commented May 29, 2026

I would hold this PR until the following are addressed.

Required before merge:

  1. Fix the red CI shard. Unit Tests (3/4) is still failing on job 78523030952 because tests/observability/posthog-error-tracking.test.ts still source-inspects for the exact string await posthogErrorTracker.shutdown(), while packages/proxy/src/main.ts now calls it inside Promise.allSettled([...]). This needs a test update that still verifies shutdown ordering and drain behavior.

  2. Update the distro compose proxy services, not only docker-compose.dev.yml. On the current head, these still pass only the removed SQLite env/volume path:

    • distro/docker-compose.local.yml: PROXY_DB_PATH=/data/proxy.db
    • distro/docker-compose.multi.yml: PROXY_DB_PATH=/data/proxy.db
    • distro/docker-compose.platform.yml: PROXY_DB_PATH=/data/proxy.db

    Since packages/proxy/src/db.ts now throws without PROXY_DATABASE_URL or PLATFORM_DATABASE_URL, those proxy services will fail startup/health. Mirror the dev compose pattern: add a Postgres dependency, set PROXY_DATABASE_URL to the platform Postgres database, and remove the obsolete proxy-data volume if it is no longer used.

  3. Platform compose needs explicit consideration. distro/docker-compose.platform.yml already has postgres and gives PLATFORM_DATABASE_URL to the platform service, but the proxy service does not receive that URL. Because platform-created user containers default to PROXY_URL=http://proxy:8080 and ANTHROPIC_BASE_URL=http://proxy:8080, a dead proxy breaks platform-created runtimes, not just proxy metrics.

  4. Add the mandatory backend PR body Invariants section:

    • Source of truth
    • Lock/transaction scope
    • Acceptable orphan states
    • Auth source of truth
    • Deferred scope
  5. Resolve or explicitly defer the two Greptile findings in the PR body with a linked follow-up if not fixed here: transactional schema migration and the POSTGRES_URL fallback appending /matrixos_platform to a URL that may already include a database name.

Related platform wiring check, possibly pre-existing but worth handling while touching platform compose: the platform orchestrator generates sk-proxy-* API keys from PLATFORM_SECRET, while proxy validation only works if the proxy receives PROXY_SHARED_SECRET or PLATFORM_SECRET. distro/docker-compose.platform.yml currently passes PLATFORM_SECRET to platform but not to proxy. If this PR updates platform proxy env, please verify the proxy shared-secret/admin-token path end-to-end, including platform calls to /usage/summary and user container calls through ANTHROPIC_BASE_URL=http://proxy:8080.

Current merge status I see: head 39a89e0e, Greptile reports 5/5 but with the two noted findings, GitHub CI has one failing unit shard, and Vercel is blocked on deployment authorization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants