From 9be8350bdcabf403b486af8694ab6d1c0ff52e8f Mon Sep 17 00:00:00 2001 From: Justin Chung <20733699+justin13888@users.noreply.github.com> Date: Tue, 2 Jun 2026 22:47:20 -0400 Subject: [PATCH] docs(design): adaptive client cache eviction policy (#23) Specify the automatic, bounded client cache the issue asks for: cache recently-viewed assets, evict least-recently-accessed first, and within a sweep evict in tier-size order (original -> preview -> thumbnail), leaving the metadata/LQIP tier effectively never reclaimed. Extends the existing owner docs rather than adding a new one (SSoT): - filesystem/client.md Space Recovery: split into automatic cache management (budget, recency promotion, LRU, tier order, pin exemption) and user-driven reclamation; add unit validation cases. - import/download-sync.md: cross-link recently-viewed retention to the owner. - DEFERRED.md: record that the policy is specified but implementation is deferred, with its capsule-core::db / capsule-sdk seam. Implementation deferred per the issue's updated scope (documentation only). --- DEFERRED.md | 6 +++++ .../content/docs/design/filesystem/client.md | 23 ++++++++++++++++++- .../docs/design/import/download-sync.md | 2 +- 3 files changed, 29 insertions(+), 2 deletions(-) diff --git a/DEFERRED.md b/DEFERRED.md index b62ce76..9342db3 100644 --- a/DEFERRED.md +++ b/DEFERRED.md @@ -57,6 +57,12 @@ complements the design docs in `capsule-docs/src/content/docs/design/`. `/sync` feed, federation, peering, and the `capsule-sdk` network client. The **pure** refuse-by-default validation invariants those paths need are implemented in `capsule_core::validation` and ready to wire into `capsule-api`. +- The **adaptive cache-eviction policy** (bounded budget, LRU-by-last-access retention of + recently-viewed blobs, tier-ordered eviction original → preview → thumbnail) is specified in + `capsule-docs` [Filesystem — Client → Space Recovery] but **not yet implemented** (issue #23, + scoped to documentation). Seam: last-access tracking lives in `capsule-core::db` + (`library.sqlite`) and the sweep in `capsule-core::library`, driven by `capsule-sdk` + connection-class detection — no core data-plane rework needed to land it. ### ML / AI - Embeddings, `sqlite-vec` vector search, the model registry, semantic/face features, and diff --git a/capsule-docs/src/content/docs/design/filesystem/client.md b/capsule-docs/src/content/docs/design/filesystem/client.md index eecd24c..3efbd24 100644 --- a/capsule-docs/src/content/docs/design/filesystem/client.md +++ b/capsule-docs/src/content/docs/design/filesystem/client.md @@ -52,7 +52,24 @@ SQLite may lag the filesystem after external edits or interrupted operations. Th ## Space Recovery -Rebuildable data is deliberately **not** stored in OS-managed cache locations: the OS evicts indiscriminately, and a thumbnail that is expensive to regenerate is not genuinely disposable. Capsule manages reclamation itself — it surfaces the biggest storage consumers and lets the user selectively delete, and an original that is server-only after eviction is transparently re-fetched on demand. +Rebuildable data is deliberately **not** stored in OS-managed cache locations: the OS evicts indiscriminately, and a thumbnail that is expensive to regenerate is not genuinely disposable. Capsule manages reclamation itself, on two paths — an automatic, bounded cache it keeps within budget on its own, and an explicit user-driven path for deeper reclamation. Either way, an original that is server-only after eviction is transparently re-fetched on demand (see [Import — Tiered, On-Demand Fetch](/design/import/download-sync/#tiered-on-demand-fetch)). + +What is eligible for reclamation is exactly the rebuildable-or-refetchable set: the `cache/` tree (thumbnails, previews, parsed-metadata caches, transcodes) and fetched-but-unpinned originals. The canonical files under `media/` — originals the device itself holds as source of truth, their `.cbor` sidecars, and their `.provenance.cbor` chains — are **never** eviction targets; neither is the rebuildable `index/library.sqlite`, which is dropped and rebuilt only on a schema change. + +### Automatic cache management + +The reclaimable set is held within a **user-configurable cache budget**. When it grows past budget — typically while browsing a large library on a device that cannot hold everything — Capsule reclaims space itself rather than waiting for the user or letting the OS decide: + +- **Recency promotion.** Viewing or opening an asset stamps a last-access time on its fetched representations in `library.sqlite`. Recently-viewed content is therefore the *last* to go, so scrolling back through an album already browsed on a high-latency or metered connection hits local cache instead of the network. +- **LRU eviction.** When over budget, representations are evicted **least-recently-accessed first**, by that last-access stamp — the same bounded-cache discipline the federation layer applies to its rejected-hash table (see [Federation — Soft-Fail Semantics](/design/federation/#soft-fail-semantics)). +- **Tier order within a sweep.** Where recency does not decide it, eviction proceeds in descending size and ascending value: **original → preview → thumbnail**. The metadata tier — the sidecar and its embedded LQIP (see [Thumbnails](/design/thumbnails/)) — is tiny and canonical and is effectively never reclaimed, so an asset always remains listable and previewable at LQIP fidelity even after every heavier representation is gone. +- **Pin exemption.** Representations the user has explicitly pinned for offline use, and originals the device itself uploaded and still owns as source of truth, are exempt from automatic eviction regardless of recency or budget pressure. + +An evicted representation is not lost: the next access transparently re-fetches it through the [tiered fetch](/design/import/download-sync/#tiered-on-demand-fetch) path, under the prevailing connection rules. This keeps the cache faithful to the recovery-first and ephemeral-derived-data [principles](/design/principles/#principles) — nothing here is a source of truth, so reclaiming it is always safe. + +### User-driven reclamation + +Beyond the automatic budget, Capsule surfaces the biggest storage consumers and lets the user selectively delete — for reclaiming below the configured budget, or for dropping pinned content the user no longer wants offline. This path can release pinned representations the automatic sweep would not, but it still never touches the canonical `media/` files. ## Validation @@ -61,5 +78,9 @@ Rebuildable data is deliberately **not** stored in OS-managed cache locations: t - **Process lock contention (smoke).** Open the library in process A; attempt to open in process B; assert clean refusal with a structured error. - **Mobile sandbox placement (smoke per platform).** Per-platform test asserts the library is placed in the OS-blessed location for app private storage and survives an app cold-start. - **Local index rebuild from sidecars (smoke).** Populate a library; drop `library.sqlite`; re-open; assert the index is rebuilt and queries return the same results as before. +- **LRU eviction order (unit).** Fill the reclaimable set past the cache budget; assert the least-recently-accessed representations are evicted first and the budget is restored; assert no canonical `media/` file is touched. +- **Tier-order eviction (unit).** With representations of equal recency over budget, assert eviction proceeds original → preview → thumbnail, and that the metadata tier (sidecar + LQIP) is never reclaimed. +- **Recency promotion (unit).** View an asset to stamp its last-access, then trigger an over-budget sweep; assert its representations survive while older ones are evicted. +- **Pin exemption (unit).** Pin a representation for offline use; push the cache over budget; assert the pinned representation survives the automatic sweep and is reclaimable only via the user-driven path. Cross-module case (full library lifecycle: import → upload → restore on a fresh client) is bounded E2E surface in [Module Map](/design/module-map/#e2e-test-surface). diff --git a/capsule-docs/src/content/docs/design/import/download-sync.md b/capsule-docs/src/content/docs/design/import/download-sync.md index 02ff2a2..8e1cba4 100644 --- a/capsule-docs/src/content/docs/design/import/download-sync.md +++ b/capsule-docs/src/content/docs/design/import/download-sync.md @@ -55,7 +55,7 @@ Because every blob is content-addressed, a fetch is skipped entirely when the bl - Prefetch is bounded and predictive — thumbnails for assets just beyond the viewport, the preview for the likely-next asset in a sequence — and is cancelled as soon as the user's focus moves. - Prefetch and any above-tier fetch obey the same connection rules as [Auto Syncing](#auto-syncing): on a metered connection the client fetches only what the user explicitly opens, and defers the rest. -- Fetched-but-unpinned blobs are ordinary cache citizens, subject to [Space Recovery](/design/filesystem/client/#space-recovery); the client transparently re-fetches them on demand if they are evicted. +- Fetched-but-unpinned blobs are ordinary cache citizens, subject to [Space Recovery](/design/filesystem/client/#space-recovery); the client transparently re-fetches them on demand if they are evicted. Recently-viewed content is retained preferentially — so scrolling back through an already-browsed album is served from cache rather than re-fetched — while the bounded, last-access-ordered eviction policy that decides what stays is owned by [Filesystem — Client](/design/filesystem/client/#automatic-cache-management). ## Auto Syncing