Add item field registry and fragment-based split storage#442
Draft
bitner wants to merge 41 commits into
Draft
Conversation
Co-authored-by: Pete Gadomski <pete.gadomski@gmail.com>
Co-authored-by: Pete Gadomski <pete.gadomski@gmail.com>
…ions - Update pgstac-migrate pyproject.toml to require pgpkg>=0.1.1 (includes routine body-change detection) - Regenerate migrations with pgpkg 0.1.1 which correctly includes search/search_query replacements - Suppress unsafe DROP FUNCTION statements for routines that exist in target schema - Fix PGTap test 116 to check column names in alphabetical order (migration adds columns at end) - Update test plan count from 229 to 248 (tests added for GC, context_count, statslastupdated) - Validate migration chain end-to-end with all tests passing - All precommit hooks passing (migrations, pgtap, pypgstac)
- expand pgstac-migrate README with full CLI/API/env var docs and troubleshooting - make psycopg[binary] mandatory in pgstac-migrate and pypgstac - make psycopg-pool mandatory in pypgstac - remove redundant psycopg optional/group wiring and update test script flags - remove pgstac-migrate upper bound in pypgstac dependency - update release workflow paths and uv setup/build step - refresh docs/changelog references for pgpkg>=0.1.1 - regenerate uv lockfiles
…ash-and-dead-code-rerun # Conflicts: # src/pgstac-migrate/pyproject.toml # src/pgstac-migrate/uv.lock
…d-code-rerun # Conflicts: # .github/instructions/scripts.instructions.md # .gitignore # AGENTS.md # CLAUDE.md # src/pgstac/migrations/pgstac--0.9.11--unreleased.sql
…ge columns and registry sampling Phase 1+2+3+4 combined: - Add item_fragments table (collection, hash, content) for fragment dedup - Add item_field_registry table (collection, path, is_leaf, value_kinds) for field discovery - Extend items table with split columns: bbox, links, assets, properties, extra, fragment_id FK - Add 6 promoted float8 columns: eo_cloud_cover, eo_snow_cover, gsd, view_off_nadir, view_sun_azimuth, view_sun_elevation - Update content_dehydrate() to populate split columns with dual-write to legacy content - Update content_hydrate() to prefer split columns over legacy content when populated - Add jsonb_field_rows() recursive walker (IMMUTABLE PARALLEL SAFE) - Add update_field_registry_from_sample() for batch path registration - Add update_field_registry_from_items() with BERNOULLI(5%)/LIMIT 1000 sampling - Add refresh_field_registry() maintenance function for path aging - Add extract_fragment(), pgstac_hash_fragment(), get_or_create_fragment() with dedup - Add gc_fragments() maintenance function for unused fragment cleanup - Extend staging trigger to assign fragment_id and queue registry updates via run_or_queue
…egistry functions - Add item_fragments table for deduplicated fragment storage (collection, hash, content) - Add item_field_registry table for tracking JSONB field paths per collection - Add split columns to items table (bbox, links, assets, properties, extra, eo_cloud_cover, eo_snow_cover, gsd, view_off_nadir, view_sun_azimuth, view_sun_elevation, fragment_id) - content_dehydrate: populates split columns + dual-write legacy content field - content_hydrate: prefers split columns when fragment_id IS NOT NULL - Batch fragment ops in staging trigger: O(1) bulk INSERT + UPDATE JOIN - extract_fragment: pure SQL function (IMMUTABLE PARALLEL SAFE) - get_or_create_fragment: INSERT-first 2-query pattern - gc_fragments: single set-based DELETE (no FOR LOOP) - jsonb_field_rows: recursive JSONB path walker with max_depth guard - update_field_registry_from_sample: batch UPSERT from caller-supplied array - update_field_registry_from_items: BERNOULLI(5) sampling for large tables - refresh_field_registry: expire stale paths older than retention_interval - items_before_update_trigger: WHEN guard prevents re-hashing on non-content updates - Fix staging trigger DELETE: use TG_TABLE_NAME instead of hard-coded items_staging - Migration: fix items_fragment_id_fkey to not use NOT VALID (unsupported on partitioned tables)
…istry-and-fragments # Conflicts: # src/pgstac/migrations/pgstac--0.9.11--unreleased.sql # src/pgstac/migrations/pgstac--unreleased.sql # src/pgstac/pgstac.sql # src/pgstac/sql/003a_items.sql # src/pgstac/tests/pgtap/003_items.sql
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Validation
scripts/runinpypgstac test --pgtap --basicsqlscripts/runinpypgstac test --pypgstacscripts/runinpypgstac --no-cache --build test --migrationsNotes
origin/maininto this branch after PR2 landed