Skip to content

Add reproducible Planetary Computer benchmark fixtures and CI/manual benchmark reporting for load-path comparison#445

Merged
bitner merged 4 commits into
v010-pr1-load-path-corefrom
copilot/add-benchmark-fixtures-ci-reporting
May 21, 2026
Merged

Add reproducible Planetary Computer benchmark fixtures and CI/manual benchmark reporting for load-path comparison#445
bitner merged 4 commits into
v010-pr1-load-path-corefrom
copilot/add-benchmark-fixtures-ci-reporting

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 20, 2026

This PR adds reproducible benchmark fixtures for naip, sentinel-2-l2a, and landsat-c2-l2 (1000 items each), and wires benchmark execution/reporting so split-column+fragment behavior can be compared over time and against main. It introduces deterministic fixture materialization, machine-readable benchmark outputs, and dedicated CI/manual workflows for artifacted reporting.

  • Fixture source of truth (deterministic, reproducible)

    • Added benchmarks/fixtures/planetary-computer/manifest.json defining:
      • API endpoint
      • target collections
      • exact per-collection item count (1000)
      • deterministic search settings
    • Added fixture docs and layout under benchmarks/fixtures/planetary-computer/ with generated data intentionally excluded from git.
  • Fixture materialization + validation

    • Added scripts/benchmark_fetch_pc_fixtures.py to:
      • fetch collection docs and items from Planetary Computer
      • enforce exact cardinality and collection membership
      • detect duplicate IDs
      • emit fixture SHA summaries for traceability.
  • Benchmark execution + machine-readable outputs

    • Added scripts/benchmark_run.py to run per-collection:
      • ingest/dehydrate timing
      • hydrate timing
      • storage sizing (table/index/total + global schema/fragment bytes)
    • Outputs JSON/CSV/Markdown artifacts suitable for CI retention and trend comparison.
  • Base-vs-head comparison reporting

    • Added scripts/benchmark_compare_results.py to diff two benchmark JSON reports and emit:
      • comparison JSON
      • tabular CSV
      • markdown summary with deltas and percentages.
  • CI/manual workflows for artifacted benchmarks

    • Added .github/workflows/benchmark-fixtures.yml:
      • lightweight fixture-based benchmark run on PR/schedule/manual trigger
      • uploads raw + summarized artifacts
    • Added .github/workflows/benchmark-compare.yml:
      • manual base-ref vs head comparison using same fixtures
      • publishes markdown summary and artifact bundle
    • Added explicit minimal workflow permissions.
  • Docs/instructions updates

    • Updated CLAUDE.md, AGENTS.md, and .github/instructions/scripts.instructions.md with the new benchmark entrypoints and workflow intent.
    • Updated CHANGELOG.md to record benchmark fixture/reporting additions.
uv run --no-project --with psycopg[binary] python scripts/benchmark_fetch_pc_fixtures.py \
  --manifest benchmarks/fixtures/planetary-computer/manifest.json \
  --output-dir /tmp/pgstac-benchmark-fixtures

uv run --no-project --with psycopg[binary] python scripts/benchmark_run.py \
  --fixtures-dir /tmp/pgstac-benchmark-fixtures \
  --repo-root "$PWD" \
  --label head \
  --output-dir /tmp/pgstac-benchmark-results

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • apt.postgresql.org
    • Triggering command: /usr/lib/apt/methods/http /usr/lib/apt/methods/http (dns block)
  • astral.sh
    • Triggering command: /usr/bin/curl curl -LsSf REDACTED git conf�� amd64.deb 4.2.0-19_amd64.deb ker/cli-plugins/docker-compose credential.usern/usr/bin/python3.13 (dns block)
  • planetarycomputer.microsoft.com
    • Triggering command: /usr/bin/python python - (dns block)
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Add reproducible benchmark fixtures and CI/manual benchmark reporting for the split-column + fragment work in PR #444.

Repository: stac-utils/pgstac
Branch to target: v010-pr1-load-path-core

Goals:

  1. Add benchmark fixtures containing approximately 1000 items each, plus the collection documents, for these Microsoft Planetary Computer STAC collections:
    • naip
    • sentinel-2-l2a
    • landsat-c2-l2
  2. Use those fixtures to benchmark and report comparisons relevant to the old collection base-item approach on main versus the new instantiated-column + fragment approach on this branch.
  3. Add CI/manual workflows and scripts so benchmark results are captured as artifacts and can be compared over time.

Important constraints and expectations:

  • The user explicitly wants fixtures of 1000 each for landsat, sentinel, and naip collections and items from the Planetary Computer STAC API.
  • If committing 3000 raw items directly is too large or awkward, the agent should choose the best practical approach while still satisfying the request as closely as possible. Preferred order:
    1. Commit fixed benchmark fixtures into the repo in a dedicated benchmark fixtures location.
    2. If that is too large, commit a deterministic fixture manifest and a fetch script that can materialize exactly 1000 items per collection from Planetary Computer, and wire CI/manual workflows to generate them reproducibly.
  • Benchmarking should cover at least:
    • dehydrate/ingest timing
    • hydrate timing
    • disk/storage size reporting
  • Results should be machine-readable and suitable for CI artifacts, not just NOTICE output.
  • Add a lightweight CI benchmark workflow for stable fixture-based benchmarking and artifact upload.
  • Add a heavier or manual/scheduled benchmark workflow if needed for branch-vs-main comparison.
  • If feasible, add a branch comparison workflow or script that benchmarks current branch versus main using the same fixtures and emits a markdown summary and raw JSON/CSV results.
  • Reuse existing repo infrastructure where possible (.github/workflows/continuous-integration.yml, scripts/test, existing benchmark_dehydrate*.sql, docker-based CI).
  • Keep changes focused on benchmark fixtures/reporting; do not broaden scope unnecessarily.

Suggested deliverables:

  • benchmark fixture location and files for collection docs and item datasets
  • scripts to fetch and/or validate Planetary Computer fixtures
  • SQL benchmark scripts for hydrate + storage plus normalization of dehydrate benchmark outputs
  • shell/python orchestration for running benchmarks and writing reports
  • GitHub Actions workflow(s) to run benchmarks and upload artifacts
  • brief docs/README for how to run the benchmarks locally and in CI

Please create a PR implementing this in stac-utils/pgstac based on branch v010-pr1-load-path-core.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: so an intense review of the fragment approach we are using with a particular eye towards making sure we are squeezing every bit of performance we can. find some sample landsat, sentinel, and naip collections and items from the Microsoft planetary computer stac api so that we can benchmark the hydration and dehydration speeds for the old collection base item approach that is in main vs the instantiated column and fragment approach we are looking at now. give me a report on both the shows differences as well as the size in disk differences for both approaches

Assistant: I reviewed PR #444 and the current main-branch design it is replacing.

Short version

You’re asking for a real benchmark report, not just a code review, and

User: try again

Assistant: I dug into PR #444 again from the URL you sent.

What changed in PR #444

The old main approach stores a collection-level base_item generated from collection metadata, then dehydrates each item by stripping anything already covered by that base item. Historically that base_item contains at least type, stac_version, assets from item_assets, and collection. You can see that in the existing migration history and schema lineage. The collections table on the old path includes a stored base_item generated column. On hydrate, item JSON is rebuilt by merging the per-item stored JSON with that collection base JSON. This is the “collection base item” approach you referred to.

PR #444 removes that model and replaces it with:

  • split top-level item storage into dedicated columns like bbox, links, assets, properties, and extra
  • many promoted scalar/queryable columns such as platform, mission, eo_cloud_cover, proj_epsg, etc.
  • a new item_fragments table for deduplicated shared JSON chunks
  • collection-level fragment_config instead of base_item
  • hydrate logic that reassembles an item from split columns plus an optional fragment row, using jsonb_merge_level1 for assets/properties reassembly
  • staging-trigger fragment extraction and dedup dur...

This pull request was created from Copilot chat.

Copilot AI changed the title [WIP] Add reproducible benchmark fixtures and CI benchmark reporting Add reproducible Planetary Computer benchmark fixtures and CI/manual benchmark reporting for load-path comparison May 20, 2026
Copilot finished work on behalf of bitner May 20, 2026 21:39
Copilot AI requested a review from bitner May 20, 2026 21:39
@bitner bitner marked this pull request as ready for review May 21, 2026 14:27
@bitner bitner merged commit 9251e60 into v010-pr1-load-path-core May 21, 2026
bitner added a commit that referenced this pull request May 27, 2026
…benchmark reporting for load-path comparison (#445)

* Initial plan

* Add benchmark fixture manifests, runners, and CI workflows

Agent-Logs-Url: https://github.com/stac-utils/pgstac/sessions/08cf3ee2-6a60-42ab-b23b-fac084012a4c

Co-authored-by: bitner <164828+bitner@users.noreply.github.com>

* Address benchmark review feedback and tighten workflow permissions

Agent-Logs-Url: https://github.com/stac-utils/pgstac/sessions/08cf3ee2-6a60-42ab-b23b-fac084012a4c

Co-authored-by: bitner <164828+bitner@users.noreply.github.com>

* Polish benchmark scripts after validation feedback

Agent-Logs-Url: https://github.com/stac-utils/pgstac/sessions/08cf3ee2-6a60-42ab-b23b-fac084012a4c

Co-authored-by: bitner <164828+bitner@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: bitner <164828+bitner@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants