Graph exports can retain stale query URL nodes after rerunning with --no-query

## Summary

When rerunning a site crawl with a different URL normalization policy, stale URLs from previous snapshots can still appear in graph exports for the latest snapshot.

A concrete example is rerunning a crawl with `--no-query`: query-string variants such as `/book-a-call?intent=...` can continue to appear as nodes in the exported graph if they were discovered in an earlier snapshot.

## Reproduction

1. Crawl a site where internal links include query-string CTA URLs:

   ```bash
   crawlith crawl https://example.com/ --limit 60 --depth 2
   ```

2. Rerun the same site with query stripping enabled:

   ```bash
   crawlith crawl https://example.com/ --limit 60 --depth 2 --no-query
   ```

3. Export the latest graph:

   ```bash
   crawlith export https://example.com/ --export json,csv --output /tmp/crawlith-export
   ```

4. Inspect `graph.json` or `nodes.csv`.

## Expected behavior

The latest snapshot export should reflect the latest crawl's normalization policy. If the latest crawl used `--no-query`, graph nodes for query-string URL variants should not remain from older snapshots.

## Actual behavior

Older query-string URL variants can remain in the exported graph because snapshot page loading includes pages first seen in earlier snapshots, even when they were not seen in the selected/latest snapshot.

## Why this matters

For SEO/internal-link graphing, stale query nodes make visualizations and link metrics noisy. They can inflate orphan/low-inlink counts and make `--no-query` appear ineffective even though the crawler is normalizing newly discovered links correctly.

## Proposed fix

Scope snapshot page queries to pages whose `last_seen_snapshot_id` matches the selected snapshot, while preserving `single` snapshot behavior. This keeps graph exports aligned to the current crawl rather than all historical pages for the site.

I'm happy to open a small PR with a repository-level test covering this behavior.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Graph exports can retain stale query URL nodes after rerunning with --no-query #103

Summary

Reproduction

Expected behavior

Actual behavior

Why this matters

Proposed fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Graph exports can retain stale query URL nodes after rerunning with --no-query #103

Description

Summary

Reproduction

Expected behavior

Actual behavior

Why this matters

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions