Skip to content

find_central_symbols runs PageRank on an apparent 100000-edge projection without warning on larger repos #14

@Regis-RCR

Description

@Regis-RCR

TL;DR: find_central_symbols looks like it caps its PageRank projection at 100000 edges and does not flag the truncation. On a repo with 147589 edges it ran on 100000 of them, so the centrality scores are computed on a partial graph and the result looks complete.

Environment

  • memtrace 0.6.0, darwin-arm64
  • engine in remote mode on 127.0.0.1:50051
  • repo repo-e, indexed, 147589 edges

Repro

find_central_symbols(repo_id=repo-e, limit=8)
  -> {"algorithm": "pagerank", "edge_count": 100000, "node_count": 33493, ...}

get_repository_stats(repo_id=repo-e) -> total_edges: 147589
list_indexed_repositories                      -> repo-e edges: 147589

edge_count: 100000 in the result, 147589 real edges in the graph. 47589 edges (32 percent) appear excluded from the projection.

Expected

Either PageRank runs on the full edge set, or the result clearly signals that it ran on a truncated projection (a truncated: true flag, an edges_considered versus edges_total pair, or a warning).

Actual

The cap is silent. edge_count: 100000 reads like the repo has 100000 edges, not like a ceiling was hit. The round number strongly suggests an apparent 100000-edge ceiling rather than a coincidence, though the code is what would confirm it. Centrality is a load-bearing metric (it tells an agent which symbols carry the highest refactor blast radius), and on this repo (and any repo above the apparent cap, if it is fixed) it is computed on a partial graph with no indication.

Ask

Surface the truncation (truncated / edges_considered / edges_total), or lift the cap for centrality. The explicit truncation note matters more than the cap itself: a documented partial result is usable, a silent one is not. Happy to send a PR for the flag.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions