Skip to content

Challenge score metric is slow and unstable due to stochastic tie-breaking #5

@NabJa

Description

@NabJa

The current implementation of the challenge score metric uses Monte Carlo tie-breaking with 10**4 permutations to approximate the expected confusion matrix. This has several drawbacks:

  • Computationally expensive (runtime grows as O(num_permutations n log n))
  • Non-reproducible due to randomness
  • Results can vary between runs, which complicates benchmarking and CI.
  • When using bootstrapping to estimate performance distributions (as we did in our paper), the repeated Monte Carlo sampling makes the metric prohibitively slow and effectively unusable.

Proposal:
Replace the sampling with an exact computation using the hypergeometric expectation. This removes stochasticity, guarantees reproducibility, and reduces runtime to O(n log n).

Related PR: #4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions