Skip to content

Stream provisioning metric changes#1014

Open
dhananjay-sawner wants to merge 7 commits into
masterfrom
dsawner/provisioning-time-metric
Open

Stream provisioning metric changes#1014
dhananjay-sawner wants to merge 7 commits into
masterfrom
dsawner/provisioning-time-metric

Conversation

@dhananjay-sawner
Copy link
Copy Markdown
Collaborator

@dhananjay-sawner dhananjay-sawner commented May 19, 2026

Summary

  • timeToReadyMs histogram (Coordinator): records the duration between datastream creation (system.creation.ms) and the INITIALIZING -> READY transition. Uses a 15-minute sliding window
    reservoir; exposed via the standard BrooklinHistogramInfo attributes (Count, Max, Mean, p50/p75/p95/p99, etc.). Enables alerting on long provisioning via Max.
  • slowProvisioningCount counter (Coordinator): incremented whenever a datastream's time-to-ready exceeds a configurable threshold.
  • New config brooklin.server.coordinator.slowProvisioningThresholdMs: controls the threshold above which a provisioning is counted as slow. Default 10 minutes. Threshold lives in config, so brooklin-server can tune per fabric without a library bump.

Testing Done

  • testTimeToReadyHistogramOnCreate — histogram count increases and sample is non-negative after INITIALIZING -> READY
  • testTimeToReadyHistogramNotEmittedOnResume — histogram count does not change on STOPPED -> READY resume
  • testSlowProvisioningCounterIncrementsWhenAboveThreshold — with threshold = 0 ms, counter increments after stream goes READY
  • testSlowProvisioningCounterDoesNotIncrementWhenBelowThreshold — with threshold = 1 h, counter stays unchanged after stream goes READY
  • mint build

Dhananjay Sawner added 7 commits May 9, 2026 14:39
Captures the duration between datastream creation (system.creation.ms)
and the INITIALIZING -> READY transition in handleDatastreamAddOrDelete.
Aggregate-only sliding-window histogram (15 min) on the Coordinator
module, registered via BrooklinHistogramInfo in CoordinatorMetrics so
the metrics bridge surfaces it (per #945 precedent).
Resume from STOPPED bypasses this code path (DatastreamResources.resume
sets READY directly), so the metric only fires on initial creation.
Adds an INFO log line in recordTimeToReadyMs alongside the histogram
update so per-stream provisioning times are recoverable from logs.
The histogram aggregates across the sliding window and cannot expose
individual sample values; the log line preserves per-datastream
attribution for retrospective analysis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant