fix(ci): patch run-tests.php getTimer() to return float instead of comma-formatted string#3856
fix(ci): patch run-tests.php getTimer() to return float instead of comma-formatted string#3856Leiyks wants to merge 16 commits into
Conversation
|
a52010c to
fe39d11
Compare
Benchmarks [ tracer ]Benchmark execution time: 2026-06-03 12:16:25 Comparing candidate commit 88460ed in PR branch Found 1 performance improvements and 1 performance regressions! Performance is the same for 192 metrics, 0 unstable metrics.
|
a6282d7 to
3f2de5f
Compare
…= 7.2) Lists tests slower than 10s with their durations at the end of the run. The --show-slow flag only exists in PHP 7.2+, so it's gated by a runtime PHP_VERSION_ID check to avoid breaking PHP 7.0/7.1 jobs. Used to verify the hypothesis that recent CI failures are caused by tests exceeding 1000s under valgrind (triggering the number_format E_WARNING in run-tests.php's record() method).
…lgrind tests PHP's run-tests.php getTimer() returns number_format($elapsed, 4) which produces a comma-formatted string (e.g. "1,500.0000") when a test takes >=1000 seconds. PHP 8.0+ raises E_WARNING when that string is used in += arithmetic inside record(), causing the test runner to abort with exit code 2. Patch getTimer() at build time so it returns a float (via round()) instead.
The CI .base_test before_script runs wait-for-service-ready.sh, but WAIT_FOR only listed test-agent — request-replayer was never gated on, even though it is in the agent_httpbin_service() service block used by test_extension_ci, ASAN test_c, and ASAN test_c with multiple observers. Add a request-replayer case to detect_service_type (probing /replay, a read-only endpoint that returns valid JSON when php -S is up), and add request-replayer:80 to WAIT_FOR for the three affected job blocks. PHP Language Tests is unchanged: it only declares test-agent in services.
curl -sf fails on 4xx, which would falsely report not-ready if /replay returns 4xx when no dump exists yet (the natural startup state). Drop -f so any HTTP response proves php -S is up and executing index.php; connection failure remains the only signal of an unhealthy service.
Filters the SLOW TEST SUMMARY to tests that meaningfully approach the per-test 300s timeout, trimming noise from the long tail.
On PHP 8.4 the test sandbox integration registers early enough that the worker bundles it into the app-started payload's `integrations` field (see libdd-telemetry build_app_started), so no separate app-integrations-change event is emitted and the existing assertion loop times out. Accept both delivery paths: the dedicated app-integrations-change event, and the integrations array embedded in app-started. The expected output shape is identical because Rust Integration struct field order matches.
…ix flaky PHP 8.0 valgrind The test guarded a sleep(20) with `if (ini_get(...) != 0.5)`, but the wrapped sleep() blocks SIGVTALRM during the call. Under valgrind on PHP 8.0, the heavier signal traffic can cut the sleep short before the sidecar fetches the config from the request-replayer and propagates it to INI globals, leaving every dynamic value at its default. Replace the single sleep() with a 100ms polling loop bounded at 20s total. Each usleep() iteration goes through the same SIGVTALRM-blocking wrapper, which calls ddtrace_check_for_new_config_now() on unblock under valgrind, giving the test repeated, well-defined opportunities to apply the config while still finishing early on the happy path.
The previous warmup was a no-op:
$start = microtime(true);
...
usleep(floor(microtime(true) - $start) * 100000);
floor() of a sub-second float yields 0, so usleep(0) did nothing. Also,
the sleep happened *after* the span had already been sampled with an
empty env, so even a real delay wouldn't have helped.
Replace it with dd_trace_internal_fn('await_agent_info') called before
opening the span — same pattern that client_side_stats*.phpt already
uses to gate on the sidecar having read /info from the agent.
5bbd9e6 to
d26b162
Compare
…c on timeout
When a waitFor* helper exhausts maxIteration, dump to stdout:
- The session token in use
- The HTTP status code returned by the replayer for that session
- The HTTP status + body length when the same endpoint is queried
without the session token
This closes the visibility gap on flaky CI failures where the test
output currently shows only a generic 'HTTP request failed!' warning
followed by 'process timed out'. With the diagnostic, the next failing
run will tell us whether the replayer returned 4xx/5xx, whether the
expected data was present under a different session bucket, or whether
the connection failed at the transport layer.
Also suppresses the per-retry file_get_contents warning (the @-prefix)
so flaky polling no longer pollutes test stdout with N warning lines
during retries — only the final structured diagnostic appears on
genuine timeouts.
No-op on passing tests (the diagnostic path runs only just before throw).
…before run-tests external kill Previous instrumentation only emitted the diagnostic when the iteration count reached 500. Under valgrind each iteration is 10-20x slower than nominal, so the iteration-cap path may take 250s+ wall-clock — past run-tests.php's per-test external kill timer. The diagnostic never fires because the process is killed first. Add an explicit wall-clock budget (60s default) that aborts the wait and dumps the diagnostic well before the external kill. Pass elapsed seconds through to the diagnostic so we can see how much wall-time was actually spent waiting.
The first diagnostic that fired on this PR revealed two open questions
it couldn't answer:
1. Of the bytes the replayer returns for the test session, which URIs
are present? (If the response is all telemetry, the test filter
strips it and replayRequest() returns empty even though /replay
returned 200 with a non-empty body.)
2. Are individual /replay calls slow, or is the slowness elsewhere in
the loop?
Extend the diagnostic to:
- JSON-decode each /replay response and emit a URI=count summary,
so we can tell whether the test's expected payload kind (e.g.
/v0.4/traces) actually arrived.
- Time each of the diagnostic's own /replay calls (fetch_ms) so we
can compare against the wall-time the wait loop spent — if the
diagnostic's own calls are fast (say <100ms) but the loop took 30s
per attempt, the bottleneck is not the HTTP call.
…rs fail
Two probes used to diagnose the remaining valgrind flake classes:
1. await_probe_installation (live_debugger.inc): add a 30s wall-clock
cap alongside the 15000-iteration cap so this fires before run-tests
external timeout under valgrind-stretched iterations. On incomplete
install (loop exited without seeing enough hook IDs), dump
get_loaded_remote_configs to show which RC configs ARE loaded on
the extension side, plus elapsed wall time and remaining probe count.
No-op for tests where the probe installs in time.
2. client_side_stats_trace_filters.phpt: before the 'in traces:' /
'in stats:' dumps, snapshot dd_trace_internal_fn('get_agent_info')
to show whether the filter_tags / filter_tags_regex / ignore_resources
rules from the PUT /set-agent-info in SKIPIF actually propagated to
the extension. If they didn't, the trace filter regression has its
root cause upstream of the filter-application code. Switches EXPECT
-> EXPECTF so the agent_info JSON is matched with a %s wildcard
regardless of contents (passing runs still pass).
…uting EXPECT output The PROBE-DIAG/RR-DIAG/FILTER-DIAG dumps were added to investigate flaky valgrind failures. They served their purpose (identified replayer session-token isolation leaks and probe-install regressions) but a false-positive in the await_probe iteration heuristic flipped 2 otherwise-passing tests to failing (debugger_metric_probe on 7.4, debugger_log_probe_process_tags on 8.1). PHPT merges stderr into stdout, so routing diagnostics to stderr would not help. Keep the wall-clock cap added in 2f7181f — that prevents run-tests.php's external per-test kill from firing before our throw on valgrind-stretched runs, which was the real fix.
Picks up libdatadog leiyks/instrument-info-fetcher (170c989b7): "test(sidecar): log every /info poll when an endpoint test_token is set" When fetch_endpoint.test_token.is_some() (i.e. we are in a PHPT/test context), the AgentInfoFetcher now emits one warn! line per /info poll into sidecar.log capturing endpoint URL, token, elapsed_ms, and the fetch result (SameState | NewState hash_prefix=... | Err(...)). Production behaviour (test_token == None) is unchanged. No polling interval or HTTP timeout was touched. This is diagnostic-only and exists so the next CI run gives per-test, per-poll evidence to pin down which failure mode is hitting agent_info-dependent flakes (dd_trace_agent_env, dynamic_config_*, client_side_stats_*, ffe/*, debugger_*) instead of fixing on hypothesis.
Picks up libdatadog leiyks/instrument-info-fetcher tip db205a39c: fix(sidecar): poll /info every 1s instead of 60s when test_token is set Per the per-poll instrumentation added in 170c989b7, 6 of the 13 failing tests on the previous pipeline (sha 88460ed) hit the same first-poll race: the AgentInfoFetcher's first poll lands on `{}` (the replayer's pre-PUT response, sha256 prefix 44136fa355b3678a), caches that state, and the next poll is 60s away — well past any test's await_agent_info budget. The fast-retry path drops that gap to 1s while `test_token` is set; production is unaffected. Tests this directly fixes (empty `{}` on first poll, observed in sidecar.log of pipeline 88460ed): - ffe/remote_config_lifecycle (PHP 7.4, 8.0) - live-debugger/log_probe (PHP 7.4) - live-debugger/span_decoration_probe (PHP 7.4) - remote-config/dynamic_config_update (PHP 8.0) - live-debugger/enable_dynamic_config (PHP 8.3) The remaining 7 failures in that pipeline are distinct issues (replayer /replay data missing, /v0.6/stats wait timeout, sidecar trace flush timeout, dd_trace_agent_env env-still-empty-despite-non-empty-info) and are NOT addressed by this commit.
What
sedpatch to$(BUILD_DIR)/run-tests.phpat build time, replacingnumber_format(withround(insidegetTimer()Why
PHP's
run-tests.phpgetTimer()returnsnumber_format($elapsed, 4), which produces a comma-formatted string (e.g."1,500.0000") when a test takes ≥1000 seconds. PHP 8.0+ raisesE_WARNING: A non-numeric value encounteredwhen that string is used in+=arithmetic insiderecord(). The main test runner process treats this as a fatal worker error and exits with code 2.This started surfacing ~1 week ago because commit fdbad29 moved
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=0fromALL_TEST_ENV_OVERRIDE(covers all test targets, including valgrind) intoENV_OVERRIDE(covers only the non-valgrind run). The valgrind run now executes with process tags enabled, making live-debugger and sidecar-related tests 5–20× slower under valgrind's instrumentation, pushing some past the 1000s threshold.How
We already patch
run-tests.phpat build time for an unrelateddl()issue. This adds a second patch in the same$(BUILD_DIR)/run-tests.phpMakefile target:round($elapsed, 4)returns afloat(e.g.1999.5678) instead of a comma-formatted string, which is safe for+=arithmetic. The JUnit XMLtime='...'attribute also benefits —1999.5678is more spec-compliant than1,999.5678.Verified the sed pattern matches both PHP 8.4.19 (line 3441) and PHP 8.5.0 (line 3433).
Testing
test_extension_ci: [8.4]andtest_extension_ci: [8.5]valgrind runs no longer abort withE_WARNING: A non-numeric value encountered in run-tests.php