Enable PHP FFE evaluation metric system tests#7033
Draft
leoromanovsky wants to merge 1 commit into
Draft
Conversation
Contributor
|
|
|
This was referenced May 28, 2026
a412f4f to
fbc9822
Compare
gh-worker-dd-mergequeue-cf854d Bot
pushed a commit
to DataDog/libdatadog
that referenced
this pull request
Jun 1, 2026
## Motivation PHP FFE evaluation metrics need a native path for aggregation, OTLP encoding, and delivery without building PHP OTLP writer/transport machinery. The shared design doc is the cross-PR reference: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0 This PR is metric-only. Exposures remain in #2026 so reviewers can evaluate OTLP metric delivery independently from exposure cache semantics. ## Changes This adds caller-driven FFE evaluation metric sidecar actions and OTLP export for `feature_flag.evaluations`. The reusable FFE-domain pieces now live in `datadog-ffe` behind the `evaluation-metrics` feature: evaluation metric input types, metric attribute normalization, aggregation by matching attribute sets, and OTLP/protobuf payload encoding. `datadog-sidecar` keeps only sidecar-specific work: parsing the configured endpoint URL, building the HTTP request, applying the timeout, logging delivery failures, and integrating with sidecar lifecycle/actions. The PHP companion PR uses this from native/C code for raw `DDTrace\ffe_evaluate` calls and from a thin PHP OpenFeature adapter for final OpenFeature-aware results. PHP does not aggregate, encode, or transport OTLP payloads. Current PHP MVP path: ```mermaid flowchart LR Eval["PHP evaluation<br/>raw API or OpenFeature adapter"] Record["PHP tracer native call<br/>record typed evaluation metric"] Action["sidecar action<br/>record FFE evaluation metrics"] Domain["datadog-ffe<br/>feature: evaluation-metrics<br/>attributes + aggregation + OTLP encoder"] Sidecar["shared sidecar<br/>metric flush lifecycle"] Collector["OTLP endpoint<br/>Agent or local collector"] Intake["feature_flag.evaluations"] Eval --> Record Record --> Action Action --> Domain Domain --> Sidecar Sidecar --> Collector Collector --> Intake ``` Future Python/Ruby connection: ```mermaid flowchart LR PyToday["dd-trace-py today<br/>OpenFeature hook + host metric writer"] RbToday["dd-trace-rb today<br/>OpenFeature hook + host metric writer"] PyFuture["dd-trace-py future<br/>explicit native opt-in"] RbFuture["dd-trace-rb future<br/>explicit native opt-in"] Native["libdatadog caller-driven<br/>FFE metric action"] Shared["shared sidecar<br/>aggregation + OTLP delivery"] Otlp["OTLP endpoint"] PyToday -. "current host metric path" .-> Otlp RbToday -. "current host metric path" .-> Otlp PyFuture -. "after ownership switch" .-> Native RbFuture -. "after ownership switch" .-> Native Native --> Shared Shared --> Otlp ``` The future Python/Ruby arrows are intentionally not active behavior in this PR. They show the reusable target for a later migration while preserving today's host-language metric writers. Why Python/Ruby do not double count today: - Python and Ruby use libdatadog for evaluation only; the evaluator returns assignment metadata and does not record `feature_flag.evaluations` as a side effect. - This PR adds a separate caller-driven sidecar action. Metric emission happens only when an SDK explicitly records a typed evaluation metric into that action. PHP wires this in its companion PR; Python and Ruby do not. - Python and Ruby therefore keep exactly their current host-language OpenFeature metric writers. They are not also sending evaluation metrics through this native sidecar path. - Evaluation metrics intentionally count every evaluation and do not have exposure-cache deduplication semantics. Future Python/Ruby migration must switch ownership to native logging and disable/bypass the host metric writer for the same evaluations. Reference implementation check: dd-trace-java's canonical metric path is OpenFeature hook based. Java's `Provider` creates `FlagEvalMetrics` and returns a `FlagEvalHook`; the hook runs in `finallyAfter`, reads the final OpenFeature `FlagEvaluationDetails` including flag key, variant, reason, error code, and allocation metadata, and records one `feature_flag.evaluations` counter. Application code only calls OpenFeature; it does not call a metric API. PHP mirrors that canonical OpenFeature shape. The PHP OpenFeature provider disables raw native metric recording while it asks the native evaluator for an assignment, then records exactly one final OpenFeature-aware metric through the Datadog-owned recorder. The raw Datadog PHP client has no direct Java equivalent, but it keeps the same SDK-owned ergonomics: normal evaluation APIs record one native metric per evaluation internally. For future Python/Ruby migration, the same rule applies: either keep the existing host-language OpenFeature metric hook, or switch ownership to the native recorder and disable/bypass the host metric writer for those evaluations. ## Decisions No telemetry is emitted automatically from shared libdatadog evaluator calls. SDKs must explicitly enqueue FFE telemetry actions. This avoids double counting for Python/Ruby, which currently log feature-flag telemetry in host-language code. Evaluation metrics intentionally count evaluations and do not use exposure-cache deduplication semantics. Future Python/Ruby migration must be an ownership switch, not an additional writer. If those SDKs opt into this native metric path, their host-language OpenFeature metric writers must stop recording the same evaluations. ## Validation Current head (`96d9a7bae`) local validation: ```sh cd /Users/leo.romanovsky/go/src/github.com/DataDog/libdatadog-ffe-sidecar-metrics cargo fmt --check cargo test -p datadog-ffe --features evaluation-metrics telemetry::evaluation_metrics cargo test -p datadog-sidecar ffe_metric cargo check -p datadog-ffe cargo check -p datadog-sidecar-ffi ``` Results: datadog-ffe metric tests passed (2 passed), sidecar metric tests passed (6 passed), default datadog-ffe check passed, sidecar FFI check passed, fmt check passed with only the repo stable-rustfmt warnings. Prior downstream PHP behavior validation before the reusable-crate refactor, from DataDog/dd-trace-php#3911 using this PR at `1f1fca439`: ```text ffe-dogfooding subject=php-3911-split-1779981881 php7_metrics=3 php8_metrics=3 php7_exposures=0 php8_exposures=0 ``` System-tests downstream validation: ```sh TEST_LIBRARY=php ./run.sh FEATURE_FLAGGING_AND_EXPERIMENTATION tests/ffe/test_flag_eval_metrics.py -vv ``` Result: 17 passed in 81.26 seconds. Related PRs: DataDog/dd-trace-php#3906, DataDog/dd-trace-php#3911, #2026, DataDog/system-tests#7033. Co-authored-by: leo.romanovsky <leo.romanovsky@datadoghq.com>
fbc9822 to
581f541
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
PHP FFE evaluation metrics need system-test coverage before DataDog/dd-trace-php#3911 can be treated as review-ready. The PHP runtime now records
feature_flag.evaluationsthrough native/libdatadog-sidecar code, so we need the shared FFE metric tests enabled against a locally built PHP tracer artifact rather than relying only on PHPT coverage.Design doc: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0
Changes
This PR enables
tests/ffe/test_flag_eval_metrics.pyfor PHP atv1.21.0-devin the manifest only. It intentionally does not enable exposure tests; exposure validation is tracked separately in #7031 so each PHP milestone can be reviewed independently.The PR is stacked on #7003, which adds the PHP FFE scaffold and evaluation-test enablement. The PHP behavior being validated here is implemented in DataDog/dd-trace-php#3911 and the sidecar metric delivery support is implemented in DataDog/libdatadog#2052.
Decisions
The scope is deliberately one manifest activation. The shared metric tests cover the expected
feature_flag.evaluationsbehavior, including successful evaluations, missing flags, malformed/empty RC handling, disabled flags, type mismatches, targeting keys, and allocation metadata.The local proof uses a matched PHP 8.2 NTS artifact and the
php-fpm-8.2weblog, which matches the PHP 8.2 target Bob recommended for FFE system-test iteration.Related PRs
Validation
Local behavior validation before this conflict-resolution rebase (previous system-tests PR head
a412f4f6), dd-trace-php metric branch now pushed asdc19ce479, andlibdatadogsubmodule96d9a7bae:cd /Users/leo.romanovsky/go/src/github.com/DataDog/dd-trace-php-ffe-metrics-restack ./tooling/bin/build-debug-artifact gnu-aarch64-8.2-nts \ /Users/leo.romanovsky/go/src/github.com/DataDog/system-tests-pr7033/binariescd /Users/leo.romanovsky/go/src/github.com/DataDog/system-tests-pr7033 DOCKER_CONFIG=/tmp/system-tests-docker-config-nocredsstore \ SYSTEM_TEST_BUILD_TIMEOUT=1800 \ ./build.sh --library php --weblog-variant php-fpm-8.2Result:
17 passed in 81.28s.Post-rebase verification: branch head is now
fbc98222a, stacked on the updated parent, enablingtests/ffe/test_flag_eval_metrics.pyatv1.21.0-devto match #7003.PATH=/opt/homebrew/opt/coreutils/libexec/gnubin:$PATH ./format.shpassed. The rebase only changed the manifest activation floor; the test behavior and local PHP artifact path above did not change.MacOS arm64 local-run note: the published PHP 8.2 base image inspected locally was amd64-only, so the local validation reused the locally built arm64
datadog/system-tests:php-fpm-8.2.base-v1. This PR remains only the manifest activation.