Skip to content

feat(sidecar): forward FFE evaluation metrics to OTLP intake#2052

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 3 commits into
mainfrom
leo.romanovsky/ffe-sidecar-evaluation-metrics
Jun 1, 2026
Merged

feat(sidecar): forward FFE evaluation metrics to OTLP intake#2052
gh-worker-dd-mergequeue-cf854d[bot] merged 3 commits into
mainfrom
leo.romanovsky/ffe-sidecar-evaluation-metrics

Conversation

@leoromanovsky
Copy link
Copy Markdown
Contributor

@leoromanovsky leoromanovsky commented May 28, 2026

Motivation

PHP FFE evaluation metrics need a native path for aggregation, OTLP encoding, and delivery without building PHP OTLP writer/transport machinery. The shared design doc is the cross-PR reference: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0

This PR is metric-only. Exposures remain in #2026 so reviewers can evaluate OTLP metric delivery independently from exposure cache semantics.

Changes

This adds caller-driven FFE evaluation metric sidecar actions and OTLP export for feature_flag.evaluations.

The reusable FFE-domain pieces now live in datadog-ffe behind the evaluation-metrics feature: evaluation metric input types, metric attribute normalization, aggregation by matching attribute sets, and OTLP/protobuf payload encoding. datadog-sidecar keeps only sidecar-specific work: parsing the configured endpoint URL, building the HTTP request, applying the timeout, logging delivery failures, and integrating with sidecar lifecycle/actions.

The PHP companion PR uses this from native/C code for raw DDTrace\ffe_evaluate calls and from a thin PHP OpenFeature adapter for final OpenFeature-aware results. PHP does not aggregate, encode, or transport OTLP payloads.

Current PHP MVP path:

flowchart LR
    Eval["PHP evaluation<br/>raw API or OpenFeature adapter"]
    Record["PHP tracer native call<br/>record typed evaluation metric"]
    Action["sidecar action<br/>record FFE evaluation metrics"]
    Domain["datadog-ffe<br/>feature: evaluation-metrics<br/>attributes + aggregation + OTLP encoder"]
    Sidecar["shared sidecar<br/>metric flush lifecycle"]
    Collector["OTLP endpoint<br/>Agent or local collector"]
    Intake["feature_flag.evaluations"]

    Eval --> Record
    Record --> Action
    Action --> Domain
    Domain --> Sidecar
    Sidecar --> Collector
    Collector --> Intake
Loading

Future Python/Ruby connection:

flowchart LR
    PyToday["dd-trace-py today<br/>OpenFeature hook + host metric writer"]
    RbToday["dd-trace-rb today<br/>OpenFeature hook + host metric writer"]
    PyFuture["dd-trace-py future<br/>explicit native opt-in"]
    RbFuture["dd-trace-rb future<br/>explicit native opt-in"]
    Native["libdatadog caller-driven<br/>FFE metric action"]
    Shared["shared sidecar<br/>aggregation + OTLP delivery"]
    Otlp["OTLP endpoint"]

    PyToday -. "current host metric path" .-> Otlp
    RbToday -. "current host metric path" .-> Otlp
    PyFuture -. "after ownership switch" .-> Native
    RbFuture -. "after ownership switch" .-> Native
    Native --> Shared
    Shared --> Otlp
Loading

The future Python/Ruby arrows are intentionally not active behavior in this PR. They show the reusable target for a later migration while preserving today's host-language metric writers.

Why Python/Ruby do not double count today:

  • Python and Ruby use libdatadog for evaluation only; the evaluator returns assignment metadata and does not record feature_flag.evaluations as a side effect.
  • This PR adds a separate caller-driven sidecar action. Metric emission happens only when an SDK explicitly records a typed evaluation metric into that action. PHP wires this in its companion PR; Python and Ruby do not.
  • Python and Ruby therefore keep exactly their current host-language OpenFeature metric writers. They are not also sending evaluation metrics through this native sidecar path.
  • Evaluation metrics intentionally count every evaluation and do not have exposure-cache deduplication semantics. Future Python/Ruby migration must switch ownership to native logging and disable/bypass the host metric writer for the same evaluations.

Reference implementation check: dd-trace-java's canonical metric path is OpenFeature hook based. Java's Provider creates FlagEvalMetrics and returns a FlagEvalHook; the hook runs in finallyAfter, reads the final OpenFeature FlagEvaluationDetails including flag key, variant, reason, error code, and allocation metadata, and records one feature_flag.evaluations counter. Application code only calls OpenFeature; it does not call a metric API.

PHP mirrors that canonical OpenFeature shape. The PHP OpenFeature provider disables raw native metric recording while it asks the native evaluator for an assignment, then records exactly one final OpenFeature-aware metric through the Datadog-owned recorder. The raw Datadog PHP client has no direct Java equivalent, but it keeps the same SDK-owned ergonomics: normal evaluation APIs record one native metric per evaluation internally. For future Python/Ruby migration, the same rule applies: either keep the existing host-language OpenFeature metric hook, or switch ownership to the native recorder and disable/bypass the host metric writer for those evaluations.

Decisions

No telemetry is emitted automatically from shared libdatadog evaluator calls. SDKs must explicitly enqueue FFE telemetry actions. This avoids double counting for Python/Ruby, which currently log feature-flag telemetry in host-language code.

Evaluation metrics intentionally count evaluations and do not use exposure-cache deduplication semantics.

Future Python/Ruby migration must be an ownership switch, not an additional writer. If those SDKs opt into this native metric path, their host-language OpenFeature metric writers must stop recording the same evaluations.

Validation

Current head (96d9a7bae) local validation:

cd /Users/leo.romanovsky/go/src/github.com/DataDog/libdatadog-ffe-sidecar-metrics
cargo fmt --check
cargo test -p datadog-ffe --features evaluation-metrics telemetry::evaluation_metrics
cargo test -p datadog-sidecar ffe_metric
cargo check -p datadog-ffe
cargo check -p datadog-sidecar-ffi

Results: datadog-ffe metric tests passed (2 passed), sidecar metric tests passed (6 passed), default datadog-ffe check passed, sidecar FFI check passed, fmt check passed with only the repo stable-rustfmt warnings.

Prior downstream PHP behavior validation before the reusable-crate refactor, from DataDog/dd-trace-php#3911 using this PR at 1f1fca439:

ffe-dogfooding subject=php-3911-split-1779981881
php7_metrics=3 php8_metrics=3
php7_exposures=0 php8_exposures=0

System-tests downstream validation:

TEST_LIBRARY=php ./run.sh FEATURE_FLAGGING_AND_EXPERIMENTATION tests/ffe/test_flag_eval_metrics.py -vv

Result: 17 passed in 81.26 seconds.

Related PRs: DataDog/dd-trace-php#3906, DataDog/dd-trace-php#3911, #2026, DataDog/system-tests#7033.

@github-actions
Copy link
Copy Markdown
Contributor

Clippy Allow Annotation Report

Comparing clippy allow annotations between branches:

  • Base Branch: origin/main
  • PR Branch: origin/leo.romanovsky/ffe-sidecar-evaluation-metrics

Summary by Rule

Rule Base Branch PR Branch Change
expect_used 2 2 No change (0%)
unwrap_used 7 7 No change (0%)
Total 9 9 No change (0%)

Annotation Counts by File

File Base Branch PR Branch Change
datadog-sidecar/src/service/sidecar_server.rs 6 6 No change (0%)
datadog-sidecar/src/service/telemetry.rs 3 3 No change (0%)

Annotation Stats by Crate

Crate Base Branch PR Branch Change
clippy-annotation-reporter 5 5 No change (0%)
datadog-ffe-ffi 1 1 No change (0%)
datadog-ipc 21 21 No change (0%)
datadog-live-debugger 6 6 No change (0%)
datadog-live-debugger-ffi 10 10 No change (0%)
datadog-profiling-replayer 4 4 No change (0%)
datadog-remote-config 3 3 No change (0%)
datadog-sidecar 57 57 No change (0%)
libdd-common 13 13 No change (0%)
libdd-common-ffi 12 12 No change (0%)
libdd-data-pipeline 5 5 No change (0%)
libdd-ddsketch 2 2 No change (0%)
libdd-dogstatsd-client 1 1 No change (0%)
libdd-profiling 13 13 No change (0%)
libdd-telemetry 20 20 No change (0%)
libdd-tinybytes 4 4 No change (0%)
libdd-trace-normalization 2 2 No change (0%)
libdd-trace-obfuscation 3 3 No change (0%)
libdd-trace-stats 1 1 No change (0%)
libdd-trace-utils 13 13 No change (0%)
Total 196 196 No change (0%)

About This Report

This report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 28, 2026

Codecov Report

❌ Patch coverage is 84.28291% with 80 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.07%. Comparing base (f7d471d) to head (96d9a7b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2052      +/-   ##
==========================================
+ Coverage   72.90%   73.07%   +0.16%     
==========================================
  Files         460      462       +2     
  Lines       76396    76939     +543     
==========================================
+ Hits        55696    56220     +524     
- Misses      20700    20719      +19     
Components Coverage Δ
libdd-crashtracker 65.39% <ø> (+0.16%) ⬆️
libdd-crashtracker-ffi 36.82% <ø> (ø)
libdd-alloc 98.77% <ø> (ø)
libdd-data-pipeline 85.60% <ø> (ø)
libdd-data-pipeline-ffi 75.70% <ø> (ø)
libdd-common 79.89% <ø> (ø)
libdd-common-ffi 74.41% <ø> (ø)
libdd-telemetry 73.34% <ø> (ø)
libdd-telemetry-ffi 31.36% <ø> (ø)
libdd-dogstatsd-client 82.64% <ø> (ø)
datadog-ipc 74.75% <ø> (-1.47%) ⬇️
libdd-profiling 81.69% <ø> (ø)
libdd-profiling-ffi 64.79% <ø> (ø)
libdd-sampling 97.46% <ø> (ø)
datadog-sidecar 32.77% <77.67%> (+3.58%) ⬆️
datdog-sidecar-ffi 9.13% <0.00%> (-1.05%) ⬇️
spawn-worker 48.86% <ø> (ø)
libdd-tinybytes 93.80% <ø> (ø)
libdd-trace-normalization 81.71% <ø> (ø)
libdd-trace-obfuscation 87.30% <ø> (ø)
libdd-trace-protobuf 68.25% <ø> (ø)
libdd-trace-utils 88.94% <ø> (ø)
libdd-tracer-flare 86.88% <ø> (ø)
libdd-log 74.83% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@datadog-prod-us1-6
Copy link
Copy Markdown

datadog-prod-us1-6 Bot commented May 28, 2026

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 84.28%
Overall Coverage: 73.07% (+0.17%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 96d9a7b | Docs | Datadog PR Page | Give us feedback!

@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 28, 2026

Artifact Size Benchmark Report

aarch64-alpine-linux-musl
Artifact Baseline Commit Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so 7.63 MB 7.63 MB 0% (0 B) 👌
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a 82.94 MB 82.94 MB 0% (0 B) 👌
aarch64-unknown-linux-gnu
Artifact Baseline Commit Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.25 MB 10.25 MB 0% (0 B) 👌
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a 94.02 MB 94.02 MB 0% (0 B) 👌
libdatadog-x64-windows
Artifact Baseline Commit Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll 24.54 MB 24.54 MB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib 83.96 KB 83.96 KB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb 178.23 MB 178.24 MB +0% (+8.00 KB) 👌
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib 915.25 MB 915.25 MB +0% (+3.30 KB) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll 8.03 MB 8.03 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib 83.96 KB 83.96 KB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb 23.77 MB 23.77 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib 47.43 MB 47.43 MB 0% (0 B) 👌
libdatadog-x86-windows
Artifact Baseline Commit Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll 21.27 MB 21.27 MB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib 85.29 KB 85.29 KB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb 182.23 MB 182.23 MB 0% (0 B) 👌
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib 908.37 MB 908.38 MB +0% (+3.30 KB) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll 6.20 MB 6.20 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib 85.29 KB 85.29 KB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb 25.48 MB 25.48 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib 45.07 MB 45.07 MB 0% (0 B) 👌
x86_64-alpine-linux-musl
Artifact Baseline Commit Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a 73.95 MB 73.95 MB 0% (0 B) 👌
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so 8.52 MB 8.52 MB 0% (0 B) 👌
x86_64-unknown-linux-gnu
Artifact Baseline Commit Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a 89.36 MB 89.36 MB 0% (0 B) 👌
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.36 MB 10.36 MB 0% (0 B) 👌

@leoromanovsky leoromanovsky marked this pull request as ready for review May 29, 2026 00:57
@leoromanovsky leoromanovsky requested review from a team as code owners May 29, 2026 00:57
@leoromanovsky leoromanovsky requested review from dd-oleksii and typotter and removed request for a team May 29, 2026 00:57
@leoromanovsky leoromanovsky requested a review from sameerank May 30, 2026 01:50
Copy link
Copy Markdown
Contributor

@sameerank sameerank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


let req = match builder
.method(Method::POST)
.header("Content-Type", "application/x-protobuf")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is for PHP, but FYI this isn't always the same for all the SDKs if we decide to unify them later

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thank you I'll investigate!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up. I looked into this and libdatadog is currently only supporting HTTP:

/// OTLP trace export protocol. HTTP/JSON is currently supported.
#[derive(Clone, Copy, Debug, Default, PartialEq, Eq)]
pub(crate) enum OtlpProtocol {
    /// HTTP with JSON body (Content-Type: application/json). Default for HTTP.
    #[default]
    HttpJson,
    /// HTTP with protobuf body. (Not supported yet)
    #[allow(dead_code)]
    HttpProtobuf,
    /// gRPC. (Not supported yet)
    #[allow(dead_code)]
    Grpc,
}

The reason is that the current remote library is http1 only. I'm following up with the team about support plans for this.

@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit 1fe5944 into main Jun 1, 2026
128 of 133 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the leo.romanovsky/ffe-sidecar-evaluation-metrics branch June 1, 2026 19:13
gh-worker-dd-mergequeue-cf854d Bot pushed a commit that referenced this pull request Jun 1, 2026
## Motivation

PHP FFE exposure delivery needs a native path with a cache that persists beyond a single PHP request/thread. The shared design doc is the cross-PR reference: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0

This PR is exposure-only. Metrics were split into #2052 so reviewers can evaluate exposure cache and delivery separately from OTLP evaluation metrics.

## Changes

This adds caller-driven FFE exposure sidecar actions, exposure payload forwarding through the Agent EVP proxy, and a shared exposure cache that deduplicates repeated `(service, env, version, flag, subject)` assignments across PHP requests and sidecar connections.

The reusable FFE-domain pieces now live in `datadog-ffe` behind the `exposure-events` feature: exposure input types, the LRU deduplication cache, and JSON payload encoding. `datadog-sidecar` keeps only sidecar-specific work: deriving the agent EVP endpoint, building the HTTP request, applying the timeout, logging delivery failures, and integrating with sidecar lifecycle/actions.

Current PHP MVP path:

```mermaid
flowchart LR
    Eval["PHP native evaluation<br/>ddog_ffe_evaluate"]
    Batch["PHP tracer native memory<br/>request/thread-local exposure batch"]
    Shutdown["PHP RSHUTDOWN<br/>flush exposure batch"]
    Action["sidecar action<br/>record FFE exposures"]
    Domain["datadog-ffe<br/>feature: exposure-events<br/>types + cache + JSON encoder"]
    Sidecar["shared sidecar<br/>cross-request and cross-thread exposure cache"]
    Agent["Datadog Agent<br/>EVP proxy"]
    Intake["FFE exposure intake"]

    Eval -->|"doLog=true assignment"| Batch
    Batch --> Shutdown
    Shutdown --> Action
    Action --> Domain
    Domain --> Sidecar
    Sidecar --> Agent
    Agent --> Intake
```

Future Python/Ruby connection:

```mermaid
flowchart LR
    PyToday["dd-trace-py today<br/>host-language exposure writer"]
    RbToday["dd-trace-rb today<br/>host-language exposure writer"]
    PyFuture["dd-trace-py future<br/>explicit native opt-in"]
    RbFuture["dd-trace-rb future<br/>explicit native opt-in"]
    Native["libdatadog caller-driven<br/>FFE exposure action"]
    Shared["shared sidecar<br/>dedupe + EVP delivery"]
    Agent["Datadog Agent<br/>EVP proxy"]

    PyToday -. "current direct EVP path" .-> Agent
    RbToday -. "current direct EVP path" .-> Agent
    PyFuture -. "after ownership switch" .-> Native
    RbFuture -. "after ownership switch" .-> Native
    Native --> Shared
    Shared --> Agent
```

The future Python/Ruby arrows are intentionally not active behavior in this PR. They show why the reusable code lives in `datadog-ffe` rather than directly in sidecar internals, while preserving today's host-language ownership.

Why Python/Ruby do not double count today:

- Python and Ruby use libdatadog for evaluation only; the evaluator returns assignment metadata and does not enqueue exposure telemetry as a side effect.
- This PR adds a separate caller-driven sidecar action. Exposure emission happens only when an SDK explicitly records exposure candidates into that action. PHP wires this in its companion PR; Python and Ruby do not.
- Python and Ruby therefore keep exactly their current host-language EVP exposure writers. They are not also sending exposure candidates through this native sidecar path.
- The sidecar cache only deduplicates exposure candidates that enter the native sidecar path. It cannot protect direct host-language EVP writers, so future Python/Ruby migration must switch ownership to native logging and disable/bypass the host exposure writer for the same evaluations.

Reference implementation check: dd-trace-java follows the same exposure semantics and user ergonomics. Java's `DDEvaluator` is SDK-owned evaluation code; after resolving an assignment, it checks allocation `doLog`, builds an exposure event with flag, variant, allocation, targeting key, and context, and dispatches it through `FeatureFlaggingGateway`. `ExposureWriterImpl` subscribes to those exposure events, queues them, deduplicates with an LRU exposure cache, serializes service/env/version context, and posts to the Agent EVP proxy. Application code only calls the OpenFeature provider; it does not call an exposure API.

PHP mirrors that canonical shape, with PHP-specific lifecycle mechanics: the dd-trace-php evaluation bridge records `doLog=true` exposure candidates internally, request shutdown flushes the batch, and this PR's sidecar path owns cross-request deduplication and EVP delivery. For future Python/Ruby migration, the same rule applies: wire native exposure recording inside the SDK-owned evaluation path, and turn off the existing host-language exposure writer for those evaluations.


## Decisions

No telemetry is emitted automatically from shared libdatadog evaluator calls. SDKs must explicitly enqueue FFE telemetry actions. This remains required for Python/Ruby coexistence because those SDKs currently log exposures and metrics in host-language code.

The sidecar cache deduplicates only exposure candidates sent through this native sidecar path; it cannot deduplicate direct host-language EVP writers.

Future Python/Ruby migration must be an ownership switch, not an additional writer. When those SDKs opt into this native exposure path, their host-language exposure writers must be disabled or bypassed for the same evaluations to avoid double counting.

## Validation

Current head (`8be471fbc`) local validation:

```sh
cd /Users/leo.romanovsky/go/src/github.com/DataDog/libdatadog-ffe-sidecar-exposures
cargo fmt --check
cargo test -p datadog-ffe --features exposure-events telemetry::exposures
cargo test -p datadog-sidecar ffe_exposure
cargo check -p datadog-ffe
cargo check -p datadog-sidecar-ffi
```

Results: datadog-ffe exposure tests passed (4 passed), sidecar exposure tests passed (6 passed), default datadog-ffe check passed, sidecar FFI check passed, fmt check passed with only the repo stable-rustfmt warnings.

Prior downstream PHP behavior validation before the reusable-crate refactor, from DataDog/dd-trace-php#3910 using this PR at `6d23848a`:

```text
ffe-dogfooding subject=php-3910-split-1779981442
php7_exposures=1 php8_exposures=1
php7_metrics=0 php8_metrics=0
```

System-tests downstream validation:

```sh
TEST_LIBRARY=php ./run.sh FEATURE_FLAGGING_AND_EXPERIMENTATION tests/ffe/test_exposures.py -vv
```

Result: 11 passed in 77.53 seconds.

Related PRs: DataDog/dd-trace-php#3906, DataDog/dd-trace-php#3910, #2052, DataDog/system-tests#7031.



Co-authored-by: leo.romanovsky <leo.romanovsky@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants