Skip to content

fix(vector config): Compression field incorrectly nested under TLS in OpenTelemetryProtocol TOML serialization, causing collector crashloops with LokiStack + Otel + tuning.#3320

Open
vparfonov wants to merge 2 commits into
openshift:masterfrom
vparfonov:log9566
Open

fix(vector config): Compression field incorrectly nested under TLS in OpenTelemetryProtocol TOML serialization, causing collector crashloops with LokiStack + Otel + tuning.#3320
vparfonov wants to merge 2 commits into
openshift:masterfrom
vparfonov:log9566

Conversation

@vparfonov

@vparfonov vparfonov commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Description

Root Cause

In go-toml, struct fields serialize in declaration order. The TLS field (struct pointer creating a TOML table) was declared before Compression (scalar), causing compression to nest inside the TLS section.

Solution

Reorder OpenTelemetryProtocol fields: move Compression before TLS.
Changes:

  • Reorder struct fields in opentelemetry_sink.go
  • Add test case for LokiStack + Otel + tuning
  • Add test fixture lokistack_otel_tuning.toml

/cc
/assign @jcantrill

Links

Summary by CodeRabbit

  • New Features

    • Added OpenTelemetry-based LokiStack output support with routing for application, audit, and infrastructure logs.
    • Improved trace context handling so log events can carry trace and span details when available.
    • Added configurable delivery and compression settings for the new LokiStack output path.
  • Tests

    • Expanded coverage for LokiStack output generation to include the new OpenTelemetry tuning scenario.

When LokiStack uses Otel datamodel with tuning+compression, the field was
incorrectly nested under [protocol.tls] instead of [protocol], causing Vector
to reject the config.

Root cause: In go-toml, struct fields serialize in declaration order. TLS field
(a struct pointer creating a TOML table) was before Compression (scalar field),
causing compression to nest inside TLS.

Changes:
- Move Compression before TLS in OpenTelemetryProtocol struct
- Add test case for LokiStack + Otel + tuning
- Add lokistack_otel_tuning.toml test fixture
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Adds a complete Vector TOML configuration (lokistack_otel_tuning.toml) for LokiStack output using the OpenTelemetry data model with tuning parameters. The configuration covers trace context extraction, per-log-type routing and OTLP normalization, grouping reduces, and HTTP sinks for application, audit, and infrastructure logs. A test case is added to verify this configuration, and a minor field order fix is applied to OpenTelemetryProtocol.

Changes

LokiStack OpenTelemetry data model and tuning

Layer / File(s) Summary
OpenTelemetryProtocol field reorder
internal/generator/vector/api/sinks/opentelemetry_sink.go
Compression and TLS field positions swapped so Compression precedes TLS in the struct.
Trace context extraction and application log routing
internal/generator/vector/output/lokistack/lokistack_otel_tuning.toml
VRL remap extracts and validates trace_id/span_id/trace_flags from structured fields, embedded JSON, or regex; routes application logs by log_type with an unmatched-fallback counter metric.
Application log normalization, grouping, and sink
internal/generator/vector/output/lokistack/lokistack_otel_tuning.toml
Application container remap builds OTLP resource attributes and logRecord; reduce groups by cluster/namespace/pod/container; application OTLP HTTP sink configured with gzip, JSON codec, bearer auth, retry, batching, and disk buffering.
Audit log routing, normalization, grouping, and sink
internal/generator/vector/output/lokistack/lokistack_otel_tuning.toml
Routes audit logs by log_source; remap transforms normalize auditd, kubeapi, openshiftAPI, and OVN streams to OTLP; grouping reduces and audit OTLP HTTP sink configured.
Infrastructure log routing, normalization, grouping, and sink
internal/generator/vector/output/lokistack/lokistack_otel_tuning.toml
Routes infrastructure logs by log_source; container and node remap transforms produce OTLP resource attributes and logRecord; grouping reduces and infrastructure OTLP HTTP sink configured.
Test case for OTLP data model and tuning
internal/generator/vector/output/lokistack/lokistack_test.go
Adds time and resource imports and a DescribeTable entry configuring OpenTelemetry DataModel with AtLeastOnce delivery, retry durations, MaxWrite of 10M, and gzip compression, asserting against lokistack_otel_tuning.toml.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 Hoppity-hop through the OTLP trail,
Trace IDs extracted without fail!
Application, audit, infra too—
Each log grouped and gzip'd anew.
The rabbit stamps the toml with glee,
LokiStack sings in harmony! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the TOML serialization bug and the affected OpenTelemetryProtocol/LokiStack path.
Description check ✅ Passed The description includes root cause, solution, changes, assignment, and links, which matches the required template well enough.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot requested review from Clee2691 and cahartma June 29, 2026 08:51

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/generator/vector/output/lokistack/lokistack_otel_tuning.toml`:
- Around line 540-542: The OVN token guards in the lokistack OTEL tuning
template are off by one and can still index past the available elements. Update
the length checks around the `ovnTokens` reads so the `log.sequence` assignment
in the `ovnTokens[1]` branch only runs when there is at least a second token,
and the `k8s.ovn.component` assignment in the `ovnTokens[2]` branch only runs
when there is at least a third token. Keep the fix localized to the `ovnTokens`
parsing block in `lokistack_otel_tuning.toml`.
- Around line 36-54: The regex fallback in lokistack_otel_tuning.toml does not
match quoted JSON keys, so common trace fields like "traceId" or "trace.flags"
are missed when the separator is not immediately adjacent. Update the
parse_regex patterns in the trace_context.trace_id, trace_context.span_id, and
trace_context.trace_flags blocks to allow optional quotes around the key names
while preserving the existing aliases and value capture groups. Keep the fix
localized to the regex fallback logic so the existing JSON branch and
assignments to trace_context remain unchanged.

In `@internal/generator/vector/output/lokistack/lokistack_test.go`:
- Around line 164-165: The retry duration values in the Lokistack test are using
raw time.Duration literals, which makes them nanoseconds instead of seconds.
Update the MaxRetryDuration and MinRetryDuration setup in lokistack_test.go to
use second-based durations so the test exercises the intended retry bounds, and
keep the change localized to the retry config in the test case.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d795473c-f9ff-4cde-a8ab-910cd91cd743

📥 Commits

Reviewing files that changed from the base of the PR and between f32eb77 and ee2a1a4.

📒 Files selected for processing (3)
  • internal/generator/vector/api/sinks/opentelemetry_sink.go
  • internal/generator/vector/output/lokistack/lokistack_otel_tuning.toml
  • internal/generator/vector/output/lokistack/lokistack_test.go

Comment thread internal/generator/vector/output/lokistack/lokistack_test.go
@vparfonov

Copy link
Copy Markdown
Contributor Author

/test e2e-using-bundle

1 similar comment
@vparfonov

Copy link
Copy Markdown
Contributor Author

/test e2e-using-bundle

Comment thread internal/generator/vector/output/lokistack/lokistack_test.go
@vparfonov

Copy link
Copy Markdown
Contributor Author

/retest

@vparfonov

Copy link
Copy Markdown
Contributor Author

/test ci-index-cluster-logging-operator-bundle

@jcantrill jcantrill left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jul 2, 2026
@openshift-ci

openshift-ci Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcantrill, vparfonov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 2, 2026
@openshift-ci

openshift-ci Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

@vparfonov: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-using-bundle d5484af link false /test e2e-using-bundle
ci/prow/e2e-target d5484af link unknown /test e2e-target

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD c02ef14 and 2 for PR HEAD d5484af in total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. release/6.6

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants