Make Splunk output sourcetype configurable in CLF by marpears · Pull Request #3251 · openshift/cluster-logging-operator

marpears · 2026-04-13T13:45:14Z

Description

This PR allows for configuration of the Splunk output source type in the CLF using a new sourceType field. This is so that we can support users who have defined custom source types in Splunk.

If sourceType is not defined, then the current behavior is preserved where _json is the default and can be overridden if payloadKey is set based on the type of the final event payload.

CLF configuration example:

  outputs:
    - name: splunk-app-team-1
      splunk:
        authentication:
          token:
            key: hecToken
            secretName: splunk-app-team-1
        sourceType: 'my:custom:sourcetype'
        url: 'https://splunk.customer.com:8088'
      type: splunk

/cc @Clee2691 @cahartma
/assign @jcantrill

/cherrypick release-6.4
/cherrypick release-6.5

Links

JIRA: OBSDA-1383

jcantrill · 2026-04-13T16:22:33Z

/hold

jcantrill

There is no functional test to support this addition. Also note we do not backport features to earlier releases

jcantrill · 2026-04-14T20:44:58Z

+	// SourceType can be used to specify a pretrained or custom sourcetype in Splunk.
+	// If not specified, `sourceType` will be "_json" or be determined automatically when using `payloadKey` based on the type of the final event payload.


... be determined automatically ...

I generally do not like this change but this statement is inaccurate. There is no 'automatic' determination as the value configured in the sink is the same as provided in the API.

This field is a hint to the splunk receiver how to parse the incoming message field, which why they can't figure it out how to do this on the server-side is beyond me. By allowing user's to configure this value, we are allowing them to misrepresent what the log format of the actual work load is. Given logs come from various workloads from the entire cluster to a single sink means the 'pre-trained' model may not apply to every log. How do we guarantee that with this implementation?

The CLF is intentionally opinionated meaning the current implementation is correct: you either get a JSON blob of ViaQ format or generic text from some other part of the ViaQ payload. IMO the proper way to add this feature is to evaluate the message field and dynamically set it based upon a pattern defined in the message. The caveat is whether it can be set for individual log messages.

That being said, what does it mean to the user, to set this value to say "apache http" and the sink additionally handles some well-known java log4j formatted logs?

Does the receiver politely process everything?

Are we going to be on the hook to "fix" the case where the logs don't match the sourceType

Hi @jcantrill thanks for reviewing this PR. In reply to your comments:

The problem this feature is intended to resolve is a scenario where a user has configured custom source types in splunk to interpret particular app log message content and does not want to use the default source type of _json.

As an example, custom source types can be configured to trim the event payload, extract key/value pairs from .message and transform them into fields for use with dashboards.

The inability to define a source type with the CLO is particularly an issue for a user who is moving away from a self-managed log collector that allows source type to be defined, and has a mature suite of splunk custom source types and dashboards. When the source type is _json instead of the required custom source type, dashboards and search queries can be broken if they rely on fields which have not been extracted, and also splunk costs increase if the log event is not trimmed. Before the CLO supported native splunk forwarding, often users would implement a self-managed fluentd instance but the splunk hec plugin is now archived. This, and the native support for splunk forwarding is a reason why a user would be seeking to move to the CLO.

If a source type needs to be referenced for a particular application's log events, but is not appropriate for other apps, the ClusterLogForwarder would need multiple splunk outputs so that it does not misrepresent the format of log events for other applications.

I have altered the text that referred to "automatic" source type determination to try and make this part clearer and added examples of CLF configurations that use the sourceType feature in docs/features/logforwarding/outputs/splunk-forwarding.adoc

If a source type needs to be referenced for a particular application's log events, but is not appropriate for other apps, the ClusterLogForwarder would need multiple splunk outputs so that it does not misrepresent the format of log events for other applications.

This is exactly the concern I have since fundamentally it requires users to know the log format of the workloads on the cluster and how to wire them from source container to output. This is precisely the behavior we explicitly wanted to avoid since before the introduction of CRIO; this was the problem to be solved with the initial fluend and regex of the source. Frankly, IMO, it completely defeats the purpose of an operator and it being the thing to manage your deployments.

That being said, we discussed this change within our architecture meeting and landed on fundamentally what you describe based on a customer's usage and experience with Splunk. Essentially, we would expect the code to be further changed to:

defining no messagePayloadKey, the functionality remains as-is

defining no sourceType with messagePayloadKey, the functionality remains as-is

defining sourceType with messagePayloadKey, the functionality populates the spec'd sourceType

add a BIG BOLD WARNING to the API and repo docs that basically say you get what you asked for and that care needs to be taken when crafting pipelines

I will re-review the comment and try to comment these items accordingly. Additionally, I will review the vector config api to confirm that sourceType can use templates. The best use of this feature, if vector supports it, would be for users to label their workloads with the sourceType and then the collector could use that value when forwarding logs.

… of sourceType use to docs

openshift-ci · 2026-04-15T15:11:19Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: marpears
Once this PR has been reviewed and has the lgtm label, please ask for approval from jcantrill. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jcantrill

Please add e2e validation tests and at least one functional test

jcantrill · 2026-04-15T20:49:42Z

+	//
+	// +kubebuilder:validation:Optional
+	// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="SourceType",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"}
+	SourceType string `json:"sourceType,omitempty"`


Add a validation rule similar to https://github.com/marpears/cluster-logging-operator/blob/b41e2067585f143ab6b4ffa656ceae5e9d779a16/api/observability/v1/output_types.go#L80 that allows setting this value only when the payloadKey is modified

Add comment that identifies this field can be templated along with a validation rule like https://github.com/marpears/cluster-logging-operator/blob/b41e2067585f143ab6b4ffa656ceae5e9d779a16/api/observability/v1/output_types.go#L1177

Add a comment note about it is on the administrator to configure the pipeline so the sourceType matches the log entry. The collector makes no effort or validation to ensure they match.

As an aside to the last point, I do not know how Splunk behaves with a mismatch which is a little concerning to me.

Also there are api validation tests here https://github.com/marpears/cluster-logging-operator/tree/b41e2067585f143ab6b4ffa656ceae5e9d779a16/test/e2e/collection/apivalidations

jcantrill · 2026-04-15T20:52:37Z

-5. `indexedFields`: Optional. Fields to be added to Splunk index.
-6. `payloadKey`: Optional. Specifies record field to use as payload.
-7. `compression`: Optional. Compression configuration, available to are: `none`, `gzip`. Default is `none`.
+5. `sourceType`: Optional. Can be used to specify a pretrained or custom sourcetype in Splunk. If not specified it will default to `_json` and be detected by .payloadkey.


There is nothing 'detected' by the collector or operator. I recommend:

linking to the list of pretrained sourcetypes

`sourceType`: Optional. Can be used to specify a pretrained or custom sourcetype in Splunk when the payloadKey is additionally defined. If not specified, it will default to `_json`.

jcantrill · 2026-04-15T20:55:12Z

 To ensure consistency and meaningful categorization, the `source` value can be dynamically derived from the `log_type` and `log_source` fields,
 following Cluster Log Forwarder's conventions.

+=== `sourceType`


This would be a good place to add an example where the sourceType is defined as a pod label and the CLF spec's the sourceType as a templated field to use from the pod meta. This highlights toe value of labels which is a core concept

jcantrill · 2026-04-15T20:56:33Z

+** `_json` — used when `payloadKey` points to an object.
+** `generic_single_line` — used when the payload is a primitive value (e.g., string, number, boolean).

+NOTE: Use `payloadKey` carefully. Selecting a single field as the payload may cause other important information in the log to be dropped, potentially leading to inconsistent or incomplete log events.


My expectation is the greatest value comes from when this points to the message field

jcantrill · 2026-04-15T20:56:54Z


 By default, `payloadKey` is not set, which means the complete log record is forwarded as the payload.

+If `sourceType` is not defined, payloadKey can override the default `sourcetype` value of `_json` :


Please explain. I don't understand what this means.

marpears added 3 commits April 13, 2026 13:22

OBSDA-1383 Make Splunk output sourcetype configurable in CLF

bab691f

OBSDA-1383 Make Splunk output sourcetype configurable in CLF

39003c6

OBSDA-1383 Make Splunk output sourcetype configurable in CLF

2c21220

openshift-ci bot assigned jcantrill Apr 13, 2026

openshift-ci bot requested review from Clee2691 and cahartma April 13, 2026 13:45

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 13, 2026

jcantrill reviewed Apr 14, 2026

View reviewed changes

Remove references to automatic source type detection and add examples…

b41e206

… of sourceType use to docs

jcantrill reviewed Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Splunk output sourcetype configurable in CLF#3251

Make Splunk output sourcetype configurable in CLF#3251
marpears wants to merge 4 commits intoopenshift:masterfrom
marpears:splunk-sourcetype

marpears commented Apr 13, 2026 •

edited

Loading

Uh oh!

jcantrill commented Apr 13, 2026

Uh oh!

jcantrill left a comment

Uh oh!

jcantrill Apr 14, 2026

Uh oh!

marpears Apr 15, 2026

Uh oh!

jcantrill Apr 15, 2026

Uh oh!

openshift-ci bot commented Apr 15, 2026

Uh oh!

jcantrill left a comment

Uh oh!

jcantrill Apr 15, 2026

Uh oh!

jcantrill Apr 15, 2026

Uh oh!

jcantrill Apr 15, 2026

Uh oh!

jcantrill Apr 15, 2026

Uh oh!

jcantrill Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// SourceType can be used to specify a pretrained or custom sourcetype in Splunk.
		// If not specified, `sourceType` will be "_json" or be determined automatically when using `payloadKey` based on the type of the final event payload.


		By default, `payloadKey` is not set, which means the complete log record is forwarded as the payload.

		If `sourceType` is not defined, payloadKey can override the default `sourcetype` value of `_json` :

Conversation

marpears commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Links

Uh oh!

jcantrill commented Apr 13, 2026

Uh oh!

jcantrill left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Apr 15, 2026

Uh oh!

jcantrill left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

marpears commented Apr 13, 2026 •

edited

Loading