SyslogAppender currently decides whether to split outgoing syslog packets using LogString::size(), while the actual UDP payload is produced only after transcoding the message into encoded bytes.
This creates a transport boundary mismatch for multibyte encodings such as UTF-8:
- split decision → based on character count
- emitted datagram size → based on encoded byte length
As a result, messages containing multibyte characters can remain below the configured MaxMessageLength threshold while still producing UDP datagrams larger than the configured maximum.
Runtime Reproduction
Configuration:
MaxMessageLength = 100
PatternLayout("%m")
- Syslog output to localhost UDP receiver
Test message:
- 40 Euro symbols (
€)
- UTF-8 encoding (
€ = 3 bytes)
Observed behavior with current implementation:
msg.size() = 40
- encoded payload length = 120 bytes
- no splitting occurred
- emitted UDP datagram size = 124 bytes including syslog prefix
This exceeds the configured maximum despite the existing split logic.
Root Cause
Current split logic uses:
if (msg.size() > _priv->maxMessageLength)
However:
LogString::size() reflects internal character/code-unit count
- UDP transport size depends on encoded byte length after transcoding
For multibyte UTF-8 content, these values diverge.
Why This Matters
This affects transport boundary reliability rather than trusted configuration validation.
Oversized syslog datagrams may:
- exceed expected relay or collector limits
- increase truncation/drop risk
- produce inconsistent packet chunking behavior across encodings
The issue is especially visible with:
- UTF-8 multibyte characters
- emoji
- CJK characters
- mixed-width log content
Additional Investigation Notes
I prototyped a byte-aware splitting implementation to validate the issue.
That investigation confirmed:
-
byte-aware splitting resolves the demonstrated overflow case
-
however, a naive implementation introduces additional considerations:
- prefix/suffix accounting must be dynamic rather than heuristic
- repeated transcoding inside the split loop can introduce avoidable hot-path overhead
For example:
- enabling
facilityPrinting
- while using a fixed suffix reserve
- still allowed a packet to exceed
MaxMessageLength by 1 byte in testing
Because of this, I am opening this issue first rather than immediately proposing the prototype patch.
Suggested Direction
A more robust solution may involve:
- enforcing limits using encoded byte length
- dynamically accounting for syslog prefix/suffix overhead
- preserving valid UTF-8/codepoint boundaries during splitting
- avoiding repeated transcoding work in the logging hot path
I can provide the reproduction test and prototype implementation details if helpful.
SyslogAppendercurrently decides whether to split outgoing syslog packets usingLogString::size(), while the actual UDP payload is produced only after transcoding the message into encoded bytes.This creates a transport boundary mismatch for multibyte encodings such as UTF-8:
As a result, messages containing multibyte characters can remain below the configured
MaxMessageLengththreshold while still producing UDP datagrams larger than the configured maximum.Runtime Reproduction
Configuration:
MaxMessageLength = 100PatternLayout("%m")Test message:
€)€= 3 bytes)Observed behavior with current implementation:
msg.size()= 40This exceeds the configured maximum despite the existing split logic.
Root Cause
Current split logic uses:
if (msg.size() > _priv->maxMessageLength)However:
LogString::size()reflects internal character/code-unit countFor multibyte UTF-8 content, these values diverge.
Why This Matters
This affects transport boundary reliability rather than trusted configuration validation.
Oversized syslog datagrams may:
The issue is especially visible with:
Additional Investigation Notes
I prototyped a byte-aware splitting implementation to validate the issue.
That investigation confirmed:
byte-aware splitting resolves the demonstrated overflow case
however, a naive implementation introduces additional considerations:
For example:
facilityPrintingMaxMessageLengthby 1 byte in testingBecause of this, I am opening this issue first rather than immediately proposing the prototype patch.
Suggested Direction
A more robust solution may involve:
I can provide the reproduction test and prototype implementation details if helpful.