feat: OTel JVM semconv compliance (dual-emit jvm.* alongside argus_*)#253
Merged
Conversation
Add a single source-of-truth mapping layer (SemconvMetrics) that maps Argus GC/heap/thread/class/cpu metrics to standard OpenTelemetry JVM semantic- convention names (jvm.gc.duration, jvm.memory.used, jvm.memory.committed, jvm.memory.used_after_last_gc, jvm.thread.count, jvm.class.count, jvm.cpu.time, jvm.cpu.recent_utilization) with UCUM units and the standard attributes (jvm.gc.name, jvm.gc.action, jvm.memory.pool.name). In Prometheus exposition dots become underscores (jvm_gc_duration_seconds, ...). PrometheusMetricsCollector now emits the standard jvm_* series alongside the existing argus_* series. The legacy argus_* duplicates are gated behind a new flag argus.metrics.legacyNames (default true) so existing dashboards keep working. Argus-unique metrics with no semconv equivalent (leak confidence, carrier-thread skew, reserved metaspace, profiling samples, ...) stay in the argus.* namespace and are emitted regardless of the flag. OtlpMetricsExporter now stamps the full OTel resource attributes (service.name, service.namespace, service.instance.id, telemetry.sdk.name, telemetry.sdk.language=java, telemetry.sdk.version) and emits the semconv metric names alongside the legacy names. Signed-off-by: rlaope <piyrw9754@gmail.com>
…cs contract
The OTLP push path had diverged from the Prometheus path on the OTel JVM
semantic-convention series:
- jvm.memory.used / jvm.memory.committed / jvm.memory.used_after_last_gc
carried no attributes. Emit jvm.memory.pool.name="heap" on each data point
so OTLP memory series match the Prometheus {jvm_memory_pool_name="heap"}
identity.
- The Metaspace memory pool was missing on OTLP. Emit jvm.memory.used /
jvm.memory.committed with pool="Metaspace", gated by the same
metaspace-enabled + non-null-analyzer guard the Prometheus path uses.
- jvm.gc.duration was absent on OTLP. Emit it as an OTLP Histogram data point
from the same aggregate pause histogram the Prometheus path uses
(explicitBounds + per-bucket counts + sum + count); skipped when there is
no pause data.
SemconvMetrics.GC_DURATION previously declared jvm.gc.name/jvm.gc.action as
required, but Argus keeps only one aggregate pause histogram and cannot split
it per collector/cause without fabricating bucket data. Removed those
attributes from the table so the declared contract matches what is emitted;
the per-collector/per-cause breakdown remains the Argus-unique
argus_gc_pause_breakdown_seconds_total / argus_gc_events_breakdown_total series.
Tests assert OTLP memory data points carry the pool attribute, OTLP emits the
Metaspace pool and the jvm.gc.duration histogram, and the SemconvMetrics table
declares no attribute that is never emitted.
Signed-off-by: rlaope <piyrw9754@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Workstream W1 — OTel JVM semantic-convention compliance.
Argus now speaks the OpenTelemetry JVM runtime semantic conventions. A new single
source-of-truth mapping layer (
SemconvMetrics) maps Argus GC/heap/thread/class/cpumetrics to the standard OTel names, and both the Prometheus collector and the OTLP
exporter draw their names, units, and types from that one table.
Standard metrics covered:
jvm.gc.duration,jvm.memory.used,jvm.memory.committed,jvm.memory.used_after_last_gc,jvm.thread.count,jvm.class.count,jvm.cpu.time,jvm.cpu.recent_utilization— with UCUM units (s,By,1,{thread},{class})and the standard attributes (
jvm.gc.name,jvm.gc.action,jvm.memory.pool.name).In Prometheus exposition the dots become underscores (
jvm_gc_duration_seconds, etc.).Compatibility
argus_*series are still emitted by default, gated behind a new flagargus.metrics.legacyNames(default true). Existing dashboards keep working;set
argus.metrics.legacyNames=falseto emit only the standardjvm_*names.argus.*namespace and are emitted regardless ofthe flag — they are differentiators, not "legacy" duplicates. Examples:
argus_gc_leak_confidence,argus_gc_leak_suspected,argus_gc_overhead_ratio,GC allocation/promotion rates, carrier-thread skew,
argus_metaspace_reserved_bytes,virtual-thread start/end/pinning counters, profiling samples.
OTLP resource attributes
OtlpMetricsExporter(viaOtlpJsonBuilder) now stamps the full OTel resource set:service.name,service.namespace(K8s namespace via Downward API),service.instance.id(pod name / hostname),
telemetry.sdk.name,telemetry.sdk.language=java,telemetry.sdk.version. The OTLP payload also emits the semconv metric names alongsidethe legacy names.
Acceptance criteria and verification
SemconvMetricsholds the table(OTel name, Prometheus name, unit, type, description, attributes). Verified by
SemconvMetricsComplianceTest.mapping_table_units_are_correctandmapping_table_declares_required_attributes.jvm_*alongsideargus_*by default;argus.metrics.legacyNames=falsedrops theargus_*duplicates. Verified byemits_standard_jvm_series_alongside_legacy_by_defaultanddisabling_legacy_names_drops_argus_duplicates_but_keeps_jvm_and_unique.OtlpJsonBuilder.appendResourceand the semconv emissions in each metric section.
series. Verified by
every_series_has_help_and_type_and_no_duplicate_series.emitted_jvm_names_match_the_mapping_tableandmapping_table_units_are_correct.argus_unique_metrics_emitted_regardless_of_legacy_flag.Deviations
jvm.cpu.time(a monotonic process-CPU-seconds counter) has no JFR-derived source inArgus today — only the recent-utilization ratio is available. It is kept in the mapping
table as the documented contract but is not exported yet. The mapping table is the
superset/contract, so the "every exported metric is in the table" assertion still holds.
jvm.gc.durationhistogram is exported as the existing aggregate; it is not yetsplit per
jvm.gc.name/jvm.gc.action(Argus tracks only an aggregate pause histogram).The per-collector/per-cause breakdown remains available as the Argus-unique
argus_gc_pause_breakdown_seconds_totalseries.Test / build notes
./gradlew :argus-server:test --tests "io.argus.server.metrics.*"and:argus-core:testare green. Two unrelated, pre-existing failures onmasterare out ofscope for W1 and reproduce with this branch's changes stashed:
CorrelationAnalyzerTracePauseTest(uses hardcoded 2026-05-28 timestamps that the 5-minuteretention prunes) and a Gradle implicit-dependency validation error on
:argus-server:jarvs
:argus-diagnostics:jar.🤖 Generated with Claude Code