Skip to content

[FEATURE] Improve observability and error semantics for application-level metadata / 增强应用级 metadata 链路观测与错误语义 #3356

Description

@Alanxtl

上下文请参考 #3188 (comment)

Background

The application-level metadata path after #2534 includes local MetadataInfo, revision calculation, metadata report publishing/loading, service-app mapping, and RPC MetadataService. When service discovery fails, it is currently difficult to identify which stage caused the failure. This issue only focuses on observability for the application-level metadata path.

Related code:

  • metadata/report_instance.go
  • metadata/mapping/metadata/service_name_mapping.go
  • metadata/client.go
  • registry/servicediscovery/service_instances_changed_listener_impl.go

Current Problems

Current observability is not enough to diagnose application-level metadata failures.

Specific problems:

  • Mapping register, get, listen, and remove operations do not have unified metrics or structured logs.
  • Revision calculation and metadata cache hit/miss are hard to observe.
  • Errors from metadata report loading, RPC metadata loading, URL construction, and mapping are not clearly categorized.
  • When consumers cannot discover services, it is hard to tell whether the root cause is mapping, metadata report, RPC metadata service, revision, or cache.

Suggestions

  • Add metrics and structured logs for mapping register, get, listen, and remove operations.
  • Expose metadata source, storage type, revision, and cache hit/miss where appropriate.
  • Use clear error categories for metadata report failure, RPC metadata failure, URL construction failure, revision mismatch, and mapping failure.
  • Include useful context such as app, revision, service key, registry id, and storage type.
  • Add failure-path tests that verify error messages or categories.

背景

#2534 之后的应用级 metadata 链路包括本地 MetadataInfo、revision 计算、metadata report 发布/获取、service-app mapping 以及 RPC MetadataService。当服务发现失败时,目前较难判断具体失败阶段。本 issue 只关注应用级 metadata 链路的可观测性。

相关代码:

  • metadata/report_instance.go
  • metadata/mapping/metadata/service_name_mapping.go
  • metadata/client.go
  • registry/servicediscovery/service_instances_changed_listener_impl.go

当前问题

当前可观测性不足以诊断应用级 metadata 失败。

具体问题:

  • mapping register、get、listen、remove 缺少统一指标或结构化日志。
  • revision 计算和 metadata cache hit/miss 难以观测。
  • metadata report 获取、RPC metadata 获取、URL 构造和 mapping 相关错误没有清晰分类。
  • consumer 订阅不到服务时,难以判断根因是 mapping、metadata report、RPC metadata service、revision 还是 cache。

建议

  • 为 mapping register、get、listen、remove 增加指标和结构化日志。
  • 在适当位置暴露 metadata source、storage type、revision、cache hit/miss。
  • 为 metadata report failure、RPC metadata failure、URL construction failure、revision mismatch、mapping failure 使用清晰错误分类。
  • 增加 app、revision、service key、registry id、storage type 等有用上下文。
  • 增加失败路径测试,验证错误信息或错误分类。

Metadata

Metadata

Fields

No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions