Skip to content

支持通过环境变量提供默认配置#15

Open
ghangz wants to merge 4 commits into
MetaX-MACA:mainfrom
ghangz:mengz/env-default-config-overrides
Open

支持通过环境变量提供默认配置#15
ghangz wants to merge 4 commits into
MetaX-MACA:mainfrom
ghangz:mengz/env-default-config-overrides

Conversation

@ghangz

@ghangz ghangz commented Jun 25, 2026

Copy link
Copy Markdown

在容器和集群环境里,通过环境变量注入默认配置比改静态文件更稳定。这个改动允许导出器从环境变量读取默认值,并保留显式配置优先级,方便在不同沐曦部署环境里复用同一份镜像。

这次改动保持默认行为兼容,只在明确需要的场景下引入新的诊断、配置入口或结果输出,避免把沐曦适配相关的问题拖到更晚的运行阶段才暴露出来。

我已经在沐曦 GPU 环境里完成了对应分支验证,并补充或执行了以下相关测试:

  • tests/test_env_defaults.py

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for configuring the exporter via environment variables (such as MX_EXPORTER_PORT and MX_EXPORTER_INTERVAL_MS) and includes corresponding unit tests. The reviewer identified a critical issue where environment variables are evaluated eagerly during argument definition. This causes the application to crash on startup if an environment variable is invalid, even if the user overrides it via command-line arguments. The reviewer suggested deferring environment variable evaluation until after parse_args() is called to ensure proper fallback behavior and robustness.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread mx_exporter/__init__.py Outdated
Comment on lines 135 to 146
parser.add_argument("-p", "--port", type=check_port, default=get_env_default("MX_EXPORTER_PORT", 8000, check_port), help="HTTP listen port")
parser.add_argument("-i", "--interval", type=check_interval, default=get_env_default("MX_EXPORTER_INTERVAL_MS", 10000, check_interval), help="Metrics gathering interval, unit:ms")
parser.add_argument("-c", "--config-file", type=check_path, help="Path to metrics config file")
parser.add_argument("-m", "--mode", type=int, choices=[0,1], default=1, help="Deprecated, keep for back compatibility")
parser.add_argument("-lm", "--log-monitor", type=int, choices=[0,1], default=1, help="Deprecated, keep for back compatibility")
parser.add_argument("-im", "--ib-monitor", type=int, choices=[0,1], default=0, help=argparse.SUPPRESS) # help="0/1 - Disable/Enable IB NIC counter monitoring"
parser.add_argument("-mp", "--mount-point", type=check_path, default="/", help="Container mount point")
parser.add_argument("-kp", "--kubelet-path", type=str, default="/var/lib/kubelet", help="Kubelet root dir")
parser.add_argument("-kd", "--k8s-domains", nargs='+', type=str, default=["metax-tech"], help="Monitoring the k8s domains contains specified keywords, multi-keywords e.g. -kd domain1 domain2")
parser.add_argument("-im", "--ib-monitor", type=int, choices=[0,1], default=get_env_default("MX_EXPORTER_IB_MONITOR", 0, int), help=argparse.SUPPRESS) # help="0/1 - Disable/Enable IB NIC counter monitoring"
parser.add_argument("-mp", "--mount-point", type=check_path, default=get_env_default("MX_EXPORTER_MOUNT_POINT", "/", check_path), help="Container mount point")
parser.add_argument("-kp", "--kubelet-path", type=str, default=get_env_default("MX_EXPORTER_KUBELET_PATH", "/var/lib/kubelet"), help="Kubelet root dir")
parser.add_argument("-kd", "--k8s-domains", nargs='+', type=str, default=get_env_domains("MX_EXPORTER_K8S_DOMAINS", ["metax-tech"]), help="Monitoring the k8s domains contains specified keywords, multi-keywords e.g. -kd domain1 domain2")

args = parser.parse_args()
print(args)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

问题分析
parser.add_argument 中,default 参数的值是在定义解析器时(即调用 add_argument 时)立即计算的,而不是在 parse_args() 解析时延迟计算。
这意味着,即使命令行中显式指定了参数(例如 -p 8080),如果环境变量 MX_EXPORTER_PORT 包含无效值(例如 "abc" 或超出范围的端口),get_env_default 仍然会被调用并触发 ValueErrorargparse.ArgumentTypeError,从而导致程序在启动时直接崩溃。这违反了“显式配置优先级”的原则,并且使得程序在存在无关环境变量干扰时变得非常脆弱。

解决方案
建议将环境变量的默认值解析移到 parser.parse_args() 之后进行。如果命令行参数为 None,再从环境变量中获取,最后使用硬编码的默认值。这样可以确保:

  1. 命令行显式指定的参数始终具有最高优先级,且不会因为无效的环境变量而崩溃。
  2. 只有在实际需要使用环境变量时,才会对其进行校验,并且可以使用 parser.error() 优雅地输出错误信息,而不是抛出未捕获的 Traceback 崩溃。
parser.add_argument("-p", "--port", type=check_port, help="HTTP listen port")
    parser.add_argument("-i", "--interval", type=check_interval, help="Metrics gathering interval, unit:ms")
    parser.add_argument("-c", "--config-file", type=check_path, help="Path to metrics config file")
    parser.add_argument("-m", "--mode", type=int, choices=[0,1], default=1, help="Deprecated, keep for back compatibility")
    parser.add_argument("-lm", "--log-monitor", type=int, choices=[0,1], default=1, help="Deprecated, keep for back compatibility")
    parser.add_argument("-im", "--ib-monitor", type=int, choices=[0,1], help=argparse.SUPPRESS) # help="0/1 - Disable/Enable IB NIC counter monitoring"
    parser.add_argument("-mp", "--mount-point", type=check_path, help="Container mount point")
    parser.add_argument("-kp", "--kubelet-path", type=str, help="Kubelet root dir")
    parser.add_argument("-kd", "--k8s-domains", nargs='+', type=str, help="Monitoring the k8s domains contains specified keywords, multi-keywords e.g. -kd domain1 domain2")

    args = parser.parse_args()

    try:
        args.port = args.port if args.port is not None else get_env_default("MX_EXPORTER_PORT", 8000, check_port)
        args.interval = args.interval if args.interval is not None else get_env_default("MX_EXPORTER_INTERVAL_MS", 10000, check_interval)
        
        ib_monitor = args.ib_monitor if args.ib_monitor is not None else get_env_default("MX_EXPORTER_IB_MONITOR", 0, int)
        if ib_monitor not in [0, 1]:
            raise argparse.ArgumentTypeError("MX_EXPORTER_IB_MONITOR must be 0 or 1")
        args.ib_monitor = ib_monitor

        args.mount_point = args.mount_point if args.mount_point is not None else get_env_default("MX_EXPORTER_MOUNT_POINT", "/", check_path)
        args.kubelet_path = args.kubelet_path if args.kubelet_path is not None else get_env_default("MX_EXPORTER_KUBELET_PATH", "/var/lib/kubelet")
        args.k8s_domains = args.k8s_domains if args.k8s_domains is not None else get_env_domains("MX_EXPORTER_K8S_DOMAINS", ["metax-tech"])
    except (ValueError, argparse.ArgumentTypeError) as e:
        parser.error(f"Configuration error: {e}")

    print(args)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant