fix: NCO-coexistence, ns-override, FW-storage gate, GB300 preset#105
Conversation
Greptile SummaryThis PR bundles several targeted fixes for GB300-NVL bring-up with a co-existing NIC Configuration Operator (NICOP) install, plus the corresponding GB300 preset. The changes are well-scoped, well-commented, and the PR description provides explicit migration notes for the known limitation around pre-existing
Confidence Score: 5/5Safe to merge; all changes address concrete, real-world failures observed on GB300-NVL hardware with a co-existing NICOP install, and the logic is consistent across the affected code paths. Each fix is narrowly scoped and directly traceable to an observed failure mode. The cluster-wide NicDevice listing correctly handles the coexistence case. The namespace override threading is applied only after the config is confirmed non-nil, avoiding any panic path. The UpdateFW gate and NNP suppression are additive conditions that default to the safe/no-op state. No regressions are introduced on the existing code paths. pkg/networkoperatorplugin/networkoperator.go — the acknowledged zero-value migration issue for pre-existing docaDriver blocks is worth tracking in a follow-up PR (pointer-bool schema change). Important Files Changed
Reviews (2): Last reviewed commit: "fix: NCO-coexistence, ns-override, FW-st..." | Re-trigger Greptile |
* discover: list NicDevice CRs cluster-wide so an existing NIC Configuration Operator install in another namespace doesn't make discovery time out. The daemon's controller queries CRs cluster-wide by node name, so when a pre-existing install owns the CRs, the daemon writes status into them in their original namespace rather than minting fresh ones under the bootstrap namespace; the in-process bootstrap-namespace List filter has to follow. * discover: classify traffic via NicDevice.Status.DPU / Status.SuperNIC (operator-authoritative, INTERNAL_CPU_MODEL via mlxconfig) instead of the chip-ID heuristic, which mis-flagged BF3 DPU SKUs whose part numbers aren't in ns-product-ids as east-west SuperNICs. The ns-product-ids table stays as a positive-only secondary signal. * deploy / validate: honor --network-operator-namespace in the standalone subcommands. Previously the flag was only registered on generate/root, so a fresh `l8k deploy --network-operator-namespace foo` had no effect — the standalone path went through loadUserConfig with empty options, never reaching ApplyOptionsToConfig. * profiles: gate `nicFirmwareStorage` behind a new NicConfigurationOperator.UpdateFW field. Adds the gated block to the five non-Spectrum-X NCPs and widens the parent nicConfigurationOperator gate to (DeployNicInterfaceNameTemplate OR UpdateFW), so the storage PVC stays reachable when the user disables the name template but enables firmware update. Default UpdateFW=false → no PVC dependency for the common case. * profiles: suppress empty NicNodePolicy when DOCADriver.Enable is false on the sriov-ethernet-rdma / sriov-ib-rdma profiles. Their only NNP body section is ofedDriver, so disabling OFED leaves an NNP carrying only metadata + nodeSelector — a no-op CR. Tightened outer gate to (versionGE 26.4 AND DOCADriver.Enable); the existing empty-render guard in templates.go drops the file. * discover: default UnloadStorageModules + UnloadThirdPartyRDMAModules to true; drop the OFED-module probe block. The probe was OOM-prone (SIGKILL/137 on large nodes) and the auto-enable side effect is now redundant with default-on. SkipPreflightChecks defaulted to explicit false (matching the existing intent: surface hardware-incompat early). * presets: add GB300-NVL-NVIDIA-GB300 preset, derived from a real GB300 topology-collector report. NODE-fallback classification (no PIX between any GPU and east-west NIC), 4 east-west CX-8 PFs + 2 north-south BF3 ports. * presets: add a repo-root `presets` symlink to pkg/presets/data so the dev install path (`l8k preset list` lookup chain) picks the in-repo data up without a manual copy. Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
9465b8e to
12a99c0
Compare
|
Re: Greptile P2 on Picked Greptile's softer remedy (document the limitation + migration path) rather than the schema change (
Force-pushed |
Summary
Group of fixes that surfaced while bringing l8k up on a GB300-NVL setup with a pre-existing NIC Configuration Operator install, plus the corresponding GB300 preset.
NicDevicelist. The daemon's controller queries CRs cluster-wide by node name, so when an existing NICOP install owns the CRs in another namespace, the daemon writes status into those CRs in their original namespace — restricting our List to the bootstrap namespace caused discovery to time out. Drops theclient.InNamespacefilter fromDiscoverClusterConfig+waitNicDevicesDiscovered.NicDevice.Status.DPU/.SuperNIC. The operator already readsINTERNAL_CPU_MODELvia mlxconfig (EMBEDDED_CPU(1)⇒ DPU) — use it. Replaces the chip-ID heuristic, which mis-flagged BF3 DPU SKUs whose part numbers weren't inns-product-idsas east-west SuperNICs.ns-product-idsstays as a positive-only secondary signal.--network-operator-namespacehonored standalone. The flag was only wired ongenerate/root; the standalone subcommands went throughloadUserConfig(options.Options{})and ignored the override. Threaded throughloadUserConfig(mirroringApplyOptionsToConfig's same branch) + registered on both subcommands.nicFirmwareStoragebehind a newNicConfigurationOperator.UpdateFW. Spectrum-X RA2.2 was rendering the PVC block unconditionally; the others didn't have it at all. Added the gated block to the five non-Spectrum-X NCPs and widened the parentnicConfigurationOperatorgate to(DeployNicInterfaceNameTemplate OR UpdateFW). DefaultUpdateFW=false→ no PVC/StorageClass dependency for the common case.NicNodePolicyon SR-IOV profiles when OFED is disabled.sriov-ethernet-rdma/sriov-ib-rdmahave onlyofedDriverin the NNP body (the SR-IOV device plugin policy is a separate CRD); disabling OFED left a no-op NNP carrying just metadata + nodeSelector. Outer gate now(versionGE 26.4 AND DOCADriver.Enable); the renderer's empty-output guard drops the file.UnloadStorageModulesandUnloadThirdPartyRDMAModulesnow default totrueat construction; the in-process probe (the holder-graph BFS) is removed because it was OOM-prone (SIGKILL/137 on large nodes) and the auto-enable side effect is redundant with default-on.SkipPreflightChecksdefaulted to explicitfalse.topology-collectorreport. NODE-fallback classification (no PIX between any GPU and any east-west NIC on this chassis). 4 east-west CX-8 PFs (one per NIC, master function — matchesl8k discover --collapse-nic-railsdefault) + 2 north-south BF3 ports.presets→pkg/presets/dataat repo root so the dev install path picks the in-repo data up without manual copies.Test plan
go build ./...cleango test ./pkg/networkoperatorplugin/... ./pkg/cmd/... ./pkg/config/...greengo test ./pkg/presets/...— only the known pre-existing environmental failures (TestGetPresetsDir_NotFound/TestGetPresetsDir_SkipsFiles, both fail when/usr/local/share/l8k/presets/exists from a priormake install— unrelated)l8k discoveragainst a cluster with a pre-existing NICOP install in another namespace returns the existingNicDeviceCRs instead of timing outl8k deploy --network-operator-namespace fooinstalls Helm intofool8k generatewithnicConfigurationOperator.updateFW: false(default) → nonicFirmwareStoragein the rendered NCP, no PVC creation at deploy timel8k generateforsriov-ethernet-rdmawithdocaDriver.enable: false→ no NNP file in the rendered manifestsl8k discoveron a GB300-NVL produces acluster-config.yamlmatching the new preset; preset-validation surfaces zero deviations