Configure maintenance operation concurrency#108
Conversation
Greptile SummaryThis PR introduces a top-level
Confidence Score: 5/5Safe to merge. All changed template paths are covered by integration tests against real Helm charts and across the four relevant release versions. The normalization and validation logic is straightforward and idempotent, the new IntOrPercent type has a thorough YAML round-trip test, and the release-gating logic for requestor mode is exercised across every profile and release combination in the matrix test. No incorrect data flow or broken contract was found in the changed paths. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[LoadFullConfig / DefaultLaunchKitConfig] --> B[NormalizeMaintenance\nfill defaults + validate]
C[ValidateClusterConfig] --> B
D[GenerateProfileDeploymentFiles] --> B
B --> E{selectedRelease >= 26.1?}
E -- yes --> F[maintenanceOperator.enabled: true\noperator.maintenanceOperator.useRequestor\nuseDrainControllerRequestor\nsriov-network-operator.operator.externalDrainer]
E -- no --> G[legacy path]
F --> H[maintenance-operator-chart.operatorConfig\nmaxParallelOperations / maxUnavailable\nmaxNodeMaintenanceTimeSeconds]
G --> H
G --> I[35-sriovnetworkpoolconfig.yaml\nSriovNetworkPoolConfig.spec.maxUnavailable]
G --> J[10-nicclusterpolicy / 11-nicnodepolicy\nmaxParallelUpgrades OFED]
F --> J2[10-nicclusterpolicy / 11-nicnodepolicy\nmaxParallelUpgrades rendered but no effect]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[LoadFullConfig / DefaultLaunchKitConfig] --> B[NormalizeMaintenance\nfill defaults + validate]
C[ValidateClusterConfig] --> B
D[GenerateProfileDeploymentFiles] --> B
B --> E{selectedRelease >= 26.1?}
E -- yes --> F[maintenanceOperator.enabled: true\noperator.maintenanceOperator.useRequestor\nuseDrainControllerRequestor\nsriov-network-operator.operator.externalDrainer]
E -- no --> G[legacy path]
F --> H[maintenance-operator-chart.operatorConfig\nmaxParallelOperations / maxUnavailable\nmaxNodeMaintenanceTimeSeconds]
G --> H
G --> I[35-sriovnetworkpoolconfig.yaml\nSriovNetworkPoolConfig.spec.maxUnavailable]
G --> J[10-nicclusterpolicy / 11-nicnodepolicy\nmaxParallelUpgrades OFED]
F --> J2[10-nicclusterpolicy / 11-nicnodepolicy\nmaxParallelUpgrades rendered but no effect]
Reviews (2): Last reviewed commit: "Configure maintenance operation concurre..." | Re-trigger Greptile |
| deploy: true | ||
| maxParallelOperations: "{{ .Maintenance.MaxParallelOperations.String }}" | ||
| maxUnavailable: "{{ .Maintenance.MaxUnavailable.String }}" | ||
| maxNodeMaintenanceTimeSeconds: "{{ .Maintenance.MaxNodeMaintenanceTimeSeconds }}" |
There was a problem hiding this comment.
maxNodeMaintenanceTimeSeconds is always a plain integer, yet it is wrapped in double-quotes like the IntOrPercent fields above it, making it a YAML string in the rendered values. The other two fields legitimately need string quoting because they can be percentage values (e.g. "25%"), but this field cannot. If the Maintenance Operator chart has a JSON schema that validates this key as an integer, Helm will reject the string value. Dropping the quotes produces the integer scalar the chart expects, and the template auto-dereferences the *int32 pointer correctly. The same change applies to all seven profile 00-values.yaml files (ipoib-rdma-shared, macvlan-rdma-shared, sriov-ethernet-rdma, sriov-ib-rdma, spectrum-x, spectrum-x-ra2.1).
| maxNodeMaintenanceTimeSeconds: "{{ .Maintenance.MaxNodeMaintenanceTimeSeconds }}" | |
| maxNodeMaintenanceTimeSeconds: {{ .Maintenance.MaxNodeMaintenanceTimeSeconds }} |
Expose Maintenance Operator, SR-IOV drain, and legacy OFED concurrency through one launch-kit config section. Enable requestor mode for OFED and SR-IOV flows starting with Network Operator 26.1. Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
1bd02c8 to
f485f03
Compare
|
Addressed both review findings in amended commit
Verification remains green: |
Summary
maintenanceconfig section with validated defaults of four concurrent nodesValidation
GOCACHE=/private/tmp/l8k-go-cache go test ./... -count=1 -skip "TestGetPresetsDir_(NotFound|SkipsFiles)"GOCACHE=/private/tmp/l8k-go-cache make buildhelm templatepassed with exact Network Operator v25.10.0 and v26.1.1 charts and the local 26.4 chartThe two skipped preset tests depend on
/usr/local/share/l8k/presetsbeing absent; they fail identically on unmodified main in this development environment.