Skip to content

CASMNET-2390 - canu generate network config silently emits wrong-cabinet VLANs on sw-cdu ports when a Mountain/Olympus cabinet is in the CCJ but not in SLS#760

Merged
spillerc-hpe merged 3 commits into
mainfrom
CASMNET-2390
Jun 9, 2026

Conversation

@spillerc-hpe

@spillerc-hpe spillerc-hpe commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary and Scope

Fixes CASMNET-2390: when a Mountain/Olympus cabinet is present in the CCJ/SHCD but absent from SLS (Networks.NMN_MTN / HMN_MTN), canu generate switch config silently emitted CDU CMM/CEC port stanzas that trunked/accessed the wrong cabinet's VLANs — VLANs that the SLS-driven sections of the same template never defined on the switch.

Root cause: in canu/generate/switch/config/config.py::get_switch_nodes, the cmm/cec branches did not initialise nmn_mtn_vlan / hmn_mtn_vlan before the inner for cabinets in sls_variables[...] loops. When the CCJ cabinet had no SLS match, the locals retained the stale value from a prior outer-loop iteration (a different cabinet) and were silently emitted onto the new cabinet's ports. The node was appended unconditionally — there was no skip/warn/error guard. If the very first cmm/cec port hit had no SLS match, the same code path raised UnboundLocalError.

This PR:

  • Initialises nmn_mtn_vlan / hmn_mtn_vlan to None per port (kills both the stale-leak and the UnboundLocalError first-port case).
  • When no SLS cabinet matches the CCJ rack number, emits a red WARNING: Skipping … via click.secho naming the switch, port, lag, cabinet, and which SLS list is missing, then continues so the node is not appended. The switch will not silently get a misconfigured port stanza for an SLS-missing cabinet.
  • Adds a regression-guarding fixture pair and two new tests in tests/test_generate_switch_config_aruba_configs_csm_1_7.py (test_switch_config_cdu_primary_mtn_sls_mismatch and test_switch_config_cdu_secondary_mtn_sls_mismatch) modelled on the existing CDU tests, using --ccj (matching test_generate_switch_config_aruba_templates_csm_1_7.py).
  • Extends tests/scripts/regenerate_golden_configs_1.7.sh (and its README.md) with a Part 6/6 block that regenerates the new goldens; total bumped from 57 to 59.

Scope is limited to CSM 1.7 + Aruba CDU rendering, but the patched code path is shared across all CSM versions and vendors.

Out of scope — inverse scenario (cabinet in SLS but not in CCJ)

This PR does not change the inverse mismatch direction. The code already behaves correctly there: the outer port loop in get_switch_nodes only iterates CCJ-derived ports, so no CMM/CEC stanzas are rendered, and the SLS-driven VLAN/gateway build (around config.py:947) is gated by if sls_rack_int in destination_rack_list, which keeps SLS-only cabinets out of NMN_MTN_VLANS / HMN_MTN_VLANS. The cabinet is silently omitted in both halves of the render — the resulting config is valid, but the operator gets no signal that the SHCD/CCJ may be missing an intended cabinet. Adding a symmetric warning is a separate, lower-severity follow-up.

  • I have added new tests to cover the new code
  • If adding a new file, I have updated pyinstaller.py (no product files added; new files are tests/fixtures, which pyinstaller.py already excludes via excludes=["tests"])
  • I have added entries in CHANGELOG.md for the changes in this PR

Issues and Related PRs

  • Resolves: CASMNET-2390

Testing

Reproduction (pre-fix, on main):

Two runs against the same CCJ with two Mountain cabinets (x1000, x1001):

canu generate network config --csm 1.7 \
  --ccj hela-ccj.json \
  --sls-file sls_input_file.json --folder output         # x1000 + x1001 in SLS
canu generate network config --csm 1.7 \
  --ccj hela-ccj.json \
  --sls-file sls_input_file_single.json --folder output2 # x1000 only in SLS

In output2, x1001's VLAN definitions and interface vlan stanzas correctly disappear, but the CMM/CEC ports for x1001 still render — trunking vlan trunk native 2000 / allowed 2000,3000 (x1000's VLANs) and vlan access 3000 instead of x1001's 2001 / 2001,3001 / 3001. No warning or error is printed. Excerpt from diff output/sw-cdu-001.cfg output2/sw-cdu-001.cfg:

< interface lag 9 multi-chassis static
<     no shutdown
<     description cmm-x1001-000:1<==sw-cdu-001
<     no routing
<     vlan trunk native 2000           # ← should be 2001 (x1001's NMN VLAN)
<     vlan trunk allowed 2000,3000     # ← should be 2001,3001
<     spanning-tree root-guard< interface 1/1/46
<     no shutdown
<     description cec-x1001-000<==sw-cdu-001
<     vlan access 3000                 # ← should be 3001 (x1001's HMN VLAN)

Post-fix behaviour:

Same single-cabinet invocation now refuses to render the SLS-missing cabinet's ports and warns loudly per skipped port:

$ canu generate network config --csm 1.7 \
    --ccj hela-ccj.json \
    --sls-file sls_input_file_single.json --folder /tmp/output_pr
…
sw-leaf-002 Config Generated
sw-edge-001 Config Generated
sw-edge-002 Config Generated
WARNING: Skipping CMM port 9 (lag 9) on sw-cdu-002 for cabinet x1001: cabinet is in the CCJ/SHCD but missing from SLS (NMN_MTN_CABINETS, HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
WARNING: Skipping CMM port 10 (lag 10) on sw-cdu-002 for cabinet x1001: cabinet is in the CCJ/SHCD but missing from SLS (NMN_MTN_CABINETS, HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
WARNING: Skipping CMM port 11 (lag 11) on sw-cdu-002 for cabinet x1001: cabinet is in the CCJ/SHCD but missing from SLS (NMN_MTN_CABINETS, HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
WARNING: Skipping CMM port 12 (lag 12) on sw-cdu-002 for cabinet x1001: cabinet is in the CCJ/SHCD but missing from SLS (NMN_MTN_CABINETS, HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
WARNING: Skipping CMM port 13 (lag 13) on sw-cdu-002 for cabinet x1001: cabinet is in the CCJ/SHCD but missing from SLS (NMN_MTN_CABINETS, HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
WARNING: Skipping CMM port 14 (lag 14) on sw-cdu-002 for cabinet x1001: cabinet is in the CCJ/SHCD but missing from SLS (NMN_MTN_CABINETS, HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
WARNING: Skipping CMM port 15 (lag 15) on sw-cdu-002 for cabinet x1001: cabinet is in the CCJ/SHCD but missing from SLS (NMN_MTN_CABINETS, HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
WARNING: Skipping CMM port 16 (lag 16) on sw-cdu-002 for cabinet x1001: cabinet is in the CCJ/SHCD but missing from SLS (NMN_MTN_CABINETS, HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
WARNING: Skipping CEC port 46 on sw-cdu-002 for cabinet x1001: cabinet is in the CCJ/SHCD but not in SLS (HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
sw-cdu-002 Config Generated
WARNING: Skipping CMM port 9 (lag 9) on sw-cdu-001 for cabinet x1001: cabinet is in the CCJ/SHCD but missing from SLS (NMN_MTN_CABINETS, HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
WARNING: Skipping CMM port 10 (lag 10) on sw-cdu-001 for cabinet x1001: cabinet is in the CCJ/SHCD but missing from SLS (NMN_MTN_CABINETS, HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
WARNING: Skipping CMM port 11 (lag 11) on sw-cdu-001 for cabinet x1001: cabinet is in the CCJ/SHCD but missing from SLS (NMN_MTN_CABINETS, HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
WARNING: Skipping CMM port 12 (lag 12) on sw-cdu-001 for cabinet x1001: cabinet is in the CCJ/SHCD but missing from SLS (NMN_MTN_CABINETS, HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
WARNING: Skipping CMM port 13 (lag 13) on sw-cdu-001 for cabinet x1001: cabinet is in the CCJ/SHCD but missing from SLS (NMN_MTN_CABINETS, HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
WARNING: Skipping CMM port 14 (lag 14) on sw-cdu-001 for cabinet x1001: cabinet is in the CCJ/SHCD but missing from SLS (NMN_MTN_CABINETS, HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
WARNING: Skipping CMM port 15 (lag 15) on sw-cdu-001 for cabinet x1001: cabinet is in the CCJ/SHCD but missing from SLS (NMN_MTN_CABINETS, HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
WARNING: Skipping CMM port 16 (lag 16) on sw-cdu-001 for cabinet x1001: cabinet is in the CCJ/SHCD but missing from SLS (NMN_MTN_CABINETS, HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
WARNING: Skipping CEC port 46 on sw-cdu-001 for cabinet x1001: cabinet is in the CCJ/SHCD but not in SLS (HMN_MTN_CABINETS). The CCJ and SLS are out of sync; reconcile them and regenerate the config.
sw-cdu-001 Config Generated

The resulting /tmp/output_pr/sw-cdu-001.cfg and sw-cdu-002.cfg contain zero x1001 references; the only x1000-VLAN stanzas left are the legitimate x1000 CMM/CEC ports.

Other validation:

  1. The full-SLS invocation (--sls-file sls_input_file.json) produces a config byte-identical to the pre-fix good output — no regression on the happy path.
  2. make unit510 passed, 0 failed (was 508, +2 new). Both new tests fail when run against the pre-fix source (verified by temporarily reverting canu/generate/switch/config/config.py to main).
  3. tests/scripts/regenerate_golden_configs_1.7.sh runs end-to-end and produces 59 files; the only diffs in pre-existing goldens are CANU version-banner strings, which tests/lib/diff.py::diff_config_files already strips.

Each new test asserts:

  1. exit_code == 0
  2. Generated config matches the new golden (which was produced by the fixed code, so this asserts the fix is sticky).
  3. Expected WARNING: Skipping … text appears in result.output covering x1001, the switch name, and the missing SLS list(s) for both CMM and CEC ports.
  4. The rendered config contains no x1001 references at all (defence-in-depth).

spillerc-hpe and others added 2 commits June 9, 2026 11:30
When a Mountain/Olympus cabinet is present in the CCJ/SHCD but absent
from the SLS input (NMN_MTN_CABINETS / HMN_MTN_CABINETS), the cmm/cec
branches of get_switch_nodes silently used the VLAN values left over
from a previous port iteration (a different cabinet), producing CDU
switch port configs that trunk/access VLANs not defined on the switch
and not belonging to that cabinet. If the very first cmm/cec port hit
had no SLS match the same code raised UnboundLocalError instead.

Initialise nmn_mtn_vlan / hmn_mtn_vlan to None per port. When no SLS
cabinet matches the CCJ rack number, emit a red WARNING via click.secho
naming the switch, port, lag, cabinet, and which SLS list is missing,
and skip the node so the wrong VLANs are never rendered.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a Mountain/Olympus cabinet is present in the CCJ/SHCD but absent
from SLS (NMN_MTN_CABINETS / HMN_MTN_CABINETS), the CDU config renderer
must skip the cabinet's CMM/CEC ports and warn, instead of stamping the
previous cabinet's VLANs onto them.

Adds:
* tests/data/Full_Architecture_Mountain_2cab.json -- CCJ with two
  Mountain cabinets (x1000, x1001).
* tests/data/sls_input_file_csm_1.7_mtn_mismatch.json -- SLS containing
  only x1000.
* tests/data/golden_configs/mtn_sls_mismatch_1.7/sw-cdu-{001,002}.cfg --
  golden output produced by the fixed renderer (no x1001 stanzas).
* test_switch_config_cdu_primary_mtn_sls_mismatch and
  test_switch_config_cdu_secondary_mtn_sls_mismatch in
  tests/test_generate_switch_config_aruba_configs_csm_1_7.py. Each
  asserts exit_code == 0, golden diff, presence of the per-port
  WARNING text, and absence of any x1001 references in the rendered
  config. Both tests fail against the pre-fix code.

Also extends tests/scripts/regenerate_golden_configs_1.7.sh (and its
README) with a Part 6/6 block that regenerates the new goldens via
--ccj, bumping the total from 57 to 59.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spillerc-hpe spillerc-hpe requested a review from a team as a code owner June 9, 2026 11:40
@spillerc-hpe spillerc-hpe changed the title Casmnet 2390CASMNE CASMNET-2390 - canu generate network config silently emits wrong-cabinet VLANs on sw-cdu ports when a Mountain/Olympus cabinet is in the CCJ but not in SLS Jun 9, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@trad511 trad511 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if config.py has any more if/then statements added, then it might explode.

@trad511 trad511 self-requested a review June 9, 2026 19:03
@spillerc-hpe spillerc-hpe merged commit 53d40ab into main Jun 9, 2026
18 checks passed
@spillerc-hpe spillerc-hpe deleted the CASMNET-2390 branch June 9, 2026 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants