Skip to content

[EVPN-MH] Add kernel patches for EVPN VXLAN Multihoming support#540

Open
bdfriedman wants to merge 1 commit intosonic-net:masterfrom
bdfriedman:evpn_mh
Open

[EVPN-MH] Add kernel patches for EVPN VXLAN Multihoming support#540
bdfriedman wants to merge 1 commit intosonic-net:masterfrom
bdfriedman:evpn_mh

Conversation

@bdfriedman
Copy link
Copy Markdown

@bdfriedman bdfriedman commented Feb 25, 2026

Why I did it

This PR adds three critical Linux kernel patches required to enable EVPN VXLAN Multihoming in SONiC. These kernel enhancements provide the necessary infrastructure for:

  1. Extended neighbor flags for multi-homing peer synchronization
  2. Protocol field tracking in bridge FDB entries to distinguish control plane vs data plane learned MACs
  3. External validation flag to prevent kernel from invalidating externally managed neighbor entries

These patches are essential for implementing the EVPN-MH feature as described in the EVPN VXLAN Multihoming HLD.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Added three kernel patches to patches-sonic directory:

1. NDA_FLAGS_EXT Support with NTF_EXT_MH_PEER_SYNC (0001-vxlan-bridge-Add-NDA_FLAGS_EXT-support-with-NTF_EXT_.patch)

This patch adds extended flags support for VXLAN and bridge FDB entries to enable multi-homing peer synchronization:

  • New field: ext_flags in vxlan_fdb structure
  • New flag: NTF_EXT_MH_PEER_SYNC - Indicates FDB entry is synchronized across EVPN-MH peers
  • New neighbor update flag: NEIGH_UPDATE_F_EXT_MH_PEER_SYNC for propagating sync state
  • Modified functions:
    • vxlan_fdb_alloc() - Initialize ext_flags
    • vxlan_fdb_create() - Pass ext_flags parameter
    • vxlan_fdb_update_existing() - Handle ext_flags updates and notifications
    • vxlan_fdb_update_create() - Create FDB with ext_flags
    • vxlan_fdb_info() - Include NDA_FLAGS_EXT in netlink messages
    • Bridge FDB functions - Propagate ext_flags through bridge layer

Files modified:

  • drivers/net/vxlan/vxlan_core.c (140 lines)
  • drivers/net/vxlan/vxlan_private.h (21 lines)
  • drivers/net/vxlan/vxlan_vnifilter.c (11 lines)
  • include/net/neighbour.h (4 lines)
  • include/uapi/linux/neighbour.h (1 line)
  • net/bridge/br.c (4 lines)
  • net/bridge/br_fdb.c (35 lines)
  • net/bridge/br_private.h (5 lines)
  • net/core/neighbour.c (13 lines)

2. Protocol Field in Bridge FDB (0001-net-bridge-vxlan-Protocol-field-in-bridge-fdb.patch)

This patch introduces an optional "protocol" field for bridge FDB entries to distinguish between control plane and data plane learned MAC addresses:

Purpose: In EVPN Multihoming, MAC addresses can be learned via:

  • Control plane (ZEBRA protocol): Static MACs distributed by FRR/BGP
  • Data plane (HW protocol): Dynamic MACs learned from traffic with aging enabled

This distinction enables:

  • Proper state machine management during MAC transitions
  • Handling traffic hashing between EVPN-MH peers
  • Managing MAC mobility across EVPN peers
  • Synchronization between control and data planes

Implementation:

  • New field: protocol in net_bridge_fdb_entry and vxlan_fdb structures
  • Protocol values: Uses standard routing protocol values (RTPROT_UNSPEC, RTPROT_ZEBRA, RTPROT_KERNEL, etc.)
  • Default: RTPROT_UNSPEC when protocol not specified (backward compatible)
  • NDA_PROTOCOL attribute: Encoded in netlink messages for FDB entries

Usage Example:

# Add MAC with hardware protocol (data plane learned)
bridge fdb add 00:00:00:00:00:88 dev hostbond2 vlan 1000 master dynamic extern_learn proto hw

# Display with protocol field
bridge -d fdb show dev hostbond2
# Output: 00:00:00:00:00:88 vlan 1000 extern_learn master br1000 proto hw

# Transition to zebra (control plane)
bridge fdb replace 00:00:00:00:00:88 dev hostbond2 vlan 1000 master dynamic extern_learn proto zebra

Files modified:

  • drivers/net/vxlan/vxlan_core.c (55 lines)
  • drivers/net/vxlan/vxlan_private.h (5 lines)
  • drivers/net/vxlan/vxlan_vnifilter.c (4 lines)
  • net/bridge/br.c (2 lines)
  • net/bridge/br_fdb.c (55 lines)
  • net/bridge/br_private.h (5 lines)

3. NTF_EXT_VALIDATED Flag for External Validation (0001-neighbor-Add-NTF_EXT_VALIDATED-flag-for-externally-v.patch)

This patch adds a new "extern_valid" neighbor flag to indicate entries learned and validated externally that should not be invalidated by the kernel:

Background: In EVPN multi-homing:

  • Each host is multi-homed via Ethernet Segment (ES/LAG) to multiple VTEPs
  • Neighbor entries are distributed to ES peers using EVPN MAC/IP advertisement routes
  • When an ES link goes down, EVPN routes are withdrawn, causing intermittent failures

Solution (based on draft-rbickhart-evpn-ip-mac-proxy-adv-03):

  • ES peers install neighbor entries and inject proxy EVPN MAC/IP advertisements
  • When ES link goes down, ES peers start aging timers instead of immediately withdrawing
  • If an ES peer locally learns the entry (becomes "reachable"), it restarts timer and removes proxy indication
  • Prevents intermittent routing failures during ES link transitions

Implementation:

  • New flag: NTF_EXT_VALIDATED (extern_valid) - Entry is externally validated
  • Behavior:
    • Kernel will NOT remove or invalidate the entry
    • Kernel can probe the entry and notify user space when it becomes "reachable"
    • If no confirmation received, kernel returns entry to "stale" state (NOT "failed" state)
    • Control plane (FRR) manages entry lifecycle
  • Initial state: "stale" when installed by control plane
  • State transitions: Kernel notifies control plane when entry becomes "reachable"

Use case: Required for EVPN-MH proxy advertisements where control plane needs full control over neighbor entry validity and removal decisions.

Files modified:

  • Neighbor subsystem for external validation support
  • Netlink attributes for extern_valid flag
  • State machine modifications

How to verify it

  1. Build kernel with these patches applied:

    cd sonic-linux-kernel
    make BLDENV=bookworm
  2. Verify NDA_FLAGS_EXT support:

    # Add FDB entry with extended flags
    bridge fdb add <mac> dev <vxlan-dev> dst <vtep-ip> vni <vni> extern_learn
    
    # Verify in kernel via netlink dump
    bridge -d fdb show | grep <mac>
  3. Verify protocol field support:

    # Add MAC with specific protocol
    bridge fdb add <mac> dev <device> vlan <vid> master dynamic extern_learn proto hw
    
    # Verify protocol shows up
    bridge -d fdb show dev <device> | grep <mac>
    # Expected output includes: proto hw
    
    # Transition protocol
    bridge fdb replace <mac> dev <device> vlan <vid> master dynamic extern_learn proto zebra
    
    # Verify protocol changed
    bridge -d fdb show dev <device> | grep <mac>
    # Expected output includes: proto zebra
  4. Verify extern_valid flag:

    # Add neighbor with extern_valid flag (via FRR/control plane)
    # Entry should remain in "stale" state and not be removed by kernel GC
    
    # Monitor neighbor state transitions
    ip -d neigh show
  5. Integration testing with EVPN-MH:

    • Configure EVPN multi-homing with ES peers
    • Verify MAC/neighbor synchronization across peers
    • Test ES link failure scenarios
    • Verify proxy advertisements and aging behavior
    • Confirm no intermittent routing/ARP failures during transitions
  6. Compatibility testing:

    • Verify existing bridge/VXLAN functionality still works
    • Test backward compatibility (entries without new fields/flags)
    • Confirm no regressions in non-EVPN scenarios

Which release branch to backport (provide reason below if selected)

  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Tested branch (Please provide the tested image version)

Description for the changelog

Add kernel patches for EVPN VXLAN Multihoming: extended FDB flags (NTF_EXT_MH_PEER_SYNC), protocol field for bridge FDB entries, and extern_valid flag for externally validated neighbor entries

Link to config_db schema for YANG model changes

N/A - This PR only adds kernel patches, no CONFIG_DB schema changes

Depends on

Related upstream work

  • EVPN MAC/IP proxy advertisement draft: draft-rbickhart-evpn-ip-mac-proxy-adv-03
  • Kernel patch for protocol field: Authored by Mrinmoy Ghosh mrghosh@cisco.com

Summary:

  • Total patches: 3
  • Total lines added: +1,558
  • Kernel subsystems modified: VXLAN driver, bridge FDB, neighbor subsystem, netlink attributes
  • Backward compatible: Yes - all new fields/flags are optional with sensible defaults

Critical for EVPN-MH:
✅ Peer synchronization flag (NTF_EXT_MH_PEER_SYNC)
✅ Control/data plane MAC distinction (protocol field)
✅ External neighbor validation (extern_valid flag)
✅ Proxy advertisement support
✅ Prevents intermittent EVPN-MH failures

Signed-off-by: Barry Friedman (friedman) <friedman@cisco.com>
@bdfriedman bdfriedman requested a review from a team as a code owner February 25, 2026 22:19
@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Mar 11, 2026

@bdfriedman , are these PR upstreamed to linux kernel already? or in the process of upstream?

Copy link
Copy Markdown

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Bug in br.c (patch 2): br_switchdev_event passes fdb_info->locked (bool) as the protocol parameter and RTPROT_UNSPEC as ext_flags — argument order is wrong after the signature change
  • Bug in br_fdb_add (patch 2): protocol is initialized to RTPROT_UNSPEC but never parsed from tb[NDA_PROTOCOL] — the add path silently ignores the user-supplied protocol, unlike the delete path which parses it correctly
  • Patch 1 (NDA_FLAGS_EXT) and Patch 3 (NTF_EXT_VALIDATED) look structurally sound
  • Patches are not upstream-accepted — these appear to be custom SONiC patches carrying significant kernel neighbor/FDB subsystem changes

}

- err = __br_fdb_delete(br, p, addr, vid);
+ err = __br_fdb_delete(br, p, addr, vid, protocol);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Wrong argument order. After adding protocol to br_fdb_external_learn_add's signature (as the 5th positional param after vid), this call site passes fdb_info->locked as protocol and appends RTPROT_UNSPEC at the end as ext_flags.

Current: (br, p, addr, vid, fdb_info->locked, false, 0, RTPROT_UNSPEC)
Expected: (br, p, addr, vid, RTPROT_UNSPEC, fdb_info->locked, false, 0)

When fdb_info->locked is true, the entry will get protocol=1 instead of RTPROT_UNSPEC, and locked will silently be forced to false.

- bool swdev_notify, u32 ext_flags)
+ const unsigned char *addr, u16 vid, u8 protocol,
+ bool locked, bool swdev_notify, u32 ext_flags)
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: protocol is never parsed from netlink in the add path. The variable is initialized to RTPROT_UNSPEC but there's no if (tb[NDA_PROTOCOL]) protocol = nla_get_u8(tb[NDA_PROTOCOL]); anywhere in br_fdb_add. Compare with br_fdb_delete which correctly parses it. This means bridge fdb add ... proto hw will silently ignore the protocol — it will always store RTPROT_UNSPEC.

Copy link
Copy Markdown

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Patch ordering concern: Patch 2 (protocol field) modifies br_fdb_external_learn_add signature from patch 1 but gets argument order wrong in br.c (already noted by prior reviewer).
  • Missing NDA_PROTOCOL parse in br_fdb_add: protocol is initialized to RTPROT_UNSPEC but never read from tb[NDA_PROTOCOL], so bridge fdb add silently ignores user-specified protocol (already noted).
  • Internal callers can silently overwrite protocol: vxlan_snoop and vxlan_fdb_external_learn_add pass RTPROT_UNSPEC — if the entry already has a real protocol set, it gets silently cleared on update.
  • Patches are well-structured overall: Clean kernel patch backports with proper commit messages and sign-offs. The NTF_EXT_VALIDATED and NTF_EXT_MH_PEER_SYNC patches look correct.

int err;

if (!(ndm->ndm_state & (NUD_PERMANENT|NUD_REACHABLE))) {
@@ -1281,7 +1296,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protocol overwrite on internal update. When vxlan_fdb_update_existing is called by internal callers like vxlan_snoop or vxlan_fdb_external_learn_add with RTPROT_UNSPEC, this block will overwrite any previously-set protocol value back to 0. Consider guarding with if (protocol != RTPROT_UNSPEC && f->protocol != protocol) to avoid unintentional protocol clearing by internal callers that don't carry protocol context.

if (!fdb || READ_ONCE(fdb->dst) != p)
return -ENOENT;

+ /* If the delete comes from a different protocol type,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-extern_learn path doesn't propagate protocol. When __br_fdb_add takes the fdb_add_entry path (non-extern_learn), the protocol parameter is unused — fdb_add_entry never receives or sets it. This means bridge fdb add ... proto hw without extern_learn will silently store RTPROT_UNSPEC. Either fdb_add_entry should also accept and set protocol, or the kernel should reject protocol with non-extern_learn entries.

Copy link
Copy Markdown

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • VXLAN FDB protocol field not emitted in netlink dumps: vxlan_fdb_info() is not updated to include NDA_PROTOCOL in the protocol patch, so VXLAN FDB dumps won't show the protocol even though it's stored internally. Bridge side (fdb_fill_info) does this correctly.
  • Existing review comments correctly identify: wrong argument order in br_switchdev_event call, missing NDA_PROTOCOL parsing in br_fdb_add, protocol not propagated through fdb_add_entry path, and protocol overwrite risk from internal VXLAN callers.
  • NTF_EXT_VALIDATED patch is well-designed — proper state validation, GC exemption, stale-instead-of-failed fallback, and carrier-down protection.
  • NTF_EXT_MH_PEER_SYNC / ext_flags plumbing through VXLAN and bridge layers looks correct.
  • All three patches use 0001- prefix — consider renaming to 0001-/0002-/0003- for clarity in the series.

6 files changed, 85 insertions(+), 41 deletions(-)

diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index de1b3fa96..c34b9f75c 100644
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing: vxlan_fdb_info() does not emit NDA_PROTOCOL. The protocol field is stored in vxlan_fdb and parsed/updated correctly, but vxlan_fdb_info() (the function that builds netlink messages for VXLAN FDB dumps) is never updated to include nla_put_u8(skb, NDA_PROTOCOL, fdb->protocol). This means bridge fdb show for VXLAN devices won't display the protocol, even though it's correctly tracked internally.

Compare with the bridge side where fdb_fill_info() correctly adds NDA_PROTOCOL. The same needs to happen in vxlan_fdb_info() — likely right after the existing NDA_VNI put.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants