[EVPN-MH] Add kernel patches for EVPN VXLAN Multihoming support#540
[EVPN-MH] Add kernel patches for EVPN VXLAN Multihoming support#540bdfriedman wants to merge 1 commit intosonic-net:masterfrom
Conversation
Signed-off-by: Barry Friedman (friedman) <friedman@cisco.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@bdfriedman , are these PR upstreamed to linux kernel already? or in the process of upstream? |
banidoru
left a comment
There was a problem hiding this comment.
- Bug in
br.c(patch 2):br_switchdev_eventpassesfdb_info->locked(bool) as theprotocolparameter andRTPROT_UNSPECasext_flags— argument order is wrong after the signature change - Bug in
br_fdb_add(patch 2):protocolis initialized toRTPROT_UNSPECbut never parsed fromtb[NDA_PROTOCOL]— the add path silently ignores the user-supplied protocol, unlike the delete path which parses it correctly - Patch 1 (NDA_FLAGS_EXT) and Patch 3 (NTF_EXT_VALIDATED) look structurally sound
- Patches are not upstream-accepted — these appear to be custom SONiC patches carrying significant kernel neighbor/FDB subsystem changes
| } | ||
|
|
||
| - err = __br_fdb_delete(br, p, addr, vid); | ||
| + err = __br_fdb_delete(br, p, addr, vid, protocol); |
There was a problem hiding this comment.
Bug: Wrong argument order. After adding protocol to br_fdb_external_learn_add's signature (as the 5th positional param after vid), this call site passes fdb_info->locked as protocol and appends RTPROT_UNSPEC at the end as ext_flags.
Current: (br, p, addr, vid, fdb_info->locked, false, 0, RTPROT_UNSPEC)
Expected: (br, p, addr, vid, RTPROT_UNSPEC, fdb_info->locked, false, 0)
When fdb_info->locked is true, the entry will get protocol=1 instead of RTPROT_UNSPEC, and locked will silently be forced to false.
| - bool swdev_notify, u32 ext_flags) | ||
| + const unsigned char *addr, u16 vid, u8 protocol, | ||
| + bool locked, bool swdev_notify, u32 ext_flags) | ||
| { |
There was a problem hiding this comment.
Bug: protocol is never parsed from netlink in the add path. The variable is initialized to RTPROT_UNSPEC but there's no if (tb[NDA_PROTOCOL]) protocol = nla_get_u8(tb[NDA_PROTOCOL]); anywhere in br_fdb_add. Compare with br_fdb_delete which correctly parses it. This means bridge fdb add ... proto hw will silently ignore the protocol — it will always store RTPROT_UNSPEC.
banidoru
left a comment
There was a problem hiding this comment.
- Patch ordering concern: Patch 2 (protocol field) modifies
br_fdb_external_learn_addsignature from patch 1 but gets argument order wrong inbr.c(already noted by prior reviewer). - Missing NDA_PROTOCOL parse in br_fdb_add:
protocolis initialized toRTPROT_UNSPECbut never read fromtb[NDA_PROTOCOL], so bridge fdb add silently ignores user-specified protocol (already noted). - Internal callers can silently overwrite protocol:
vxlan_snoopandvxlan_fdb_external_learn_addpassRTPROT_UNSPEC— if the entry already has a real protocol set, it gets silently cleared on update. - Patches are well-structured overall: Clean kernel patch backports with proper commit messages and sign-offs. The NTF_EXT_VALIDATED and NTF_EXT_MH_PEER_SYNC patches look correct.
| int err; | ||
|
|
||
| if (!(ndm->ndm_state & (NUD_PERMANENT|NUD_REACHABLE))) { | ||
| @@ -1281,7 +1296,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[], |
There was a problem hiding this comment.
Protocol overwrite on internal update. When vxlan_fdb_update_existing is called by internal callers like vxlan_snoop or vxlan_fdb_external_learn_add with RTPROT_UNSPEC, this block will overwrite any previously-set protocol value back to 0. Consider guarding with if (protocol != RTPROT_UNSPEC && f->protocol != protocol) to avoid unintentional protocol clearing by internal callers that don't carry protocol context.
| if (!fdb || READ_ONCE(fdb->dst) != p) | ||
| return -ENOENT; | ||
|
|
||
| + /* If the delete comes from a different protocol type, |
There was a problem hiding this comment.
Non-extern_learn path doesn't propagate protocol. When __br_fdb_add takes the fdb_add_entry path (non-extern_learn), the protocol parameter is unused — fdb_add_entry never receives or sets it. This means bridge fdb add ... proto hw without extern_learn will silently store RTPROT_UNSPEC. Either fdb_add_entry should also accept and set protocol, or the kernel should reject protocol with non-extern_learn entries.
banidoru
left a comment
There was a problem hiding this comment.
- VXLAN FDB protocol field not emitted in netlink dumps:
vxlan_fdb_info()is not updated to includeNDA_PROTOCOLin the protocol patch, so VXLAN FDB dumps won't show the protocol even though it's stored internally. Bridge side (fdb_fill_info) does this correctly. - Existing review comments correctly identify: wrong argument order in
br_switchdev_eventcall, missingNDA_PROTOCOLparsing inbr_fdb_add, protocol not propagated throughfdb_add_entrypath, and protocol overwrite risk from internal VXLAN callers. - NTF_EXT_VALIDATED patch is well-designed — proper state validation, GC exemption, stale-instead-of-failed fallback, and carrier-down protection.
- NTF_EXT_MH_PEER_SYNC / ext_flags plumbing through VXLAN and bridge layers looks correct.
- All three patches use
0001-prefix — consider renaming to0001-/0002-/0003-for clarity in the series.
| 6 files changed, 85 insertions(+), 41 deletions(-) | ||
|
|
||
| diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c | ||
| index de1b3fa96..c34b9f75c 100644 |
There was a problem hiding this comment.
Missing: vxlan_fdb_info() does not emit NDA_PROTOCOL. The protocol field is stored in vxlan_fdb and parsed/updated correctly, but vxlan_fdb_info() (the function that builds netlink messages for VXLAN FDB dumps) is never updated to include nla_put_u8(skb, NDA_PROTOCOL, fdb->protocol). This means bridge fdb show for VXLAN devices won't display the protocol, even though it's correctly tracked internally.
Compare with the bridge side where fdb_fill_info() correctly adds NDA_PROTOCOL. The same needs to happen in vxlan_fdb_info() — likely right after the existing NDA_VNI put.
Why I did it
This PR adds three critical Linux kernel patches required to enable EVPN VXLAN Multihoming in SONiC. These kernel enhancements provide the necessary infrastructure for:
These patches are essential for implementing the EVPN-MH feature as described in the EVPN VXLAN Multihoming HLD.
Work item tracking
How I did it
Added three kernel patches to patches-sonic directory:
1. NDA_FLAGS_EXT Support with NTF_EXT_MH_PEER_SYNC (0001-vxlan-bridge-Add-NDA_FLAGS_EXT-support-with-NTF_EXT_.patch)
This patch adds extended flags support for VXLAN and bridge FDB entries to enable multi-homing peer synchronization:
ext_flagsinvxlan_fdbstructureNTF_EXT_MH_PEER_SYNC- Indicates FDB entry is synchronized across EVPN-MH peersNEIGH_UPDATE_F_EXT_MH_PEER_SYNCfor propagating sync statevxlan_fdb_alloc()- Initialize ext_flagsvxlan_fdb_create()- Pass ext_flags parametervxlan_fdb_update_existing()- Handle ext_flags updates and notificationsvxlan_fdb_update_create()- Create FDB with ext_flagsvxlan_fdb_info()- Include NDA_FLAGS_EXT in netlink messagesFiles modified:
drivers/net/vxlan/vxlan_core.c(140 lines)drivers/net/vxlan/vxlan_private.h(21 lines)drivers/net/vxlan/vxlan_vnifilter.c(11 lines)include/net/neighbour.h(4 lines)include/uapi/linux/neighbour.h(1 line)net/bridge/br.c(4 lines)net/bridge/br_fdb.c(35 lines)net/bridge/br_private.h(5 lines)net/core/neighbour.c(13 lines)2. Protocol Field in Bridge FDB (0001-net-bridge-vxlan-Protocol-field-in-bridge-fdb.patch)
This patch introduces an optional "protocol" field for bridge FDB entries to distinguish between control plane and data plane learned MAC addresses:
Purpose: In EVPN Multihoming, MAC addresses can be learned via:
This distinction enables:
Implementation:
protocolinnet_bridge_fdb_entryandvxlan_fdbstructuresRTPROT_UNSPECwhen protocol not specified (backward compatible)Usage Example:
Files modified:
drivers/net/vxlan/vxlan_core.c(55 lines)drivers/net/vxlan/vxlan_private.h(5 lines)drivers/net/vxlan/vxlan_vnifilter.c(4 lines)net/bridge/br.c(2 lines)net/bridge/br_fdb.c(55 lines)net/bridge/br_private.h(5 lines)3. NTF_EXT_VALIDATED Flag for External Validation (0001-neighbor-Add-NTF_EXT_VALIDATED-flag-for-externally-v.patch)
This patch adds a new "extern_valid" neighbor flag to indicate entries learned and validated externally that should not be invalidated by the kernel:
Background: In EVPN multi-homing:
Solution (based on draft-rbickhart-evpn-ip-mac-proxy-adv-03):
Implementation:
NTF_EXT_VALIDATED(extern_valid) - Entry is externally validatedUse case: Required for EVPN-MH proxy advertisements where control plane needs full control over neighbor entry validity and removal decisions.
Files modified:
How to verify it
Build kernel with these patches applied:
cd sonic-linux-kernel make BLDENV=bookwormVerify NDA_FLAGS_EXT support:
Verify protocol field support:
Verify extern_valid flag:
Integration testing with EVPN-MH:
Compatibility testing:
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Add kernel patches for EVPN VXLAN Multihoming: extended FDB flags (NTF_EXT_MH_PEER_SYNC), protocol field for bridge FDB entries, and extern_valid flag for externally validated neighbor entries
Link to config_db schema for YANG model changes
N/A - This PR only adds kernel patches, no CONFIG_DB schema changes
Depends on
Related upstream work
Summary:
Critical for EVPN-MH:
✅ Peer synchronization flag (NTF_EXT_MH_PEER_SYNC)
✅ Control/data plane MAC distinction (protocol field)
✅ External neighbor validation (extern_valid flag)
✅ Proxy advertisement support
✅ Prevents intermittent EVPN-MH failures