[LTS 9.4] CVE-2025-39697, CVE-2025-38248#1210
Open
pvts-mat wants to merge 6 commits intoctrliq:ciqlts9_4from
Open
[LTS 9.4] CVE-2025-39697, CVE-2025-38248#1210pvts-mat wants to merge 6 commits intoctrliq:ciqlts9_4from
pvts-mat wants to merge 6 commits intoctrliq:ciqlts9_4from
Conversation
…functions jira VULN-72330 cve-pre CVE-2025-38248 commit-author Yong Wang <yongwang@nvidia.com> commit 4b30ae9 When a bridge port STP state is changed from BLOCKING/DISABLED to FORWARDING, the port's igmp query timer will NOT re-arm itself if the bridge has been configured as per-VLAN multicast snooping. Solve this by choosing the correct multicast context(s) to enable/disable port multicast based on whether per-VLAN multicast snooping is enabled or not, i.e. using per-{port, VLAN} context in case of per-VLAN multicast snooping by re-implementing br_multicast_enable_port() and br_multicast_disable_port() functions. Before the patch, the IGMP query does not happen in the last step of the following test sequence, i.e. no growth for tx counter: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # bridge link set dev swp1 state 0 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge link set dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 After the patch, the IGMP query happens in the last step of the test: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # bridge link set dev swp1 state 0 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge link set dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 3 Signed-off-by: Yong Wang <yongwang@nvidia.com> Reviewed-by: Andy Roulin <aroulin@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 4b30ae9) Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>
jira VULN-72330 cve-pre CVE-2025-38248 commit-author Yong Wang <yongwang@nvidia.com> commit 6c13104 When the vlan STP state is changed, which could be manipulated by "bridge vlan" commands, similar to port STP state, this also impacts multicast behaviors such as igmp query. In the scenario of per-VLAN snooping, there's a need to update the corresponding multicast context to re-arm the port query timer when vlan state becomes "forwarding" etc. Update br_vlan_set_state() function to enable vlan multicast context in such scenario. Before the patch, the IGMP query does not happen in the last step of the following test sequence, i.e. no growth for tx counter: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # sleep 1 # bridge vlan set vid 1 dev swp1 state 4 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge vlan set vid 1 dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 After the patch, the IGMP query happens in the last step of the test: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # sleep 1 # bridge vlan set vid 1 dev swp1 state 4 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge vlan set vid 1 dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 3 Signed-off-by: Yong Wang <yongwang@nvidia.com> Reviewed-by: Andy Roulin <aroulin@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 6c13104) Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>
jira VULN-72330 cve CVE-2025-38248 commit-author Ido Schimmel <idosch@nvidia.com> commit 7544f3f upstream-diff Context conflicts due to missing 8fa7292 ("treewide: Switch/rename to timer_delete[_sync]()"). No real diffs from upstream The bridge maintains a global list of ports behind which a multicast router resides. The list is consulted during forwarding to ensure multicast packets are forwarded to these ports even if the ports are not member in the matching MDB entry. When per-VLAN multicast snooping is enabled, the per-port multicast context is disabled on each port and the port is removed from the global router port list: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 # ip link add name dummy1 up master br1 type dummy # ip link set dev dummy1 type bridge_slave mcast_router 2 $ bridge -d mdb show | grep router router ports on br1: dummy1 # ip link set dev br1 type bridge mcast_vlan_snooping 1 $ bridge -d mdb show | grep router However, the port can be re-added to the global list even when per-VLAN multicast snooping is enabled: # ip link set dev dummy1 type bridge_slave mcast_router 0 # ip link set dev dummy1 type bridge_slave mcast_router 2 $ bridge -d mdb show | grep router router ports on br1: dummy1 Since commit 4b30ae9 ("net: bridge: mcast: re-implement br_multicast_{enable, disable}_port functions"), when per-VLAN multicast snooping is enabled, multicast disablement on a port will disable the per-{port, VLAN} multicast contexts and not the per-port one. As a result, a port will remain in the global router port list even after it is deleted. This will lead to a use-after-free [1] when the list is traversed (when adding a new port to the list, for example): # ip link del dev dummy1 # ip link add name dummy2 up master br1 type dummy # ip link set dev dummy2 type bridge_slave mcast_router 2 Similarly, stale entries can also be found in the per-VLAN router port list. When per-VLAN multicast snooping is disabled, the per-{port, VLAN} contexts are disabled on each port and the port is removed from the per-VLAN router port list: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 # ip link add name dummy1 up master br1 type dummy # bridge vlan add vid 2 dev dummy1 # bridge vlan global set vid 2 dev br1 mcast_snooping 1 # bridge vlan set vid 2 dev dummy1 mcast_router 2 $ bridge vlan global show dev br1 vid 2 | grep router router ports: dummy1 # ip link set dev br1 type bridge mcast_vlan_snooping 0 $ bridge vlan global show dev br1 vid 2 | grep router However, the port can be re-added to the per-VLAN list even when per-VLAN multicast snooping is disabled: # bridge vlan set vid 2 dev dummy1 mcast_router 0 # bridge vlan set vid 2 dev dummy1 mcast_router 2 $ bridge vlan global show dev br1 vid 2 | grep router router ports: dummy1 When the VLAN is deleted from the port, the per-{port, VLAN} multicast context will not be disabled since multicast snooping is not enabled on the VLAN. As a result, the port will remain in the per-VLAN router port list even after it is no longer member in the VLAN. This will lead to a use-after-free [2] when the list is traversed (when adding a new port to the list, for example): # ip link add name dummy2 up master br1 type dummy # bridge vlan add vid 2 dev dummy2 # bridge vlan del vid 2 dev dummy1 # bridge vlan set vid 2 dev dummy2 mcast_router 2 Fix these issues by removing the port from the relevant (global or per-VLAN) router port list in br_multicast_port_ctx_deinit(). The function is invoked during port deletion with the per-port multicast context and during VLAN deletion with the per-{port, VLAN} multicast context. Note that deleting the multicast router timer is not enough as it only takes care of the temporary multicast router states (1 or 3) and not the permanent one (2). [1] BUG: KASAN: slab-out-of-bounds in br_multicast_add_router.part.0+0x3f1/0x560 Write of size 8 at addr ffff888004a67328 by task ip/384 [...] Call Trace: <TASK> dump_stack_lvl+0x6f/0xa0 print_address_description.constprop.0+0x6f/0x350 print_report+0x108/0x205 kasan_report+0xdf/0x110 br_multicast_add_router.part.0+0x3f1/0x560 br_multicast_set_port_router+0x74e/0xac0 br_setport+0xa55/0x1870 br_port_slave_changelink+0x95/0x120 __rtnl_newlink+0x5e8/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 __sys_sendmsg+0x124/0x1c0 do_syscall_64+0xbb/0x360 entry_SYSCALL_64_after_hwframe+0x4b/0x53 [2] BUG: KASAN: slab-use-after-free in br_multicast_add_router.part.0+0x378/0x560 Read of size 8 at addr ffff888009f00840 by task bridge/391 [...] Call Trace: <TASK> dump_stack_lvl+0x6f/0xa0 print_address_description.constprop.0+0x6f/0x350 print_report+0x108/0x205 kasan_report+0xdf/0x110 br_multicast_add_router.part.0+0x378/0x560 br_multicast_set_port_router+0x6f9/0xac0 br_vlan_process_options+0x8b6/0x1430 br_vlan_rtm_process_one+0x605/0xa30 br_vlan_rtm_process+0x396/0x4c0 rtnetlink_rcv_msg+0x2f7/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 __sys_sendmsg+0x124/0x1c0 do_syscall_64+0xbb/0x360 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: 2796d84 ("net: bridge: vlan: convert mcast router global option to per-vlan entry") Fixes: 4b30ae9 ("net: bridge: mcast: re-implement br_multicast_{enable, disable}_port functions") Reported-by: syzbot+7bfa4b72c6a5da128d32@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/684c18bd.a00a0220.279073.000b.GAE@google.com/T/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250619182228.1656906-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit 7544f3f) Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>
jira VULN-136536 cve-pre CVE-2025-39697 commit-author Trond Myklebust <trond.myklebust@hammerspace.com> commit b193a78 Ensure that nfs_clear_request_commit() updates the correct counters when it removes them from the commit list. Fixes: ed5d588 ("NFS: Try to join page groups before an O_DIRECT retransmission") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> (cherry picked from commit b193a78) Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>
…ests jira VULN-136536 cve-pre CVE-2025-39697 commit-author Christoph Hellwig <hch@lst.de> commit 25edbca upstream-diff Used linux-6.6.y backport 9a1963404cc2eef69d2f8a42861bdf63d087dd5d for the clean cherry pick Fold nfs_page_group_lock_subrequests into nfs_lock_and_join_requests to prepare for future changes to this code, and move the helpers to write.c as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> (cherry picked from commit 9a1963404cc2eef69d2f8a42861bdf63d087dd5d) Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>
jira VULN-136536 cve CVE-2025-39697 commit-author Trond Myklebust <trond.myklebust@hammerspace.com> commit 76d2e38 upstream-diff Used linux-6.6.y backport 181feb41f0b268e6288bf9a7b984624d7fe2031d for the clean cherry pick After nfs_lock_and_join_requests() tests for whether the request is still attached to the mapping, nothing prevents a call to nfs_inode_remove_request() from succeeding until we actually lock the page group. The reason is that whoever called nfs_inode_remove_request() doesn't necessarily have a lock on the page group head. So in order to avoid races, let's take the page group lock earlier in nfs_lock_and_join_requests(), and hold it across the removal of the request in nfs_inode_remove_request(). Reported-by: Jeff Layton <jlayton@kernel.org> Tested-by: Joe Quanaim <jdq@meta.com> Tested-by: Andrew Steffen <aksteffen@meta.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: bd37d6f ("NFSv4: Convert nfs_lock_and_join_requests() to use nfs_page_find_head_request()") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> (cherry picked from commit 181feb41f0b268e6288bf9a7b984624d7fe2031d) Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[LTS 9.4]
Commits
CVE-2025-39697
0:
1:
2:
The fix 76d2e38 expands the locked range in
nfs_lock_and_join_requests()function on code which inciqlts9_4was not inlined and remains in thenfs_page_group_lock_subrequests()function. Before it can be applied thenfs_page_group_lock_subrequests()call must be inlined - this is whatnfs: fold nfs_page_group_lock_subrequests into nfs_lock_and_join_requestsaccomplishes. The problem is thatnfs_lock_and_join_requests()undergoes frequent changes and at the time of upstream's 25edbca it differs substantially from the LTS 9.4 version. A stable Linux version exists with CVE-2025-39697 backported, whose NFS timeline closely matches that of LTS 9.4 -linux-6.6.y:Except for handling the
struct nfs_commit_infoobject thenfs_lock_and_join_requests()function is the same inciqlts9_4andlinux-6.6.y. Commit (2)NFS: Use the correct commit info in nfs_join_page_group()nulls out this difference and allows for a clean cherry pick oflinux-6.6.yversion of 25edbca. Thislinux-6.6.ybackport introducesnfs_init_cinfo_from_inode()callwhich seems to be redundant, but it was left for the sake of simplicity and compatibility with the following commit. Having
linux-6.6.y-flavored 25edbca in place the CVE-2025-39697 fix fromlinux-6.6.yapplied cleanly as well.The
linux-6.6.ybackports may be confusing, because they mix in elements from other commits as well. This can be best explained with the table of commits listing upstream changes to thenfs_lock_and_join_requests()function since the the upstream CVE-2025-39697 fix 76d2e38 down to the version found inciqlts9_4:The end result - at the moment of
NFS: Fix a race when updating an existing writefix - is thenfs_lock_and_join_requests()function being the same inlinux-6.6.yand the upstream:kernel-src-tree/fs/nfs/write.c
Lines 553 to 609 in 76d2e38
up to two lines:
inodeobject throughfolio_file_mapping()call instead of directfoliofield access:linux-6.6.ykernel-mainline!folio_test_swapcache(folio)condition in the request removal testlinux-6.6.ykernel-mainlineBoth of these differences are the result of the lack of backported commit 7e8e78a - directly for the first one and indirectly for the second one, as the above branch was part of the inlined
nfs_folio_find_and_lock_request()function with thefolio_test_swapcache(folio)check removed in that commit.CVE-2025-38248
For CVE-2025-38248 on LTS 9.4 the situation is very similar to LTS 9.6 . In summary,
bridge: mcast: Fix use-after-free during router port configurationfixes the bug, but it assumesnet: bridge: mcast: re-implement br_multicast_{enable, disable}_port functionsis in place, which is part of the problem, but not the only one, so the bug still applies, whilenet: bridge: mcast: update multicast contex when vlan state is changedis pulled in for completion. See #1163 for details.Bug replication was done with KASAN enabled:
Three versions of kernel were tested:
ciqlts9_4,ciqlts9_4withnet: bridge: mcast: re-implement br_multicast_{enable, disable}_port functionsbackported,ciqlts9_4as in this PR.The 7544f3f commit addresses two use-after-free bugs, denoted [1] and [2], having two separate replication scripts (may not be minimal, but are sufficient):
[1]:
[2]:
This gave 2 x 3 = 6 test results, which can be summarized as follows:
This confirms that
ciqlts9_4is affected by CVE-2025-38248 and that 7544f3f fixes the problem, while 4b30ae9 may exacerbate it temporarily, but may be needed for 7544f3f as prerequisite.kABI check: passed
Boot test: passed
boot-test.log
Kselftests: passed relative
Reference
kselftests–ciqlts9_4–run1.log
Patch
kselftests–ciqlts9_4-CVE-batch-31–run1.log
kselftests–ciqlts9_4-CVE-batch-31–run2.log
Comparison
The tests results for the reference and the patch are the same.
selftests-cmp.txt
Footnotes
1 CVE-2025-38248-repl-1–ciqlts9_4.log
2 CVE-2025-38248-repl-2–ciqlts9_4.log
3 CVE-2025-38248-repl-1–ciqlts9_4-halfpatch.log
4 CVE-2025-38248-repl-2–ciqlts9_4-halfpatch.log
5 CVE-2025-38248-repl-1–ciqlts9_4-patch.log
6 CVE-2025-38248-repl-2–ciqlts9_4-patch.log