Description
A circular locking dependency is reported by lockdep at beegfs lockdep circular in the Linux kernel BeeGFS. Triggered from task dd. Reachable call chain (top-down): FhgfsInode_referenceHandle → __lock_acquire → get_reg → check_prev_add → check_noncircular → print_circular_bug. Lockdep reports an AB-BA cycle between FhgfsInode->fileHandlesLock and a VFS i_rwsem, taken in opposite orders by writeback and the open path. Impact: under the right interleaving the kernel deadlocks, blocking the filesystem and any task that touches the affected mount.
Version
- Linux kernel commit
44331bd6a610
- BeeGFS version
8.3.0
Backtrace (excerpt)
[ 274.037072] WARNING: possible circular locking dependency detected
[ 274.037681] 6.19.0-g990dca9c3b7a-dirty #48 Tainted: G O
[ 274.038338] ------------------------------------------------------
[ 274.038938] dd/897 is trying to acquire lock:
[ 274.039376] ff1100010971b3c8 (&this->mutex#9){+.+.}-{4:4}, at: FhgfsInode_referenceHandle+0x1c4/0x16a0 [beegfs]
[ 274.041228]
[ 274.041228] but task is already holding lock:
[ 274.041805] ff1100010971b668 (&this->rwSem#3){++++}-{4:4}, at: FhgfsOpsHelper_flushCache+0x22/0x50 [beegfs]
[ 274.043589]
[ 274.043589] which lock already depends on the new lock.
[ 274.043589]
[ 274.044367]
[ 274.044367] the existing dependency chain (in reverse order) is:
[ 274.045085]
[ 274.045085] -> #1 (&this->rwSem#3){++++}-{4:4}:
[ 274.045729] lock_acquire+0x150/0x2c0
[ 274.046201] down_read+0xa0/0x450
[ 274.046623] FhgfsInode_referenceHandle+0xc48/0x16a0 [beegfs]
[ 274.048049] FhgfsOps_openReferenceHandle+0x3d1/0x820 [beegfs]
[ 274.049471] do_dentry_open+0x69b/0x14d0
[ 274.049978] vfs_open+0x87/0x400
[ 274.050373] path_openat+0x216f/0x3160
[ 274.050856] do_file_open+0x22d/0x480
[ 274.051310] do_sys_openat2+0x106/0x1d0
[ 274.051776] __x64_sys_openat+0x146/0x200
[ 274.052247] do_syscall_64+0x111/0x690
[ 274.052697] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 274.053293]
[ 274.053293] -> #0 (&this->mutex#9){+.+.}-{4:4}:
[ 274.053941] check_prev_add+0xeb/0xd00
[ 274.054387] __lock_acquire+0x1641/0x2260
[ 274.054869] lock_acquire+0x150/0x2c0
[ 274.055320] __mutex_lock+0x1a4/0x25c0
[ 274.055802] FhgfsInode_referenceHandle+0x1c4/0x16a0 [beegfs]
[ 274.057249] FhgfsOpsHelper_writeStatelessInode+0xc8/0x310 [beegfs]
[ 274.058779] __FhgfsOpsHelper_flushCacheUnlocked+0xfd/0x210 [beegfs]
[ 274.060286] FhgfsOpsHelper_flushCache+0x31/0x50 [beegfs]
[ 274.061699] __FhgfsOps_flush+0x331/0xd70 [beegfs]
[ 274.063061] filp_flush+0x12d/0x1d0
[ 274.063497] __x64_sys_close+0x84/0x120
[ 274.063962] do_syscall_64+0x111/0x690
[ 274.064409] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 274.064976]
[ 274.064976] other info that might help us debug this:
[ 274.064976]
[ 274.065753] Possible unsafe locking scenario:
[ 274.065753]
[ 274.066326] CPU0 CPU1
[ 274.066785] ---- ----
[ 274.067229] lock(&this->rwSem#3);
[ 274.067606] lock(&this->mutex#9);
[ 274.068226] lock(&this->rwSem#3);
[ 274.068843] lock(&this->mutex#9);
[ 274.069223]
[ 274.069223] *** DEADLOCK ***
[ 274.069223]
[ 274.069803] 1 lock held by dd/897:
[ 274.070153] #0: ff1100010971b668 (&this->rwSem#3){++++}-{4:4}, at: FhgfsOpsHelper_flushCache+0x22/0x50 [beegfs]
[ 274.071983]
[ 274.071983] stack backtrace:
[ 274.072430] CPU: 7 UID: 0 PID: 897 Comm: dd Tainted: G O 6.19.0-g990dca9c3b7a-dirty #48 PREEMPT(lazy)
[ 274.072474] Tainted: [O]=OOT_MODULE
[ 274.072483] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[ 274.072503] Call Trace:
[ 274.072520] <TASK>
[ 274.072535] dump_stack_lvl+0xc6/0x120
[ 274.072574] print_circular_bug+0x2d1/0x400
[ 274.072632] check_noncircular+0x146/0x160
[ 274.072691] check_prev_add+0xeb/0xd00
[ 274.072724] ? srso_alias_return_thunk+0x5/0xfbef5
[ 274.072758] ? get_reg+0x128/0x1a0
[ 274.072823] __lock_acquire+0x1641/0x2260
[ 274.072866] lock_acquire+0x150/0x2c0
[ 274.072894] ? FhgfsInode_referenceHandle+0x1c4/0x16a0 [beegfs]
[ 274.073722] ? __pfx___might_resched+0x10/0x10
[ 274.073768] __mutex_lock+0x1a4/0x25c0
[ 274.073813] ? FhgfsInode_referenceHandle+0x1c4/0x16a0 [beegfs]
[ 274.074619] ? srso_alias_return_thunk+0x5/0xfbef5
[ 274.074654] ? srso_alias_return_thunk+0x5/0xfbef5
[ 274.074689] ? FhgfsInode_referenceHandle+0x1c4/0x16a0 [beegfs]
[ 274.075505] ? __lock_acquire+0x466/0x2260
[ 274.075534] ? __pfx___mutex_lock+0x10/0x10
[ 274.075582] ? srso_alias_return_thunk+0x5/0xfbef5
[ 274.075615] ? __lock_acquire+0x466/0x2260
[ 274.075649] ? lock_acquire+0x150/0x2c0
[ 274.075676] ? is_bpf_text_address+0x2a/0x1e0
[ 274.075747] ? srso_alias_return_thunk+0x5/0xfbef5
[ 274.075781] ? lock_is_held_type+0x8f/0x100
[ 274.075832] ? FhgfsInode_referenceHandle+0x1c4/0x16a0 [beegfs]
[ 274.076640] ? srso_alias_return_thunk+0x5/0xfbef5
[ 274.076674] FhgfsInode_referenceHandle+0x1c4/0x16a0 [beegfs]
[ 274.077494] ? srso_alias_return_thunk+0x5/0xfbef5
[ 274.077528] ? filemap_get_folios_tag+0x428/0xab0
[ 274.077592] ? __pfx_FhgfsInode_referenceHandle+0x10/0x10 [beegfs]
[ 274.078417] ? __pfx_filemap_get_folios_tag+0x10/0x10
[ 274.078469] ? srso_alias_return_thunk+0x5/0xfbef5
[ 274.078503] ? _raw_spin_unlock+0x23/0x40
[ 274.078538] ? srso_alias_return_thunk+0x5/0xfbef5
[ 274.078575] ? writeback_single_inode+0x197/0x10d0
[ 274.078641] FhgfsOpsHelper_writeStatelessInode+0xc8/0x310 [beegfs]
[ 274.079466] ? __pfx_FhgfsOpsHelper_writeStatelessInode+0x10/0x10 [beegfs]
[ 274.080283] ? srso_alias_return_thunk+0x5/0xfbef5
[ 274.080317] ? lock_acquire+0x150/0x2c0
[ 274.080345] ? FhgfsOpsHelper_flushCache+0x22/0x50 [beegfs]
[ 274.081182] ? srso_alias_return_thunk+0x5/0xfbef5
[ 274.081223] __FhgfsOpsHelper_flushCacheUnlocked+0xfd/0x210 [beegfs]
[ 274.082050] FhgfsOpsHelper_flushCache+0x31/0x50 [beegfs]
[ 274.082873] __FhgfsOps_flush+0x331/0xd70 [beegfs]
[ 274.083692] ? __pfx___FhgfsOps_flush+0x10/0x10 [beegfs]
[ 274.084504] ? srso_alias_return_thunk+0x5/0xfbef5
[ 274.084551] ? file_close_fd+0x66/0x80
[ 274.084600] ? srso_alias_return_thunk+0x5/0xfbef5
[ 274.084633] ? lock_release+0xc7/0x270
[ 274.084658] ? srso_alias_return_thunk+0x5/0xfbef5
[ 274.084691] ? Logger_getLogLevel+0x12/0x40 [beegfs]
[ 274.085502] ? __pfx_FhgfsOps_flush+0x10/0x10 [beegfs]
[ 274.086316] filp_flush+0x12d/0x1d0
[ 274.086359] __x64_sys_close+0x84/0x120
[ 274.086391] do_syscall_64+0x111/0x690
[ 274.086425] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 274.086455] RIP: 0033:0x7f02058b79e0
[ 274.086477] Code: 0d 00 00 00 eb b2 e8 0f f8 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d 01 2c 0e 00 00 74 17 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
[ 274.086504] RSP: 002b:00007ffcbdcbe8f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
[ 274.086542] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f02058b79e0
[ 274.086560] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
[ 274.086576] RBP: 00007f02057bc6c0 R08: 0000000000000007 R09: 000055af1e93b010
[ 274.086593] R10: 00007f02057d84f0 R11: 0000000000000202 R12: 0000000000000000
[ 274.086610] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001
[ 274.086647] </TASK>
[ 274.124619] beegfs: dd(897): NodeConn (acquire stream): Connected: beegfs-storage@10.0.0.2:8003 (protocol: TCP)
[ 274.147403] dd (897) used greatest stack depth: 20488 bytes left
[ 274.805132] beegfs: umount(905): App (stop components): Stopping components...
[ 277.671386] beegfs: beegfs_XNodeSyn(879): Deregistration: Node deregistration successful.
[ 277.672462] beegfs: umount(905): App (stop): All components stopped.
[ 277.685370] beegfs: umount(905): BeeGFS unmounted.
[ 282.292078] beegfs: mount(955): Built without NVFS RDMA support.
[ 282.294479]
Description
A circular locking dependency is reported by lockdep at
beegfs lockdep circularin the Linux kernel BeeGFS. Triggered from taskdd. Reachable call chain (top-down):FhgfsInode_referenceHandle → __lock_acquire → get_reg → check_prev_add → check_noncircular → print_circular_bug. Lockdep reports an AB-BA cycle betweenFhgfsInode->fileHandlesLockand a VFSi_rwsem, taken in opposite orders by writeback and the open path. Impact: under the right interleaving the kernel deadlocks, blocking the filesystem and any task that touches the affected mount.Version
44331bd6a6108.3.0Backtrace (excerpt)