Environment
- BeeGFS version: 8.2.2
- OS (servers): Ubuntu 22.04, kernel 5.15.0-164-generic
- OS (clients): Ubuntu 22.04, kernel 5.15.0-25-generic
- Hardware: 5 meta/storage servers (32 cores, 192GB RAM, NVMe, ConnectX-7 100GbE)
- Clients: 2 (one RDMA via ConnectX-6 Dx, one TCP-only 10GbE)
- Meta storage: ext4 on NVMe
Description
When a client opens files with 128 concurrent threads accessing the same directory, the metadata server occasionally returns ENOENT for files that definitively exist. The files are immediately accessible on retry.
Reproduction
- A directory containing 84-1400 files (e.g., JPEG images)
- A multi-threaded application opens files listed in a CSV, using 128 threads
- Approximately 1-2 out of every 9,000 open() calls return ENOENT
- The failing file varies randomly between runs — it is never the same file twice
- Immediately after the failure, the same file can be opened successfully
Example output (128-thread application reading 9,428 files across multiple directories):
Warning: Failed to open file for reading!
File: /mnt/bee/data/ins_inf/ecu/edualexi/e7d6f1d67ea62cc29eb669e91dd1baf94b5a9f4a.jpg
Files: 9428 Templates: 9427
The file exists and is world-readable:
-rwxrwxrwx+ 1 ben ben 131113 Mar 6 2023 /mnt/bee/data/.../file.jpg
Key findings
- No client-side communication errors:
dmesg on the client shows zero BeeGFS messages during the failure. The meta server sends a clean ENOENT response (not a connection timeout or retry).
- Happens across multiple meta targets: Failures occur on directories owned by different meta nodes (m:1, m:3, etc.), ruling out a single faulty server.
- Happens with both RDMA and TCP clients: Reproduced on a 100GbE RDMA client and a 10GbE TCP-only client.
- Not related to caching: Setting
tuneENOENTCacheValidityMS=0, tuneFileSubentryCacheValidityMS=0, and tuneDirSubentryCacheValidityMS=0 on the client does not fix it.
- Not related to server capacity: Meta servers are 95-99% CPU idle during reproduction. Increasing
connMaxInternodeNum (64→256), tuneNumStreamListeners (8→32), tuneNumWorkers (128), and tuneUsePerUserMsgQueues=true did not fix it.
- Workaround: An LD_PRELOAD shim that retries open() on ENOENT with a 2ms delay succeeds on the first retry every time, confirming the ENOENT is transient (~1-2ms duration).
Expected behavior
The meta server should never return ENOENT for a file that exists, regardless of concurrent lookup load on the same directory.
Suspected cause
A race condition in the meta server's internal directory entry lookup when multiple concurrent requests access the same directory simultaneously. The per-directory locking or hash-walk may briefly return "not found" while another operation is in progress.
Environment
Description
When a client opens files with 128 concurrent threads accessing the same directory, the metadata server occasionally returns ENOENT for files that definitively exist. The files are immediately accessible on retry.
Reproduction
Example output (128-thread application reading 9,428 files across multiple directories):
Warning: Failed to open file for reading!
File: /mnt/bee/data/ins_inf/ecu/edualexi/e7d6f1d67ea62cc29eb669e91dd1baf94b5a9f4a.jpg
Files: 9428 Templates: 9427
The file exists and is world-readable:
-rwxrwxrwx+ 1 ben ben 131113 Mar 6 2023 /mnt/bee/data/.../file.jpg
Key findings
dmesgon the client shows zero BeeGFS messages during the failure. The meta server sends a clean ENOENT response (not a connection timeout or retry).tuneENOENTCacheValidityMS=0,tuneFileSubentryCacheValidityMS=0, andtuneDirSubentryCacheValidityMS=0on the client does not fix it.connMaxInternodeNum(64→256),tuneNumStreamListeners(8→32),tuneNumWorkers(128), andtuneUsePerUserMsgQueues=truedid not fix it.Expected behavior
The meta server should never return ENOENT for a file that exists, regardless of concurrent lookup load on the same directory.
Suspected cause
A race condition in the meta server's internal directory entry lookup when multiple concurrent requests access the same directory simultaneously. The per-directory locking or hash-walk may briefly return "not found" while another operation is in progress.