Bug 2166364

Summary: NFS hang on large dirs with kenel 4.18.0-448.el8.x86_64
Product: Red Hat Enterprise Linux 8 Reporter: dm
Component: kernelAssignee: Benjamin Coddington <bcodding>
kernel sub component: NFS QA Contact: Yongcheng Yang <yoyang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aokuliar, bcodding, bstinson, jwboyer, nfs-team, wderick, xzhou, yieli, yoyang
Version: CentOS StreamKeywords: Regression, Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.8   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-4.18.0-463.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-16 09:01:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description dm 2023-02-01 15:35:16 UTC
Description of problem:
NFS hangs on large directory access.

Version-Release number of selected component (if applicable):
Centos Stream 8 updated

How reproducible: always


Steps to Reproduce:
1. Mount NFS
2. CD to a large directory (~500000 files)
3. LS

Actual results:
Never returns listings.
Unable to interrupt using Ctrl-C

Message is syslog :

Feb  1 09:49:45 centos50 kernel: INFO: task bash:1326 blocked for more than 120 seconds.
Feb  1 09:49:45 centos50 kernel:      Not tainted 4.18.0-448.el8.x86_64 #1
Feb  1 09:49:45 centos50 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  1 09:49:45 centos50 kernel: task:bash            state:D stack:    0 pid: 1326 ppid:  1325 flags:0x00004084
Feb  1 09:49:45 centos50 kernel: Call Trace:
Feb  1 09:49:45 centos50 kernel: __schedule+0x2d1/0x870
Feb  1 09:49:45 centos50 kernel: schedule+0x55/0xf0
Feb  1 09:49:45 centos50 kernel: io_schedule+0x12/0x40
Feb  1 09:49:45 centos50 kernel: __lock_page+0x12d/0x230
Feb  1 09:49:45 centos50 kernel: ? file_fdatawait_range+0x20/0x20
Feb  1 09:49:45 centos50 kernel: pagecache_get_page+0x1e6/0x310
Feb  1 09:49:45 centos50 kernel: nfs_readdir_page_get_locked+0x38/0xe0 [nfs]
Feb  1 09:49:45 centos50 kernel: nfs_readdir_page_filler+0x215/0x410 [nfs]
Feb  1 09:49:45 centos50 kernel: nfs_readdir_xdr_to_array+0x2d9/0x310 [nfs]
Feb  1 09:49:45 centos50 kernel: nfs_readdir+0x26a/0xda0 [nfs]
Feb  1 09:49:45 centos50 kernel: ? update_load_avg+0x7e/0x710
Feb  1 09:49:45 centos50 kernel: iterate_dir+0x144/0x1a0
Feb  1 09:49:45 centos50 kernel: ksys_getdents64+0x9c/0x130
Feb  1 09:49:45 centos50 kernel: ? iterate_dir+0x1a0/0x1a0
Feb  1 09:49:45 centos50 kernel: __x64_sys_getdents64+0x16/0x20
Feb  1 09:49:45 centos50 kernel: do_syscall_64+0x5b/0x1b0
Feb  1 09:49:45 centos50 kernel: entry_SYSCALL_64_after_hwframe+0x61/0xc6
Feb  1 09:49:45 centos50 kernel: RIP: 0033:0x7fbe68a4436b
Feb  1 09:49:45 centos50 kernel: Code: Unable to access opcode bytes at RIP 0x7fbe68a44341.
Feb  1 09:49:45 centos50 kernel: RSP: 002b:00007fffe85a09a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Feb  1 09:49:45 centos50 kernel: RAX: ffffffffffffffda RBX: 000055f0ef7af4e0 RCX: 00007fbe68a4436b
Feb  1 09:49:45 centos50 kernel: RDX: 0000000000100000 RSI: 000055f0ef7af510 RDI: 0000000000000003
Feb  1 09:49:45 centos50 kernel: RBP: 000055f0ef7af510 R08: 0000000000000005 R09: 00007fbe68d0ebc0
Feb  1 09:49:45 centos50 kernel: R10: 0000000000000007 R11: 0000000000000246 R12: ffffffffffffff78
Feb  1 09:49:45 centos50 kernel: R13: 0000000000000000 R14: 000055f0ef8af4e3 R15: 0000000000000000



Expected results:
Directory listing after reasonable time


Additional info:
Happens with kernel 4.18.0-448.el8.x86_64 only

For comparison :
- vanilla kernel 4.19.271 is OK.
- previous kernel-4.18.0-408.el8.x86_64 is OK

Comment 1 Benjamin Coddington 2023-02-01 21:17:56 UTC
Reproduced, we're waiting on a page we already locked.

.. needs upstream:
648a4548d622 NFS: Don't deadlock when cookie hashes collide

Comment 14 Yongcheng Yang 2023-02-16 02:24:01 UTC
No new issue found from the regression tests.

Comment 15 dm 2023-03-09 14:12:38 UTC
As of today Centos Stream 8 still provides 4.18.0-448.el8 (affected by NFS bug).

Do you have an expected date of release for kernel-4.18.0-463.el8 for Stream8 ?

Comment 18 errata-xmlrpc 2023-05-16 09:01:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2951