Bug 2166364 - NFS hang on large dirs with kenel 4.18.0-448.el8.x86_64
Summary: NFS hang on large dirs with kenel 4.18.0-448.el8.x86_64
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kernel
Version: CentOS Stream
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: 8.8
Assignee: Benjamin Coddington
QA Contact: Yongcheng Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-02-01 15:35 UTC by dm
Modified: 2023-05-16 10:56 UTC (History)
9 users (show)

Fixed In Version: kernel-4.18.0-463.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-16 09:01:04 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/rhel/src/kernel rhel-8 merge_requests 4188 0 None None None 2023-02-01 21:28:41 UTC
Red Hat Issue Tracker RHELPLAN-147322 0 None None None 2023-02-01 15:36:14 UTC
Red Hat Product Errata RHSA-2023:2951 0 None None None 2023-05-16 09:02:24 UTC

Description dm 2023-02-01 15:35:16 UTC
Description of problem:
NFS hangs on large directory access.

Version-Release number of selected component (if applicable):
Centos Stream 8 updated

How reproducible: always


Steps to Reproduce:
1. Mount NFS
2. CD to a large directory (~500000 files)
3. LS

Actual results:
Never returns listings.
Unable to interrupt using Ctrl-C

Message is syslog :

Feb  1 09:49:45 centos50 kernel: INFO: task bash:1326 blocked for more than 120 seconds.
Feb  1 09:49:45 centos50 kernel:      Not tainted 4.18.0-448.el8.x86_64 #1
Feb  1 09:49:45 centos50 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  1 09:49:45 centos50 kernel: task:bash            state:D stack:    0 pid: 1326 ppid:  1325 flags:0x00004084
Feb  1 09:49:45 centos50 kernel: Call Trace:
Feb  1 09:49:45 centos50 kernel: __schedule+0x2d1/0x870
Feb  1 09:49:45 centos50 kernel: schedule+0x55/0xf0
Feb  1 09:49:45 centos50 kernel: io_schedule+0x12/0x40
Feb  1 09:49:45 centos50 kernel: __lock_page+0x12d/0x230
Feb  1 09:49:45 centos50 kernel: ? file_fdatawait_range+0x20/0x20
Feb  1 09:49:45 centos50 kernel: pagecache_get_page+0x1e6/0x310
Feb  1 09:49:45 centos50 kernel: nfs_readdir_page_get_locked+0x38/0xe0 [nfs]
Feb  1 09:49:45 centos50 kernel: nfs_readdir_page_filler+0x215/0x410 [nfs]
Feb  1 09:49:45 centos50 kernel: nfs_readdir_xdr_to_array+0x2d9/0x310 [nfs]
Feb  1 09:49:45 centos50 kernel: nfs_readdir+0x26a/0xda0 [nfs]
Feb  1 09:49:45 centos50 kernel: ? update_load_avg+0x7e/0x710
Feb  1 09:49:45 centos50 kernel: iterate_dir+0x144/0x1a0
Feb  1 09:49:45 centos50 kernel: ksys_getdents64+0x9c/0x130
Feb  1 09:49:45 centos50 kernel: ? iterate_dir+0x1a0/0x1a0
Feb  1 09:49:45 centos50 kernel: __x64_sys_getdents64+0x16/0x20
Feb  1 09:49:45 centos50 kernel: do_syscall_64+0x5b/0x1b0
Feb  1 09:49:45 centos50 kernel: entry_SYSCALL_64_after_hwframe+0x61/0xc6
Feb  1 09:49:45 centos50 kernel: RIP: 0033:0x7fbe68a4436b
Feb  1 09:49:45 centos50 kernel: Code: Unable to access opcode bytes at RIP 0x7fbe68a44341.
Feb  1 09:49:45 centos50 kernel: RSP: 002b:00007fffe85a09a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Feb  1 09:49:45 centos50 kernel: RAX: ffffffffffffffda RBX: 000055f0ef7af4e0 RCX: 00007fbe68a4436b
Feb  1 09:49:45 centos50 kernel: RDX: 0000000000100000 RSI: 000055f0ef7af510 RDI: 0000000000000003
Feb  1 09:49:45 centos50 kernel: RBP: 000055f0ef7af510 R08: 0000000000000005 R09: 00007fbe68d0ebc0
Feb  1 09:49:45 centos50 kernel: R10: 0000000000000007 R11: 0000000000000246 R12: ffffffffffffff78
Feb  1 09:49:45 centos50 kernel: R13: 0000000000000000 R14: 000055f0ef8af4e3 R15: 0000000000000000



Expected results:
Directory listing after reasonable time


Additional info:
Happens with kernel 4.18.0-448.el8.x86_64 only

For comparison :
- vanilla kernel 4.19.271 is OK.
- previous kernel-4.18.0-408.el8.x86_64 is OK

Comment 1 Benjamin Coddington 2023-02-01 21:17:56 UTC
Reproduced, we're waiting on a page we already locked.

.. needs upstream:
648a4548d622 NFS: Don't deadlock when cookie hashes collide

Comment 14 Yongcheng Yang 2023-02-16 02:24:01 UTC
No new issue found from the regression tests.

Comment 15 dm 2023-03-09 14:12:38 UTC
As of today Centos Stream 8 still provides 4.18.0-448.el8 (affected by NFS bug).

Do you have an expected date of release for kernel-4.18.0-463.el8 for Stream8 ?

Comment 18 errata-xmlrpc 2023-05-16 09:01:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2951


Note You need to log in before you can comment on or make changes to this bug.