1905809 – [RHEL-9] WARNING: CPU: 0 PID: 13059 at fs/nfsd/nfs4proc.c:458 nfsd4_open+0x19c/0x4a0 [nfsd]

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1905809 - [RHEL-9] WARNING: CPU: 0 PID: 13059 at fs/nfsd/nfs4proc.c:458 nfsd4_open+0x19c/0x4a0 [nfsd]

Summary: [RHEL-9] WARNING: CPU: 0 PID: 13059 at fs/nfsd/nfs4proc.c:458 nfsd4_open+0x19...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	unspecified
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	pre-dev-freeze
Target Release:	---
Assignee:	Benjamin Coddington
QA Contact:	JianHong Yin
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-09 06:30 UTC by JianHong Yin
Modified:	2022-11-15 12:09 UTC (History)
CC List:	6 users (show)
Fixed In Version:	kernel-5.14.0-129.el9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-11-15 10:50:19 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	chuck.lever: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gitlab	redhat/centos-stream/src/kernel centos-stream-9 merge_requests 979	0	None	opened	Make NFSv4 OPEN(CREATE) less brittle	2022-07-06 14:16:25 UTC
Red Hat Product Errata	RHSA-2022:8267	0	None	None	None	2022-11-15 10:50:48 UTC

Description JianHong Yin 2020-12-09 06:30:00 UTC

Description of problem:

got kernel WARNING while run luster-racer test on RHEL-9 nfs
'''
[ 1623.977519] nfsd4_process_open2 failed to open newly-created file! status=5 
[ 1623.978951] WARNING: CPU: 0 PID: 13059 at fs/nfsd/nfs4proc.c:458 nfsd4_open+0x19c/0x4a0 [nfsd] 
'''

Version-Release number of selected component (if applicable):
kernel-5.9.0-40.el9.x86_64

How reproducible:
TBD

Steps to Reproduce:
1. TBD

Actual results:
'''
[ 1623.973811] ------------[ cut here ]------------ 
[ 1623.977519] nfsd4_process_open2 failed to open newly-created file! status=5 
[ 1623.978951] WARNING: CPU: 0 PID: 13059 at fs/nfsd/nfs4proc.c:458 nfsd4_open+0x19c/0x4a0 [nfsd] 
[ 1623.980620] Modules linked in: nfsv3 ext4 mbcache jbd2 loop nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace nfs_ssc fscache rfkill sunrpc intel_rapl_msr intel_rapl_common isst_if_common nfit libnvdimm pcspkr virtio_balloon joydev i2c_piix4 drm fuse ip_tables xfs libcrc32c ata_generic ata_piix libata crct10dif_pclmul crc32_pclmul crc32c_intel virtio_net ghash_clmulni_intel virtio_console net_failover serio_raw virtio_blk failover dm_mirror dm_region_hash dm_log dm_mod 
[ 1623.989192] CPU: 0 PID: 13059 Comm: nfsd Kdump: loaded Tainted: G               X --------- ---  5.9.0-40.el9.x86_64 #1 
[ 1623.991260] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 
[ 1623.992396] RIP: 0010:nfsd4_open+0x19c/0x4a0 [nfsd] 
[ 1623.993347] Code: ea 4c 89 f7 e8 b5 4e 01 00 41 89 c5 85 c0 74 1c 80 bd 15 01 00 00 00 74 13 44 89 ee 48 c7 c7 d8 2f a9 c0 0f ce e8 c0 7e 90 f8 <0f> 0b 4c 8b 34 24 4d 85 f6 74 73 4d 39 e6 74 6e 4c 89 e7 e8 7c d0 
[ 1623.996928] RSP: 0018:ffff9911c0bd7da0 EFLAGS: 00010286 
[ 1623.997951] RAX: 0000000000000000 RBX: ffff8b4a2228d000 RCX: 0000000000000000 
[ 1623.999339] RDX: ffff8b4ab02281e0 RSI: ffff8b4ab0218000 RDI: ffff8b4ab0218000 
[ 1624.000719] RBP: ffff8b4aad6211e0 R08: 0000000000000001 R09: 00000000000002e7 
[ 1624.002111] R10: 0000000000000000 R11: ffff9911c0bd7c3d R12: ffff8b4aad628070 
[ 1624.003492] R13: 0000000005000000 R14: ffff8b4aad618000 R15: ffffffffba1b9d00 
[ 1624.004880] FS:  0000000000000000(0000) GS:ffff8b4ab0200000(0000) knlGS:0000000000000000 
[ 1624.006452] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[ 1624.007570] CR2: 00007f66771f5000 CR3: 000000010760e001 CR4: 00000000007706f0 
[ 1624.009077] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[ 1624.010606] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 
[ 1624.011985] PKRU: 55555554 
[ 1624.012529] Call Trace: 
[ 1624.013033]  nfsd4_proc_compound+0x394/0x720 [nfsd] 
[ 1624.013998]  ? nfsd4_decode_compound.constprop.0+0x3ac/0x460 [nfsd] 
[ 1624.015225]  nfsd_dispatch+0xcc/0x200 [nfsd] 
[ 1624.016086]  svc_process_common+0x479/0x650 [sunrpc] 
[ 1624.017072]  ? svc_handle_xprt+0x165/0x300 [sunrpc] 
[ 1624.018028]  ? nfsd_svc+0x130/0x130 [nfsd] 
[ 1624.018851]  svc_process+0xb7/0xf0 [sunrpc] 
[ 1624.019675]  nfsd+0xe3/0x140 [nfsd] 
[ 1624.020372]  ? nfsd_destroy+0x50/0x50 [nfsd] 
[ 1624.021208]  kthread+0x11b/0x140 
[ 1624.021859]  ? __kthread_bind_mask+0x60/0x60 
[ 1624.022699]  ret_from_fork+0x22/0x30 
[ 1624.023408] ---[ end trace 36f9925a7db892e6 ]--- 
'''

Expected results:
no the warning

Additional info:

Comment 3 J. Bruce Fields 2022-01-24 22:28:31 UTC

See also https://bugzilla.kernel.org/show_bug.cgi?id=195725

Comment 5 Chuck Lever 2022-04-22 14:35:45 UTC

Proposed upstream fix (against v5.18-rc3): https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/?h=bugzilla-1905809

I set up Lustre racer in my lab but haven't gotten it to reproduce this failure. Would it be possible for Red Hat to test my proposed fix?

Comment 6 JianHong Yin 2022-04-27 09:04:57 UTC

(In reply to Chuck Lever from comment #5)
> Proposed upstream fix (against v5.18-rc3):
> https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/
> ?h=bugzilla-1905809
> 
> I set up Lustre racer in my lab but haven't gotten it to reproduce this
> failure. Would it be possible for Red Hat to test my proposed fix?

Hi Chuck. I'm working on it, will report test result tomorrow

Comment 7 JianHong Yin 2022-04-28 05:10:33 UTC

(In reply to JianHong Yin from comment #6)
> (In reply to Chuck Lever from comment #5)
> > Proposed upstream fix (against v5.18-rc3):
> > https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/
> > ?h=bugzilla-1905809
> > 
> > I set up Lustre racer in my lab but haven't gotten it to reproduce this
> > failure. Would it be possible for Red Hat to test my proposed fix?
> 
> Hi Chuck. I'm working on it, will report test result tomorrow

continue: Hi Chuck, we got a new WARNING with the new kernel.

'''
[11418.531472] ------------[ cut here ]------------ 
[11418.536114] WARNING: CPU: 23 PID: 873760 at fs/inode.c:388 inc_nlink+0x32/0x40 
[11418.543354] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs ext4 mbcache jbd2 loop rpcrdma rdma_cm iw_cm ib_cm nfsd auth_rpcgss nfs_acl lockd grace rfkill sunrpc intel_rapl_msr dcdbas intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm irqbypass rapl pcspkr dell_smbios dell_wmi_descriptor wmi_bmof ipmi_ssif mgag200 i2c_algo_bit drm_shmem_helper mlx5_ib drm_kms_helper syscopyarea sysfillrect sysimgblt ib_uverbs fb_sys_fops ib_core k10temp i2c_piix4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c mlx5_core sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ahci libahci crct10dif_pclmul mlxfw crc32_pclmul crc32c_intel libata tls ghash_clmulni_intel tg3 psample ccp megaraid_sas pci_hyperv_intf wmi dm_mirror dm_region_hash dm_log dm_mod 
[11418.615123] CPU: 23 PID: 873760 Comm: ln Kdump: loaded Not tainted 5.18.0-rc3+ #1 
[11418.622618] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.3.6 07/06/2021 
[11418.630204] RIP: 0010:inc_nlink+0x32/0x40 
[11418.634241] Code: 85 c0 74 07 83 c0 01 89 47 48 c3 f6 87 99 00 00 00 04 74 16 48 8b 47 28 f0 48 ff 88 48 04 00 00 8b 47 48 83 c0 01 89 47 48 c3 <0f> 0b eb e6 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 01 00 
[11418.653006] RSP: 0018:ffffab4122027cc8 EFLAGS: 00010246 
[11418.658249] RAX: 0000000000000000 RBX: ffff9f18cc101e88 RCX: 0000000000000001 
[11418.665397] RDX: ffff9f18c6b88000 RSI: 0000000000000001 RDI: ffff9f18d1bcece8 
[11418.672545] RBP: ffffab4122027dc8 R08: 0000000000000000 R09: ffff9f1aae6273e8 
[11418.679695] R10: 0000000000000003 R11: 0000000000000003 R12: ffff9f18d1bced70 
[11418.686840] R13: ffff9f18d1bcece8 R14: 0000000100a9a299 R15: ffffab4122027d8c 
[11418.693987] FS:  00007f43bdd1b740(0000) GS:ffff9f277f3c0000(0000) knlGS:0000000000000000 
[11418.702089] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[11418.707853] CR2: 00007ffa9538c0e0 CR3: 000000010d6ca000 CR4: 0000000000350ee0 
[11418.715001] Call Trace: 
[11418.717457]  <TASK> 
[11418.719624]  _nfs4_proc_link+0x20f/0x260 [nfsv4] 
[11418.724275]  nfs4_proc_link+0x65/0xa0 [nfsv4] 
[11418.728664]  nfs_link+0x6e/0x160 [nfs] 
[11418.732441]  vfs_link+0x20f/0x320 
[11418.735771]  do_linkat+0x212/0x2e0 
[11418.739184]  __x64_sys_linkat+0x56/0x70 
[11418.743031]  do_syscall_64+0x3b/0x90 
[11418.746619]  entry_SYSCALL_64_after_hwframe+0x44/0xae 
[11418.751679] RIP: 0033:0x7f43bda43a2e 
[11418.755262] Code: 48 8b 0d fd 63 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 09 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ca 63 1b 00 f7 d8 64 89 01 48 
[11418.774097] RSP: 002b:00007ffd9efd8fb8 EFLAGS: 00000256 ORIG_RAX: 0000000000000109 
[11418.781682] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f43bda43a2e 
[11418.788829] RDX: 0000000000000003 RSI: 00007ffd9efdacee RDI: 00000000ffffff9c 
[11418.795972] RBP: 00007ffd9efdacee R08: 0000000000000000 R09: 0000000000000000 
[11418.803117] R10: 00005649754b1304 R11: 0000000000000256 R12: 0000000000000100 
[11418.810262] R13: 00007ffd9efdad00 R14: 00007ffd9efdacee R15: 00005649754b1304 
[11418.817418]  </TASK> 
[11418.819619] ---[ end trace 0000000000000000 ]--- 
[11436.131940] 8[1540067]: segfault at 8 ip 00007f3831ca9680 sp 00007ffcac7ed818 error 4 in ld-linux-x86-64.so.2[7f3831c8a000+26000] 
[11436.143630] Code: 5b 4c 89 f0 5d 41 5c 41 5d 41 5e c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 41 ff ff 6f 29 d0 48 8b 54 c6 40 48 8b 46 68 <48> 8b 40 08 f6 86 1e 03 00 00 20 74 03 48 03 06 48 85 d2 74 2b 48
'''

Comment 8 Chuck Lever 2022-04-28 14:54:40 UTC

Thanks for exercising the proposed patch.

nfs4_proc_link is client-side code. IIRC your reproducer runs loopback -- client and server are the same system. At a guess I think this backtrace is a different issue entirely. Perhaps it should be reported as a separate bug.

Would it make sense to continue running the reproducer for a while to see if you can trigger any other failures?

Comment 9 JianHong Yin 2022-04-29 03:50:23 UTC

(In reply to Chuck Lever from comment #8)
> Thanks for exercising the proposed patch.
> 
> nfs4_proc_link is client-side code. IIRC your reproducer runs loopback --
> client and server are the same system. At a guess I think this backtrace is
> a different issue entirely. Perhaps it should be reported as a separate bug.
Hi Chuck, Good to know that.

> 
> Would it make sense to continue running the reproducer for a while to see if
> you can trigger any other failures?
no more failures were found after new round of testing.
and if without the patch I can always reproduce the issue on latest upstream 
kernel (5.18.0-0.rc4.20220427git46cf2c613f4b10e)

so the patch works well, VERIFIED; good job!

Comment 10 Chuck Lever 2022-04-29 13:40:31 UTC

(In reply to JianHong Yin from comment #9)
> no more failures were found after new round of testing.
> and if without the patch I can always reproduce the issue on latest upstream 
> kernel (5.18.0-0.rc4.20220427git46cf2c613f4b10e)
> 
> so the patch works well, VERIFIED; good job!

Thank you very much for your help. I will queue this series for v5.19 with a Tested-by.

Comment 26 errata-xmlrpc 2022-11-15 10:50:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8267

Note You need to log in before you can comment on or make changes to this bug.