Bug 1905809
| Summary: | [RHEL-9] WARNING: CPU: 0 PID: 13059 at fs/nfsd/nfs4proc.c:458 nfsd4_open+0x19c/0x4a0 [nfsd] | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | JianHong Yin <jiyin> |
| Component: | kernel | Assignee: | Benjamin Coddington <bcodding> |
| kernel sub component: | NFS | QA Contact: | JianHong Yin <jiyin> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | bcodding, chuck.lever, nfs-maint, xzhou, yieli, yoyang |
| Version: | unspecified | Keywords: | Patch, Triaged |
| Target Milestone: | pre-dev-freeze | Flags: | chuck.lever:
needinfo-
pm-rhel: mirror+ |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-5.14.0-129.el9 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-11-15 10:50:19 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
JianHong Yin
2020-12-09 06:30:00 UTC
Proposed upstream fix (against v5.18-rc3): https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/?h=bugzilla-1905809 I set up Lustre racer in my lab but haven't gotten it to reproduce this failure. Would it be possible for Red Hat to test my proposed fix? (In reply to Chuck Lever from comment #5) > Proposed upstream fix (against v5.18-rc3): > https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/ > ?h=bugzilla-1905809 > > I set up Lustre racer in my lab but haven't gotten it to reproduce this > failure. Would it be possible for Red Hat to test my proposed fix? Hi Chuck. I'm working on it, will report test result tomorrow (In reply to JianHong Yin from comment #6) > (In reply to Chuck Lever from comment #5) > > Proposed upstream fix (against v5.18-rc3): > > https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/ > > ?h=bugzilla-1905809 > > > > I set up Lustre racer in my lab but haven't gotten it to reproduce this > > failure. Would it be possible for Red Hat to test my proposed fix? > > Hi Chuck. I'm working on it, will report test result tomorrow continue: Hi Chuck, we got a new WARNING with the new kernel. ''' [11418.531472] ------------[ cut here ]------------ [11418.536114] WARNING: CPU: 23 PID: 873760 at fs/inode.c:388 inc_nlink+0x32/0x40 [11418.543354] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs ext4 mbcache jbd2 loop rpcrdma rdma_cm iw_cm ib_cm nfsd auth_rpcgss nfs_acl lockd grace rfkill sunrpc intel_rapl_msr dcdbas intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm irqbypass rapl pcspkr dell_smbios dell_wmi_descriptor wmi_bmof ipmi_ssif mgag200 i2c_algo_bit drm_shmem_helper mlx5_ib drm_kms_helper syscopyarea sysfillrect sysimgblt ib_uverbs fb_sys_fops ib_core k10temp i2c_piix4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c mlx5_core sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 sg ahci libahci crct10dif_pclmul mlxfw crc32_pclmul crc32c_intel libata tls ghash_clmulni_intel tg3 psample ccp megaraid_sas pci_hyperv_intf wmi dm_mirror dm_region_hash dm_log dm_mod [11418.615123] CPU: 23 PID: 873760 Comm: ln Kdump: loaded Not tainted 5.18.0-rc3+ #1 [11418.622618] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.3.6 07/06/2021 [11418.630204] RIP: 0010:inc_nlink+0x32/0x40 [11418.634241] Code: 85 c0 74 07 83 c0 01 89 47 48 c3 f6 87 99 00 00 00 04 74 16 48 8b 47 28 f0 48 ff 88 48 04 00 00 8b 47 48 83 c0 01 89 47 48 c3 <0f> 0b eb e6 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 01 00 [11418.653006] RSP: 0018:ffffab4122027cc8 EFLAGS: 00010246 [11418.658249] RAX: 0000000000000000 RBX: ffff9f18cc101e88 RCX: 0000000000000001 [11418.665397] RDX: ffff9f18c6b88000 RSI: 0000000000000001 RDI: ffff9f18d1bcece8 [11418.672545] RBP: ffffab4122027dc8 R08: 0000000000000000 R09: ffff9f1aae6273e8 [11418.679695] R10: 0000000000000003 R11: 0000000000000003 R12: ffff9f18d1bced70 [11418.686840] R13: ffff9f18d1bcece8 R14: 0000000100a9a299 R15: ffffab4122027d8c [11418.693987] FS: 00007f43bdd1b740(0000) GS:ffff9f277f3c0000(0000) knlGS:0000000000000000 [11418.702089] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11418.707853] CR2: 00007ffa9538c0e0 CR3: 000000010d6ca000 CR4: 0000000000350ee0 [11418.715001] Call Trace: [11418.717457] <TASK> [11418.719624] _nfs4_proc_link+0x20f/0x260 [nfsv4] [11418.724275] nfs4_proc_link+0x65/0xa0 [nfsv4] [11418.728664] nfs_link+0x6e/0x160 [nfs] [11418.732441] vfs_link+0x20f/0x320 [11418.735771] do_linkat+0x212/0x2e0 [11418.739184] __x64_sys_linkat+0x56/0x70 [11418.743031] do_syscall_64+0x3b/0x90 [11418.746619] entry_SYSCALL_64_after_hwframe+0x44/0xae [11418.751679] RIP: 0033:0x7f43bda43a2e [11418.755262] Code: 48 8b 0d fd 63 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 09 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ca 63 1b 00 f7 d8 64 89 01 48 [11418.774097] RSP: 002b:00007ffd9efd8fb8 EFLAGS: 00000256 ORIG_RAX: 0000000000000109 [11418.781682] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f43bda43a2e [11418.788829] RDX: 0000000000000003 RSI: 00007ffd9efdacee RDI: 00000000ffffff9c [11418.795972] RBP: 00007ffd9efdacee R08: 0000000000000000 R09: 0000000000000000 [11418.803117] R10: 00005649754b1304 R11: 0000000000000256 R12: 0000000000000100 [11418.810262] R13: 00007ffd9efdad00 R14: 00007ffd9efdacee R15: 00005649754b1304 [11418.817418] </TASK> [11418.819619] ---[ end trace 0000000000000000 ]--- [11436.131940] 8[1540067]: segfault at 8 ip 00007f3831ca9680 sp 00007ffcac7ed818 error 4 in ld-linux-x86-64.so.2[7f3831c8a000+26000] [11436.143630] Code: 5b 4c 89 f0 5d 41 5c 41 5d 41 5e c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 41 ff ff 6f 29 d0 48 8b 54 c6 40 48 8b 46 68 <48> 8b 40 08 f6 86 1e 03 00 00 20 74 03 48 03 06 48 85 d2 74 2b 48 ''' Thanks for exercising the proposed patch. nfs4_proc_link is client-side code. IIRC your reproducer runs loopback -- client and server are the same system. At a guess I think this backtrace is a different issue entirely. Perhaps it should be reported as a separate bug. Would it make sense to continue running the reproducer for a while to see if you can trigger any other failures? (In reply to Chuck Lever from comment #8) > Thanks for exercising the proposed patch. > > nfs4_proc_link is client-side code. IIRC your reproducer runs loopback -- > client and server are the same system. At a guess I think this backtrace is > a different issue entirely. Perhaps it should be reported as a separate bug. Hi Chuck, Good to know that. > > Would it make sense to continue running the reproducer for a while to see if > you can trigger any other failures? no more failures were found after new round of testing. and if without the patch I can always reproduce the issue on latest upstream kernel (5.18.0-0.rc4.20220427git46cf2c613f4b10e) so the patch works well, VERIFIED; good job! (In reply to JianHong Yin from comment #9) > no more failures were found after new round of testing. > and if without the patch I can always reproduce the issue on latest upstream > kernel (5.18.0-0.rc4.20220427git46cf2c613f4b10e) > > so the patch works well, VERIFIED; good job! Thank you very much for your help. I will queue this series for v5.19 with a Tested-by. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:8267 |