Description of problem: rpc.mountd dumps core. Reverting to previous build makes the NFS server work again. PID: 3056 (rpc.mountd) UID: 0 (root) GID: 0 (root) Signal: 11 (SEGV) Timestamp: Fri 2019-05-24 15:43:23 AEST (10min ago) Command Line: /usr/sbin/rpc.mountd Executable: /usr/sbin/rpc.mountd Control Group: /system.slice/nfs-mountd.service Unit: nfs-mountd.service Slice: system.slice Boot ID: <boot> Machine ID: <machine> Hostname: <host> Storage: /var/lib/systemd/coredump/core.rpc\x2emountd.0.9a81e480746b479d9c8ae9618ff17404.3056.1558676603000000.lz4 Message: Process 3056 (rpc.mountd) of user 0 dumped core.< Stack trace of thread 3056: #0 0x000055b227043f5f n/a (rpc.mountd) #1 0x000055b22704418d n/a (rpc.mountd) #2 0x000055b22703df13 n/a (rpc.mountd) #3 0x000055b22703e34b n/a (rpc.mountd) #4 0x000055b22703ae32 n/a (rpc.mountd) #5 0x000055b22703ce93 n/a (rpc.mountd) #6 0x000055b22703d3a0 n/a (rpc.mountd) #7 0x000055b2270380f0 n/a (rpc.mountd) #8 0x00007f15da422f33 __libc_start_main (libc.so.6) #9 0x000055b22703823e n/a (rpc.mountd) Version-Release number of selected component (if applicable): 2.3.4-1 How reproducible: Always. Steps to Reproduce: 1. Attempt to mount an NFS directory. 2. rpc.mountd crashes. Actual results: SIGSEGV. Expected results: Previous RPM 2.3.3-7.rc2 works fine
I tested this version with kernel 5.1.5 as well and got the same result. Core dump on the server side and a kernel Oops on the client side. Reverting to previous version makes things work. The client works against EFS, but not against itself on F30.
For posterity, client side Oops-es: -------------------- May 27 17:51:14 host kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 May 27 17:51:14 host kernel: #PF error: [normal kernel read fault] May 27 17:51:14 host kernel: PGD 0 P4D 0 May 27 17:51:14 host kernel: Oops: 0000 [#1] SMP PTI May 27 17:51:14 host kernel: CPU: 0 PID: 3415 Comm: automount Not tainted 5.1.5-300.fc30.x86_64 #1 May 27 17:51:14 host kernel: Hardware name: LENOVO 20BXCTO1WW/20BXCTO1WW, BIOS JBET72WW (1.36 ) 02/23/2019 May 27 17:51:14 host kernel: RIP: 0010:xprt_adjust_timeout+0x9/0xe0 [sunrpc] May 27 17:51:14 host kernel: Code: 05 00 01 00 00 48 89 83 f8 00 00 00 5b 5d 41 5c 41 5d 41 5e c3 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 41 54 55 53 <48> 8b 87 98 00 00 00 48 89 fb 48 8b 80 a8 00 00 00 48 8b 68 78 48 May 27 17:51:14 host kernel: RSP: 0018:ffffb99a0174baf0 EFLAGS: 00010207 May 27 17:51:14 host kernel: RAX: 00000000fffffff5 RBX: ffff94505f378a00 RCX: 0000000000000003 May 27 17:51:14 host kernel: RDX: ffff94506252cac0 RSI: 00000000fffffe01 RDI: 0000000000000000 May 27 17:51:14 host kernel: RBP: ffff94505ea04500 R08: ffff94506252ca90 R09: ffff94506252cac0 May 27 17:51:14 host kernel: R10: 0000000000000003 R11: ffffe6e10752ce20 R12: ffff944ffcbd3400 May 27 17:51:14 host kernel: R13: ffff94505ea04530 R14: 0000000000004080 R15: ffffffffc07b13f0 May 27 17:51:14 host kernel: FS: 00007f262a1e1700(0000) GS:ffff945065a00000(0000) knlGS:0000000000000000 May 27 17:51:14 host kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 27 17:51:14 host kernel: CR2: 0000000000000098 CR3: 000000021fb0a005 CR4: 00000000003606f0 May 27 17:51:14 host kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 27 17:51:14 host kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 May 27 17:51:14 host kernel: Call Trace: May 27 17:51:14 host kernel: rpc_check_timeout+0x1e/0xe0 [sunrpc] May 27 17:51:14 host kernel: call_decode+0x123/0x180 [sunrpc] May 27 17:51:14 host kernel: __rpc_execute+0x7c/0x330 [sunrpc] May 27 17:51:14 host kernel: ? recalibrate_cpu_khz+0x10/0x10 May 27 17:51:14 host kernel: rpc_run_task+0x10a/0x140 [sunrpc] May 27 17:51:14 host kernel: nfs4_call_sync_sequence+0x68/0xa0 [nfsv4] May 27 17:51:14 host kernel: _nfs4_proc_getattr+0xfb/0x120 [nfsv4] May 27 17:51:14 host kernel: nfs4_proc_getattr+0x73/0x110 [nfsv4] May 27 17:51:14 host kernel: __nfs_revalidate_inode+0x11a/0x2f0 [nfs] May 27 17:51:14 host kernel: nfs_getattr+0x115/0x2a0 [nfs] May 27 17:51:14 host kernel: ? security_inode_getattr+0x3a/0x50 May 27 17:51:14 host kernel: vfs_statx+0x94/0xf0 May 27 17:51:14 host kernel: __do_sys_newlstat+0x39/0x70 May 27 17:51:14 host kernel: do_syscall_64+0x5b/0x170 May 27 17:51:14 host kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 May 27 17:51:14 host kernel: RIP: 0033:0x7f2642542599 May 27 17:51:14 host kernel: Code: ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 49 89 f0 48 89 d6 83 ff 01 77 31 4c 89 c7 b8 06 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 07 c3 66 0f 1f 44 00 00 48 8b 15 b9 48 0d 00 May 27 17:51:14 host kernel: RSP: 002b:00007f262a1deb48 EFLAGS: 00000246 ORIG_RAX: 0000000000000006 May 27 17:51:14 host kernel: RAX: ffffffffffffffda RBX: 0000556f2b47f2b0 RCX: 00007f2642542599 May 27 17:51:14 host kernel: RDX: 00007f262a1deb70 RSI: 00007f262a1deb70 RDI: 00007f262a1dec50 May 27 17:51:14 host kernel: RBP: 0000556f2b47f2b0 R08: 00007f262a1dec50 R09: 00007f262a1deae0 May 27 17:51:14 host kernel: R10: 00007f262a1dfd20 R11: 0000000000000246 R12: 00007f262a1dec50 May 27 17:51:14 host kernel: R13: 0000000000000001 R14: 0000556f2a3a09a0 R15: 00007f262a1e0e40 May 27 17:51:14 host kernel: Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables rmi_smbus rmi_core vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm mei_wdt mei_hdcp irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf joydev intel_pch_thermal i2c_i801 mei_me mei pcc_cpufreq auth_rpcgss sunrpc i915 rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper e1000e crc32c_intel drm serio_raw rtsx_pci wmi video May 27 17:51:14 host kernel: CR2: 0000000000000098 May 27 17:51:14 host kernel: ---[ end trace 9dc0e1a830bdaf41 ]--- May 27 17:51:16 host kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 May 27 17:51:16 host kernel: #PF error: [normal kernel read fault] May 27 17:51:16 host kernel: PGD 0 P4D 0 May 27 17:51:16 host kernel: Oops: 0000 [#2] SMP PTI May 27 17:51:16 host kernel: CPU: 3 PID: 166 Comm: kworker/u16:3 Tainted: G D 5.1.5-300.fc30.x86_64 #1 May 27 17:51:16 host kernel: Hardware name: LENOVO 20BXCTO1WW/20BXCTO1WW, BIOS JBET72WW (1.36 ) 02/23/2019 May 27 17:51:16 host kernel: Workqueue: rpciod rpc_async_schedule [sunrpc] May 27 17:51:16 host kernel: RIP: 0010:xprt_adjust_timeout+0x9/0xe0 [sunrpc] May 27 17:51:16 host kernel: Code: 05 00 01 00 00 48 89 83 f8 00 00 00 5b 5d 41 5c 41 5d 41 5e c3 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 41 54 55 53 <48> 8b 87 98 00 00 00 48 89 fb 48 8b 80 a8 00 00 00 48 8b 68 78 48 May 27 17:51:16 host kernel: RSP: 0018:ffffb99a01067d68 EFLAGS: 00010207 May 27 17:51:16 host kernel: RAX: 00000000fffffff5 RBX: ffff94505f37b600 RCX: 0000000000000003 May 27 17:51:16 host kernel: RDX: ffff94506252cac0 RSI: 00000000fffffe01 RDI: 0000000000000000 May 27 17:51:16 host kernel: RBP: ffff94505fecc800 R08: ffff94506252ca90 R09: ffff94506252cac0 May 27 17:51:16 host kernel: R10: 0000000000000003 R11: 0000000000000018 R12: ffff94505f379000 May 27 17:51:16 host kernel: R13: ffff94505fecc830 R14: 0000000000005a81 R15: ffffffffc07b13f0 May 27 17:51:16 host kernel: FS: 0000000000000000(0000) GS:ffff945065ac0000(0000) knlGS:0000000000000000 May 27 17:51:16 host kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 27 17:51:16 host kernel: CR2: 0000000000000098 CR3: 000000014520e006 CR4: 00000000003606e0 May 27 17:51:16 host kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 27 17:51:16 host kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 May 27 17:51:16 host kernel: Call Trace: May 27 17:51:16 host kernel: rpc_check_timeout+0x1e/0xe0 [sunrpc] May 27 17:51:16 host kernel: call_decode+0x123/0x180 [sunrpc] May 27 17:51:16 host kernel: __rpc_execute+0x7c/0x330 [sunrpc] May 27 17:51:16 host kernel: rpc_async_schedule+0x29/0x40 [sunrpc] May 27 17:51:16 host kernel: process_one_work+0x19d/0x380 May 27 17:51:16 host kernel: worker_thread+0x50/0x3b0 May 27 17:51:16 host kernel: kthread+0xfb/0x130 May 27 17:51:16 host kernel: ? process_one_work+0x380/0x380 May 27 17:51:16 host kernel: ? kthread_park+0x90/0x90 May 27 17:51:16 host kernel: ret_from_fork+0x35/0x40 May 27 17:51:16 host kernel: Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables rmi_smbus rmi_core vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm mei_wdt mei_hdcp irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf joydev intel_pch_thermal i2c_i801 mei_me mei pcc_cpufreq auth_rpcgss sunrpc i915 rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper e1000e crc32c_intel drm serio_raw rtsx_pci wmi video May 27 17:51:16 host kernel: CR2: 0000000000000098 May 27 17:51:16 host kernel: ---[ end trace 9dc0e1a830bdaf42 ]--- --------------------
I am not able to reproduce this... (In reply to Bojan Smojver from comment #1) > I tested this version with kernel 5.1.5 as well and got the same result. > Core dump on the server side and a kernel Oops on the client side. Reverting > to previous version makes things work. Would it be possible to do a dnf debuginfo-install nfs-utils-2.3.4-1.fc30.x86_64 which should produce a populated backtrace. > > The client works against EFS, but not against itself on F30. I don't understand what this means...
Trace with debuginfo installed: -------------------------- PID: 3704 (rpc.mountd) UID: 0 (root) GID: 0 (root) Signal: 11 (SEGV) Timestamp: Tue 2019-05-28 06:18:23 AEST (1min 12s ago) Command Line: /usr/sbin/rpc.mountd Executable: /usr/sbin/rpc.mountd Control Group: /system.slice/nfs-mountd.service Unit: nfs-mountd.service Slice: system.slice Boot ID: 0ecbf9e012f44868b6051b46f73caa50 Machine ID: <machine> Hostname: <host> Storage: /var/lib/systemd/coredump/core.rpc\x2emountd.0.0ecbf9e012f44868b6051b46f73caa50.3704.1558988303000000.lz4 Message: Process 3704 (rpc.mountd) of user 0 dumped core. Stack trace of thread 3704: #0 0x0000555be4766f5f DoMatch (rpc.mountd) #1 0x0000555be476718d wildmat (rpc.mountd) #2 0x0000555be4760f13 check_wildcard (rpc.mountd) #3 0x0000555be476134b client_compose (rpc.mountd) #4 0x0000555be475de32 auth_unix_ip (rpc.mountd) #5 0x0000555be475fe93 cache_process_req (rpc.mountd) #6 0x0000555be47603a0 my_svc_run (rpc.mountd) #7 0x0000555be475b0f0 main (rpc.mountd) #8 0x00007fea3deb0f33 __libc_start_main (libc.so.6) #9 0x0000555be475b23e _start (rpc.mountd) -------------------------- >> The client works against EFS, but not against itself on F30. > I don't understand what this means... Client F30, nfs-utils 2.3.4 --> Server F30, nfs-utils 2.3.4: server core dumps, client kernel Oops-es Client F30, nfs-utils 2.3.4 --> Server Amazon EFS (i.e. NFSv4 essentially), works fine on both ends I tried both of the above scenarios with kernel 5.1.4-1.300 and 5.1.5-1.300. Same. My exports file (F30), if it matters: -------------------------- # /home/groups *.<my-domain>(rw,sync,sec=krb5) /home/users *.<my-domain>(rw,sync,sec=krb5) -------------------------- I tried without sec=krb5 too. Same result, if I remember correctly. The FS on client side is mounted through autofs, but I tried by hand. Same.
(In reply to Bojan Smojver from comment #4) > I tried both of the above scenarios with kernel 5.1.4-1.300 and 5.1.5-1.300. Er, I meant -300 there, not -1.300, of course.
Could you please try this scratch build of nfs-utils https://koji.fedoraproject.org/koji/taskinfo?taskID=35111648 It contains an upstream patch that I believe fixes the problem.
(In reply to Steve Dickson from comment #6) > Could you please try this scratch build of nfs-utils > https://koji.fedoraproject.org/koji/taskinfo?taskID=35111648 > > It contains an upstream patch that I believe fixes the problem. Thanks for the quick turnaround, it works. No core dumps on the server. I did see a kernel Oops once since the upgrade of the client, but not after I rebooted it. If it happens again, I'll let you know.
FEDORA-2019-06f611666c has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-06f611666c
FEDORA-2019-4cefd3161a has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-4cefd3161a
nfs-utils-2.3.4-2.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-06f611666c
nfs-utils-2.3.3-4.rc2.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-4cefd3161a
Steve, Can I ask a general question here about those kernel Oops that I posted in this bug and that have been reported using abrt. Surely, a buggy nfs-utils package should not be able to cause those in the kernel, right? These are kernel bugs, correct?
(In reply to Bojan Smojver from comment #14) > Can I ask a general question here about those kernel Oops that I posted in > this bug and that have been reported using abrt. Surely, a buggy nfs-utils > package should not be able to cause those in the kernel, right? These are > kernel bugs, correct? Yes, that's a kernel bug probably unrelated to the mountd bug. I don't recognize it.
nfs-utils-2.3.4-2.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.
nfs-utils-2.3.3-4.rc2.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.