Bug 1713937

Summary: rpc.mountd dumps core with nfs-utils 2.3.4
Product: [Fedora] Fedora Reporter: Bojan Smojver <bojan>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: 30CC: amessina, bfields, jlayton, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: nfs-utils-2.3.4-2.fc30 nfs-utils-2.3.3-4.rc2.fc29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-25 01:25:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bojan Smojver 2019-05-25 21:27:55 UTC
Description of problem:

rpc.mountd dumps core. Reverting to previous build makes the NFS server work again.

      PID: 3056 (rpc.mountd)
       UID: 0 (root)
       GID: 0 (root)
    Signal: 11 (SEGV)
 Timestamp: Fri 2019-05-24 15:43:23 AEST (10min ago)

Command Line: /usr/sbin/rpc.mountd
 Executable: /usr/sbin/rpc.mountd
 Control Group: /system.slice/nfs-mountd.service
 Unit: nfs-mountd.service
 Slice: system.slice
 Boot ID: <boot>
 Machine ID: <machine>
 Hostname: <host>
 Storage: /var/lib/systemd/coredump/core.rpc\x2emountd.0.9a81e480746b479d9c8ae9618ff17404.3056.1558676603000000.lz4 
Message: Process 3056 (rpc.mountd) of user 0 dumped core.<

            Stack trace of thread 3056:
            #0  0x000055b227043f5f n/a (rpc.mountd)
            #1  0x000055b22704418d n/a (rpc.mountd)
            #2  0x000055b22703df13 n/a (rpc.mountd)
            #3  0x000055b22703e34b n/a (rpc.mountd)
            #4  0x000055b22703ae32 n/a (rpc.mountd)
            #5  0x000055b22703ce93 n/a (rpc.mountd)
            #6  0x000055b22703d3a0 n/a (rpc.mountd)
            #7  0x000055b2270380f0 n/a (rpc.mountd)
            #8  0x00007f15da422f33 __libc_start_main (libc.so.6)
            #9  0x000055b22703823e n/a (rpc.mountd)


Version-Release number of selected component (if applicable):
2.3.4-1

How reproducible:
Always.


Steps to Reproduce:
1. Attempt to mount an NFS directory.
2. rpc.mountd crashes.

Actual results:
SIGSEGV.

Expected results:
Previous RPM 2.3.3-7.rc2 works fine

Comment 1 Bojan Smojver 2019-05-27 11:12:54 UTC
I tested this version with kernel 5.1.5 as well and got the same result. Core dump on the server side and a kernel Oops on the client side. Reverting to previous version makes things work.

The client works against EFS, but not against itself on F30.

Comment 2 Bojan Smojver 2019-05-27 12:11:46 UTC
For posterity, client side Oops-es:
--------------------
May 27 17:51:14 host kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
May 27 17:51:14 host kernel: #PF error: [normal kernel read fault]
May 27 17:51:14 host kernel: PGD 0 P4D 0 
May 27 17:51:14 host kernel: Oops: 0000 [#1] SMP PTI
May 27 17:51:14 host kernel: CPU: 0 PID: 3415 Comm: automount Not tainted 5.1.5-300.fc30.x86_64 #1
May 27 17:51:14 host kernel: Hardware name: LENOVO 20BXCTO1WW/20BXCTO1WW, BIOS JBET72WW (1.36 ) 02/23/2019
May 27 17:51:14 host kernel: RIP: 0010:xprt_adjust_timeout+0x9/0xe0 [sunrpc]
May 27 17:51:14 host kernel: Code: 05 00 01 00 00 48 89 83 f8 00 00 00 5b 5d 41 5c 41 5d 41 5e c3 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 41 54 55 53 <48> 8b 87 98 00 00 00 48 89 fb 48 8b 80 a8 00 00 00 48 8b 68 78 48
May 27 17:51:14 host kernel: RSP: 0018:ffffb99a0174baf0 EFLAGS: 00010207
May 27 17:51:14 host kernel: RAX: 00000000fffffff5 RBX: ffff94505f378a00 RCX: 0000000000000003
May 27 17:51:14 host kernel: RDX: ffff94506252cac0 RSI: 00000000fffffe01 RDI: 0000000000000000
May 27 17:51:14 host kernel: RBP: ffff94505ea04500 R08: ffff94506252ca90 R09: ffff94506252cac0
May 27 17:51:14 host kernel: R10: 0000000000000003 R11: ffffe6e10752ce20 R12: ffff944ffcbd3400
May 27 17:51:14 host kernel: R13: ffff94505ea04530 R14: 0000000000004080 R15: ffffffffc07b13f0
May 27 17:51:14 host kernel: FS:  00007f262a1e1700(0000) GS:ffff945065a00000(0000) knlGS:0000000000000000
May 27 17:51:14 host kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 27 17:51:14 host kernel: CR2: 0000000000000098 CR3: 000000021fb0a005 CR4: 00000000003606f0
May 27 17:51:14 host kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 27 17:51:14 host kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
May 27 17:51:14 host kernel: Call Trace:
May 27 17:51:14 host kernel:  rpc_check_timeout+0x1e/0xe0 [sunrpc]
May 27 17:51:14 host kernel:  call_decode+0x123/0x180 [sunrpc]
May 27 17:51:14 host kernel:  __rpc_execute+0x7c/0x330 [sunrpc]
May 27 17:51:14 host kernel:  ? recalibrate_cpu_khz+0x10/0x10
May 27 17:51:14 host kernel:  rpc_run_task+0x10a/0x140 [sunrpc]
May 27 17:51:14 host kernel:  nfs4_call_sync_sequence+0x68/0xa0 [nfsv4]
May 27 17:51:14 host kernel:  _nfs4_proc_getattr+0xfb/0x120 [nfsv4]
May 27 17:51:14 host kernel:  nfs4_proc_getattr+0x73/0x110 [nfsv4]
May 27 17:51:14 host kernel:  __nfs_revalidate_inode+0x11a/0x2f0 [nfs]
May 27 17:51:14 host kernel:  nfs_getattr+0x115/0x2a0 [nfs]
May 27 17:51:14 host kernel:  ? security_inode_getattr+0x3a/0x50
May 27 17:51:14 host kernel:  vfs_statx+0x94/0xf0
May 27 17:51:14 host kernel:  __do_sys_newlstat+0x39/0x70
May 27 17:51:14 host kernel:  do_syscall_64+0x5b/0x170
May 27 17:51:14 host kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 27 17:51:14 host kernel: RIP: 0033:0x7f2642542599
May 27 17:51:14 host kernel: Code: ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 49 89 f0 48 89 d6 83 ff 01 77 31 4c 89 c7 b8 06 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 07 c3 66 0f 1f 44 00 00 48 8b 15 b9 48 0d 00
May 27 17:51:14 host kernel: RSP: 002b:00007f262a1deb48 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
May 27 17:51:14 host kernel: RAX: ffffffffffffffda RBX: 0000556f2b47f2b0 RCX: 00007f2642542599
May 27 17:51:14 host kernel: RDX: 00007f262a1deb70 RSI: 00007f262a1deb70 RDI: 00007f262a1dec50
May 27 17:51:14 host kernel: RBP: 0000556f2b47f2b0 R08: 00007f262a1dec50 R09: 00007f262a1deae0
May 27 17:51:14 host kernel: R10: 00007f262a1dfd20 R11: 0000000000000246 R12: 00007f262a1dec50
May 27 17:51:14 host kernel: R13: 0000000000000001 R14: 0000556f2a3a09a0 R15: 00007f262a1e0e40
May 27 17:51:14 host kernel: Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables rmi_smbus rmi_core vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm mei_wdt mei_hdcp irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf joydev intel_pch_thermal i2c_i801 mei_me mei pcc_cpufreq auth_rpcgss sunrpc i915 rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper e1000e crc32c_intel drm serio_raw rtsx_pci wmi video
May 27 17:51:14 host kernel: CR2: 0000000000000098
May 27 17:51:14 host kernel: ---[ end trace 9dc0e1a830bdaf41 ]---
May 27 17:51:16 host kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
May 27 17:51:16 host kernel: #PF error: [normal kernel read fault]
May 27 17:51:16 host kernel: PGD 0 P4D 0 
May 27 17:51:16 host kernel: Oops: 0000 [#2] SMP PTI
May 27 17:51:16 host kernel: CPU: 3 PID: 166 Comm: kworker/u16:3 Tainted: G      D           5.1.5-300.fc30.x86_64 #1
May 27 17:51:16 host kernel: Hardware name: LENOVO 20BXCTO1WW/20BXCTO1WW, BIOS JBET72WW (1.36 ) 02/23/2019
May 27 17:51:16 host kernel: Workqueue: rpciod rpc_async_schedule [sunrpc]
May 27 17:51:16 host kernel: RIP: 0010:xprt_adjust_timeout+0x9/0xe0 [sunrpc]
May 27 17:51:16 host kernel: Code: 05 00 01 00 00 48 89 83 f8 00 00 00 5b 5d 41 5c 41 5d 41 5e c3 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 41 54 55 53 <48> 8b 87 98 00 00 00 48 89 fb 48 8b 80 a8 00 00 00 48 8b 68 78 48
May 27 17:51:16 host kernel: RSP: 0018:ffffb99a01067d68 EFLAGS: 00010207
May 27 17:51:16 host kernel: RAX: 00000000fffffff5 RBX: ffff94505f37b600 RCX: 0000000000000003
May 27 17:51:16 host kernel: RDX: ffff94506252cac0 RSI: 00000000fffffe01 RDI: 0000000000000000
May 27 17:51:16 host kernel: RBP: ffff94505fecc800 R08: ffff94506252ca90 R09: ffff94506252cac0
May 27 17:51:16 host kernel: R10: 0000000000000003 R11: 0000000000000018 R12: ffff94505f379000
May 27 17:51:16 host kernel: R13: ffff94505fecc830 R14: 0000000000005a81 R15: ffffffffc07b13f0
May 27 17:51:16 host kernel: FS:  0000000000000000(0000) GS:ffff945065ac0000(0000) knlGS:0000000000000000
May 27 17:51:16 host kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 27 17:51:16 host kernel: CR2: 0000000000000098 CR3: 000000014520e006 CR4: 00000000003606e0
May 27 17:51:16 host kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 27 17:51:16 host kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
May 27 17:51:16 host kernel: Call Trace:
May 27 17:51:16 host kernel:  rpc_check_timeout+0x1e/0xe0 [sunrpc]
May 27 17:51:16 host kernel:  call_decode+0x123/0x180 [sunrpc]
May 27 17:51:16 host kernel:  __rpc_execute+0x7c/0x330 [sunrpc]
May 27 17:51:16 host kernel:  rpc_async_schedule+0x29/0x40 [sunrpc]
May 27 17:51:16 host kernel:  process_one_work+0x19d/0x380
May 27 17:51:16 host kernel:  worker_thread+0x50/0x3b0
May 27 17:51:16 host kernel:  kthread+0xfb/0x130
May 27 17:51:16 host kernel:  ? process_one_work+0x380/0x380
May 27 17:51:16 host kernel:  ? kthread_park+0x90/0x90
May 27 17:51:16 host kernel:  ret_from_fork+0x35/0x40
May 27 17:51:16 host kernel: Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables rmi_smbus rmi_core vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm mei_wdt mei_hdcp irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf joydev intel_pch_thermal i2c_i801 mei_me mei pcc_cpufreq auth_rpcgss sunrpc i915 rtsx_pci_sdmmc mmc_core i2c_algo_bit drm_kms_helper e1000e crc32c_intel drm serio_raw rtsx_pci wmi video
May 27 17:51:16 host kernel: CR2: 0000000000000098
May 27 17:51:16 host kernel: ---[ end trace 9dc0e1a830bdaf42 ]---
--------------------

Comment 3 Steve Dickson 2019-05-27 19:08:36 UTC
I am not able to reproduce this... 


(In reply to Bojan Smojver from comment #1)
> I tested this version with kernel 5.1.5 as well and got the same result.
> Core dump on the server side and a kernel Oops on the client side. Reverting
> to previous version makes things work.
Would it be possible to do a dnf debuginfo-install nfs-utils-2.3.4-1.fc30.x86_64
which should produce a populated backtrace.

> 
> The client works against EFS, but not against itself on F30.
I don't understand what this means...

Comment 4 Bojan Smojver 2019-05-27 20:27:42 UTC
Trace with debuginfo installed:
--------------------------
           PID: 3704 (rpc.mountd)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 11 (SEGV)
     Timestamp: Tue 2019-05-28 06:18:23 AEST (1min 12s ago)
  Command Line: /usr/sbin/rpc.mountd
    Executable: /usr/sbin/rpc.mountd
 Control Group: /system.slice/nfs-mountd.service
          Unit: nfs-mountd.service
         Slice: system.slice
       Boot ID: 0ecbf9e012f44868b6051b46f73caa50
    Machine ID: <machine>
      Hostname: <host>
       Storage: /var/lib/systemd/coredump/core.rpc\x2emountd.0.0ecbf9e012f44868b6051b46f73caa50.3704.1558988303000000.lz4
       Message: Process 3704 (rpc.mountd) of user 0 dumped core.
                
                Stack trace of thread 3704:
                #0  0x0000555be4766f5f DoMatch (rpc.mountd)
                #1  0x0000555be476718d wildmat (rpc.mountd)
                #2  0x0000555be4760f13 check_wildcard (rpc.mountd)
                #3  0x0000555be476134b client_compose (rpc.mountd)
                #4  0x0000555be475de32 auth_unix_ip (rpc.mountd)
                #5  0x0000555be475fe93 cache_process_req (rpc.mountd)
                #6  0x0000555be47603a0 my_svc_run (rpc.mountd)
                #7  0x0000555be475b0f0 main (rpc.mountd)
                #8  0x00007fea3deb0f33 __libc_start_main (libc.so.6)
                #9  0x0000555be475b23e _start (rpc.mountd)
--------------------------

>> The client works against EFS, but not against itself on F30.
> I don't understand what this means...

Client F30, nfs-utils 2.3.4 --> Server F30, nfs-utils 2.3.4: server core dumps, client kernel Oops-es
Client F30, nfs-utils 2.3.4 --> Server Amazon EFS (i.e. NFSv4 essentially), works fine on both ends

I tried both of the above scenarios with kernel 5.1.4-1.300 and 5.1.5-1.300. Same.

My exports file (F30), if it matters:
--------------------------
#
/home/groups *.<my-domain>(rw,sync,sec=krb5)
/home/users *.<my-domain>(rw,sync,sec=krb5)
--------------------------

I tried without sec=krb5 too. Same result, if I remember correctly. The FS on client side is mounted through autofs, but I tried by hand. Same.

Comment 5 Bojan Smojver 2019-05-27 20:37:18 UTC
(In reply to Bojan Smojver from comment #4)

> I tried both of the above scenarios with kernel 5.1.4-1.300 and 5.1.5-1.300.

Er, I meant -300 there, not -1.300, of course.

Comment 6 Steve Dickson 2019-05-28 14:19:04 UTC
Could you please try this scratch build of nfs-utils
   https://koji.fedoraproject.org/koji/taskinfo?taskID=35111648 

It contains an upstream patch that I believe fixes the problem.

Comment 7 Bojan Smojver 2019-05-28 22:28:44 UTC
(In reply to Steve Dickson from comment #6)
> Could you please try this scratch build of nfs-utils
>    https://koji.fedoraproject.org/koji/taskinfo?taskID=35111648 
> 
> It contains an upstream patch that I believe fixes the problem.

Thanks for the quick turnaround, it works. No core dumps on the server.

I did see a kernel Oops once since the upgrade of the client, but not after I rebooted it. If it happens again, I'll let you know.

Comment 8 Fedora Update System 2019-05-29 18:31:33 UTC
FEDORA-2019-06f611666c has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-06f611666c

Comment 9 Fedora Update System 2019-05-29 18:31:37 UTC
FEDORA-2019-06f611666c has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-06f611666c

Comment 10 Fedora Update System 2019-05-29 18:42:45 UTC
FEDORA-2019-4cefd3161a has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-4cefd3161a

Comment 11 Fedora Update System 2019-05-29 18:42:48 UTC
FEDORA-2019-4cefd3161a has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-4cefd3161a

Comment 12 Fedora Update System 2019-05-30 13:57:29 UTC
nfs-utils-2.3.4-2.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-06f611666c

Comment 13 Fedora Update System 2019-05-30 15:34:41 UTC
nfs-utils-2.3.3-4.rc2.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-4cefd3161a

Comment 14 Bojan Smojver 2019-06-06 23:22:23 UTC
Steve,

Can I ask a general question here about those kernel Oops that I posted in this bug and that have been reported using abrt. Surely, a buggy nfs-utils package should not be able to cause those in the kernel, right? These are kernel bugs, correct?

Comment 15 J. Bruce Fields 2019-06-07 00:19:50 UTC
(In reply to Bojan Smojver from comment #14)
> Can I ask a general question here about those kernel Oops that I posted in
> this bug and that have been reported using abrt. Surely, a buggy nfs-utils
> package should not be able to cause those in the kernel, right? These are
> kernel bugs, correct?

Yes, that's a kernel bug probably unrelated to the mountd bug.  I don't recognize it.

Comment 16 Fedora Update System 2019-06-25 01:25:17 UTC
nfs-utils-2.3.4-2.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 17 Fedora Update System 2019-11-24 01:54:58 UTC
nfs-utils-2.3.3-4.rc2.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.