Hide Forgot
Description of problem: The gssproxy hangs in a read on its socket pipe only when rpc.gssd is started by systemd. Here is the hug thread: gdb) info threads Id Target Id Frame 2 Thread 0x7fd0d12cd700 (LWP 1866) "rpc.gssd" 0x00007fd0d2dd176d in read () at ../sysdeps/unix/syscall-template.S:81 * 1 Thread 0x7fd0d3fc1840 (LWP 780) "rpc.gssd" 0x00007fd0d2af4f93 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 (gdb) thread 2 [Switching to thread 2 (Thread 0x7fd0d12cd700 (LWP 1866))] #0 0x00007fd0d2dd176d in read () at ../sysdeps/unix/syscall-template.S:81 81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 0x00007fd0d2dd176d in read () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007fd0d1917bd4 in read (__nbytes=4, __buf=0x7fd0d11cc04c, __fd=<optimized out>) at /usr/include/bits/unistd.h:44 #2 gpm_recv_buffer (gpmctx=<optimized out>, length=<optimized out>, buffer=<optimized out>) at src/client/gpm_common.c:243 #3 gpm_make_call (proc=proc@entry=6, arg=arg@entry=0x7fd0d12cc210, res=res@entry=0x7fd0d12cc170) at src/client/gpm_common.c:460 #4 0x00007fd0d1913fd9 in gpm_acquire_cred (minor_status=minor_status@entry=0x7fd0d12cc438, in_cred_handle=in_cred_handle@entry=0x0, desired_name=<optimized out>, time_req=time_req@entry=4294967295, desired_mechs=desired_mechs@entry=0x7fd0cc002440, cred_usage=cred_usage@entry=1, impersonate=impersonate@entry=false, output_cred_handle=0x7fd0cc0021e0, actual_mechs=actual_mechs@entry=0x0, time_rec=time_rec@entry=0x0) at src/client/gpm_acquire_cred.c:104 #5 0x00007fd0d1918b50 in gssi_acquire_cred_from (minor_status=0x7fd0d12cc5d4, Unfortunately the hung thread is holding a mutex which means all other nfs mounts will hang as well. Its not clear whether this is a gssproxy issue or a sysstemd issue... I'll start with gssproxy since that's where code is hug. Version-Release number of selected component (if applicable): nfs-utils-1.3.0-0.52.el7.x86_64 krb5-libs-1.15.1-17.el7.x86_64 gssproxy-0.7.0-13.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Set up an IPA server which should be used as the KDC 2. Make the nfs clients and server registered hosts and service with the IPA server 3. on the nfs client: mount -o sec=krb5 nfsserver:/mnt/tmp /mnt/tmp
The entire back trace: (gdb) thread 2 [Switching to thread 2 (Thread 0x7fd0d12cd700 (LWP 1866))] #0 0x00007fd0d2dd176d in read () at ../sysdeps/unix/syscall-template.S:81 81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 0x00007fd0d2dd176d in read () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007fd0d1917bd4 in read (__nbytes=4, __buf=0x7fd0d11cc04c, __fd=<optimized out>) at /usr/include/bits/unistd.h:44 #2 gpm_recv_buffer (gpmctx=<optimized out>, length=<optimized out>, buffer=<optimized out>) at src/client/gpm_common.c:243 #3 gpm_make_call (proc=proc@entry=6, arg=arg@entry=0x7fd0d12cc210, res=res@entry=0x7fd0d12cc170) at src/client/gpm_common.c:460 #4 0x00007fd0d1913fd9 in gpm_acquire_cred (minor_status=minor_status@entry=0x7fd0d12cc438, in_cred_handle=in_cred_handle@entry=0x0, desired_name=<optimized out>, time_req=time_req@entry=4294967295, desired_mechs=desired_mechs@entry=0x7fd0cc002440, cred_usage=cred_usage@entry=1, impersonate=impersonate@entry=false, output_cred_handle=0x7fd0cc0021e0, actual_mechs=actual_mechs@entry=0x0, time_rec=time_rec@entry=0x0) at src/client/gpm_acquire_cred.c:104 #5 0x00007fd0d1918b50 in gssi_acquire_cred_from (minor_status=0x7fd0d12cc5d4, desired_name=0x0, time_req=4294967295, desired_mechs=0x7fd0cc002440, cred_usage=1, cred_store=0x0, output_cred_handle=0x7fd0d12cc500, actual_mechs=0x0, time_rec=0x0) at src/mechglue/gpp_acquire_cred.c:165 #6 0x00007fd0d3938aa0 in gss_add_cred_from (minor_status=minor_status@entry=0x7fd0d12cc5d4, input_cred_handle=<optimized out>, desired_name=desired_name@entry=0x0, desired_mech=<optimized out>, cred_usage=cred_usage@entry=1, initiator_time_req=initiator_time_req@entry=4294967295, acceptor_time_req=acceptor_time_req@entry=4294967295, cred_store=cred_store@entry=0x0, output_cred_handle=output_cred_handle@entry=0x0, actual_mechs=actual_mechs@entry=0x0, initiator_time_rec=0x0, acceptor_time_rec=0x0) at g_acquire_cred.c:455 #7 0x00007fd0d39390e9 in gss_acquire_cred_from ( minor_status=minor_status@entry=0x7fd0d12cc6cc, desired_name=desired_name@entry=0x0, time_req=time_req@entry=4294967295, desired_mechs=desired_mechs@entry=0x7fd0d12cc6d0, cred_usage=cred_usage@entry=1, cred_store=cred_store@entry=0x0, output_cred_handle=output_cred_handle@entry=0x7fd0d12cc7c0, actual_mechs=actual_mechs@entry=0x0, time_rec=time_rec@entry=0x0) at g_acquire_cred.c:190 #8 0x00007fd0d3939304 in gss_acquire_cred (minor_status=minor_status@entry=0x7fd0d12cc6cc, ---Type <return> to continue, or q <return> to quit--- desired_name=desired_name@entry=0x0, time_req=time_req@entry=4294967295, desired_mechs=desired_mechs@entry=0x7fd0d12cc6d0, cred_usage=cred_usage@entry=1, output_cred_handle=output_cred_handle@entry=0x7fd0d12cc7c0, actual_mechs=actual_mechs@entry=0x0, time_rec=time_rec@entry=0x0) at g_acquire_cred.c:107 #9 0x00005569456af1df in gssd_acquire_krb5_cred (gss_cred=gss_cred@entry=0x7fd0d12cc7c0) at krb5_util.c:1362 #10 0x00005569456b1389 in limit_krb5_enctypes (sec=sec@entry=0x7fd0d12cc7b0) at krb5_util.c:1426 #11 0x00005569456adc6b in create_auth_rpc_client (clp=clp@entry=0x556945db8450, tgtname=tgtname@entry=0x0, clnt_return=clnt_return@entry=0x7fd0d12cccf0, auth_return=auth_return@entry=0x7fd0d12ccc48, uid=uid@entry=0, cred=cred@entry=0x0, authtype=0) at gssd_proc.c:337 #12 0x00005569456ae4e6 in krb5_use_machine_creds (clp=clp@entry=0x556945db8450, uid=uid@entry=0, tgtname=tgtname@entry=0x0, service=service@entry=0x556945dc7330 "*", rpc_clnt=rpc_clnt@entry=0x7fd0d12cccf0) at gssd_proc.c:558 #13 0x00005569456ae68d in process_krb5_upcall (clp=clp@entry=0x556945db8450, uid=uid@entry=0, fd=9, tgtname=tgtname@entry=0x0, service=service@entry=0x556945dc7330 "*") at gssd_proc.c:645 #14 0x00005569456aeed9 in handle_gssd_upcall (info=0x556945dc7310) at gssd_proc.c:804 #15 0x00007fd0d2dcadd5 in start_thread (arg=0x7fd0d12cd700) at pthread_create.c:308 #16 0x00007fd0d2af49bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
The work around to this problem is setting GSS_USE_PROXY="no" in /etc/sysconfig/nfs
1. Does it happen in Fedora? 2. How do I reproduce this?
(In reply to Robbie Harwood from comment #4) > 1. Does it happen in Fedora? IDK... I have not tried... I'll look into it. > > 2. How do I reproduce this? Steps to Reproduce: 1. Set up an IPA server which should be used as the KDC 2. Make the nfs clients and server registered hosts and service with the IPA server 3. on the nfs client: mount -o sec=krb5 nfsserver:/mnt/tmp /mnt/tmp Question: What is the read from the socket trying to get? Maybe something is not running or has failed to come up?
It is easily reproduce on a new system of machine. The first appearance is on October 30(1016.n.1), but not very often. This week the bug always reproduces in our auto test(new system). The fatal message: Failed to lock byte 100
(In reply to Robbie Harwood from comment #4) > 1. Does it happen in Fedora? Update F27: No (nfs-utils-2.2.1-1.rc1.fc27.x86_64 gssproxy-0.7.0-24.fc27.x86_64) Updated F26: Yes (nfs-utils-2.2.1-0.fc28.x86_64, gssproxy-0.7.0-9.fc26.x86_64) (gdb) info threads Id Target Id Frame 1 Thread 0x7fb552932600 (LWP 716) "rpc.gssd" 0x00007fb551241383 in epoll_wait () from /lib64/libc.so.6 2 Thread 0x7fb54fe29700 (LWP 1092) "rpc.gssd" 0x00007fb55173028d in read () from /lib64/libpthread.so.0 * 3 Thread 0x7fb54f21f700 (LWP 1095) "rpc.gssd" 0x00007fb55173028d in read () from /lib64/libpthread.so.0 4 Thread 0x7fb54ea1e700 (LWP 1096) "rpc.gssd" 0x00007fb55173028d in read () from /lib64/libpthread.so.0 5 Thread 0x7fb54e21d700 (LWP 1184) "rpc.gssd" 0x00007fb55173028d in read () from /lib64/libpthread.so.0 Updated F25: No (nfs-utils-2.1.1-5.rc4.fc27.x86_64 gssproxy-0.7.0-9.fc25.x86_64) Update rawhide: Yes but hung in epoll_wait instead read (gdb) bt #0 0x00007fe5939198d7 in epoll_wait () from /lib64/libc.so.6 #1 0x00007fe592b54a63 in gpm_epoll_wait.constprop () from /usr/lib64/gssproxy/proxymech.so #2 0x00007fe592b55162 in gpm_make_call () from /usr/lib64/gssproxy/proxymech.so #3 0x00007fe592b512ef in gpm_acquire_cred () from /usr/lib64/gssproxy/proxymech.so #4 0x00007fe592b55fe5 in gssi_acquire_cred_from () from /usr/lib64/gssproxy/proxymech.so #5 0x00007fe594971ca0 in gss_add_cred_from () from /lib64/libgssapi_krb5.so.2 #6 0x00007fe59497222e in gss_acquire_cred_from () from /lib64/libgssapi_krb5.so.2 #7 0x00007fe5949724b6 in gss_acquire_cred () from /lib64/libgssapi_krb5.so.2 #8 0x0000563d21f4d0b2 in gssd_acquire_krb5_cred () #9 0x0000563d21f4f534 in limit_krb5_enctypes () #10 0x0000563d21f4bd65 in create_auth_rpc_client.constprop () #11 0x0000563d21f4c458 in krb5_use_machine_creds () #12 0x0000563d21f4c622 in process_krb5_upcall () #13 0x0000563d21f4ce58 in handle_gssd_upcall () #14 0x00007fe593dfa55b in start_thread () from /lib64/libpthread.so.0 #15 0x00007fe5939195af in clone () from /lib64/libc.so.6 nfs-utils-2.2.1-1.rc1.fc28.1.x86_64 gssproxy-0.7.0-24.fc28.x86_64 IHTH...
There's no difference between the f27 and the rawhide gssproxy (hence why the NVR matches), so that's kind of odd. My test setup for f27 is working as of this morning's fixes; and rawhide is as well (gssproxy-0.7.0-25, nfs-utils-2.2.1-1.rc1.fc28.1.x86_64). (This is with a non-IPA KDC.) (I'm also running without KCM (yum erase sssd-kcm) because it's broken at the moment for this use case.) If you could double-check with -25 that would be helpful; otherwise I'll check with a freeipa KDC, but it may be a while since I don't have that env handy.
Steve did you restart rpc.gssd after you updated gssproxy before repeating your tests ?
(In reply to Robbie Harwood from comment #8) > If you could double-check with -25 that would be helpful; otherwise I'll > check with a freeipa KDC, but it may be a while since I don't have that env > handy. With both -24 and -25 the -o sec=krb5 on F27 do succeed but very slowly. The mount.nfs process is hung in the same spot gss_cred_init() but there must be some type of timeout that is popping causing the process to wake up and continue. On Rawhide... its the same thing... The mount take even longer. Here is what I'm looking at: f27# time mount rhel7srv:/home/tmp /mnt/tmp real 0m30.163s user 0m0.004s sys 0m0.021s fedora# time mount -o sec=krb5 rhel7srv:/home/tmp /mnt/tmp real 1m0.185s user 0m0.010s sys 0m0.029s
(In reply to Simo Sorce from comment #9) > Steve did you restart rpc.gssd after you updated gssproxy before repeating > your tests ? Yes... I updated and reboot both VMs.
I can't reproduce on rawhide or f27: [root@nfs-client vagrant]# time mount -v nfs-server.mivehind.net:/home /mnt/nfs/home -o vers=4.2 mount.nfs: timeout set for Mon Dec 4 19:59:13 2017 mount.nfs: trying text-based options 'vers=4.2,addr=192.168.56.10,clientaddr=192.168.56.15' real 0m1.644s user 0m0.006s sys 0m0.024s [root@nfs-client vagrant]# mount | grep /mnt/nfs/home nfs-server.mivehind.net:/home on /mnt/nfs/home type nfs4 (rw,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=krb5,clientaddr=192.168.56.15,local_lock=none,addr=192.168.56.10) [root@nfs-client vagrant]# rpm -qa | grep gssproxy gssproxy-debuginfo-0.7.0-25.fc27.x86_64 gssproxy-debugsource-0.7.0-25.fc27.x86_64 gssproxy-0.7.0-25.fc27.x86_64 [root@nfs-client vagrant]# rpm -qa | grep sssd-kcm [root@nfs-client vagrant]# rpm -qa | grep nfs-utils nfs-utils-2.2.1-1.rc1.fc27.x86_64 [root@nfs-client vagrant]#
This bug has caused many auto test cases to fail. Please pay more attention. Thanks.
The gssproxy fix in this bz https://bugzilla.redhat.com/show_bug.cgi?id=1507817 should fix this problem....
(In reply to Steve Dickson from comment #14) > The gssproxy fix in this bz > https://bugzilla.redhat.com/show_bug.cgi?id=1507817 > > should fix this problem.... Steve , Can we mark this verified too since https://bugzilla.redhat.com/show_bug.cgi?id=1507817 seems to be verfied as per https://bugzilla.redhat.com/show_bug.cgi?id=1507817#c96 ?
(In reply to Kaleem from comment #17) > (In reply to Steve Dickson from comment #14) > > The gssproxy fix in this bz > > https://bugzilla.redhat.com/show_bug.cgi?id=1507817 > > > > should fix this problem.... > > Steve , > > Can we mark this verified too since > https://bugzilla.redhat.com/show_bug.cgi?id=1507817 seems to be verfied as > per https://bugzilla.redhat.com/show_bug.cgi?id=1507817#c96 ? I would think so...
Marking verified as per #c18
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0709