Bug 1507817

Summary: [rhel-7.5] krb5 mounting starts to be failed as access denied by server
Product: Red Hat Enterprise Linux 7 Reporter: Yongcheng Yang <yoyang>
Component: gssproxyAssignee: Robbie Harwood <rharwood>
Status: CLOSED ERRATA QA Contact: ipa-qe <ipa-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.5CC: abokovoy, ajmitchell, amessina, fs-qe, ikent, ipa-qe, jiyin, jstodola, ksiddiqu, lkocman, lmiksik, ovasik, pasik, ppiotrowski, rharwood, rob.verduijn, salmy, steved, swhiteho, xdong, xzhou, yoyang
Target Milestone: betaKeywords: Regression, TestBlocker
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: gssproxy-0.7.0-17.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 11:09:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1514241    
Bug Blocks:    

Description Yongcheng Yang 2017-10-31 08:55:09 UTC
Description of problem:
During the early testing for RHEL-7.5 (only nightly distro available),
some krb5 mounting tests start to get failed compared with RHEL-7.4.


Version-Release number of selected component (if applicable):
gssproxy-0.7.0-9.el7.x86_64
libini_config-1.3.1-29.el7.x86_64


How reproducible:
100%


Steps to Reproduce:
1. update gssproxy (libini_config) to the version of RHEL-7.5
2. reboot machine (or other method to renew it)
3. mount -t nfs4 -o sec=krb5 $nfs_server:$export_path $mount_point


Actual results:

### Previous krb5 test in RHEL-7.4 ###
######################################
[04:06:10 root@ ~~]# klist -a
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: root

Valid starting     Expires            Service principal
31/10/17 04:06:10  01/11/17 04:06:10  krbtgt/RHQE.COM
        renew until 07/11/17 03:06:10
        Addresses: (none)
[04:06:10 root@ ~~]# service_nfs restart
Redirecting to /bin/systemctl restart nfs.service
[04:06:10 root@ ~~]# exportfs -v -i hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local -o rw,sec=sys:krb5:krb5i:krb5p
exporting hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local
[04:06:10 root@ ~~]# exportfs -v
/exportDir-krb5-common-local
                hp-dl388g8-20.rhts.eng.pek2.redhat.com(rw,sync,wdelay,hide,no_subtree_check,sec=sys:krb5:krb5i:krb5p,secure,root_squash,no_all_squash)
[04:06:10 root@ ~~]# mount -t nfs4 -o sec=sys hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local
[04:06:10 root@ ~~]# cat /proc/mounts | grep /mnt/nfsmp-krb5-common-local
hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local nfs4 rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.73.4.191,local_lock=none,addr=10.73.4.191 0 0
[04:06:10 root@ ~~]# umount /mnt/nfsmp-krb5-common-local
[04:06:11 root@ ~~]# mount -t nfs4 -o sec=krb5 hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local
[04:06:11 root@ ~~]# cat /proc/mounts | grep /mnt/nfsmp-krb5-common-local
hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local nfs4 rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=10.73.4.191,local_lock=none,addr=10.73.4.191 0 0
[04:06:11 root@ ~~]# umount /mnt/nfsmp-krb5-common-local
[04:06:11 root@ ~~]# mount -t nfs4 -o sec=krb5i hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local
[04:06:11 root@ ~~]# cat /proc/mounts | grep /mnt/nfsmp-krb5-common-local
hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local nfs4 rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5i,clientaddr=10.73.4.191,local_lock=none,addr=10.73.4.191 0 0
[04:06:11 root@ ~~]# umount /mnt/nfsmp-krb5-common-local
[04:06:11 root@ ~~]# mount -t nfs4 -o sec=krb5p hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local
[04:06:11 root@ ~~]# cat /proc/mounts | grep /mnt/nfsmp-krb5-common-local
hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local nfs4 rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5p,clientaddr=10.73.4.191,local_lock=none,addr=10.73.4.191 0 0
[04:06:11 root@ ~~]# umount /mnt/nfsmp-krb5-common-local
[04:06:11 root@ ~~]#
[04:06:11 root@ ~~]#
[04:06:11 root@ ~~]# rpm -q gssproxy
gssproxy-0.7.0-4.el7.x86_64
[04:06:11 root@ ~~]#

#### Updating package gssproxy ###
##################################
[04:06:12 root@ ~~]# cat /etc/yum.repos.d/beaker-Server.repo
[beaker-Server]
name=beaker-Server
baseurl=http://download.eng.pek2.redhat.com/pub/rhel/rel-eng/RHEL-7.4-20170711.0/compose/Server/x86_64/os
enabled=1
gpgcheck=0
skip_if_unavailable=1
[04:06:12 root@ ~~]# vim /etc/yum.repos.d/beaker-Server.repo 
[04:06:12 root@ ~~]# grep baseurl /etc/yum.repos.d/beaker-Server.repo
baseurl=http://download.eng.pek2.redhat.com/pub/rhel/nightly/RHEL-7.5-20171026.n.0/compose/Server/x86_64/os
[04:06:12 root@ ~~]# 
[04:06:12 root@ ~~]# yum update gssproxy
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered with an entitlement server. You can use subscription-manager to register.
Resolving Dependencies
--> Running transaction check
---> Package gssproxy.x86_64 0:0.7.0-4.el7 will be updated
---> Package gssproxy.x86_64 0:0.7.0-9.el7 will be an update
--> Processing Dependency: libini_config >= 1.3.1-28 for package: gssproxy-0.7.0-9.el7.x86_64
--> Running transaction check
---> Package libini_config.x86_64 0:1.3.0-27.el7 will be updated
---> Package libini_config.x86_64 0:1.3.1-29.el7 will be an update
--> Processing Conflict: gssproxy-0.7.0-9.el7.x86_64 conflicts selinux-policy < 3.13.1-166.el7.noarch
--> Restarting Dependency Resolution with new changes.
--> Running transaction check
---> Package selinux-policy.noarch 0:3.13.1-166.el7 will be updated
--> Processing Dependency: selinux-policy = 3.13.1-166.el7 for package: selinux-policy-targeted-3.13.1-166.el7.noarch
--> Processing Dependency: selinux-policy = 3.13.1-166.el7 for package: selinux-policy-targeted-3.13.1-166.el7.noarch
---> Package selinux-policy.noarch 0:3.13.1-174.el7 will be an update
--> Running transaction check
---> Package selinux-policy-targeted.noarch 0:3.13.1-166.el7 will be updated
---> Package selinux-policy-targeted.noarch 0:3.13.1-174.el7 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

=============================================================================================================================================================
 Package                                       Arch                         Version                                Repository                           Size
=============================================================================================================================================================
Updating:
 gssproxy                                      x86_64                       0.7.0-9.el7                            beaker-Server                       104 k
 selinux-policy                                noarch                       3.13.1-174.el7                         beaker-Server                       442 k
Updating for dependencies:
 libini_config                                 x86_64                       1.3.1-29.el7                           beaker-Server                        63 k
 selinux-policy-targeted                       noarch                       3.13.1-174.el7                         beaker-Server                       6.5 M

Transaction Summary
=============================================================================================================================================================
Upgrade  2 Packages (+2 Dependent packages)

Total download size: 7.1 M
Is this ok [y/d/N]: y
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
(1/4): gssproxy-0.7.0-9.el7.x86_64.rpm                                                                                                | 104 kB  00:00:00
(2/4): libini_config-1.3.1-29.el7.x86_64.rpm                                                                                          |  63 kB  00:00:00
(3/4): selinux-policy-3.13.1-174.el7.noarch.rpm                                                                                       | 442 kB  00:00:00
(4/4): selinux-policy-targeted-3.13.1-174.el7.noarch.rpm                                                                              | 6.5 MB  00:00:00
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                         32 MB/s | 7.1 MB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Updating   : selinux-policy-3.13.1-174.el7.noarch                                                                                                      1/8
  Updating   : libini_config-1.3.1-29.el7.x86_64                                                                                                         2/8
  Updating   : gssproxy-0.7.0-9.el7.x86_64                                                                                                               3/8
  Updating   : selinux-policy-targeted-3.13.1-174.el7.noarch                                                                                             4/8
Invalid policycap (cgroup_seclabel) at /etc/selinux/targeted/tmp/modules/100/base/cil:10
Failed to verify cil database
Failed to verify cil database
/usr/sbin/semodule:  Failed!
  Cleanup    : selinux-policy-targeted-3.13.1-166.el7.noarch                                                                                             5/8
  Cleanup    : gssproxy-0.7.0-4.el7.x86_64                                                                                                               6/8
  Cleanup    : selinux-policy-3.13.1-166.el7.noarch                                                                                                      7/8
  Cleanup    : libini_config-1.3.0-27.el7.x86_64                                                                                                         8/8
  Verifying  : libini_config-1.3.1-29.el7.x86_64                                                                                                         1/8
  Verifying  : gssproxy-0.7.0-9.el7.x86_64                                                                                                               2/8
  Verifying  : selinux-policy-3.13.1-174.el7.noarch                                                                                                      3/8
  Verifying  : selinux-policy-targeted-3.13.1-174.el7.noarch                                                                                             4/8
  Verifying  : libini_config-1.3.0-27.el7.x86_64                                                                                                         5/8
  Verifying  : selinux-policy-3.13.1-166.el7.noarch                                                                                                      6/8
  Verifying  : selinux-policy-targeted-3.13.1-166.el7.noarch                                                                                             7/8
  Verifying  : gssproxy-0.7.0-4.el7.x86_64                                                                                                               8/8

Updated:
  gssproxy.x86_64 0:0.7.0-9.el7                                            selinux-policy.noarch 0:3.13.1-174.el7

Dependency Updated:
  libini_config.x86_64 0:1.3.1-29.el7                                     selinux-policy-targeted.noarch 0:3.13.1-174.el7

Complete!
# Seems need to reboot machine to trigger following failure
[04:16:35 root@ ~~]# reboot

### After rebooting ###
#######################
[04:18:35 root@ ~~]# klist -a
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: root

Valid starting     Expires            Service principal
31/10/17 04:18:35  01/11/17 04:18:35  krbtgt/RHQE.COM
        renew until 07/11/17 03:18:35
        Addresses: (none)
[04:18:35 root@ ~~]# service_nfs restart
Redirecting to /bin/systemctl restart nfs.service
[04:18:35 root@ ~~]# exportfs -v -i hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local -o rw,sec=sys:krb5:krb5i:krb5p
exporting hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local
[04:18:35 root@ ~~]# exportfs -v
/exportDir-krb5-common-local
                hp-dl388g8-20.rhts.eng.pek2.redhat.com(rw,sync,wdelay,hide,no_subtree_check,sec=sys:krb5:krb5i:krb5p,secure,root_squash,no_all_squash)
[04:18:35 root@ ~~]# mount -t nfs4 -o sec=sys hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local
[04:18:36 root@ ~~]# cat /proc/mounts | grep /mnt/nfsmp-krb5-common-local
hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local nfs4 rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.73.4.191,local_lock=none,addr=10.73.4.191 0 0
[04:18:36 root@ ~~]# umount /mnt/nfsmp-krb5-common-local
[04:18:36 root@ ~~]# mount -t nfs4 -o sec=krb5 hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local
mount.nfs4: access denied by server while mounting hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[04:18:36 root@ ~~]# mount -t nfs4 -o sec=krb5i hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local
mount.nfs4: access denied by server while mounting hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[04:18:36 root@ ~~]# mount -t nfs4 -o sec=krb5p hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local /mnt/nfsmp-krb5-common-local
mount.nfs4: access denied by server while mounting hp-dl388g8-20.rhts.eng.pek2.redhat.com:/exportDir-krb5-common-local
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[04:18:36 root@ ~~]# klist -a
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: root

Valid starting     Expires            Service principal
31/10/17 04:18:35  01/11/17 04:18:35  krbtgt/RHQE.COM
        renew until 07/11/17 03:18:35
        Addresses: (none)
[04:18:36 root@ ~~]# rpm -q libini_config gssproxy
libini_config-1.3.1-29.el7.x86_64
gssproxy-0.7.0-9.el7.x86_64


Additional info:
Looks like selinux-policy has *NO* effect as it still fails
after disabling selinux.

Comment 3 Robbie Harwood 2017-10-31 15:27:42 UTC
Please try with gssproxy-0.7.0-11 and selinux DONTAUDIT rules disabled.  Thanks!

Comment 15 Robbie Harwood 2017-11-16 18:05:50 UTC
Sure, I'll take a stab.

gssproxy-0.7.0-7.el7 updates our unit file (gssproxy.service) to match that of the Fedora unit file, which I believe was changed back in May, with steved taking care of the nfs-utils side.  Nothing else changes between -6 and -7.

In particular, this removes a line `Requires=proc-fs-nfsd.mount` (which gssproxy cannot have since it can't depend on nfs-utils).  To counter this, it adds the ability to reload the gssproxy daemon (`systemctl try-reload-or-restart` in fedora; `systemctl reload-or-try-restart` in rhel).  These commands cause gssproxy to start (if it's not) and bring up the kernel interface.

I believe the current failure yoyang is reporting is that everything works fine after upgrading packages *until* the machine is restarted.  This is consistent with a unit dependency problem.

Comment 16 James 2017-11-16 21:51:12 UTC
Broken in fedora, gssproxy-0.7.0-24.fc27.x86_64. Works when I manually start gssproxy.

Comment 18 Steve Dickson 2017-11-17 16:27:33 UTC
I'm wonder if this problem is due to
    https://bugzilla.redhat.com/show_bug.cgi?id=1497267

Does setting GSS_USE_PROXY="no" in /etc/sysconfig/nfs
and then restarting rpcgssd help?

Comment 19 James 2017-11-17 19:23:50 UTC
(In reply to Steve Dickson from comment #18)
> I'm wonder if this problem is due to
>     https://bugzilla.redhat.com/show_bug.cgi?id=1497267
> 
> Does setting GSS_USE_PROXY="no" in /etc/sysconfig/nfs
> and then restarting rpcgssd help?

I modified /etc/sysconfig/nfs as described and rebooted. The mount works with gssproxy-0.7.0-24.fc27.x86_64.

Comment 20 Yongcheng Yang 2017-11-20 10:47:23 UTC
(In reply to Steve Dickson from comment #18)
> I'm wonder if this problem is due to
>     https://bugzilla.redhat.com/show_bug.cgi?id=1497267
> 
> Does setting GSS_USE_PROXY="no" in /etc/sysconfig/nfs
> and then restarting rpcgssd help?

I'm afraid this issue is a bit different.

Setting GSS_USE_PROXY="no" cannot resolve my problem:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[root@krb5-nfs-server ~]# grep GSS_USE_PROXY /etc/sysconfig/nfs 
GSS_USE_PROXY="no"
[root@krb5-nfs-server ~]# rpm -q gssproxy
gssproxy-0.7.0-7.el7.x86_64
[root@krb5-nfs-server ~]# mount -t nfs4 -o sec=krb5 krb5-nfs-server:/exportDir-krb5-common-local /mnt/
mount.nfs4: access denied by server while mounting krb5-nfs-server:/exportDir-krb5-common-local
[root@krb5-nfs-server ~]# klist -a
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: root

Valid starting     Expires            Service principal
20/11/17 10:27:22  21/11/17 10:27:22  krbtgt/RHQE.COM
	renew until 27/11/17 10:27:22
	Addresses: (none)
[root@krb5-nfs-server ~]# yum downgrade -y gssproxy-0.7.0-6.el7.x86_64.rpm &>/dev/null
[root@krb5-nfs-server ~]# rpm -q gssproxy
gssproxy-0.7.0-6.el7.x86_64
[root@krb5-nfs-server ~]# reboot
Connection to krb5-nfs-server closed by remote host.
Connection to krb5-nfs-server closed.
...

After downgrading to gssproxy-0.7.0-6.el7, it can work:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[root@krb5-nfs-server ~]# grep GSS_USE_PROXY /etc/sysconfig/nfs 
GSS_USE_PROXY="yes"
[root@krb5-nfs-server ~]# rpm -q gssproxy
gssproxy-0.7.0-6.el7.x86_64
[root@krb5-nfs-server ~]# mount -t nfs4 -o sec=krb5 krb5-nfs-server:/exportDir-krb5-common-local /mnt/
[root@krb5-nfs-server ~]# echo $?
0
[root@krb5-nfs-server ~]# klist -a
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: host/krb5-nfs-server

Valid starting     Expires            Service principal
31/12/69 19:00:00  31/12/69 19:00:00  Encrypted/Credentials/v1@X-GSSPROXY:
	Addresses: (none)
[root@krb5-nfs-server ~]# cat /proc/mounts | grep mnt
krb5-nfs-server:/exportDir-krb5-common-local /mnt nfs4 rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=10.73.4.155,local_lock=none,addr=10.73.4.155 0 0
[root@krb5-nfs-server ~]# umount /mnt/

Comment 21 Steve Dickson 2017-11-20 15:52:19 UTC
(In reply to Yongcheng Yang from comment #20)
> (In reply to Steve Dickson from comment #18)
> > I'm wonder if this problem is due to
> >     https://bugzilla.redhat.com/show_bug.cgi?id=1497267
> > 
> > Does setting GSS_USE_PROXY="no" in /etc/sysconfig/nfs
> > and then restarting rpcgssd help?
> 
> I'm afraid this issue is a bit different.
> 
> Setting GSS_USE_PROXY="no" cannot resolve my problem:
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> After downgrading to gssproxy-0.7.0-6.el7, it can work:
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
So how is this an nfs-utils problem when downing gssproxy fixes 
the problem? Nothing has changed in the nfs-utils but there
obviously there has been some change in the gssproxy code.

Comment 22 Yongcheng Yang 2017-11-21 01:05:06 UTC
(In reply to Steve Dickson from comment #21)
> (In reply to Yongcheng Yang from comment #20)
> > After downgrading to gssproxy-0.7.0-6.el7, it can work:
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> So how is this an nfs-utils problem when downing gssproxy fixes 
> the problem? Nothing has changed in the nfs-utils but there
> obviously there has been some change in the gssproxy code.

Maybe Robbie has provided some investigation in comment #5,
comment #10 and comment #15. Sorry I cannot fully understand it.

Comment 23 James 2017-11-28 21:20:30 UTC
(In reply to James from comment #19)
> (In reply to Steve Dickson from comment #18)
> > I'm wonder if this problem is due to
> >     https://bugzilla.redhat.com/show_bug.cgi?id=1497267
> > 
> > Does setting GSS_USE_PROXY="no" in /etc/sysconfig/nfs
> > and then restarting rpcgssd help?
> 
> I modified /etc/sysconfig/nfs as described and rebooted. The mount works
> with gssproxy-0.7.0-24.fc27.x86_64.

Seems like I spoke too soon. This only works on one of my machines. Don't know what's different about the others.

Comment 25 Steve Dickson 2017-11-29 15:04:07 UTC
I'm not seeing this problem... 

rhel7# mount -o sec=krb5 rhel7srv:/home/tmp /mnt/tmp
rhel7# umount /mnt/tmp

Using the following:

# rpm -q nfs-utils gssproxy
nfs-utils-1.3.0-0.52.el7.x86_64
gssproxy-0.7.0-13.el7.x86_64

But I am seeing this being logged. 


Nov 29 09:59:41 rhel7 python: SELinux is preventing /usr/sbin/rpc.gssd from connectto access on the unix_stream_socket /var/lib/gssproxy/default.sock.#012#012*****  Plugin catchall (100. confidence) suggests   **************************#012#012If you believe that rpc.gssd should be allowed connectto access on the default.sock unix_stream_socket by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'rpc.gssd' --raw | audit2allow -M my-rpcgssd#012# semodule -i my-rpcgssd.pp#012

Could this possibly be an SELinux issue?

Comment 26 Steve Dickson 2017-11-29 15:32:45 UTC
But... now after a complete RHEL7.5 nightly update I am seeing a hang on
a -o krb5 mount:

rhel7# ps ax | grep mount.nfs
 1921 pts/0    D+     0:00 /sbin/mount.nfs rhel7srv:/home/tmp /mnt/tmp -o rw,sec=krb5
 1968 pts/2    S+     0:00 grep mount.nfs
rhel7# cat /proc/1921/stack
[<ffffffffc04cfaf4>] rpc_wait_bit_killable+0x24/0xb0 [sunrpc]
[<ffffffffc04cfacd>] __rpc_wait_for_completion_task+0x2d/0x30 [sunrpc]
[<ffffffffc0837cf4>] _nfs4_proc_exchange_id+0x3a4/0x490 [nfsv4]
[<ffffffffc083e8b9>] nfs4_proc_exchange_id+0x39/0x60 [nfsv4]
[<ffffffffc084fb61>] nfs41_discover_server_trunking+0x21/0xb0 [nfsv4]
[<ffffffffc08501ad>] nfs4_discover_server_trunking+0x7d/0x2e0 [nfsv4]
[<ffffffffc0858dc6>] nfs4_init_client+0x126/0x300 [nfsv4]
[<ffffffffc07ee6d6>] nfs_get_client+0x2c6/0x3e0 [nfs]
[<ffffffffc08582a8>] nfs4_set_client+0x98/0x130 [nfsv4]
[<ffffffffc0859a9e>] nfs4_create_server+0x13e/0x3b0 [nfsv4]
[<ffffffffc0850bee>] nfs4_remote_mount+0x2e/0x60 [nfsv4]
[<ffffffffb8213989>] mount_fs+0x39/0x1b0
[<ffffffffb82307f7>] vfs_kern_mount+0x67/0x110
[<ffffffffc0850b16>] nfs_do_root_mount+0x86/0xc0 [nfsv4]
[<ffffffffc0850f14>] nfs4_try_mount+0x44/0xc0 [nfsv4]
[<ffffffffc07fbaf5>] nfs_fs_mount+0x4c5/0xd90 [nfs]
[<ffffffffb8213989>] mount_fs+0x39/0x1b0
[<ffffffffb82307f7>] vfs_kern_mount+0x67/0x110
[<ffffffffb8232d03>] do_mount+0x233/0xaf0
[<ffffffffb8233946>] SyS_mount+0x96/0xf0
[<ffffffffb86ea909>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

Now the odd thing is if I start rpc.gssd from the shell I do not see the hang
rhel7# systemctl stop rpc-gssd
rhel7# rpc.gssd -vv
rhel7# mount -o sec=krb5 rhel7srv:/home/tmp /mnt/tmp
rhel7# umount /mnt/tmp

Any idea to what this means?

Comment 27 Steve Dickson 2017-11-29 16:15:11 UTC
Here is the current rpc-gssd.service file:

[Unit]
Description=RPC security service for NFS client and server
DefaultDependencies=no
Conflicts=umount.target
Requires=rpc_pipefs.target
After=rpc_pipefs.target gssproxy.service

ConditionPathExists=/etc/krb5.keytab

PartOf=nfs-utils.service

Wants=nfs-config.service
After=nfs-config.service

[Service]
EnvironmentFile=-/run/sysconfig/nfs-utils

Type=forking
ExecStart=/usr/sbin/rpc.gssd $GSSDARGS

Is there something missing that is needed by systemd?

Comment 28 Steve Dickson 2017-11-29 19:10:51 UTC
Question... has the credit cache moved?? 

That could explain the failure that is being seen....

Comment 30 Steve Dickson 2017-11-30 12:51:54 UTC
I was finally able to reproduce access denied problem... Not sure
why all the systemd noise went away although I did disabled Selinux
which might have done the trick... 

It turns out stopping gssproxy mades the problem go away. 

rhel7# systemctl start gssproxy
rhel7# mount -o sec=krb5 rhel7srv:/home/tmp /mnt/tmp
mount.nfs: access denied by server while mounting rhel7srv:/home/tmp
rhel7# systemctl stop gssproxy
rhel7# mount -o sec=krb5 rhel7srv:/home/tmp /mnt/tmp
rhel7# umount /mnt/tmp

Yongcheng, can you please verify this is the case in your testing

If verified, the workaround to move the alpha forward is to
disable gssproxy

Comment 31 Yongcheng Yang 2017-11-30 13:52:40 UTC
(In reply to Steve Dickson from comment #30)
 
> It turns out stopping gssproxy mades the problem go away. 

Hi Steve, our testcase is to setup krb server and client in the same machine. So
stopping gssproxy may lead the whole kerberos function stop working. So IMO this
workaround cannot solve our testcase's failure (and I have checked it).

However, if client is not located with server, it may help. I'll check that.

Comment 32 Steve Dickson 2017-11-30 14:08:41 UTC
(In reply to Yongcheng Yang from comment #31)
> (In reply to Steve Dickson from comment #30)
>  
> > It turns out stopping gssproxy mades the problem go away. 
> 
> Hi Steve, our testcase is to setup krb server and client in the same
> machine. So
> stopping gssproxy may lead the whole kerberos function stop working. So IMO
> this
> workaround cannot solve our testcase's failure (and I have checked it).
> 
> However, if client is not located with server, it may help. I'll check that.

My set up is between two different vms... One being the client and the
other being the server... I think that is more of a realistic set
than loopback... Very few customers (if any) use loopback mounts
in production, esp secure mounts.

Comment 33 Steve Dickson 2017-11-30 17:39:14 UTC
Do we know this is not the the problem
    https://bugzilla.redhat.com/show_bug.cgi?id=1514241

Comment 34 Steve Dickson 2017-11-30 20:14:09 UTC
(In reply to Steve Dickson from comment #26)
> But... now after a complete RHEL7.5 nightly update I am seeing a hang on
> a -o krb5 mount:
> 
> rhel7# ps ax | grep mount.nfs
>  1921 pts/0    D+     0:00 /sbin/mount.nfs rhel7srv:/home/tmp /mnt/tmp -o
> rw,sec=krb5
>  1968 pts/2    S+     0:00 grep mount.nfs
> rhel7# cat /proc/1921/stack
> [<ffffffffc04cfaf4>] rpc_wait_bit_killable+0x24/0xb0 [sunrpc]
> [<ffffffffc04cfacd>] __rpc_wait_for_completion_task+0x2d/0x30 [sunrpc]
> [<ffffffffc0837cf4>] _nfs4_proc_exchange_id+0x3a4/0x490 [nfsv4]
> [<ffffffffc083e8b9>] nfs4_proc_exchange_id+0x39/0x60 [nfsv4]
> [<ffffffffc084fb61>] nfs41_discover_server_trunking+0x21/0xb0 [nfsv4]
> [<ffffffffc08501ad>] nfs4_discover_server_trunking+0x7d/0x2e0 [nfsv4]
> [<ffffffffc0858dc6>] nfs4_init_client+0x126/0x300 [nfsv4]
> [<ffffffffc07ee6d6>] nfs_get_client+0x2c6/0x3e0 [nfs]
> [<ffffffffc08582a8>] nfs4_set_client+0x98/0x130 [nfsv4]
> [<ffffffffc0859a9e>] nfs4_create_server+0x13e/0x3b0 [nfsv4]
> [<ffffffffc0850bee>] nfs4_remote_mount+0x2e/0x60 [nfsv4]
> [<ffffffffb8213989>] mount_fs+0x39/0x1b0
> [<ffffffffb82307f7>] vfs_kern_mount+0x67/0x110
> [<ffffffffc0850b16>] nfs_do_root_mount+0x86/0xc0 [nfsv4]
> [<ffffffffc0850f14>] nfs4_try_mount+0x44/0xc0 [nfsv4]
> [<ffffffffc07fbaf5>] nfs_fs_mount+0x4c5/0xd90 [nfs]
> [<ffffffffb8213989>] mount_fs+0x39/0x1b0
> [<ffffffffb82307f7>] vfs_kern_mount+0x67/0x110
> [<ffffffffb8232d03>] do_mount+0x233/0xaf0
> [<ffffffffb8233946>] SyS_mount+0x96/0xf0
> [<ffffffffb86ea909>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> Now the odd thing is if I start rpc.gssd from the shell I do not see the hang
> rhel7# systemctl stop rpc-gssd
> rhel7# rpc.gssd -vv
> rhel7# mount -o sec=krb5 rhel7srv:/home/tmp /mnt/tmp
> rhel7# umount /mnt/tmp
> 
> Any idea to what this means?

This hang comes from bug 1519511

Comment 35 Steve Dickson 2017-11-30 20:44:50 UTC
I'm beginning think this might be a configuration issue.

QE is used krb5kdc as their KDC and I'm using an IPA server
as my KDC is one difference and mounts are working in my
world and not in theirs... that is the other difference :-)

A third difference is QE is running the their KDC on their
NFS server, which may not be a big deal other than making
it harder to debug because there a not a network between
them, its just another difference.

Now, I'm not saying we should switch over to an IPA server,
we definitely should get the krb5kdc environment working
but I don't think we should hold the alpha for it. 

Here is why I think its a configuration issue... in the
IPA world, when the client does the mount I see the following
network traffic to the KDC.

No.  Time           Source       Destination   Protocol Length Info
2857 8.926648824    172.31.1.58  172.31.1.3    KRB5     282    AS-REQ
2859 8.932487365    172.31.1.3   172.31.1.58   KRB5     404    KRB Error: KRB5KDC_ERR_PREAUTH_REQUIRED
2886 8.983945212    172.31.1.58  172.31.1.3    KRB5     1125   TGS-REQ
2888 8.992263978    172.31.1.3   172.31.1.58   KRB5     1045   TGS-REP

I don't see any this type of traffic between their client and KDC. 

Note, we should also be testing against a Windows KDC, since that
is the majority of the KDC out there...

Comment 36 Alexander Bokovoy 2017-12-01 08:08:09 UTC
The exchange AS-REQ / PRE-AUTH response /TGS-REQ / TGS-REP is a correct one in IPA (and admittedly in any other contemporary Kerberos setup). If your KDC is not requiring pre-auth on the principals, including krbtgt/REALM@REALM, you wouldn't see that first response but overall running without pre-auth is insecure. See, for example, https://social.technet.microsoft.com/wiki/contents/articles/23559.kerberos-pre-authentication-why-it-should-not-be-disabled.aspx.

Can we get more details of how krb5kdc is set up in QE? I tried to find build_krb5_server.sh and failed.

Comment 41 Robbie Harwood 2017-12-01 20:25:44 UTC
I have built a new version of gssproxy - gssproxy-0.7.0-14.el7.  If people could re-test with that, that would be very helpful.  I'll try to reproduce otherwise.

Comment 42 Robbie Harwood 2017-12-01 20:54:56 UTC
Also please note that any time you upgrade gssproxy for testing, you need to restart rpc-gssd, else it will still be using the old gssproxy library.

Comment 43 Simo Sorce 2017-12-01 21:27:28 UTC
Alexander, all,
preauth is not required for rando keys by default by the MIT KDC as it is not necessary. We did enable preauth on all keys in IPA just in case but it is definitely not a requirement.

Preauth is required for keys that are generated from a password, because dictionary based brute force attacks are possible with human chosen passwords. That is generally not the case with service keys as they are fully random so a brute force is equivalente to an exhaustive search, which is unfeasible in case of AES encryption.

TL;DR: ignore the prauth thing it's a red herring and completely unrelated to the bug at hand.

HTH.

Comment 51 Ian Kent 2017-12-05 00:29:52 UTC
(In reply to Robbie Harwood from comment #15)
> Sure, I'll take a stab.
> 
> gssproxy-0.7.0-7.el7 updates our unit file (gssproxy.service) to match that
> of the Fedora unit file, which I believe was changed back in May, with
> steved taking care of the nfs-utils side.  Nothing else changes between -6
> and -7.
> 
> In particular, this removes a line `Requires=proc-fs-nfsd.mount` (which
> gssproxy cannot have since it can't depend on nfs-utils).  To counter this,

Why remove it?

You don't need to remove a "Requires" that's too strong for your
needs, changing it to "Wants" should be enough.

Yongcheng, what happens if you do this?
Change the "Requires" to "Wants" instead of removing the line
in the gssproxy.service file?

Comment 52 Yongcheng Yang 2017-12-05 03:24:14 UTC
(In reply to Ian Kent from comment #51)

> Yongcheng, what happens if you do this?
> Change the "Requires" to "Wants" instead of removing the line
> in the gssproxy.service file?

Hello Ian, just a quick update here: it works in my environment!

After gssproxy removing "Requires=proc-fs-nfsd.mount", we do find that
module "nfsd" not loaded in newly booted machine (/proc/fs/nfsd not
mounted either) until NFS started after awhile.  However I didn't know
it had something relative with krb5 configuration before.

After adding the line "Wants=proc-fs-nfsd.mount", "nfsd" loads
automatically, even though gssproxy.service is disabled. And then my
test can pass.
~~~~~~~~~~~~~~
[root@~]# mount | grep nfsd
nfsd on /proc/fs/nfsd type nfsd (rw,relatime)
[root@~]# lsmod | grep nfsd
nfsd                  342943  1 
auth_rpcgss            59415  1 nfsd
nfs_acl                12837  1 nfsd
lockd                  93827  1 nfsd
grace                  13515  2 nfsd,lockd
sunrpc                349214  7 nfsd,auth_rpcgss,lockd,nfs_acl
[root@~]# systemctl is-enabled gssproxy
disabled
...

Maybe this is the reason why this issue cannot be reproduced in
some environment. And I also find it always fail in newly machine
but can pass sometimes

Comment 56 Robbie Harwood 2017-12-05 15:02:45 UTC
(In reply to Ian Kent from comment #54)
> (In reply to Yongcheng Yang from comment #52)
> > (In reply to Ian Kent from comment #51)
> > 
> > > Yongcheng, what happens if you do this?
> > > Change the "Requires" to "Wants" instead of removing the line
> > > in the gssproxy.service file?
> > 
> > Hello Ian, just a quick update here: it works in my environment!
> > 
> > After gssproxy removing "Requires=proc-fs-nfsd.mount", we do find that
> > module "nfsd" not loaded in newly booted machine (/proc/fs/nfsd not
> > mounted either) until NFS started after awhile.  However I didn't know
> > it had something relative with krb5 configuration before.
> 
> I'm only going by what I've read in the comments.
> 
> Namely, works with the "Requires" line present and gssproxy cannot
> depend on nfs-utils (the reason the Requires was removed).
> 
> So "Wants" appears to be appropriate in that it won't fail the
> service gssproxy if proc-fs-nfsd.mount fails or is not present.
> 
> The presence of the "Wants" will affect dependency order and
> that might be why using Wants does what's needed.
> 
> Perhaps there is some other sensible reason gssproxy cannot
> use this?
> 
> Ian

This change came at request in https://bugzilla.redhat.com/show_bug.cgi?id=1326440 and several other bugs.

But basically: gssproxy is an application that is *consumed by* NFS, not the other way around.  NFS should be encoding its dependency on gssproxy; it is conceptually incorrect for gssproxy to cart around a list of applications that depend on it.

Again, can QE please try gssproxy-0.7.0-14 and tell me if the issue is resolved?

Comment 57 Ian Kent 2017-12-06 00:54:32 UTC
(In reply to Robbie Harwood from comment #56)
> (In reply to Ian Kent from comment #54)
> > (In reply to Yongcheng Yang from comment #52)
> > > (In reply to Ian Kent from comment #51)
> > > 
> > > > Yongcheng, what happens if you do this?
> > > > Change the "Requires" to "Wants" instead of removing the line
> > > > in the gssproxy.service file?
> > > 
> > > Hello Ian, just a quick update here: it works in my environment!
> > > 
> > > After gssproxy removing "Requires=proc-fs-nfsd.mount", we do find that
> > > module "nfsd" not loaded in newly booted machine (/proc/fs/nfsd not
> > > mounted either) until NFS started after awhile.  However I didn't know
> > > it had something relative with krb5 configuration before.
> > 
> > I'm only going by what I've read in the comments.
> > 
> > Namely, works with the "Requires" line present and gssproxy cannot
> > depend on nfs-utils (the reason the Requires was removed).
> > 
> > So "Wants" appears to be appropriate in that it won't fail the
> > service gssproxy if proc-fs-nfsd.mount fails or is not present.
> > 
> > The presence of the "Wants" will affect dependency order and
> > that might be why using Wants does what's needed.
> > 
> > Perhaps there is some other sensible reason gssproxy cannot
> > use this?
> > 
> > Ian
> 
> This change came at request in
> https://bugzilla.redhat.com/show_bug.cgi?id=1326440 and several other bugs.

Indeed and the problem statement contains a contradiction.

"But in no sense does gssproxy depend on NFS" and just before
that an admission that gssproxy does depend on NFS "Yes, if
NFS is in use, proc-fs-nfsd.mount should be started before
gssproxy.service".

That second statement is the problem description and essentially
says that it is a gssproxy problem.

And there is no discussion about what this person actually
needs and why the bug has been logged.

> 
> But basically: gssproxy is an application that is *consumed by* NFS, not the
> other way around.  NFS should be encoding its dependency on gssproxy; it is
> conceptually incorrect for gssproxy to cart around a list of applications
> that depend on it.

NFS shouldn't need to concern itself with whether gssproxy is too
clueless to provide services to NFS if the proc file system entry
isn't available when it's started.

Documenting this fact without adding a hard dependency seems like
the right thing to do if gssproxy can't be updated to handle this
case.

Is it the hard dependency that bug 1326440 is about or something
else?

What really concerns me about what I see here is it may be this
type of approach that has caused some of the more questionable
practices within the NFS service units. Such as calling systemctl
to start a service from within another service unit. Sure, you
can do that and most of the time it isn't a problem, until it is!

In fact I have an essentially unresolvable problem with autofs
because of exactly this.

It is essential that the dependency declaration features of
systemd be used in a sensible way to document these types of
dependencies rather than try and do "smart" things in service
units that could and do have side effects.

Ian

Comment 58 Yongcheng Yang 2017-12-06 02:50:24 UTC
(In reply to Robbie Harwood from comment #56)

> Again, can QE please try gssproxy-0.7.0-14 and tell me if the issue is
> resolved?

Seems gssproxy-0.7.0-14.el7 still can't resolve this problem.

Please find comment #53 for the test logs.

Comment 59 Robbie Harwood 2017-12-06 20:11:33 UTC
(In reply to Yongcheng Yang from comment #58)
> (In reply to Robbie Harwood from comment #56)
> 
> > Again, can QE please try gssproxy-0.7.0-14 and tell me if the issue is
> > resolved?
> 
> Seems gssproxy-0.7.0-14.el7 still can't resolve this problem.
> 
> Please find comment #53 for the test logs.

Where are the test logs?  I see logs showing that tests were run and failed, but no output from them.  (What I'd really like is a reproducer that doesn't require that I stand up beaker.)

Comment 60 Ian Kent 2017-12-06 23:33:33 UTC
(In reply to Robbie Harwood from comment #59)
> (In reply to Yongcheng Yang from comment #58)
> > (In reply to Robbie Harwood from comment #56)
> > 
> > > Again, can QE please try gssproxy-0.7.0-14 and tell me if the issue is
> > > resolved?
> > 
> > Seems gssproxy-0.7.0-14.el7 still can't resolve this problem.
> > 
> > Please find comment #53 for the test logs.
> 
> Where are the test logs?  I see logs showing that tests were run and failed,
> but no output from them.  (What I'd really like is a reproducer that doesn't
> require that I stand up beaker.)

I don't know what that last sentence means?

It would be good if QE could get those logs, hopefully they
will.

The other thing you can do is have a look at the test that
was run.

I had a quick look at the nfs-utils test here and it isn't
too hard to follow but ...

QE use a couple of libraries of scripts which are hard to
locate. You would need to ask QE questions if it wasn't
clear what these scripts or functions did or perhaps have
QE direct you to where they are located so can see for your
self.

Other than that most of the scripted test infrastructure is
fairly straight forward to follow so that problems can be
manually reproduced based on what the test does either on
Beaker reserved hosts or on local machines.

Assuming you have a REDHAT.COM tgt ticket the nfs-utils tests
repository can be checked out using:
rhpkg clone tests/nfs-utils

and the test is at:
cd nfs-utils/krb5-test/krb5-common-local

look at it with:
vi runtest.sh

Ian

Comment 61 Yongcheng Yang 2017-12-07 09:18:32 UTC
(In reply to Robbie Harwood from comment #59)

> Where are the test logs?  I see logs showing that tests were run and failed,
> but no output from them.  (What I'd really like is a reproducer that doesn't
> require that I stand up beaker.)

Hi Robbie, please find "TESTOUT.log" in the beaker page for detailed logs.

What's more, following are reproduce steps I used:

# > /etc/exports
# > /etc/sysconfig/nfs
# build_krb5_server.sh (comment #37)
# config_krb5_client.sh $HOSTNAME (comment #37)
# systemctl restart gssproxy.service
# systemctl restart rpc-gssd.service
# systemctl restart nfs.service
# mkdir /export_test
# exportfs -v -i $HOSTNAME:/export_test -o rw,sec=sys:krb5:krb5i:krb5p
# mount -t nfs4 -o sec=krb5 $HOSTNAME:/export_test /mnt
mount.nfs4: access denied by server while mounting ...

I can always hit this problem within a newly installed 7.5 machine.
Sometimes if nfs.service is started before "build_krb5_server.sh"
(maybe some other issue), problem solved, and then I cannot reproduce
it any more on that specific machine again.  Sorry for I still cannot
figure out a simple reproducer.

Anyway, it can be triggered consistently in machine newly installed.

Comment 62 Alice Mitchell 2017-12-07 09:45:17 UTC
I too can now reliably reproduce this in a manner similar to comment #61, the newer gssproxy -14 makes no difference.

One observation is that no amount of restarting services seems to fix this, only a reboot, and I have checked that the exact same set of services is running before and after that reboot.

Comment 64 Alice Mitchell 2017-12-11 15:09:22 UTC
The more I look at this the more I am reminded of the conclusions in bug 1452881 in that although gssproxy does not require nfsd be running, if it is not then it puts itself into a state which is not updated if nfsd is later started.  The change between release 6 and 7 removed the requirement that nfsd be started before gssproxy allowing this to happen.

I suppose what is required is a way for nfs-server.service to be able to request that if and only if gssproxy is already running, then restart it.

Comment 65 Alice Mitchell 2017-12-11 17:29:25 UTC
nfs-server.service does try to reload gssproxy by sending it a HUP, but this is not enough, it does not reload everything and doesn't change the /proc/net/rpc/use-gss-proxy file amongst other things.

I would propose the following work around to nfs-server.service, whilst not being ideal (as it calls systemctl) i see no existing mechanism that would do the same without adding back the dependencies we were trying to remove.

This will make ensure that if gssproxy was already running when nfs is started, as appears to be happening in this bug, that it will be restarted, but does not create a dependency in either direction.

--- nfs-server.service	2017-12-11 16:29:18.010517239 +0000
+++ nfs-server.service.orig	2017-12-11 16:27:00.988323934 +0000
@@ -25,8 +25,8 @@
 Type=oneshot
 RemainAfterExit=yes
 ExecStartPre=-/usr/sbin/exportfs -r
+ExecStartPre=-/bin/sh -c '/bin/kill -HUP `cat /run/gssproxy.pid`'
 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS
-ExecStartPost=-/bin/sh -c 'if systemctl -q is-active gssproxy; then systemctl r
estart gssproxy; fi'
 ExecStop=/usr/sbin/rpc.nfsd 0
 ExecStopPost=/usr/sbin/exportfs -au
 ExecStopPost=/usr/sbin/exportfs -f

Comment 66 Yongcheng Yang 2017-12-12 02:49:17 UTC
(In reply to Justin Mitchell from comment #65)

I just did a simple test by hand against the workaround comment #65
and it dose resolve the problem. Thanks Justin for the investigating!

Comment 67 Steve Whitehouse 2017-12-12 11:12:18 UTC
Can we get this done today in that case? We need the patch in today if it is to make beta. SteveD should be able to assist.

Comment 68 Steve Dickson 2017-12-12 15:28:40 UTC
(In reply to Justin Mitchell from comment #65)
> nfs-server.service does try to reload gssproxy by sending it a HUP, but this
> is not enough, it does not reload everything and doesn't change the
> /proc/net/rpc/use-gss-proxy file amongst other things.
> 
> I would propose the following work around to nfs-server.service, whilst not
> being ideal (as it calls systemctl) i see no existing mechanism that would
> do the same without adding back the dependencies we were trying to remove.
> 
> This will make ensure that if gssproxy was already running when nfs is
> started, as appears to be happening in this bug, that it will be restarted,
> but does not create a dependency in either direction.
> 
> --- nfs-server.service	2017-12-11 16:29:18.010517239 +0000
> +++ nfs-server.service.orig	2017-12-11 16:27:00.988323934 +0000
> @@ -25,8 +25,8 @@
>  Type=oneshot
>  RemainAfterExit=yes
>  ExecStartPre=-/usr/sbin/exportfs -r
> +ExecStartPre=-/bin/sh -c '/bin/kill -HUP `cat /run/gssproxy.pid`'
>  ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS
> -ExecStartPost=-/bin/sh -c 'if systemctl -q is-active gssproxy; then
> systemctl r
> estart gssproxy; fi'
>  ExecStop=/usr/sbin/rpc.nfsd 0
>  ExecStopPost=/usr/sbin/exportfs -au
>  ExecStopPost=/usr/sbin/exportfs -f

Where is this version of nfs-server come from? 
Looking at the latest version:

 
[Service]
EnvironmentFile=-/run/sysconfig/nfs-utils

Type=oneshot
RemainAfterExit=yes
ExecStartPre=-/usr/sbin/exportfs -r
ExecStartPre=-/bin/sh -c '/bin/kill -HUP `cat /run/gssproxy.pid`'
ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS
ExecStop=/usr/sbin/rpc.nfsd 0
ExecStopPost=/usr/sbin/exportfs -au
ExecStopPost=/usr/sbin/exportfs -f

That line exists... actually I was added by the 
nfs-utils-1.3.0-systemd-gssproxy.patch in 7.4...

Comment 69 Steve Dickson 2017-12-12 15:47:42 UTC
(In reply to Justin Mitchell from comment #65)
> nfs-server.service does try to reload gssproxy by sending it a HUP, but this
> is not enough, it does not reload everything and doesn't change the
> /proc/net/rpc/use-gss-proxy file amongst other things.
> 
> I would propose the following work around to nfs-server.service, whilst not
> being ideal (as it calls systemctl) i see no existing mechanism that would
> do the same without adding back the dependencies we were trying to remove.
> 
> This will make ensure that if gssproxy was already running when nfs is
> started, as appears to be happening in this bug, that it will be restarted,
> but does not create a dependency in either direction.
> 
> --- nfs-server.service	2017-12-11 16:29:18.010517239 +0000
> +++ nfs-server.service.orig	2017-12-11 16:27:00.988323934 +0000
> @@ -25,8 +25,8 @@
>  Type=oneshot
>  RemainAfterExit=yes
>  ExecStartPre=-/usr/sbin/exportfs -r
> +ExecStartPre=-/bin/sh -c '/bin/kill -HUP `cat /run/gssproxy.pid`'
>  ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS
> -ExecStartPost=-/bin/sh -c 'if systemctl -q is-active gssproxy; then
> systemctl r
> estart gssproxy; fi'
>  ExecStop=/usr/sbin/rpc.nfsd 0
>  ExecStopPost=/usr/sbin/exportfs -au
>  ExecStopPost=/usr/sbin/exportfs -f

So I guess the patch is telling me to added the systemctl and remove
the -HUP... 

The -HUP did work in 7.4... what changed in gssproxy to break it???

Comment 70 Steve Dickson 2017-12-12 15:59:15 UTC
Here is the proposed patch
--- nfs-server.service.orig	2017-11-03 10:14:43.000000000 -0400
+++ nfs-server.service	2017-12-12 10:50:55.083472070 -0500
@@ -25,8 +25,8 @@ EnvironmentFile=-/run/sysconfig/nfs-util
 Type=oneshot
 RemainAfterExit=yes
 ExecStartPre=-/usr/sbin/exportfs -r
-ExecStartPre=-/bin/sh -c '/bin/kill -HUP `cat /run/gssproxy.pid`'
 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS
+ExecStartPost=-/bin/sh -c 'if systemctl -q is-active gssproxy; then systemctl restart gssproxy ; fi'
 ExecStop=/usr/sbin/rpc.nfsd 0
 ExecStopPost=/usr/sbin/exportfs -au
 ExecStopPost=/usr/sbin/exportfs -f

Comment 71 Alice Mitchell 2017-12-12 16:01:43 UTC
The change is based on nfs-utils-1.3.0-0.52.el7 that was in the 7.5-alpha that i tested on.

gssproxy-0.7.0-6.el7 (comment #15) used to depend on nfsd, so that gssproxy always started up in a specific mode. That dependency causes issues listed elsewhere so was removed. So now gssproxy can start up in a state where nfsd was not already running, maybe not even enabled yet, which is fine in itself. 

If you later configure and run nfsd, as the qe tests do, the service file sends a HUP to gssproxy to reload its config. However, that reload is not a complete one, and krb5 remains in a broken state until the next reboot when the normal service ordering has nfsd start before gssproxy.

short version: systemd boot ordering works, but manual starting fails if nfsd comes after gssproxy because -HUP does not sufficiently reset gssproxy's state

Comment 72 Steve Dickson 2017-12-12 16:11:02 UTC
With the patch in Comment 70 and restarting the nfs server 
I'm seeing the following gssproxy error

Stopping GSSAPI Proxy Daemon...
[2017/12/12 16:09:02]: Exiting after receiving a signal
Starting GSSAPI Proxy Daemon...
[2017/12/12 16:09:02]: Debug Enabled (level: 9)
[2017/12/12 16:09:02]: Problem with kernel communication!  NFS server will not work
[2017/12/12 16:09:02]: Failed to get peer's SELinux context (92:Protocol not available)
[2017/12/12 16:09:02]: Client connected (fd = 11)[2017/12/12 16:09:02]:  (pid = 2045) (uid = 0) (gid = 0)[2017/12/12 16:09:02]:
Started GSSAPI Proxy Daemon.

Comment 73 Robbie Harwood 2017-12-12 18:53:57 UTC
Per discussion on IRC, I have a gssproxy version that is slow but workable.

No changes from nfs-utils should be required at this time.  If I have missed any other questions in the above, please re-ask them.

Comment 77 Robbie Harwood 2017-12-12 20:34:24 UTC
BZ has taken my flags away during component reassign, so I need those back please.

Anyway, there's probably a bug related to timeout behavior that will be incoming; it should get a separate ticket and be triaged separately.  Slow mounts shouldn't block beta.

Comment 78 Steve Dickson 2017-12-12 23:02:01 UTC
(In reply to Robbie Harwood from comment #73)
> Per discussion on IRC, I have a gssproxy version that is slow but workable.
> 
> No changes from nfs-utils should be required at this time.  If I have missed
> any other questions in the above, please re-ask them.

I'm thinking if we make the following change that slowness will go away
diff -up ./src/client/gpm_common.c.orig ./src/client/gpm_common.c
--- ./src/client/gpm_common.c.orig	2017-12-12 16:23:25.856802016 -0500
+++ ./src/client/gpm_common.c	2017-12-12 17:39:23.740147375 -0500
@@ -48,7 +48,7 @@ static void gpm_init_once(void)
     pthread_mutex_init(&gpm_global_ctx.lock, &attr);
 
     gpm_global_ctx.fd = -1;
-    gpm_global_ctx.epollfd = -1;
+    gpm_global_ctx.epollfd = 10000;
     gpm_global_ctx.timerfd = -1;
 
     seedp = time(NULL) + getpid() + pthread_self();

Comment 79 Yongcheng Yang 2017-12-13 09:27:25 UTC
(In reply to Robbie Harwood from comment #73)
> Per discussion on IRC, I have a gssproxy version that is slow but workable.
> 
> No changes from nfs-utils should be required at this time.  If I have missed
> any other questions in the above, please re-ask them.

Just went through the IRC logs, seems gssproxy-0.7.0-16.el7 is to fix
the hang issue ie. bug 1519511 (mentioned in comment #26, comment #34
and comment #46) but not this one.  And the reproducer of comment #61
still gets failed as Justin says.

For this "gssproy started before nfsd" issue, thanks Justin for the
debugging, just found that when get into the mis-configured situation,
file "/proc/net/rpc/use-gss-proxy" doesn't contain "1", which NOT meet
https://www.kernel.org/doc/Documentation/filesystems/nfs/rpc-server-gss.txt
However, i don't know it's whose duty to update that file :-)

Comment 80 Alice Mitchell 2017-12-13 09:48:16 UTC
(In reply to Yongcheng Yang from comment #79)
> For this "gssproy started before nfsd" issue, thanks Justin for the
> debugging, just found that when get into the mis-configured situation,
> file "/proc/net/rpc/use-gss-proxy" doesn't contain "1", which NOT meet
> https://www.kernel.org/doc/Documentation/filesystems/nfs/rpc-server-gss.txt
> However, i don't know it's whose duty to update that file :-)

I do not know if it is the only program that touches it, but gssproxy has code to set the use-gss-proxy value, but does so only once at startup, it will not change it later from a HUP or other input.

i.e. init_proc_nfsd() in gp_init.c is called only once from main()

I am not sure yet if this is the cause or just a symptom, but i do know that restarting gssproxy after you first start nfsd does solve the stated problem, hence why i called it a workaround rather than a fix.

We just never used to see this because previous versions always started nfs before starting gssproxy, likewise you wont see it if both services are enabled and you reboot, as the ordering dependencies will start nfs first.

Comment 81 Steve Dickson 2017-12-13 13:13:13 UTC
(In reply to Steve Dickson from comment #78)
> (In reply to Robbie Harwood from comment #73)
> > Per discussion on IRC, I have a gssproxy version that is slow but workable.
> > 
> > No changes from nfs-utils should be required at this time.  If I have missed
> > any other questions in the above, please re-ask them.
> 
> I'm thinking if we make the following change that slowness will go away
> diff -up ./src/client/gpm_common.c.orig ./src/client/gpm_common.c
> --- ./src/client/gpm_common.c.orig	2017-12-12 16:23:25.856802016 -0500
> +++ ./src/client/gpm_common.c	2017-12-12 17:39:23.740147375 -0500
> @@ -48,7 +48,7 @@ static void gpm_init_once(void)
>      pthread_mutex_init(&gpm_global_ctx.lock, &attr);
>  
>      gpm_global_ctx.fd = -1;
> -    gpm_global_ctx.epollfd = -1;
> +    gpm_global_ctx.epollfd = 10000;
>      gpm_global_ctx.timerfd = -1;
>  
>      seedp = time(NULL) + getpid() + pthread_self();

Without the above fix there are the mount speeds:
rhel7# time mount -o sec=krb5 rhel7srv:/home/tmp /mnt/tmp

real	0m45.322s
user	0m0.003s
sys	0m0.011s
rhel7# umount /mnt/tmp
rhel7# time mount -o sec=krb5 rhel7srv:/home/tmp /mnt/tmp

real	1m0.122s
user	0m0.001s
sys	0m0.005s
rhel7# umount /mnt/tmp

With the above fix here are the mount speeds:

hel7# time mount -o sec=krb5 rhel7srv:/home/tmp /mnt/tmp

real	0m0.205s
user	0m0.002s
sys	0m0.013s
rhel7# umount /mnt/tmp
rhel7# time mount -o sec=krb5 rhel7srv:/home/tmp /mnt/tmp

real	0m0.106s
user	0m0.001s
sys	0m0.014s

Comment 82 Simo Sorce 2017-12-13 13:28:10 UTC
Steve,
epollfd is a *file descriptor*, you cannot assign random values to it. Maybe you want to change the timeouts to be shorter.

Comment 83 Steve Dickson 2017-12-13 13:39:21 UTC
(In reply to Simo Sorce from comment #82)
> Steve,
> epollfd is a *file descriptor*, you cannot assign random values to it. Maybe
> you want to change the timeouts to be shorter.

Maybe I got it wrong... I thought that what was suggested I do... 

My original ideas was this which worked quite well too:

diff -up ./src/client/gpm_common.c.orig ./src/client/gpm_common.c
--- ./src/client/gpm_common.c.orig	2017-12-12 16:23:25.856802016 -0500
+++ ./src/client/gpm_common.c	2017-12-13 08:37:37.085290157 -0500
@@ -269,7 +269,7 @@ static int gpm_epoll_wait(struct gpm_ctx
     }
 
     do {
-        epoll_ret = epoll_wait(gpmctx->epollfd, events, 2, -1);
+        epoll_ret = epoll_wait(gpmctx->epollfd, events, 2, 10000);
     } while (epoll_ret < 0 && errno == EINTR);
 
     if (epoll_ret < 0) {

Comment 84 Steve Dickson 2017-12-13 13:51:41 UTC
Now that QE has acked... Lets see what they see... If they don't see this 1m delay then fine... its something in my world... But if they see the 1m delay for each mount we make one of the above changes.... 

Not that most people put beta's in production, but 1m delay on every single 
NFS mount will causes pain esp with hundreds mounts happening at once.
We should try to avoid this pain... IMHO...

Comment 85 Steve Dickson 2017-12-13 14:20:17 UTC
This gssproxy fix should have been made for this bz
     https://bugzilla.redhat.com/show_bug.cgi?id=1519511

So Comment 77 through Comment 84 should been made in bug 1519511
since they fix the NFS client mounts hanging the gssproxy code. 

The patch proposed in Comment 74 of this bz mostly fixes the access denied
server issue QE was originally seeing

Comment 86 Robbie Harwood 2017-12-13 15:22:49 UTC
Please, please, my kingdom for a separate bug for the slowness.

Also on irc you told me you were mucking with selinux state as well, not just doing things to the epoll_wait call.

Comment 90 Robbie Harwood 2017-12-13 17:46:35 UTC
Per IRC discussion: I would rather fix this in gssproxy if you're seeing the HUP not be sufficient.

However, I would again like to ask that NFS consider using `systemctl reload` instead of digging up the gssproxy pidfile and shelling out to kill.

Therefore, I am taking this bug, and requesting that you not do what was proposed in Comment #70.

For reference: my understanding of the bug is that if NFS is started after gssproxy, then we see access denied messages.  But that if we start gssproxy after NFS is up, everything works.  And there is additional evidence to suggest that gssproxy is failing to poke the proc file after restart.

Comment 92 Steve Dickson 2017-12-13 17:57:23 UTC
(In reply to Robbie Harwood from comment #90)
> Per IRC discussion: I would rather fix this in gssproxy if you're seeing the
> HUP not be sufficient.
> 
> However, I would again like to ask that NFS consider using `systemctl
> reload` instead of digging up the gssproxy pidfile and shelling out to kill.
Ok.... Please bz on this.... I'll that happen later on down the road..

> 
> Therefore, I am taking this bug, and requesting that you not do what was
> proposed in Comment #70.
Its all yours... I can't do any commits now... No bz :-) 

> 
> For reference: my understanding of the bug is that if NFS is started after
> gssproxy, then we see access denied messages.  But that if we start gssproxy
> after NFS is up, everything works.  And there is additional evidence to
> suggest that gssproxy is failing to poke the proc file after restart.
I think you nailed it

Comment 93 Alice Mitchell 2017-12-13 18:42:51 UTC
I would further refine this that instead of having nfs-utils restart gssproxy, if you modify gssproxy to poke proc after a HUP then the existing nfs-utils script is enough, and this also appears to fix the issue.

This works in my tests, applied to gssproxy-0.7.0-16
diff -ur gssproxy-0.7.0.orig/src/gssproxy.c gssproxy-0.7.0.wip/src/gssproxy.c
--- gssproxy-0.7.0.orig/src/gssproxy.c	2017-12-13 15:37:45.496066981 +0000
+++ gssproxy-0.7.0.wip/src/gssproxy.c	2017-12-13 15:40:56.557956682 +0000
@@ -143,6 +143,7 @@
     free_config(&old_config);
 
     GPDEBUG("New config loaded successfully.\n");
+    init_proc_nfsd(gpctx->config);
     return;
 }

Comment 94 Robbie Harwood 2017-12-13 19:43:16 UTC
Need a qe_ack before I can do a proper build.

In the meantime, here's a scratch-build that should fix the issue: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14751067

(The logic should match pretty closely to Justin's patch, but do please complain if it doesn't work.)

Comment 96 Yongcheng Yang 2017-12-14 08:01:22 UTC
(In reply to Robbie Harwood from comment #94)

> In the meantime, here's a scratch-build that should fix the issue:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14751067
> 
> (The logic should match pretty closely to Justin's patch, but do please
> complain if it doesn't work.)

Thanks for it! Have verified that above version do fix the problem.
P.s. as there is Beaker Incident, I just test it by hand for now.

What's more, just a simple test for the proc file:
~~~~~~~~~~~~~~~~~~~~~~~~~
[root@~]# rpm -q gssproxy
gssproxy-0.7.0-13.el7.x86_64
[root@~]# cat /proc/net/rpc/use-gss-proxy
cat: /proc/net/rpc/use-gss-proxy: No such file or directory
[root@~]# lsmod | grep nfsd
[root@~]# systemctl restart gssproxy
[root@~]# systemctl restart nfs
[root@~]# cat /proc/net/rpc/use-gss-proxy
-1
^^^
[root@~]#
[root@~]# rpm -Uvh gssproxy-0.7.0-17.el7.x86_64.rpm >/dev/null <<< updating
[root@~]# sh krb5-common-local/rm_nfsd.sh  <<< remove the module nfsd
[root@~]# rpm -q gssproxy
gssproxy-0.7.0-17.el7.x86_64
[root@~]# cat /proc/net/rpc/use-gss-proxy
cat: /proc/net/rpc/use-gss-proxy: No such file or directory
[root@~]# lsmod | grep nfsd
[root@~]# systemctl restart gssproxy
[root@~]# systemctl restart nfs
[root@~]# cat /proc/net/rpc/use-gss-proxy
1
^^^

Comment 97 Alice Mitchell 2017-12-14 09:10:16 UTC
Checked it on my reproducer, works for me as well, thanks.

Comment 98 Kaleem 2018-01-02 09:09:13 UTC
(In reply to Yongcheng Yang from comment #96)
> (In reply to Robbie Harwood from comment #94)
> 
> > In the meantime, here's a scratch-build that should fix the issue:
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14751067
> > 
> > (The logic should match pretty closely to Justin's patch, but do please
> > complain if it doesn't work.)
> 
> Thanks for it! Have verified that above version do fix the problem.
> P.s. as there is Beaker Incident, I just test it by hand for now.
> 
> What's more, just a simple test for the proc file:
> ~~~~~~~~~~~~~~~~~~~~~~~~~
> [root@~]# rpm -q gssproxy
> gssproxy-0.7.0-13.el7.x86_64
> [root@~]# cat /proc/net/rpc/use-gss-proxy
> cat: /proc/net/rpc/use-gss-proxy: No such file or directory
> [root@~]# lsmod | grep nfsd
> [root@~]# systemctl restart gssproxy
> [root@~]# systemctl restart nfs
> [root@~]# cat /proc/net/rpc/use-gss-proxy
> -1
> ^^^
> [root@~]#
> [root@~]# rpm -Uvh gssproxy-0.7.0-17.el7.x86_64.rpm >/dev/null <<< updating
> [root@~]# sh krb5-common-local/rm_nfsd.sh  <<< remove the module nfsd
> [root@~]# rpm -q gssproxy
> gssproxy-0.7.0-17.el7.x86_64
> [root@~]# cat /proc/net/rpc/use-gss-proxy
> cat: /proc/net/rpc/use-gss-proxy: No such file or directory
> [root@~]# lsmod | grep nfsd
> [root@~]# systemctl restart gssproxy
> [root@~]# systemctl restart nfs
> [root@~]# cat /proc/net/rpc/use-gss-proxy
> 1
> ^^^

Yongcheng Yang,

Can we mark this verified on the basis of above outputs?

Comment 99 Yongcheng Yang 2018-01-02 11:05:05 UTC
(In reply to Kaleem from comment #98)

> Yongcheng Yang,
> 
> Can we mark this verified on the basis of above outputs?

Sure! Have verified that gssproxy-0.7.0-17.el7 has solved this problem.

Comment 105 errata-xmlrpc 2018-04-10 11:09:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0709