**** Description of problem: PPL, I need put a NFSv4 server to work with Kerberos and AutoFS, but i got a problem: If NFS server goes down i get a LOOOOOOONG mount timeout on NFSv4 client... Since I need mount some (3 to 6) dirs at user logon process, if mount hangs, user logon hangs. Then i want configure it to timeout (if server down) after 10-15 secs (MAX) on each mount attempt. Once I try access mount point using AutoFS (proto=tcp OR proto=udp) it hangs for 189 secs (3m9s: real 3m9.001s) until show error (mount: mount to NFS server '172.16.0.10' failed: timed out (giving up)) Mounting manually using NFSv4 i got same timeouts of AutoFS. The only way to get a acceptable timeout value is using only proto=udp,retry=0 (not using sec=krb5) any another combination i get 3m9s. I'm using these packages (server and client side): autofs-5.0.1-0.rc2.102.el5_3.1 nfs-utils-1.0.9-40.el5 kernel-2.6.18-128.1.16.el5 **** Version-Release number of selected component (if applicable): autofs-5.0.1-0.rc2.102.el5_3.1 **** Steps to Reproduce: 1. Install S.O. (5.2, 5.3, updated or not, doest matter) 2. Configure AutoFS or just try manually mount a down server (or random IP). 2.1 # time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o sec=krb5p,proto=tcp,retry=0 **** Actual results: mount: mount to NFS server '1.2.3.4' failed: timed out (giving up). real 3m9.000s user 0m0.000s sys 0m0.002s **** Expected results: Something around 10 seconds to fail mount... **** Additional info: "retry" option just DONT WORK if using kerberos and/or proto=tcp... mount just "obey" if using proto=udp without kerberos (anyway i want a timeout around 10 secs, 21s still isnt good enough). # time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o proto=udp,retry=0 mount: mount to NFS server '1.2.3.4' failed: timed out (giving up). real 0m21.003s user 0m0.000s sys 0m0.003s
Hi Carlos, I have added the patch I posted in our email conversation and done a RHEL-5 build. You can find it at: http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.131.bz517349.1.el5. Please give this a try without using Kerberos to start with, as I'm not in a position to test it, and you reported this happening without it anyway. This is not without it own problem though. I probably should have chased this when I first noticed it with the timed umount changes. Anyway, when autofs sends a TERM signal to the mount(8) process that it has spawned it terminates OK but it's child mount.nfs(8) (or mount.nfs4) doesn't terminate. The mount.nfs process does respond to signals and would go away but, for autofs to lookup it's pid and signal it is a big pain. So, to start with lets see if this will at least return within a configured mount timeout specified in the autofs configuration. It should and it did for me in all the cases I tried, with or without the intr option on the mount. Ian
Hi Ian! Still no good news for me :( There my tests: # newer kernel 2.6.18-164.el5 + autofs-5.0.1-0.rc2.131.bz517349.1.el5.i386.rpm # ---------------------------------------------------- [root@KSTATION areas_comuns]# uname -r 2.6.18-164.el5 [root@KSTATION areas_comuns]# automount -V Linux automount version 5.0.1-0.rc2.131.bz517349.1.el5 ---------------------------------------------------- NOT WORKING - AUTOFS CMD: -fstype=nfs4,rw,acl,sec=krb5p 172.x.y.z:/areas_comuns/test ---------------------------------------------------- [root@KSTATION areas_comuns]# time ls -la testedown ls: testedown: No such file or directory real 3m9.005s user 0m0.000s sys 0m0.001s ---------------------------------------------------- NOT WORKING - AUTOFS CMD (W/O KERBEROS): -fstype=nfs4,rw,acl 172.x.y.z:/areas_comuns/test ---------------------------------------------------- [root@KSTATION areas_comuns]# time ls -la testedown ls: testedown: No such file or directory real 3m9.004s user 0m0.000s sys 0m0.002s ---------------------------------------------------- NOT WORKING - AUTOFS CMD (W/O KERBEROS AND NFS4): -rw,acl 172.x.y.z:/areas_comuns/test ---------------------------------------------------- [root@KSTATION areas_comuns]# time ls -la testedown ls: testedown: No such file or directory real 9m27.022s user 0m0.000s sys 0m0.002s ---------------------------------------------------- (i've killed automount AND mount, and restarted autofs for each test) Sep 24 14:51:47 KSTATION automount[3808]: mount(nfs): nfs: mount failure 172.x.y.z:/areas_comuns/test on /misc/areas_comuns/testedown Sep 24 14:51:47 KSTATION automount[3808]: ioctl_send_fail: token = 19 Sep 24 14:51:47 KSTATION automount[3808]: failed to mount /misc/areas_comuns/testedown Any idea ? :) Thanks a alot! Carlos.
(In reply to comment #2) > Hi Ian! > > Still no good news for me :( *sigh* snip ... The timeouts you saw are the same as I got on the command line, so at least that is consistent. > > Any idea ? :) What timeout did you put in /etc/sysconfig/autofs? Can you show me the line exactly as it is in the config file please. The other thing we can do is to check the mount processes. Open another window, look for the mount processes, there should be two, one for mount and one for mount.nfs(4). Some time after the timeout you set in the config has passed send a TERM signal to the mount process and see if it goes away. After checking that also send a TERM signal to the mount.nfs(4) process and check it. Let me know what happens. If the processes aren't catching the signals we will have to start looking at the mount code and kernel changes. Ian
> What timeout did you put in /etc/sysconfig/autofs? > Can you show me the line exactly as it is in the config file please. MOUNT_WAIT=10 (I've tried 1, 2 and 3 too, without any change in autofs timeout...) > The other thing we can do is to check the mount processes. > > Open another window, look for the mount processes, there should be > two, one for mount and one for mount.nfs(4). Some time after the > timeout you set in the config has passed send a TERM signal to > the mount process and see if it goes away. After checking that > also send a TERM signal to the mount.nfs(4) process and check it. > > Let me know what happens. ----------------------------------------- [root@KSTATION /]# time ls -la /misc/areas_comuns/testedown & sleep 15; killall -TERM mount.nfs4 [1] 6545 [root@KSTATION /]# ls: /misc/areas_comuns/testedown: No such file or directory real 0m15.006s user 0m0.000s sys 0m0.002s ----------------------------------------- [root@KSTATION /]# time ls -la /misc/areas_comuns/testedown & sleep 20; killall -TERM mount [1] 6590 ls: : No such file or directory [root@KSTATION /]# real 0m20.008s user 0m0.000s sys 0m0.002s ----------------------------------------- Yeah :) (using: -fstype=nfs4,rw,acl,sec=krb5p) > > If the processes aren't catching the signals we will have to > start looking at the mount code and kernel changes. > > Ian
OK, I'll add some directed logging and build a new test package so we can try and find out what's going on.
Thanks Ian :)
I have added some logging to the package. You can find it at: http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.131.bz517349.2 Please give this a try so we can find out what is and isn't happening with the timeout and sub-process signalling. Ian
Hah! I figured out... it's SELinux fault :P He's denying the TERM signal of autofs. Sep 29 10:07:13 KSTATION automount[31977]: timed_read: poll(), ret 0 Sep 29 10:07:13 KSTATION automount[31977]: do_spawn: process done, errn -110 Sep 29 10:07:13 KSTATION automount[31977]: do_spawn: read timed out, sending TERM Sep 29 10:07:13 KSTATION automount[31977]: do_spawn: wait for pid 32023 type=AVC msg=audit(1254229633.475:1195): avc: denied { signal } for pid=32022 comm="automount" scontext=root:system_r:automount_t:s0 tcontext=root:system_r:mount_t:s0 tclass=process [root@KSTATION /]# setenforce 0 [root@KSTATION /]# time ls -la /misc/areas_comuns/testedown ls: /misc/areas_comuns/testedown: No such file or directory real 0m10.008s user 0m0.000s sys 0m0.002s
Then, now it's a SELinux devel team problem, right? They'll need apply a label for this behavior on next SELinux policy patch.
(In reply to comment #9) > Then, now it's a SELinux devel team problem, right? > They'll need apply a label for this behavior on next SELinux policy patch. Maybe, but I tested this on RHEL-5.4 with Selinux in enforcing mode without problem. We will need to compare selinux policy versions before we send it over to them. Ian
I'm using (selinux-policy-2.4.6-203.el5): Name : selinux-policy Relocations: (not relocatable) Version : 2.4.6 Vendor: CentOS Release : 203.el5 Build Date: Wed 21 Jan 2009 08:49:15 AM BRT Install Date: Wed 27 May 2009 10:59:34 AM BRT Build Host: builder10.centos.org And made no changes in SELinux policies for those tests (until now).
(In reply to comment #11) > I'm using (selinux-policy-2.4.6-203.el5): > Name : selinux-policy Relocations: (not relocatable) > Version : 2.4.6 Vendor: CentOS > Release : 203.el5 Build Date: Wed 21 Jan 2009 > 08:49:15 AM BRT > Install Date: Wed 27 May 2009 10:59:34 AM BRT Build Host: > builder10.centos.org > > And made no changes in SELinux policies for those tests (until now). That looks like a the RHEL-5.3 policy but we had the timed umount in 5.3 so it should work. Maybe I missed that it didn't actually work in enforcing mode, not sure. We have rev 248 in RHEL-5.4. Let me test 5.3 tomorrow and get back to you. Ian
(In reply to comment #12) > (In reply to comment #11) > > I'm using (selinux-policy-2.4.6-203.el5): > > Name : selinux-policy Relocations: (not relocatable) > > Version : 2.4.6 Vendor: CentOS > > Release : 203.el5 Build Date: Wed 21 Jan 2009 > > 08:49:15 AM BRT > > Install Date: Wed 27 May 2009 10:59:34 AM BRT Build Host: > > builder10.centos.org > > > > And made no changes in SELinux policies for those tests (until now). > > That looks like a the RHEL-5.3 policy but we had the timed > umount in 5.3 so it should work. Maybe I missed that it didn't > actually work in enforcing mode, not sure. > > We have rev 248 in RHEL-5.4. > > Let me test 5.3 tomorrow and get back to you. > > Ian Yes, It IS RHEL5.3 (i'm using CentOS 5.3) :) If you want me to do any testing, just say... Thanks alot!
(In reply to comment #13) > > > > We have rev 248 in RHEL-5.4. > > > > Let me test 5.3 tomorrow and get back to you. > > > > Ian > > Yes, It IS RHEL5.3 (i'm using CentOS 5.3) :) If you want me to do any testing, > just say... It appears that RHEL-5.4 has selinux-policy rev 255 but this change worked for me with rev 248 on RHEL-5.4. Updating to selinux-policy rev 255 (along with its dependencies) on RHEL-5.3 allowed this autofs change to work for me. I can't say in what revision this was fixed but a number of autofs policy changes went into revs 228 and 229. So, given that the timed mount is a change that would be targeted at RHEL-5.5, the selinux policy isn't an issue for us here. You will need to get hold of CentOS-5.4 selinux packages and test to make sure they don't introduce unexpected side effects, otherwise the selinux aspect of this change is a CentOS support issue. Ian
I've updated to selinux-policy-2.4.6-259.el5... Your package + selinux-policy-2.4.6-259.el5 = problem is solved :) [root@KSTATION /]# [root@KSTATION /]# time ls -la /misc/areas_comuns/testedown ls: /misc/areas_comuns/testedown: No such file or directory real 0m15.008s user 0m0.000s sys 0m0.002s [root@KSTATION /]# getenforce Enforcing [root@KSTATION /]# rpm -qi selinux-policy Name : selinux-policy Relocations: (not relocatable) Version : 2.4.6 Vendor: Red Hat, Inc. Release : 259.el5 Build Date: Tue 29 Sep 2009 04:38:47 PM BRT Install Date: Wed 30 Sep 2009 11:21:43 AM BRT Build Host: js20-bc2-10.build.redhat.com Group : System Environment/Base Source RPM: selinux-policy-2.4.6-259.el5.src.rpm Size : 0 License: GPL Signature : (none) Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> URL : http://serefpolicy.sourceforge.net Summary : SELinux policy configuration Description : SELinux Reference Policy - modular. [root@KSTATION /]# Your package will be oficial just on RHEL5.5 ? or RHEL5.4 will get a update?
(In reply to comment #15) > Your package will be oficial just on RHEL5.5 ? or RHEL5.4 will get a update? Engineering generally doesn't drive the process of proposing updates for previous RHEL versions, we're more concerned with current development. The process is driven by our support groups and is based on the perceived importance and potential impact of the problem. Once it is decided that an update is required there are two ways it can be delivered, assuming it is approved by the relevant groups (obviously including Engineering). One is a hot fix which is essentially a release which includes the fix that is provided directly to the customer and is supported until the next product release. The second is a little harder to get approved and is available via RHN to customers that subscribe to that service. Clearly, both of these cases require a support subscription of some sort. Ian
Well, let's wait for 5.5 (or 5.6 :P) :D Meanwhile i'm using: autofs-5.0.1-0.rc2.131.bz517349.1.el5 + selinux-policy-2.4.6-259.el5 Thanks again :)
(In reply to comment #17) > Well, let's wait for 5.5 (or 5.6 :P) :D > > Meanwhile i'm using: autofs-5.0.1-0.rc2.131.bz517349.1.el5 > + selinux-policy-2.4.6-259.el5 I should get this into 5.5. That's much the same as what you would get with a hotfix anyway, ;) Ian
UPDATE: - Now with CentOS 5.4 (selinux-policy-2.4.6-255) we just need use autofs-5.0.1-0.rc2.131.bz517349.1.el5 :) Now, let's wait for 5.5 :P Thanks :)
Created attachment 373664 [details] Patch - add mount wait parameter
Build autofs-5.0.1-0.rc2.133.el5 of autofs contains the changes discussed here. The RHTS test bz517349 withing the bugzillas workflow tests detects the issue resolved by this change. In addition be aware of the selinux dependencies discussed in comments #12 through #17.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0265.html