Bug 517349 - NFSv4 / Kerberos / AutoFS (mount) Try mounting a down server takes TOO MUCH time to fail (timeout).
Summary: NFSv4 / Kerberos / AutoFS (mount) Try mounting a down server takes TOO MUCH t...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: autofs
Version: 5.0
Hardware: All
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Ian Kent
QA Contact: BaseOS QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-08-13 15:16 UTC by Carlos André
Modified: 2010-03-30 08:37 UTC (History)
3 users (show)

Fixed In Version: autofs-5.0.1-0.rc2.133.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 08:37:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch - add mount wait parameter (5.25 KB, patch)
2009-11-25 06:00 UTC, Ian Kent
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2010:0265 0 normal SHIPPED_LIVE autofs bug fix update 2010-03-29 12:54:19 UTC

Description Carlos André 2009-08-13 15:16:28 UTC
**** Description of problem:

PPL, I need put a NFSv4 server to work with Kerberos and AutoFS, but i got a problem: If NFS server goes down i get a LOOOOOOONG mount timeout on NFSv4 client...

Since I need mount some (3 to 6) dirs at user logon process, if mount hangs, user logon hangs. Then i want configure it to timeout (if server down) after 10-15 secs (MAX) on each mount attempt.

Once I try access mount point using AutoFS (proto=tcp OR proto=udp) it hangs for 189 secs (3m9s: real  3m9.001s)  until show error (mount: mount to NFS server '172.16.0.10' failed: timed out (giving up))

Mounting manually using NFSv4 i got same timeouts of AutoFS.

The only way to get a acceptable timeout value is using only proto=udp,retry=0 (not using sec=krb5) any another combination i get 3m9s.




I'm using these packages (server and client side):
autofs-5.0.1-0.rc2.102.el5_3.1
nfs-utils-1.0.9-40.el5
kernel-2.6.18-128.1.16.el5



**** Version-Release number of selected component (if applicable):
autofs-5.0.1-0.rc2.102.el5_3.1


**** Steps to Reproduce:
1. Install S.O. (5.2, 5.3, updated or not, doest matter)
2. Configure AutoFS or just try manually mount a down server (or random IP).
2.1 # time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o sec=krb5p,proto=tcp,retry=0

  
**** Actual results:
mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).

real    3m9.000s
user    0m0.000s
sys     0m0.002s


**** Expected results:
Something around 10 seconds to fail mount...


**** Additional info:
"retry" option just DONT WORK if using kerberos and/or proto=tcp... mount just "obey" if using proto=udp without kerberos (anyway i want a timeout around 10 secs, 21s still isnt good enough).

# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o proto=udp,retry=0
mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).

real    0m21.003s
user    0m0.000s
sys     0m0.003s

Comment 1 Ian Kent 2009-09-23 07:22:30 UTC
Hi Carlos,

I have added the patch I posted in our email conversation and
done a RHEL-5 build. You can find it at:
http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.131.bz517349.1.el5.

Please give this a try without using Kerberos to start with, as
I'm not in a position to test it, and you reported this happening
without it anyway.

This is not without it own problem though.

I probably should have chased this when I first noticed it with
the timed umount changes. Anyway, when autofs sends a TERM signal
to the mount(8) process that it has spawned it terminates OK but
it's child mount.nfs(8) (or mount.nfs4) doesn't terminate. The
mount.nfs process does respond to signals and would go away but,
for autofs to lookup it's pid and signal it is a big pain.

So, to start with lets see if this will at least return within
a configured mount timeout specified in the autofs configuration.
It should and it did for me in all the cases I tried, with or
without the intr option on the mount.

Ian

Comment 2 Carlos André 2009-09-24 17:57:25 UTC
Hi Ian!

Still no good news for me :(

There my tests:

# newer kernel 2.6.18-164.el5 + autofs-5.0.1-0.rc2.131.bz517349.1.el5.i386.rpm #
----------------------------------------------------
[root@KSTATION areas_comuns]# uname -r
2.6.18-164.el5
[root@KSTATION areas_comuns]# automount -V

Linux automount version 5.0.1-0.rc2.131.bz517349.1.el5
----------------------------------------------------

NOT WORKING - AUTOFS CMD: -fstype=nfs4,rw,acl,sec=krb5p 172.x.y.z:/areas_comuns/test
----------------------------------------------------
[root@KSTATION areas_comuns]# time ls -la testedown
ls: testedown: No such file or directory

real    3m9.005s
user    0m0.000s
sys     0m0.001s
----------------------------------------------------

NOT WORKING - AUTOFS CMD (W/O KERBEROS): -fstype=nfs4,rw,acl 172.x.y.z:/areas_comuns/test
----------------------------------------------------
[root@KSTATION areas_comuns]# time ls -la testedown
ls: testedown: No such file or directory

real    3m9.004s
user    0m0.000s
sys     0m0.002s
----------------------------------------------------

NOT WORKING - AUTOFS CMD (W/O KERBEROS AND NFS4): -rw,acl 172.x.y.z:/areas_comuns/test
----------------------------------------------------
[root@KSTATION areas_comuns]# time ls -la testedown
ls: testedown: No such file or directory

real    9m27.022s
user    0m0.000s
sys     0m0.002s
----------------------------------------------------

(i've killed automount AND mount, and restarted autofs for each test)

Sep 24 14:51:47 KSTATION automount[3808]: mount(nfs): nfs: mount failure 172.x.y.z:/areas_comuns/test on /misc/areas_comuns/testedown
Sep 24 14:51:47 KSTATION automount[3808]: ioctl_send_fail: token = 19
Sep 24 14:51:47 KSTATION automount[3808]: failed to mount /misc/areas_comuns/testedown




Any idea ? :)

Thanks a alot!

Carlos.

Comment 3 Ian Kent 2009-09-25 03:03:01 UTC
(In reply to comment #2)
> Hi Ian!
> 
> Still no good news for me :(

*sigh*

snip ...

The timeouts you saw are the same as I got on the command line,
so at least that is consistent.

> 
> Any idea ? :)

What timeout did you put in /etc/sysconfig/autofs?
Can you show me the line exactly as it is in the config file please.

The other thing we can do is to check the mount processes. 

Open another window, look for the mount processes, there should be
two, one for mount and one for mount.nfs(4). Some time after the
timeout you set in the config has passed send a TERM signal to
the mount process and see if it goes away. After checking that
also send a TERM signal to the mount.nfs(4) process and check it.

Let me know what happens.

If the processes aren't catching the signals we will have to
start looking at the mount code and kernel changes.

Ian

Comment 4 Carlos André 2009-09-25 11:18:46 UTC
> What timeout did you put in /etc/sysconfig/autofs?
> Can you show me the line exactly as it is in the config file please.

MOUNT_WAIT=10 (I've tried 1, 2 and 3 too, without any change in autofs timeout...)


> The other thing we can do is to check the mount processes. 
> 
> Open another window, look for the mount processes, there should be
> two, one for mount and one for mount.nfs(4). Some time after the
> timeout you set in the config has passed send a TERM signal to
> the mount process and see if it goes away. After checking that
> also send a TERM signal to the mount.nfs(4) process and check it.
> 
> Let me know what happens.


-----------------------------------------
[root@KSTATION /]# time ls -la /misc/areas_comuns/testedown & sleep 15; killall -TERM mount.nfs4
[1] 6545
[root@KSTATION /]# ls: /misc/areas_comuns/testedown: No such file or directory

real    0m15.006s
user    0m0.000s
sys     0m0.002s
-----------------------------------------
[root@KSTATION /]# time ls -la /misc/areas_comuns/testedown & sleep 20; killall -TERM mount
[1] 6590
ls: : No such file or directory
[root@KSTATION /]#
real    0m20.008s
user    0m0.000s
sys     0m0.002s
-----------------------------------------
Yeah :) 
(using: -fstype=nfs4,rw,acl,sec=krb5p)

> 
> If the processes aren't catching the signals we will have to
> start looking at the mount code and kernel changes.
> 
> Ian

Comment 5 Ian Kent 2009-09-25 14:05:07 UTC
OK, I'll add some directed logging and build a new test package
so we can try and find out what's going on.

Comment 6 Carlos André 2009-09-25 14:52:08 UTC
Thanks Ian :)

Comment 7 Ian Kent 2009-09-29 07:09:53 UTC
I have added some logging to the package. You can find it at:
http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.131.bz517349.2

Please give this a try so we can find out what is and isn't
happening with the timeout and sub-process signalling.

Ian

Comment 8 Carlos André 2009-09-29 13:13:18 UTC
Hah! I figured out... it's SELinux fault :P He's denying the TERM signal of autofs.
Sep 29 10:07:13 KSTATION automount[31977]: timed_read: poll(), ret 0
Sep 29 10:07:13 KSTATION automount[31977]: do_spawn: process done, errn -110
Sep 29 10:07:13 KSTATION automount[31977]: do_spawn: read timed out, sending TERM
Sep 29 10:07:13 KSTATION automount[31977]: do_spawn: wait for pid 32023

type=AVC msg=audit(1254229633.475:1195): avc:  denied  { signal } for  pid=32022 comm="automount" scontext=root:system_r:automount_t:s0 tcontext=root:system_r:mount_t:s0 tclass=process


[root@KSTATION /]# setenforce 0
[root@KSTATION /]# time ls -la /misc/areas_comuns/testedown
ls: /misc/areas_comuns/testedown: No such file or directory

real    0m10.008s
user    0m0.000s
sys     0m0.002s

Comment 9 Carlos André 2009-09-29 14:55:04 UTC
Then, now it's a SELinux devel team problem, right?
They'll need apply a label for this behavior on next SELinux policy patch.

Comment 10 Ian Kent 2009-09-29 16:07:50 UTC
(In reply to comment #9)
> Then, now it's a SELinux devel team problem, right?
> They'll need apply a label for this behavior on next SELinux policy patch.  

Maybe, but I tested this on RHEL-5.4 with Selinux in enforcing
mode without problem.

We will need to compare selinux policy versions before we send
it over to them.

Ian

Comment 11 Carlos André 2009-09-29 16:17:03 UTC
I'm using (selinux-policy-2.4.6-203.el5):
Name        : selinux-policy               Relocations: (not relocatable)
Version     : 2.4.6                             Vendor: CentOS
Release     : 203.el5                       Build Date: Wed 21 Jan 2009 08:49:15 AM BRT
Install Date: Wed 27 May 2009 10:59:34 AM BRT      Build Host: builder10.centos.org

And made no changes in SELinux policies for those tests (until now).

Comment 12 Ian Kent 2009-09-29 16:39:36 UTC
(In reply to comment #11)
> I'm using (selinux-policy-2.4.6-203.el5):
> Name        : selinux-policy               Relocations: (not relocatable)
> Version     : 2.4.6                             Vendor: CentOS
> Release     : 203.el5                       Build Date: Wed 21 Jan 2009
> 08:49:15 AM BRT
> Install Date: Wed 27 May 2009 10:59:34 AM BRT      Build Host:
> builder10.centos.org
> 
> And made no changes in SELinux policies for those tests (until now).  

That looks like a the RHEL-5.3 policy but we had the timed
umount in 5.3 so it should work. Maybe I missed that it didn't
actually work in enforcing mode, not sure.

We have rev 248 in RHEL-5.4.

Let me test 5.3 tomorrow and get back to you.

Ian

Comment 13 Carlos André 2009-09-29 17:17:00 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > I'm using (selinux-policy-2.4.6-203.el5):
> > Name        : selinux-policy               Relocations: (not relocatable)
> > Version     : 2.4.6                             Vendor: CentOS
> > Release     : 203.el5                       Build Date: Wed 21 Jan 2009
> > 08:49:15 AM BRT
> > Install Date: Wed 27 May 2009 10:59:34 AM BRT      Build Host:
> > builder10.centos.org
> > 
> > And made no changes in SELinux policies for those tests (until now).  
> 
> That looks like a the RHEL-5.3 policy but we had the timed
> umount in 5.3 so it should work. Maybe I missed that it didn't
> actually work in enforcing mode, not sure.
> 
> We have rev 248 in RHEL-5.4.
> 
> Let me test 5.3 tomorrow and get back to you.
> 
> Ian  

Yes, It IS RHEL5.3 (i'm using CentOS 5.3) :) If you want me to do any testing, just say...

Thanks alot!

Comment 14 Ian Kent 2009-09-30 06:41:16 UTC
(In reply to comment #13)
> > 
> > We have rev 248 in RHEL-5.4.
> > 
> > Let me test 5.3 tomorrow and get back to you.
> > 
> > Ian  
> 
> Yes, It IS RHEL5.3 (i'm using CentOS 5.3) :) If you want me to do any testing,
> just say...

It appears that RHEL-5.4 has selinux-policy rev 255 but this
change worked for me with rev 248 on RHEL-5.4.

Updating to selinux-policy rev 255 (along with its dependencies)
on RHEL-5.3 allowed this autofs change to work for me. I can't
say in what revision this was fixed but a number of autofs policy
changes went into revs 228 and 229.

So, given that the timed mount is a change that would be targeted
at RHEL-5.5, the selinux policy isn't an issue for us here. You
will need to get hold of CentOS-5.4 selinux packages and test to
make sure they don't introduce unexpected side effects, otherwise
the selinux aspect of this change is a CentOS support issue.

Ian

Comment 15 Carlos André 2009-09-30 14:35:57 UTC
I've updated to selinux-policy-2.4.6-259.el5... 
Your package + selinux-policy-2.4.6-259.el5 = problem is solved :)


[root@KSTATION /]#
[root@KSTATION /]# time ls -la /misc/areas_comuns/testedown
ls: /misc/areas_comuns/testedown: No such file or directory

real    0m15.008s
user    0m0.000s
sys     0m0.002s
[root@KSTATION /]# getenforce
Enforcing
[root@KSTATION /]# rpm -qi selinux-policy
Name        : selinux-policy               Relocations: (not relocatable)
Version     : 2.4.6                             Vendor: Red Hat, Inc.
Release     : 259.el5                       Build Date: Tue 29 Sep 2009 04:38:47 PM BRT
Install Date: Wed 30 Sep 2009 11:21:43 AM BRT      Build Host: js20-bc2-10.build.redhat.com
Group       : System Environment/Base       Source RPM: selinux-policy-2.4.6-259.el5.src.rpm
Size        : 0                                License: GPL
Signature   : (none)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
URL         : http://serefpolicy.sourceforge.net
Summary     : SELinux policy configuration
Description :
SELinux Reference Policy - modular.
[root@KSTATION /]#       

Your package will be oficial just on RHEL5.5 ? or RHEL5.4 will get a update?

Comment 16 Ian Kent 2009-10-01 04:08:03 UTC
(In reply to comment #15)
> Your package will be oficial just on RHEL5.5 ? or RHEL5.4 will get a update?  

Engineering generally doesn't drive the process of proposing
updates for previous RHEL versions, we're more concerned with
current development.

The process is driven by our support groups and is based on
the perceived importance and potential impact of the problem.

Once it is decided that an update is required there are two ways
it can be delivered, assuming it is approved by the relevant
groups (obviously including Engineering). One is a hot fix
which is essentially a release which includes the fix that is
provided directly to the customer and is supported until the
next product release. The second is a little harder to get
approved and is available via RHN to customers that subscribe
to that service.

Clearly, both of these cases require a support subscription of
some sort.

Ian

Comment 17 Carlos André 2009-10-01 11:24:08 UTC
Well, let's wait for 5.5 (or 5.6 :P) :D

Meanwhile i'm using: autofs-5.0.1-0.rc2.131.bz517349.1.el5
 + selinux-policy-2.4.6-259.el5 

Thanks again :)

Comment 18 Ian Kent 2009-10-01 14:24:45 UTC
(In reply to comment #17)
> Well, let's wait for 5.5 (or 5.6 :P) :D
> 
> Meanwhile i'm using: autofs-5.0.1-0.rc2.131.bz517349.1.el5
>  + selinux-policy-2.4.6-259.el5 

I should get this into 5.5.
That's much the same as what you would get with a hotfix anyway, ;)

Ian

Comment 20 Carlos André 2009-10-27 17:31:54 UTC
UPDATE:
- Now with CentOS 5.4 (selinux-policy-2.4.6-255) we just need use autofs-5.0.1-0.rc2.131.bz517349.1.el5 :)

Now, let's wait for 5.5 :P

Thanks :)

Comment 21 Ian Kent 2009-11-25 06:00:15 UTC
Created attachment 373664 [details]
Patch - add mount wait parameter

Comment 22 Ian Kent 2009-12-21 02:23:54 UTC
Build autofs-5.0.1-0.rc2.133.el5 of autofs contains the changes
discussed here.

The RHTS test bz517349 withing the bugzillas workflow tests
detects the issue resolved by this change.

In addition be aware of the selinux dependencies discussed in
comments #12 through #17.

Comment 30 errata-xmlrpc 2010-03-30 08:37:16 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0265.html


Note You need to log in before you can comment on or make changes to this bug.