Bug 834641

Summary:

autofs requires portmapper on server for NFSv4 mounts

Product:

Red Hat Enterprise Linux 6

Reporter:

bcodding

Component:

autofs

Assignee:

Ian Kent <ikent>

Status:

CLOSED ERRATA

QA Contact:

yanfu,wang <yanwang>

Severity:

low

Docs Contact:

Priority:

unspecified

Version:

6.3

CC:

david.halliwell, flakrat, igeorgex, ikent, jcpunk, jonathan.underwood, mishu, pasteur, Per.t.Sjoholm, rik.theys, rmainz, yanwang

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

autofs-5.0.5-55.el6

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-02-21 10:53:18 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

846852

Attachments:

Description	Flags
Longer debug log	none
Patch - fix nfs4 contacts portmap	none
Debug log -- after patch - fixed.	none
/var/log/messages for autofs-5.0.5-54.bz834641.1.el6 and MOUNT_WAIT as default value	none

Description bcodding 2012-06-22 16:07:03 UTC

After upgrading from RHEL 6.2 to 6.3, when automounting nfs4 to a server not running portmapper, automount fails with

Jun 22 12:01:03 gnu automount[21099]: mount_mount: mount(nfs): nfs options="sec=krb5,actimeo=5,timeo=60", nosymlink=1, ro=0
Jun 22 12:01:03 gnu automount[21099]: get_nfs_info: called with host nfs.uvm.edu(10.214.10.214) proto tcp version 0x40
Jun 22 12:01:03 gnu automount[21099]: get_nfs_info: nfs v4 rpc ping time: 0.000430
Jun 22 12:01:03 gnu automount[21099]: get_nfs_info: host nfs.uvm.edu cost 429 weight 0
Jun 22 12:01:03 gnu automount[21099]: mount(nfs): no hosts available

With options:
-fstype=nfs4,sec=krb5,actimeo=5,timeo=60

We found autofs failing after attempting to contact portmapper on the server.  The problem is well discussed here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=675798

Workaround: explicitly specifying the port in the mount map disables autofs' portmapper attempts:

-port=2049,-fstype=nfs4,sec=krb5,actimeo=5,timeo=60

Comment 2 Ian Kent 2012-06-23 02:34:01 UTC

(In reply to comment #0)
> After upgrading from RHEL 6.2 to 6.3, when automounting nfs4 to a server not
> running portmapper, automount fails with
> 
> Jun 22 12:01:03 gnu automount[21099]: mount_mount: mount(nfs): nfs
> options="sec=krb5,actimeo=5,timeo=60", nosymlink=1, ro=0
> Jun 22 12:01:03 gnu automount[21099]: get_nfs_info: called with host
> nfs.uvm.edu(10.214.10.214) proto tcp version 0x40
> Jun 22 12:01:03 gnu automount[21099]: get_nfs_info: nfs v4 rpc ping time:
> 0.000430
> Jun 22 12:01:03 gnu automount[21099]: get_nfs_info: host nfs.uvm.edu cost
> 429 weight 0
> Jun 22 12:01:03 gnu automount[21099]: mount(nfs): no hosts available

You will need to post the debug log from start until some time
after the problem happens for it to be useful to me.

> 
> With options:
> -fstype=nfs4,sec=krb5,actimeo=5,timeo=60

Are you sure these are the options?

> 
> We found autofs failing after attempting to contact portmapper on the
> server.  The problem is well discussed here:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=675798

I only had a quick look at that bug since much of it was taken
from comments I made about the issue on the autofs mailing list.
Not everything that I said is reflected in the bug and some
things appear to not be entirely accurately presented.

One thing that I did see is that the conclusion of the bug was
that using the "-fstype=nfs4" option does in fact cause automount
to bypass port lookup. Although I didn't look closely I don't
remember seeing discussion about the hosts map, which is a
special case. I did however see some talk about the MOUNT_WAIT
configuration entry but didn't notice if it mentioned that can
be used to restore the previous behaviour but with a timeout on
waiting for mounts to complete.

The posters in the bug do not appear not to appreciate the
problem that the changes are meant to help with, and the need
to offer a way to revert to the previous behaviour without
introducing unacceptable wait times for mounts to servers that
aren't responding.

The problem that lead to this change is that there can be lengthy
waits for mounts (2-3 minutes) due to changes to mount.nfs(8) and
the kernel. I've been aware of the changed behaviour of the kernel
for some time and managed to have the situation improved some after
reporting my difficulty. But not all the difficulties could be
resolved. Now that mount.nfs(8) passes the mount options to the
kernel, and the kernel performs the RPC operations that mount did
previously, delays on mounts to servers that aren't available can
be significant.

I had to find a way to, at the very least, improve that.

Ian

Comment 3 bcodding 2012-06-25 13:59:53 UTC

Created attachment 594192 [details]
Longer debug log

Comment 4 bcodding 2012-06-25 14:03:26 UTC

> You will need to post the debug log from start until some time
> after the problem happens for it to be useful to me.

Ok.  I hope 20 minutes is enough for you.

> > With options:
> > -fstype=nfs4,sec=krb5,actimeo=5,timeo=60
> 
> Are you sure these are the options?

Yes.  The log should also reassure you.

> > We found autofs failing after attempting to contact portmapper on the
> > server.  The problem is well discussed here:
> > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=675798
> 
> I only had a quick look at that bug since much of it was taken
> from comments I made about the issue on the autofs mailing list.
> Not everything that I said is reflected in the bug and some
> things appear to not be entirely accurately presented.

Where can I find what you've said about this?  I've posted the bug so other
sysadmins can find the workaround after the upgrade breaks their systems,
but now I am legitimately curious about the internals of the issue.

I understand that kernel can cause automount to wait for some time in
mount.nfs, and that wait is unacceptable.  I can think of several ways to
work around waiting for a process, but I don't want to assume too much about
why exactly those options aren't considered; maybe you can tell us?

I think that for the nfsv4 case a portmapper contact would be unneccessary,
can you tell me why that thinking is wrong?

> One thing that I did see is that the conclusion of the bug was
> that using the "-fstype=nfs4" option does in fact cause automount
> to bypass port lookup.

Not in our experience; only specifying the port causes the lookup to be
bypassed.

>  ...
> I had to find a way to, at the very least, improve that.

Ok, let's find a way to fix this now.  I can test patches for you, if you'd
like to save time not reproducing.

Comment 5 Ian Kent 2012-06-25 15:08:18 UTC

(In reply to comment #4)
> > You will need to post the debug log from start until some time
> > after the problem happens for it to be useful to me.
> 
> Ok.  I hope 20 minutes is enough for you.

That's great, the log start with the startup of autofs so I
know there is nothing that I might miss. Very good thanks.

> 
> > > With options:
> > > -fstype=nfs4,sec=krb5,actimeo=5,timeo=60
> > 
> > Are you sure these are the options?
> 
> Yes.  The log should also reassure you.

Indeed that is so, good.

> 
> > > We found autofs failing after attempting to contact portmapper on the
> > > server.  The problem is well discussed here:
> > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=675798
> > 
> > I only had a quick look at that bug since much of it was taken
> > from comments I made about the issue on the autofs mailing list.
> > Not everything that I said is reflected in the bug and some
> > things appear to not be entirely accurately presented.
> 
> Where can I find what you've said about this?  I've posted the bug so other
> sysadmins can find the workaround after the upgrade breaks their systems,
> but now I am legitimately curious about the internals of the issue.

They should be on mailing list mirrors around the place.
If you really want persue it I can forward the mails but
somehow I suspect you won't need to persue it once we
sort things out working on this bug.

> 
> I understand that kernel can cause automount to wait for some time in
> mount.nfs, and that wait is unacceptable.  I can think of several ways to
> work around waiting for a process, but I don't want to assume too much about
> why exactly those options aren't considered; maybe you can tell us?

Yes, it is unacceptable for an interactive application.
The actual probelm is that the kernel must wait for RPCs to
time out most of the time because of the "best effort" needed
to avoid potential corruption when using NFS.

The consequence of that for autofs is that it must now probe
the server (including for simple mounts, which it didn't do
previously) to see if it is avialble, which is a reasonably
quick process, before handing off to mount(8).

> 
> I think that for the nfsv4 case a portmapper contact would be unneccessary,
> can you tell me why that thinking is wrong?

Your thinking is not wrong, there is in fact a mistake in
the code which I found fairly quickly thanks to the log you
provided.

Unfortunealy, sometimes it can be hard to communicate with
people and real problems can't be properly indentified. Then
someone comes along and provides exactly what I ask for and
the issue is then found. I guess I owe the mailing list
person an appology, ;)

> 
> > One thing that I did see is that the conclusion of the bug was
> > that using the "-fstype=nfs4" option does in fact cause automount
> > to bypass port lookup.
> 
> Not in our experience; only specifying the port causes the lookup to be
> bypassed.

Again, that's right, I really wonder how that came about since
I was wrong about it in the first place.

> 
> >  ...
> > I had to find a way to, at the very least, improve that.
> 
> Ok, let's find a way to fix this now.  I can test patches for you, if you'd
> like to save time not reproducing.

It's late here so I'll just post the patch and make a test
package tomorrow, unless you "really" need a test package.

Ian

Comment 6 bcodding 2012-06-25 15:13:23 UTC

Just the patch would be perfect.  I'll rebuild and test and send you my logs.

Comment 7 Ian Kent 2012-06-25 15:17:02 UTC

(In reply to comment #5)
> 
> Your thinking is not wrong, there is in fact a mistake in
> the code which I found fairly quickly thanks to the log you
> provided.

It is not a good excuse but I would also like to add that the
mistake has been present in the code for quite a while and
wasn't actually introduced by the recent change.

Comment 8 Ian Kent 2012-06-25 15:21:24 UTC

Created attachment 594218 [details]
Patch  - fix nfs4 contacts portmap

Comment 9 Ian Kent 2012-06-25 15:26:48 UTC

It's also worth remembering that if the MOUNT_WAIT configuration
option is given a value other than -1 (default, wait until mount
returns) then autofs won't perform the probe at all for simple
mounts.

It isn't recommended if you have servers frequently not contactable
but, set to a sensible value, it will cause autofs to behave the
way it did before the change but also will not wait for a blocked
mount to complete.

Comment 10 bcodding 2012-06-25 18:04:19 UTC

That patch has fixed it.. attached fixed log.  Thanks.

Comment 11 bcodding 2012-06-25 18:05:18 UTC

Created attachment 594251 [details]
Debug log -- after patch - fixed.

Comment 12 Ian Kent 2012-06-27 04:03:52 UTC

A test package with the patch posted in this bug has been built.
It is available at:
http://people.redhat.com/~ikent/autofs-5.0.5-54.bz834641.1.el6

Please test and report your results.

Comment 13 bcodding 2012-06-27 13:14:53 UTC

Ian, your test package also fixes the problem.  Same results as Comment 11.

Comment 14 Mike Hanby 2012-07-17 17:31:18 UTC

I encountered the same issue with our autofs mounted NFSv4 home directories. Installing the test package on the clients resolved the problem (testing the automount map with -port=2049 also successfully mounted it).

Comment 15 Ian Kent 2012-09-06 01:42:12 UTC

*** Bug 846320 has been marked as a duplicate of this bug. ***

Comment 16 Jonathan Underwood 2012-09-06 21:28:58 UTC

I just hit this problem too withautofs-5.0.5-54.el6.x86_64 . 

I tried Ian's test package (autofs-5.0.5-54.bz834641.1.el6) and that didn't change matters for me. In both cases I had MOUNT_WAIT=-1. 

However, in both cases, if I change to MOUNT_WAIT=10, directories are automounted successfully.

I am curious as to:

1) What is actually the correct fix - adjust MOUNT_WAIT, or specify the port, and/or specify -fstype=nfs4 ?

2) Will there be a fixed package pushed as an errata?

Comment 17 Jonathan Underwood 2012-09-06 22:35:03 UTC

@bcodding: are you sure that Ian's patch/updated package fixes the problem for you *without* any of the other workarounds?

Comment 18 bcodding 2012-09-07 00:16:54 UTC

(In reply to comment #17)
> @bcodding: are you sure that Ian's patch/updated package fixes the problem
> for you *without* any of the other workarounds?

Yes, I am sure.

Comment 19 Ian Kent 2012-09-07 02:13:21 UTC

(In reply to comment #16)
> I just hit this problem too withautofs-5.0.5-54.el6.x86_64 . 
> 
> I tried Ian's test package (autofs-5.0.5-54.bz834641.1.el6) and that didn't
> change matters for me. In both cases I had MOUNT_WAIT=-1.

It should have, we'll need to work out why that is the case.
How about posting a debug log.
 
> 
> However, in both cases, if I change to MOUNT_WAIT=10, directories are
> automounted successfully.

Setting the MOUNT_WAIT is meant to restore the previous behaviour
without also exposing you to possible long mount timeouts. So you
can consider it a workaround but it is no accident it works that
way.

> 
> I am curious as to:
> 
> 1) What is actually the correct fix - adjust MOUNT_WAIT, or specify the
> port, and/or specify -fstype=nfs4 ?

For this specific issue the patch here should be sufficient.
There are some other issues but they are not specific to
contacting the port mapper.

I've already talked about MOUNT_WAIT.

Specifying "-fstype=nfs4" should also have the desired effect
because it says this is an NFSv4 only mount. The mount option
"-t nfs4" will be added to the mount command and fallback to
NFSv3 won't be attempted by mount.nfs.

If you have an NFSv4 only environment or your servers export
NFSv4 mounts without using the global root you can set MOUNT_NFS_DEFAULT_PROTOCOL=4 in the autofs configuration
(which is the default on install).

> 
> 2) Will there be a fixed package pushed as an errata?

You can see that by looking at the bug.

It's set to be an update for RHEL-6.4, other than that it's
not my call. Also note that if you have access to async updates
or you want a hotfix prior to RHEL-6.4 then the issue needs to
to logged via support and the fix requested. You can't do that
with issues logged directly in Bugzilla.

Ian

Comment 20 Ian Kent 2012-09-07 02:16:16 UTC

(In reply to comment #19)
> > 
> > 2) Will there be a fixed package pushed as an errata?
> 
> You can see that by looking at the bug.
> 
> It's set to be an update for RHEL-6.4, other than that it's
> not my call. Also note that if you have access to async updates

Well, it was set to be updated but the flags are cleaed now
and that wasn't my doing. Bugzilla is misbehaving a lot lately!

Comment 21 Jonathan Underwood 2012-09-07 13:51:03 UTC

Created attachment 610722 [details]
/var/log/messages for autofs-5.0.5-54.bz834641.1.el6 and MOUNT_WAIT as default value

Comment 22 Jonathan Underwood 2012-09-07 13:54:53 UTC

The log file in Comment #21 is /var/log/messages after installing autofs-5.0.5-54.bz834641.1.el6, unsetting MOUNT_WAIT (so it takes its default), restarting autofs, and logging in as a user with an automounted home directory. As you can see, the mount fails. Setting Mount_WAIT=10 allows the home directory to mount successfully.

I should say, this test machine is running Scientific Linux 6.3, not RH.

Comment 23 Jonathan Underwood 2012-09-07 13:56:29 UTC

I should also add that I have MOUNT_NFS_DEFAULT_PROTOCOL=4 in all my testing.

Comment 24 Ian Kent 2012-09-07 14:36:23 UTC

Are you working in a TCP only NFS environemnt?

Comment 25 Jonathan Underwood 2012-09-07 14:46:29 UTC

(In reply to comment #24)
> Are you working in a TCP only NFS environemnt?

Yes, all servers are NFSv4 only.

Comment 26 Ian Kent 2012-09-07 15:15:39 UTC

(In reply to comment #25)
> (In reply to comment #24)
> > Are you working in a TCP only NFS environemnt?
> 
> Yes, all servers are NFSv4 only.

That's not what I asked.

Comment 27 Jonathan Underwood 2012-09-07 15:22:53 UTC

(In reply to comment #26)
> (In reply to comment #25)
> > (In reply to comment #24)
> > > Are you working in a TCP only NFS environemnt?
> > 
> > Yes, all servers are NFSv4 only.
> 
> That's not what I asked.

OK - I didn't until just now realize you could allow NFSv4 over UDP! None of the servers have -o udp or allow incoming/outgoing udp in their firewall configuration. So, yes, all TCP.

Comment 28 Ian Kent 2012-09-07 15:26:20 UTC

Could you please try the package at:
http://people.redhat.com/~ikent/autofs-5.0.5-55.el6

Comment 29 Jonathan Underwood 2012-09-07 15:32:14 UTC

(In reply to comment #28)
> Could you please try the package at:
> http://people.redhat.com/~ikent/autofs-5.0.5-55.el6

Same result I am afraid - mount fails unless I specify MOUNT_WAIT=10.

Comment 30 Ian Kent 2012-09-10 01:50:05 UTC

(In reply to comment #29)
> (In reply to comment #28)
> > Could you please try the package at:
> > http://people.redhat.com/~ikent/autofs-5.0.5-55.el6
> 
> Same result I am afraid - mount fails unless I specify MOUNT_WAIT=10.

I think you'll need to use "-fstype=nfs4".

It looks like either I start undoing what's been done to get
reasonable interactive response times following the recent
changes to mount.nfs or the fstype pseudo option will be
required. The change to mount.nfs essentially passes most
tasks for mounting to the kernel.

Comment 31 Ian Kent 2012-09-10 01:54:30 UTC

(In reply to comment #30)
> (In reply to comment #29)
> > (In reply to comment #28)
> > > Could you please try the package at:
> > > http://people.redhat.com/~ikent/autofs-5.0.5-55.el6
> > 
> > Same result I am afraid - mount fails unless I specify MOUNT_WAIT=10.
> 
> I think you'll need to use "-fstype=nfs4".
> 
> It looks like either I start undoing what's been done to get
> reasonable interactive response times following the recent
> changes to mount.nfs or the fstype pseudo option will be
> required. The change to mount.nfs essentially passes most
> tasks for mounting to the kernel.

That is, reasonable interactive response time for mount when
servers that are not responding are encountered.

Of course the MOUNT_WAIT option can be used to tell autofs to
limit mount wait time instead to probing availability before
mounting.

Comment 32 Jonathan Underwood 2012-09-10 12:47:33 UTC

Would a better design not be to introduce a flag which enables/disables the check that automount does to see if the server is up before handing off to mount? I realize that MOUNT_WAIT does this already, but it presently seems to serve two (somewhat orthogonal) purposes.

Comment 33 Jonathan Underwood 2012-09-10 15:57:21 UTC

For others that might be reading this bug, the following thread contains a lot of useful information about this situation (I wish I'd found this earlier):

http://www.spinics.net/lists/autofs/msg00132.html

Reading through that, it does seem to me that the following are worth while suggestions to consider implementing to make this situation easier to deal with:

1) As suggested by Michael Tokarev, try a TCP probe of port 2049 on the server before bothering to contact portmap - that way if nfs4 is available that will be used (unless another version has been explicitly specified)

2) As I suggested above, add an extra switch that disables the portmap probing, rather than tying it into the MOUNT_WAIT variable.

Comment 34 Ian Kent 2012-09-19 03:02:28 UTC

(In reply to comment #33)
> For others that might be reading this bug, the following thread contains a
> lot of useful information about this situation (I wish I'd found this
> earlier):
> 
> http://www.spinics.net/lists/autofs/msg00132.html
> 
> Reading through that, it does seem to me that the following are worth while
> suggestions to consider implementing to make this situation easier to deal
> with:
> 
> 1) As suggested by Michael Tokarev, try a TCP probe of port 2049 on the
> server before bothering to contact portmap - that way if nfs4 is available
> that will be used (unless another version has been explicitly specified)

That's not sensible because, for the hosts map, autofs needs to
contact mountd.

> 
> 2) As I suggested above, add an extra switch that disables the portmap
> probing, rather than tying it into the MOUNT_WAIT variable.

But there's already such an option, "fstype=nfs4" is meant to be
used to ensure the portmapper is not contacted but it also says
that your using nfsv4 only so there is no requirement to be able
to fall back to earlier nfs protocol versions. That last bit is
important.

There was a bug when specifying fstype which has been fixed now.

Ian

Comment 36 yanfu,wang 2013-01-16 03:15:38 UTC

I couldn't reproduce using comment #0, but I could reproduce using the related bug 846852 steps:
Reproduced on autofs-5.0.5-54.el6:
nfs server:
[root@hp-dl388g8-06 ~]# cat /etc/exports
/tmp *(rw)
[root@hp-dl388g8-06 ~]# service nfs restart
[root@hp-dl388g8-06 ~]# iptables -A INPUT -m state --state NEW -m udp -p udp --dport 111 -j DROP

client:
Guarantee hosts map enabled:
[root@ibm-x3550m3-05 ~]# service autofs restart
Stopping automount:                                        [  OK  ]
Starting automount:                                        [  OK  ]
[root@ibm-x3550m3-05 ~]# ls -l /net/hp-dl388g8-06.rhts.eng.nay.redhat.com
note: ls hung there and from /var/log/messages:
Jan 13 22:40:26 ibm-x3550m3-05 automount[6704]: lookup_read_master: lookup(nisplus): couldn't locate nis+ table auto.master
Jan 13 22:40:50 ibm-x3550m3-05 kernel: automount[6716]: segfault at 28 ip 00007faf780dd862 sp 00007faf7b92f960 error 4 in lookup_hosts.so[7faf780d4000+1c000]
Jan 13 22:40:50 ibm-x3550m3-05 abrtd: Directory 'ccpp-2013-01-13-22:40:50-6704' creation detected
Jan 13 22:40:50 ibm-x3550m3-05 abrt[6717]: Saved core dump of pid 6704 (/usr/sbin/automount) to /var/spool/abrt/ccpp-2013-01-13-22:40:50-6704 (34787328 bytes)
Jan 13 22:40:51 ibm-x3550m3-05 kernel: Bridge firewalling registered


Verified on autofs-5.0.5-72.el6:
[root@ibm-x3550m3-05 ~]# ls -l /net/hp-dl388g8-06.rhts.eng.nay.redhat.com
total 0
drwxr-xr-x. 2 root root 0 Jan 15 22:56 tmp

Comment 37 errata-xmlrpc 2013-02-21 10:53:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0462.html