Bug 834641
Summary: | autofs requires portmapper on server for NFSv4 mounts | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | bcodding |
Component: | autofs | Assignee: | Ian Kent <ikent> |
Status: | CLOSED ERRATA | QA Contact: | yanfu,wang <yanwang> |
Severity: | low | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.3 | CC: | david.halliwell, flakrat, igeorgex, ikent, jcpunk, jonathan.underwood, mishu, pasteur, Per.t.Sjoholm, rik.theys, rmainz, yanwang |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | autofs-5.0.5-55.el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-02-21 10:53:18 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 846852 | ||
Attachments: |
Description
bcodding
2012-06-22 16:07:03 UTC
(In reply to comment #0) > After upgrading from RHEL 6.2 to 6.3, when automounting nfs4 to a server not > running portmapper, automount fails with > > Jun 22 12:01:03 gnu automount[21099]: mount_mount: mount(nfs): nfs > options="sec=krb5,actimeo=5,timeo=60", nosymlink=1, ro=0 > Jun 22 12:01:03 gnu automount[21099]: get_nfs_info: called with host > nfs.uvm.edu(10.214.10.214) proto tcp version 0x40 > Jun 22 12:01:03 gnu automount[21099]: get_nfs_info: nfs v4 rpc ping time: > 0.000430 > Jun 22 12:01:03 gnu automount[21099]: get_nfs_info: host nfs.uvm.edu cost > 429 weight 0 > Jun 22 12:01:03 gnu automount[21099]: mount(nfs): no hosts available You will need to post the debug log from start until some time after the problem happens for it to be useful to me. > > With options: > -fstype=nfs4,sec=krb5,actimeo=5,timeo=60 Are you sure these are the options? > > We found autofs failing after attempting to contact portmapper on the > server. The problem is well discussed here: > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=675798 I only had a quick look at that bug since much of it was taken from comments I made about the issue on the autofs mailing list. Not everything that I said is reflected in the bug and some things appear to not be entirely accurately presented. One thing that I did see is that the conclusion of the bug was that using the "-fstype=nfs4" option does in fact cause automount to bypass port lookup. Although I didn't look closely I don't remember seeing discussion about the hosts map, which is a special case. I did however see some talk about the MOUNT_WAIT configuration entry but didn't notice if it mentioned that can be used to restore the previous behaviour but with a timeout on waiting for mounts to complete. The posters in the bug do not appear not to appreciate the problem that the changes are meant to help with, and the need to offer a way to revert to the previous behaviour without introducing unacceptable wait times for mounts to servers that aren't responding. The problem that lead to this change is that there can be lengthy waits for mounts (2-3 minutes) due to changes to mount.nfs(8) and the kernel. I've been aware of the changed behaviour of the kernel for some time and managed to have the situation improved some after reporting my difficulty. But not all the difficulties could be resolved. Now that mount.nfs(8) passes the mount options to the kernel, and the kernel performs the RPC operations that mount did previously, delays on mounts to servers that aren't available can be significant. I had to find a way to, at the very least, improve that. Ian Created attachment 594192 [details]
Longer debug log
> You will need to post the debug log from start until some time > after the problem happens for it to be useful to me. Ok. I hope 20 minutes is enough for you. > > With options: > > -fstype=nfs4,sec=krb5,actimeo=5,timeo=60 > > Are you sure these are the options? Yes. The log should also reassure you. > > We found autofs failing after attempting to contact portmapper on the > > server. The problem is well discussed here: > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=675798 > > I only had a quick look at that bug since much of it was taken > from comments I made about the issue on the autofs mailing list. > Not everything that I said is reflected in the bug and some > things appear to not be entirely accurately presented. Where can I find what you've said about this? I've posted the bug so other sysadmins can find the workaround after the upgrade breaks their systems, but now I am legitimately curious about the internals of the issue. I understand that kernel can cause automount to wait for some time in mount.nfs, and that wait is unacceptable. I can think of several ways to work around waiting for a process, but I don't want to assume too much about why exactly those options aren't considered; maybe you can tell us? I think that for the nfsv4 case a portmapper contact would be unneccessary, can you tell me why that thinking is wrong? > One thing that I did see is that the conclusion of the bug was > that using the "-fstype=nfs4" option does in fact cause automount > to bypass port lookup. Not in our experience; only specifying the port causes the lookup to be bypassed. > ... > I had to find a way to, at the very least, improve that. Ok, let's find a way to fix this now. I can test patches for you, if you'd like to save time not reproducing. (In reply to comment #4) > > You will need to post the debug log from start until some time > > after the problem happens for it to be useful to me. > > Ok. I hope 20 minutes is enough for you. That's great, the log start with the startup of autofs so I know there is nothing that I might miss. Very good thanks. > > > > With options: > > > -fstype=nfs4,sec=krb5,actimeo=5,timeo=60 > > > > Are you sure these are the options? > > Yes. The log should also reassure you. Indeed that is so, good. > > > > We found autofs failing after attempting to contact portmapper on the > > > server. The problem is well discussed here: > > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=675798 > > > > I only had a quick look at that bug since much of it was taken > > from comments I made about the issue on the autofs mailing list. > > Not everything that I said is reflected in the bug and some > > things appear to not be entirely accurately presented. > > Where can I find what you've said about this? I've posted the bug so other > sysadmins can find the workaround after the upgrade breaks their systems, > but now I am legitimately curious about the internals of the issue. They should be on mailing list mirrors around the place. If you really want persue it I can forward the mails but somehow I suspect you won't need to persue it once we sort things out working on this bug. > > I understand that kernel can cause automount to wait for some time in > mount.nfs, and that wait is unacceptable. I can think of several ways to > work around waiting for a process, but I don't want to assume too much about > why exactly those options aren't considered; maybe you can tell us? Yes, it is unacceptable for an interactive application. The actual probelm is that the kernel must wait for RPCs to time out most of the time because of the "best effort" needed to avoid potential corruption when using NFS. The consequence of that for autofs is that it must now probe the server (including for simple mounts, which it didn't do previously) to see if it is avialble, which is a reasonably quick process, before handing off to mount(8). > > I think that for the nfsv4 case a portmapper contact would be unneccessary, > can you tell me why that thinking is wrong? Your thinking is not wrong, there is in fact a mistake in the code which I found fairly quickly thanks to the log you provided. Unfortunealy, sometimes it can be hard to communicate with people and real problems can't be properly indentified. Then someone comes along and provides exactly what I ask for and the issue is then found. I guess I owe the mailing list person an appology, ;) > > > One thing that I did see is that the conclusion of the bug was > > that using the "-fstype=nfs4" option does in fact cause automount > > to bypass port lookup. > > Not in our experience; only specifying the port causes the lookup to be > bypassed. Again, that's right, I really wonder how that came about since I was wrong about it in the first place. > > > ... > > I had to find a way to, at the very least, improve that. > > Ok, let's find a way to fix this now. I can test patches for you, if you'd > like to save time not reproducing. It's late here so I'll just post the patch and make a test package tomorrow, unless you "really" need a test package. Ian Just the patch would be perfect. I'll rebuild and test and send you my logs. (In reply to comment #5) > > Your thinking is not wrong, there is in fact a mistake in > the code which I found fairly quickly thanks to the log you > provided. It is not a good excuse but I would also like to add that the mistake has been present in the code for quite a while and wasn't actually introduced by the recent change. Created attachment 594218 [details]
Patch - fix nfs4 contacts portmap
It's also worth remembering that if the MOUNT_WAIT configuration option is given a value other than -1 (default, wait until mount returns) then autofs won't perform the probe at all for simple mounts. It isn't recommended if you have servers frequently not contactable but, set to a sensible value, it will cause autofs to behave the way it did before the change but also will not wait for a blocked mount to complete. That patch has fixed it.. attached fixed log. Thanks. Created attachment 594251 [details]
Debug log -- after patch - fixed.
A test package with the patch posted in this bug has been built. It is available at: http://people.redhat.com/~ikent/autofs-5.0.5-54.bz834641.1.el6 Please test and report your results. Ian, your test package also fixes the problem. Same results as Comment 11. I encountered the same issue with our autofs mounted NFSv4 home directories. Installing the test package on the clients resolved the problem (testing the automount map with -port=2049 also successfully mounted it). *** Bug 846320 has been marked as a duplicate of this bug. *** I just hit this problem too withautofs-5.0.5-54.el6.x86_64 . I tried Ian's test package (autofs-5.0.5-54.bz834641.1.el6) and that didn't change matters for me. In both cases I had MOUNT_WAIT=-1. However, in both cases, if I change to MOUNT_WAIT=10, directories are automounted successfully. I am curious as to: 1) What is actually the correct fix - adjust MOUNT_WAIT, or specify the port, and/or specify -fstype=nfs4 ? 2) Will there be a fixed package pushed as an errata? @bcodding: are you sure that Ian's patch/updated package fixes the problem for you *without* any of the other workarounds? (In reply to comment #17) > @bcodding: are you sure that Ian's patch/updated package fixes the problem > for you *without* any of the other workarounds? Yes, I am sure. (In reply to comment #16) > I just hit this problem too withautofs-5.0.5-54.el6.x86_64 . > > I tried Ian's test package (autofs-5.0.5-54.bz834641.1.el6) and that didn't > change matters for me. In both cases I had MOUNT_WAIT=-1. It should have, we'll need to work out why that is the case. How about posting a debug log. > > However, in both cases, if I change to MOUNT_WAIT=10, directories are > automounted successfully. Setting the MOUNT_WAIT is meant to restore the previous behaviour without also exposing you to possible long mount timeouts. So you can consider it a workaround but it is no accident it works that way. > > I am curious as to: > > 1) What is actually the correct fix - adjust MOUNT_WAIT, or specify the > port, and/or specify -fstype=nfs4 ? For this specific issue the patch here should be sufficient. There are some other issues but they are not specific to contacting the port mapper. I've already talked about MOUNT_WAIT. Specifying "-fstype=nfs4" should also have the desired effect because it says this is an NFSv4 only mount. The mount option "-t nfs4" will be added to the mount command and fallback to NFSv3 won't be attempted by mount.nfs. If you have an NFSv4 only environment or your servers export NFSv4 mounts without using the global root you can set MOUNT_NFS_DEFAULT_PROTOCOL=4 in the autofs configuration (which is the default on install). > > 2) Will there be a fixed package pushed as an errata? You can see that by looking at the bug. It's set to be an update for RHEL-6.4, other than that it's not my call. Also note that if you have access to async updates or you want a hotfix prior to RHEL-6.4 then the issue needs to to logged via support and the fix requested. You can't do that with issues logged directly in Bugzilla. Ian (In reply to comment #19) > > > > 2) Will there be a fixed package pushed as an errata? > > You can see that by looking at the bug. > > It's set to be an update for RHEL-6.4, other than that it's > not my call. Also note that if you have access to async updates Well, it was set to be updated but the flags are cleaed now and that wasn't my doing. Bugzilla is misbehaving a lot lately! Created attachment 610722 [details] /var/log/messages for autofs-5.0.5-54.bz834641.1.el6 and MOUNT_WAIT as default value The log file in Comment #21 is /var/log/messages after installing autofs-5.0.5-54.bz834641.1.el6, unsetting MOUNT_WAIT (so it takes its default), restarting autofs, and logging in as a user with an automounted home directory. As you can see, the mount fails. Setting Mount_WAIT=10 allows the home directory to mount successfully. I should say, this test machine is running Scientific Linux 6.3, not RH. I should also add that I have MOUNT_NFS_DEFAULT_PROTOCOL=4 in all my testing. Are you working in a TCP only NFS environemnt? (In reply to comment #24) > Are you working in a TCP only NFS environemnt? Yes, all servers are NFSv4 only. (In reply to comment #25) > (In reply to comment #24) > > Are you working in a TCP only NFS environemnt? > > Yes, all servers are NFSv4 only. That's not what I asked. (In reply to comment #26) > (In reply to comment #25) > > (In reply to comment #24) > > > Are you working in a TCP only NFS environemnt? > > > > Yes, all servers are NFSv4 only. > > That's not what I asked. OK - I didn't until just now realize you could allow NFSv4 over UDP! None of the servers have -o udp or allow incoming/outgoing udp in their firewall configuration. So, yes, all TCP. Could you please try the package at: http://people.redhat.com/~ikent/autofs-5.0.5-55.el6 (In reply to comment #28) > Could you please try the package at: > http://people.redhat.com/~ikent/autofs-5.0.5-55.el6 Same result I am afraid - mount fails unless I specify MOUNT_WAIT=10. (In reply to comment #29) > (In reply to comment #28) > > Could you please try the package at: > > http://people.redhat.com/~ikent/autofs-5.0.5-55.el6 > > Same result I am afraid - mount fails unless I specify MOUNT_WAIT=10. I think you'll need to use "-fstype=nfs4". It looks like either I start undoing what's been done to get reasonable interactive response times following the recent changes to mount.nfs or the fstype pseudo option will be required. The change to mount.nfs essentially passes most tasks for mounting to the kernel. (In reply to comment #30) > (In reply to comment #29) > > (In reply to comment #28) > > > Could you please try the package at: > > > http://people.redhat.com/~ikent/autofs-5.0.5-55.el6 > > > > Same result I am afraid - mount fails unless I specify MOUNT_WAIT=10. > > I think you'll need to use "-fstype=nfs4". > > It looks like either I start undoing what's been done to get > reasonable interactive response times following the recent > changes to mount.nfs or the fstype pseudo option will be > required. The change to mount.nfs essentially passes most > tasks for mounting to the kernel. That is, reasonable interactive response time for mount when servers that are not responding are encountered. Of course the MOUNT_WAIT option can be used to tell autofs to limit mount wait time instead to probing availability before mounting. Would a better design not be to introduce a flag which enables/disables the check that automount does to see if the server is up before handing off to mount? I realize that MOUNT_WAIT does this already, but it presently seems to serve two (somewhat orthogonal) purposes. For others that might be reading this bug, the following thread contains a lot of useful information about this situation (I wish I'd found this earlier): http://www.spinics.net/lists/autofs/msg00132.html Reading through that, it does seem to me that the following are worth while suggestions to consider implementing to make this situation easier to deal with: 1) As suggested by Michael Tokarev, try a TCP probe of port 2049 on the server before bothering to contact portmap - that way if nfs4 is available that will be used (unless another version has been explicitly specified) 2) As I suggested above, add an extra switch that disables the portmap probing, rather than tying it into the MOUNT_WAIT variable. (In reply to comment #33) > For others that might be reading this bug, the following thread contains a > lot of useful information about this situation (I wish I'd found this > earlier): > > http://www.spinics.net/lists/autofs/msg00132.html > > Reading through that, it does seem to me that the following are worth while > suggestions to consider implementing to make this situation easier to deal > with: > > 1) As suggested by Michael Tokarev, try a TCP probe of port 2049 on the > server before bothering to contact portmap - that way if nfs4 is available > that will be used (unless another version has been explicitly specified) That's not sensible because, for the hosts map, autofs needs to contact mountd. > > 2) As I suggested above, add an extra switch that disables the portmap > probing, rather than tying it into the MOUNT_WAIT variable. But there's already such an option, "fstype=nfs4" is meant to be used to ensure the portmapper is not contacted but it also says that your using nfsv4 only so there is no requirement to be able to fall back to earlier nfs protocol versions. That last bit is important. There was a bug when specifying fstype which has been fixed now. Ian I couldn't reproduce using comment #0, but I could reproduce using the related bug 846852 steps: Reproduced on autofs-5.0.5-54.el6: nfs server: [root@hp-dl388g8-06 ~]# cat /etc/exports /tmp *(rw) [root@hp-dl388g8-06 ~]# service nfs restart [root@hp-dl388g8-06 ~]# iptables -A INPUT -m state --state NEW -m udp -p udp --dport 111 -j DROP client: Guarantee hosts map enabled: [root@ibm-x3550m3-05 ~]# service autofs restart Stopping automount: [ OK ] Starting automount: [ OK ] [root@ibm-x3550m3-05 ~]# ls -l /net/hp-dl388g8-06.rhts.eng.nay.redhat.com note: ls hung there and from /var/log/messages: Jan 13 22:40:26 ibm-x3550m3-05 automount[6704]: lookup_read_master: lookup(nisplus): couldn't locate nis+ table auto.master Jan 13 22:40:50 ibm-x3550m3-05 kernel: automount[6716]: segfault at 28 ip 00007faf780dd862 sp 00007faf7b92f960 error 4 in lookup_hosts.so[7faf780d4000+1c000] Jan 13 22:40:50 ibm-x3550m3-05 abrtd: Directory 'ccpp-2013-01-13-22:40:50-6704' creation detected Jan 13 22:40:50 ibm-x3550m3-05 abrt[6717]: Saved core dump of pid 6704 (/usr/sbin/automount) to /var/spool/abrt/ccpp-2013-01-13-22:40:50-6704 (34787328 bytes) Jan 13 22:40:51 ibm-x3550m3-05 kernel: Bridge firewalling registered Verified on autofs-5.0.5-72.el6: [root@ibm-x3550m3-05 ~]# ls -l /net/hp-dl388g8-06.rhts.eng.nay.redhat.com total 0 drwxr-xr-x. 2 root root 0 Jan 15 22:56 tmp Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0462.html |