Bug 1287894
| Summary: | Trying to automount a non-existent directory using localhost results in a ~4 minute hang | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Heather Barcomb <hbarcomb> | ||||||||||
| Component: | nfs-utils | Assignee: | Steve Dickson <steved> | ||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Yongcheng Yang <yoyang> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | high | ||||||||||||
| Version: | 6.7 | CC: | eguan, fs-qe, hbarcomb, ikent, jiyin, sai.sakhamuri, steved | ||||||||||
| Target Milestone: | rc | ||||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | All | ||||||||||||
| OS: | Linux | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | nfs-utils-1.2.3-67.el6 | Doc Type: | Bug Fix | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2016-05-11 00:04:43 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
Those debug traces look like automount is falling back to NFS for failed bind mounts. My first thought about it is the difference between rhel-5 and rhel-6 is due to the kernel changes that take much of the option parsing and mounting into the kernel in rhel-6. The kernel RPC is much more strict and will wait for timeouts for more cases than the glibc RPC implementation. I pointed this out around the time this was done and some changes were made to improve it but it wasn't possible to make it behave the same as the glibc RPC. Although I haven't actually looked yet, I suspect that means I'll need to re-think the wisdom of falling back to NFS when bind mounts fail. Ian Right, I do fallback to trying an NFS mount if the bind mount fails. Another aspect of this is that if a hostname is given in the map entry and it is determined to belong to a local interface (including localhost) it might be that users expect it to fail over and try a local NFS mount. We just don't know. That means if we don't fail over to trying a local NFS mount users would need to specify the "nobind" option explicitly to make that happen. OTOH the interactive impact of this wait is significant so I'm inclined to just not fall back to NFS as if a bind mount fails the likelihood an NFS mount will fail too is quite high. ..... (In reply to Heather Barcomb from comment #0) I find that the manual mount will hang in the same way as autofs when autofs isn't running. > > This hang also occurs when trying to manually mount the non-existent > directory if autofs is running, but it fails immediately if autofs is not > running: The test below doesn't quite reflect what autofs does. We expect the manual mount to hang if autofs is running because autofs will interfere with the mount point path so lets not persue that for now. > > > [root@rhel64 ~]# service autofs status > automount is stopped > > [root@rhel64 ~]# time mount localhost:/export/testhome/sub6 /testhome/sub6 > mount.nfs: mount point /testhome/sub6 does not exist > > real 0m0.005s <======== > user 0m0.002s > sys 0m0.001s The difference between this and what autofs does is autofs will create the mount point /testhome/sub6 and then call mount(8). But since there isn't any NFS service running the manual mount should fail immediately bit it doesn't, it hangs. On Fedora 22 it does fail immediately. Steve, Is this something we want to investigate in mount.nfs(8) and fix if it isn't working the way we expect it to work? (In reply to Ian Kent from comment #4) > Steve, > > Is this something we want to investigate in mount.nfs(8) and > fix if it isn't working the way we expect it to work? It looks like RHEL6 needs this commit commit 65ac59cdef494f4a439feace258e35f226ae210b Author: NeilBrown <neilb> Date: Tue Mar 11 11:11:10 2014 -0400 [PATCH - v3] mount.nfs: Fix fallback from tcp to udp Created attachment 1103368 [details]
The backported patch
(In reply to Heather Barcomb from comment #0) > Description of problem: > > * When trying to automount a non-existent directory using localhost results > in a ~4 minute hang > * The hang can be shorted by setting a low MOUNT_WAIT value in > etc/sysconfig/autofs > * This same setup in RHEL5 fails immediately Is the NFS server running on localhost when this is seen? If the NFS server is running I can't reproduce this behaviour. Ian Created attachment 1112011 [details]
logs when reproduced with RHEL-6.4
Created attachment 1112012 [details]
logs when verified with RHEL-6.4 nfs-utils-1.2.3-67.el6
Created attachment 1112013 [details]
logs when test with RHEL-6.7 nfs-utils-1.2.3-67.el6
(In reply to Yongcheng Yang from comment #22) > Created attachment 1112013 [details] > logs when test with RHEL-6.7 nfs-utils-1.2.3-67.el6 Are you sure this was done with nfs-utils revision 67. Certainly the patch of this bug in included in that revision. And mount.nfs is being called by autofs but it did not return an error as it should. So I don't think this is an autofs problem. I was able to verify the behaviour of mount.nfs on rhel-6.7 manually without difficulty. Ian Move to VERIFIED as Comment 37 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0868.html |
Description of problem: * When trying to automount a non-existent directory using localhost results in a ~4 minute hang * The hang can be shorted by setting a low MOUNT_WAIT value in etc/sysconfig/autofs * This same setup in RHEL5 fails immediately Version-Release number of selected component (if applicable): * kernel-2.6.32-358.el6.x86_64 * autofs-5.0.5-73.el6.x86_64 * nfs-utils-1.2.3-64.el6.x86_64 How reproducible: * consistently Steps to Reproduce: 1. Configure autofs to mount using localhost and try to automount a non-existant director (see below) Actual results: * The command used to automount (cd, ls, etc) will hang for a few minutes Expected results: * An immediate "No such file or directory" error Additional info: ===================== My test environments: ===================== [root@rhel59 ~]# uname -r 2.6.18-348.el5 [root@rhel64 ~]# uname -r 2.6.32-358.el6.x86_64 both machines only have /export/testhome/sub1, sub2, and sub3 (so anything sub4 or higher is a non-existent directory) ===================== RHEL5: [root@rhel59 ~]# grep -v ^# /etc/auto.master /misc /etc/auto.misc /net -hosts /testhome /etc/auto.testhome [root@rhel59 ~]# cat /etc/auto.testhome * localhost:/export/testhome/& [root@rhel59 ~]# ls /export/testhome sub1 sub2 sub3 [root@rhel59 ~]# date;ls -l /testhome/sub1;date <== valid subdirectory Wed Dec 2 14:27:33 EST 2015 total 0 Wed Dec 2 14:27:33 EST 2015 [root@rhel59 ~]# date;ls -l /testhome/sub5;date <== non-valid subdirectory Wed Dec 2 14:27:37 EST 2015 ls: /testhome/sub5: No such file or directory Wed Dec 2 14:27:37 EST 2015 <==== fails within 1 second ========================== RHEL6: [root@rhel64 ~]# grep -v ^# /etc/auto.master /misc /etc/auto.misc /net -hosts /testhome /etc/auto.testhome [root@rhel64 ~]# cat /etc/auto.testhome * localhost:/export/testhome/& [root@rhel64 ~]# ls /export/testhome sub1 sub2 sub3 [root@rhel64 ~]# date;ls -l /testhome/sub1;date <== valid subdirectory Wed Dec 2 14:27:32 EST 2015 total 0 Wed Dec 2 14:27:32 EST 2015 [root@rhel64 ~]# date;ls -l /testhome/sub5;date <== non-valid subdirectory Wed Dec 2 14:27:36 EST 2015 ls: cannot access /testhome/sub5: No such file or directory Wed Dec 2 14:31:46 EST 2015 <==== takes 4:10 to fail ============================== running automount in debug/foreground mode shows: ============================== RHEL 5: handle_packet: type = 3 handle_packet_missing_indirect: token 16, name sub8, request pid 4317 attempting to mount entry /testhome/sub8 lookup_mount: lookup(file): looking up sub8 lookup_mount: lookup(file): sub8 -> localhost:/export/testhome/& parse_mount: parse(sun): expanded entry: localhost:/export/testhome/sub8 parse_mount: parse(sun): gathered options: parse_mount: parse(sun): dequote("localhost:/export/testhome/sub8") -> localhost:/export/testhome/sub8 parse_mount: parse(sun): core of entry: options=, loc=localhost:/export/testhome/sub8 sun_mount: parse(sun): mounting root /testhome, mountpoint sub8, what localhost:/export/testhome/sub8, fstype nfs, options (null) mount_mount: mount(nfs): root=/testhome name=sub8 what=localhost:/export/testhome/sub8, fstype=nfs, options=(null) mount_mount: mount(nfs): calling mkdir_path /testhome/sub8 mount_mount: mount(nfs): sub8 is local, attempt bind mount mount_mount: mount(bind): calling mkdir_path /testhome/sub8 mount_mount: mount(bind): calling mount --bind -s -o defaults /export/testhome/sub8 /testhome/sub8 >> mount: special device /export/testhome/sub8 does not exist mount_mount: mount(nfs): calling mount -t nfs localhost:/export/testhome/sub8 /testhome/sub8 >> mount: mount to NFS server 'localhost' failed: RPC Error: Program not registered. mount(nfs): nfs: mount failure localhost:/export/testhome/sub8 on /testhome/sub8 dev_ioctl_send_fail: token = 16 handle_packet: type = 3 handle_packet_missing_indirect: token 17, name sub8, request pid 4317 dev_ioctl_send_fail: token = 17 handle_packet: type = 3 handle_packet_missing_indirect: token 18, name sub8, request pid 4317 dev_ioctl_send_fail: token = 18 failed to mount /testhome/sub8 ======================== RHEL 6: handle_packet: type = 3 handle_packet_missing_indirect: token 19, name sub6, request pid 3409 attempting to mount entry /testhome/sub6 lookup_mount: lookup(file): looking up sub6 lookup_mount: lookup(file): sub6 -> localhost:/export/testhome/& parse_mount: parse(sun): expanded entry: localhost:/export/testhome/sub6 parse_mount: parse(sun): gathered options: parse_mount: parse(sun): dequote("localhost:/export/testhome/sub6") -> localhost:/export/testhome/sub6 parse_mount: parse(sun): core of entry: options=, loc=localhost:/export/testhome/sub6 sun_mount: parse(sun): mounting root /testhome, mountpoint sub6, what localhost:/export/testhome/sub6, fstype nfs, options (null) mount_mount: mount(nfs): root=/testhome name=sub6 what=localhost:/export/testhome/sub6, fstype=nfs, options=(null) mount_mount: mount(nfs): calling mkdir_path /testhome/sub6 mount_mount: mount(nfs): sub6 is local, attempt bind mount mount_mount: mount(bind): calling mkdir_path /testhome/sub6 mount_mount: mount(bind): calling mount --bind -s -o defaults /export/testhome/sub6 /testhome/sub6 >> mount: special device /export/testhome/sub6 does not exist mount_mount: mount(nfs): calling mount -t nfs localhost:/export/testhome/sub6 /testhome/sub6 ... >> mount.nfs: Connection timed out mount_mount: mount(nfs): sub6 is local, attempt bind mount mount_mount: mount(bind): calling mkdir_path /testhome/sub6 mount_mount: mount(bind): calling mount --bind -s -o defaults /export/testhome/sub6 /testhome/sub6 >> mount: special device /export/testhome/sub6 does not exist mount_mount: mount(nfs): calling mount -t nfs localhost:/export/testhome/sub6 /testhome/sub6 ... >> mount.nfs: Connection timed out mount(nfs): nfs: mount failure localhost:/export/testhome/sub6 on /testhome/sub6 dev_ioctl_send_fail: token = 19 failed to mount /testhome/sub6 ======================== This hang also occurs when trying to manually mount the non-existent directory if autofs is running, but it fails immediately if autofs is not running: [root@rhel64 ~]# service autofs status automount is stopped [root@rhel64 ~]# time mount localhost:/export/testhome/sub6 /testhome/sub6 mount.nfs: mount point /testhome/sub6 does not exist real 0m0.005s <======== user 0m0.002s sys 0m0.001s [root@rhel64 ~]# service autofs start Starting automount: [ OK ] [root@rhel64 ~]# time mount localhost:/export/testhome/sub6 /testhome/sub6 mount.nfs: mount point /testhome/sub6 does not exist real 4m10.036s <======== user 0m0.000s sys 0m0.001s