Bug 1287894 - Trying to automount a non-existent directory using localhost results in a ~4 minute hang
Trying to automount a non-existent directory using localhost results in a ~4 ...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: nfs-utils (Show other bugs)
6.7
All Linux
high Severity high
: rc
: ---
Assigned To: Steve Dickson
Yongcheng Yang
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-02 18:18 EST by Heather Barcomb
Modified: 2016-11-18 04:20 EST (History)
7 users (show)

See Also:
Fixed In Version: nfs-utils-1.2.3-67.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-10 20:04:43 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
The backported patch (8.26 KB, text/plain)
2015-12-07 15:28 EST, Steve Dickson
no flags Details
logs when reproduced with RHEL-6.4 (13.87 KB, text/plain)
2016-01-05 23:29 EST, Yongcheng Yang
no flags Details
logs when verified with RHEL-6.4 nfs-utils-1.2.3-67.el6 (17.82 KB, text/plain)
2016-01-05 23:31 EST, Yongcheng Yang
no flags Details
logs when test with RHEL-6.7 nfs-utils-1.2.3-67.el6 (19.09 KB, text/plain)
2016-01-05 23:33 EST, Yongcheng Yang
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2074883 None None None Never

  None (edit)
Description Heather Barcomb 2015-12-02 18:18:06 EST
Description of problem:
 
 * When trying to automount a non-existent directory using localhost results in a ~4 minute hang
 * The hang can be shorted by setting a low MOUNT_WAIT value in etc/sysconfig/autofs 
 * This same setup in RHEL5 fails immediately 

Version-Release number of selected component (if applicable):

 * kernel-2.6.32-358.el6.x86_64
 * autofs-5.0.5-73.el6.x86_64
 * nfs-utils-1.2.3-64.el6.x86_64

How reproducible:

 * consistently 

Steps to Reproduce:
1. Configure autofs to mount using localhost and try to automount a non-existant director (see below)

Actual results:

 * The command used to automount (cd, ls, etc) will hang for a few minutes

Expected results:

 * An immediate "No such file or directory" error

Additional info:

=====================
My test environments:
=====================
[root@rhel59 ~]# uname -r
2.6.18-348.el5

[root@rhel64 ~]# uname -r
2.6.32-358.el6.x86_64

both machines only have /export/testhome/sub1, sub2, and sub3 (so anything sub4 or higher is a non-existent directory)

=====================
RHEL5:


[root@rhel59 ~]# grep -v ^# /etc/auto.master
/misc	/etc/auto.misc
/net	-hosts
/testhome	/etc/auto.testhome

[root@rhel59 ~]# cat /etc/auto.testhome
* localhost:/export/testhome/&

[root@rhel59 ~]# ls /export/testhome
sub1  sub2  sub3

[root@rhel59 ~]# date;ls -l /testhome/sub1;date  <== valid subdirectory
Wed Dec  2 14:27:33 EST 2015
total 0
Wed Dec  2 14:27:33 EST 2015

[root@rhel59 ~]# date;ls -l /testhome/sub5;date  <== non-valid subdirectory
Wed Dec  2 14:27:37 EST 2015
ls: /testhome/sub5: No such file or directory       
Wed Dec  2 14:27:37 EST 2015                      <==== fails within 1 second


==========================
RHEL6: 


[root@rhel64 ~]# grep -v ^# /etc/auto.master
/misc	/etc/auto.misc
/net	-hosts
/testhome	/etc/auto.testhome

[root@rhel64 ~]# cat /etc/auto.testhome
* localhost:/export/testhome/&

[root@rhel64 ~]# ls /export/testhome
sub1  sub2  sub3

[root@rhel64 ~]# date;ls -l /testhome/sub1;date  <== valid subdirectory
Wed Dec  2 14:27:32 EST 2015
total 0
Wed Dec  2 14:27:32 EST 2015

[root@rhel64 ~]# date;ls -l /testhome/sub5;date  <== non-valid subdirectory
Wed Dec  2 14:27:36 EST 2015
ls: cannot access /testhome/sub5: No such file or directory
Wed Dec  2 14:31:46 EST 2015                       <==== takes 4:10 to fail


==============================
running automount in debug/foreground mode shows:
==============================
RHEL 5:


handle_packet: type = 3
handle_packet_missing_indirect: token 16, name sub8, request pid 4317
attempting to mount entry /testhome/sub8
lookup_mount: lookup(file): looking up sub8
lookup_mount: lookup(file): sub8 -> localhost:/export/testhome/&
parse_mount: parse(sun): expanded entry: localhost:/export/testhome/sub8
parse_mount: parse(sun): gathered options: 
parse_mount: parse(sun): dequote("localhost:/export/testhome/sub8") -> localhost:/export/testhome/sub8
parse_mount: parse(sun): core of entry: options=, loc=localhost:/export/testhome/sub8
sun_mount: parse(sun): mounting root /testhome, mountpoint sub8, what localhost:/export/testhome/sub8, fstype nfs, options (null)
mount_mount: mount(nfs): root=/testhome name=sub8 what=localhost:/export/testhome/sub8, fstype=nfs, options=(null)
mount_mount: mount(nfs): calling mkdir_path /testhome/sub8
mount_mount: mount(nfs): sub8 is local, attempt bind mount
mount_mount: mount(bind): calling mkdir_path /testhome/sub8
mount_mount: mount(bind): calling mount --bind -s  -o defaults /export/testhome/sub8 /testhome/sub8
>> mount: special device /export/testhome/sub8 does not exist
mount_mount: mount(nfs): calling mount -t nfs localhost:/export/testhome/sub8 /testhome/sub8
>> mount: mount to NFS server 'localhost' failed: RPC Error: Program not registered.
mount(nfs): nfs: mount failure localhost:/export/testhome/sub8 on /testhome/sub8
dev_ioctl_send_fail: token = 16
handle_packet: type = 3
handle_packet_missing_indirect: token 17, name sub8, request pid 4317
dev_ioctl_send_fail: token = 17
handle_packet: type = 3
handle_packet_missing_indirect: token 18, name sub8, request pid 4317
dev_ioctl_send_fail: token = 18
failed to mount /testhome/sub8

========================
RHEL 6:


handle_packet: type = 3
handle_packet_missing_indirect: token 19, name sub6, request pid 3409
attempting to mount entry /testhome/sub6
lookup_mount: lookup(file): looking up sub6
lookup_mount: lookup(file): sub6 -> localhost:/export/testhome/&
parse_mount: parse(sun): expanded entry: localhost:/export/testhome/sub6
parse_mount: parse(sun): gathered options: 
parse_mount: parse(sun): dequote("localhost:/export/testhome/sub6") -> localhost:/export/testhome/sub6
parse_mount: parse(sun): core of entry: options=, loc=localhost:/export/testhome/sub6
sun_mount: parse(sun): mounting root /testhome, mountpoint sub6, what localhost:/export/testhome/sub6, fstype nfs, options (null)
mount_mount: mount(nfs): root=/testhome name=sub6 what=localhost:/export/testhome/sub6, fstype=nfs, options=(null)
mount_mount: mount(nfs): calling mkdir_path /testhome/sub6
mount_mount: mount(nfs): sub6 is local, attempt bind mount
mount_mount: mount(bind): calling mkdir_path /testhome/sub6
mount_mount: mount(bind): calling mount --bind -s  -o defaults /export/testhome/sub6 /testhome/sub6
>> mount: special device /export/testhome/sub6 does not exist
mount_mount: mount(nfs): calling mount -t nfs localhost:/export/testhome/sub6 /testhome/sub6
...
>> mount.nfs: Connection timed out
mount_mount: mount(nfs): sub6 is local, attempt bind mount
mount_mount: mount(bind): calling mkdir_path /testhome/sub6
mount_mount: mount(bind): calling mount --bind -s  -o defaults /export/testhome/sub6 /testhome/sub6
>> mount: special device /export/testhome/sub6 does not exist
mount_mount: mount(nfs): calling mount -t nfs localhost:/export/testhome/sub6 /testhome/sub6
...
>> mount.nfs: Connection timed out
mount(nfs): nfs: mount failure localhost:/export/testhome/sub6 on /testhome/sub6
dev_ioctl_send_fail: token = 19
failed to mount /testhome/sub6

========================

This hang also occurs when trying to manually mount the non-existent directory if autofs is running, but it fails immediately if autofs is not running:


 [root@rhel64 ~]# service autofs status
 automount is stopped

 [root@rhel64 ~]# time mount localhost:/export/testhome/sub6 /testhome/sub6
 mount.nfs: mount point /testhome/sub6 does not exist

 real	0m0.005s  <========
 user	0m0.002s
 sys	0m0.001s

 [root@rhel64 ~]# service autofs start
 Starting automount:                                        [  OK  ]

 [root@rhel64 ~]# time mount localhost:/export/testhome/sub6 /testhome/sub6
 mount.nfs: mount point /testhome/sub6 does not exist

 real	4m10.036s  <========
 user	0m0.000s
 sys	0m0.001s
Comment 1 Ian Kent 2015-12-02 19:24:45 EST
Those debug traces look like automount is falling back to NFS
for failed bind mounts.

My first thought about it is the difference between rhel-5
and rhel-6 is due to the kernel changes that take much of
the option parsing and mounting into the kernel in rhel-6.

The kernel RPC is much more strict and will wait for timeouts
for more cases than the glibc RPC implementation.

I pointed this out around the time this was done and some
changes were made to improve it but it wasn't possible to
make it behave the same as the glibc RPC.

Although I haven't actually looked yet, I suspect that means
I'll need to re-think the wisdom of falling back to NFS when
bind mounts fail.

Ian
Comment 2 Ian Kent 2015-12-02 19:59:02 EST
Right, I do fallback to trying an NFS mount if the bind
mount fails.

Another aspect of this is that if a hostname is given in
the map entry and it is determined to belong to a local
interface (including localhost) it might be that users
expect it to fail over and try a local NFS mount.

We just don't know.

That means if we don't fail over to trying a local NFS
mount users would need to specify the "nobind" option
explicitly to make that happen.

OTOH the interactive impact of this wait is significant
so I'm inclined to just not fall back to NFS as if a
bind mount fails the likelihood an NFS mount will fail
too is quite high.

.....
Comment 3 Ian Kent 2015-12-03 05:47:54 EST
(In reply to Heather Barcomb from comment #0)

I find that the manual mount will hang in the same way as autofs
when autofs isn't running.

> 
> This hang also occurs when trying to manually mount the non-existent
> directory if autofs is running, but it fails immediately if autofs is not
> running:

The test below doesn't quite reflect what autofs does.

We expect the manual mount to hang if autofs is running because
autofs will interfere with the mount point path so lets not persue
that for now.

> 
> 
>  [root@rhel64 ~]# service autofs status
>  automount is stopped
> 
>  [root@rhel64 ~]# time mount localhost:/export/testhome/sub6 /testhome/sub6
>  mount.nfs: mount point /testhome/sub6 does not exist
> 
>  real	0m0.005s  <========
>  user	0m0.002s
>  sys	0m0.001s

The difference between this and what autofs does is autofs will
create the mount point /testhome/sub6 and then call mount(8).

But since there isn't any NFS service running the manual mount
should fail immediately bit it doesn't, it hangs.

On Fedora 22 it does fail immediately.
Comment 4 Ian Kent 2015-12-03 05:50:45 EST
Steve,

Is this something we want to investigate in mount.nfs(8) and
fix if it isn't working the way we expect it to work?
Comment 5 Steve Dickson 2015-12-07 15:00:56 EST
(In reply to Ian Kent from comment #4)
> Steve,
> 
> Is this something we want to investigate in mount.nfs(8) and
> fix if it isn't working the way we expect it to work?

It looks like RHEL6 needs this commit
commit 65ac59cdef494f4a439feace258e35f226ae210b
Author: NeilBrown <neilb@suse.de>
Date:   Tue Mar 11 11:11:10 2014 -0400

    [PATCH - v3] mount.nfs: Fix fallback from tcp to udp
Comment 7 Steve Dickson 2015-12-07 15:28 EST
Created attachment 1103368 [details]
The backported patch
Comment 12 Ian Kent 2015-12-31 20:14:32 EST
(In reply to Heather Barcomb from comment #0)
> Description of problem:
>  
>  * When trying to automount a non-existent directory using localhost results
> in a ~4 minute hang
>  * The hang can be shorted by setting a low MOUNT_WAIT value in
> etc/sysconfig/autofs 
>  * This same setup in RHEL5 fails immediately 

Is the NFS server running on localhost when this is seen?
If the NFS server is running I can't reproduce this behaviour.

Ian
Comment 20 Yongcheng Yang 2016-01-05 23:29 EST
Created attachment 1112011 [details]
logs when reproduced with RHEL-6.4
Comment 21 Yongcheng Yang 2016-01-05 23:31 EST
Created attachment 1112012 [details]
logs when verified with RHEL-6.4 nfs-utils-1.2.3-67.el6
Comment 22 Yongcheng Yang 2016-01-05 23:33 EST
Created attachment 1112013 [details]
logs when test with RHEL-6.7 nfs-utils-1.2.3-67.el6
Comment 24 Ian Kent 2016-01-06 01:51:12 EST
(In reply to Yongcheng Yang from comment #22)
> Created attachment 1112013 [details]
> logs when test with RHEL-6.7 nfs-utils-1.2.3-67.el6

Are you sure this was done with nfs-utils revision 67.
Certainly the patch of this bug in included in that revision. 

And mount.nfs is being called by autofs but it did not return
an error as it should. So I don't think this is an autofs
problem.

I was able to verify the behaviour of mount.nfs on rhel-6.7
manually without difficulty.

Ian
Comment 38 Yongcheng Yang 2016-02-17 22:07:00 EST
Move to VERIFIED as Comment 37
Comment 40 errata-xmlrpc 2016-05-10 20:04:43 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0868.html

Note You need to log in before you can comment on or make changes to this bug.