Bug 1049017

Summary: Regression: autofs mounts hang if maps are reloaded while the mount is expiring
Product: Red Hat Enterprise Linux 5 Reporter: David Halliwell <david.halliwell>
Component: autofsAssignee: Ian Kent <ikent>
Status: CLOSED ERRATA QA Contact: JianHong Yin <jiyin>
Severity: high Docs Contact:
Priority: high    
Version: 5.10CC: bgollahe, cww, david.halliwell, eguan, ikent, jherrman, ksquizza, pyaduvan, swhiteho
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: autofs-5.0.1-0.rc2.184.el5 Doc Type: Bug Fix
Doc Text:
In a previous version, a check for mounted file systems was removed from autofs mount control if a miscellaneous device was not used. However, a subsequent update introduced a mount export function that requires this check. As a consequence, autofs mounts sometimes became unresponsive when re-reading the mount map. This update fixes the bug and autofs mounts no longer hang in the scenario described.
Story Points: ---
Clone Of:
: 1144746 (view as bug list) Environment:
Last Closed: 2014-09-16 00:17:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1144746    
Attachments:
Description Flags
autofs debug messages of issue reproduced
none
Patch - check for existing offset mount before mounting none

Description David Halliwell 2014-01-06 19:33:26 UTC
Description of problem:
-----------------------
If autofs maps are reloaded while a /net (indirect) mount is in the process of expiring, the mount will hang blocking all further IO.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
Affected version:
autofs-5.0.1-0.rc2.183.el5.x86_64

Based on my testing, this does not occur with the previous version:
autofs-5.0.1-0.rc2.177.el5.x86_64

How reproducible:
-----------------
Reproducible with the steps below.

Steps to Reproduce:
-------------------
#autofs version:  autofs-5.0.1-0.rc2.183.el5.x86_64

# Edit /etc/sysconfig/autofs
Set debugging:  OPTIONS="-d"
Set timeout:    DEFAULT_TIMEOUT=30

# Start autofs
/etc/init.d/autofs start

# Mount the remote filesystem using /net
ls -l /net/nfsserver/exportedfs/dir

# Wait for ~30 secs until the NFS filesystem is expired/unmounted
watch -n1 mount

# The indirect mount is still present at this point
cat /proc/mounts

# Reload the maps (do this quickly before the indirect mount expires)
/etc/init.d/autofs reload

# Now wait a while for the indirect (autofs) mount to expire, then try to access the remote filesystem again
ls -l /net/nfsserver/exportedfs/dir

Actual results:
---------------
All further IO with the mountpoint /net/nfsserver/exportedfs/dir hangs.

If you dig through the debug messages, you can see the following steps occur:

1/  The offset (indirect) mounts for each exported filesystem are created in /net/nfsserver
2/  The NFS (direct) mount is created at the appropriate offset
3/  This is timed out by autofs after ~30secs
4/ The NFS (direct) filesystem is unmounted

5/ At this stage we tell autofs to reload the maps.
* For some reason, this causes the offset (indirect) directories to be mounted again *

6/ The expiry continues, but the offset directories are now in use again so the expiry fails

Expected results:
-----------------
The mountpoint should not hang.
On older versions of autofs, step 5/ does not seem to cause the offset directories to be mounted again.

Comment 3 David Halliwell 2014-01-27 15:45:33 UTC
This bug does not occur if the misc device is enabled.  So in the config file, set:
USE_MISC_DEVICE=no

We have a support case open for this bug.  Support services have managed to recreate it, we're now trying to push for a fix.

Comment 6 Ian Kent 2014-01-27 22:50:44 UTC
(In reply to David Halliwell from comment #3)
> This bug does not occur if the misc device is enabled.  So in the config
> file, set:
> USE_MISC_DEVICE=no

Not terribly relevant since the misc device should always
be used.

Comment 9 Kyle Squizzato 2014-01-29 22:40:07 UTC
Created attachment 857283 [details]
autofs debug messages of issue reproduced

Comment 10 Ian Kent 2014-01-30 00:43:32 UTC
(In reply to Kyle Squizzato from comment #9)
> Created attachment 857283 [details]
> autofs debug messages of issue reproduced

Thanks for that Kyle.
The log is what I needed but I'm still not sure what is going
on, I'll have a look.

Comment 13 Ian Kent 2014-01-30 08:03:36 UTC
(In reply to Ian Kent from comment #10)
> (In reply to Kyle Squizzato from comment #9)
> > Created attachment 857283 [details]
> > autofs debug messages of issue reproduced
> 
> Thanks for that Kyle.
> The log is what I needed but I'm still not sure what is going
> on, I'll have a look.

This log is rather interesting.

I can see where the second mount is done and I'm slowly
remembering what I did with the change that lead to this.

But the change that allows the second mount to happen was
done a much longer time ago.

Looking at it I can't see why I commented out the mounted
check ...... it's certainly needed by the later change.

But more puzzling is the hang.
In theory it shouls mount on top of the second mount leaving
us none the wiser that we have a bug ...... puzzling.

Ian

Comment 14 Ian Kent 2014-01-30 08:59:17 UTC
Created attachment 857392 [details]
Patch - check for existing offset mount before mounting

We can check if this resolves the problem.
Looks like it might.

Comment 15 Ian Kent 2014-01-30 11:16:03 UTC
A package with the above patch is available for testing at:
http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.183.bz1049017.1

Please check if this resolves the problem and report back.

Comment 18 David Halliwell 2014-01-31 14:36:42 UTC
(In reply to Ian Kent from comment #15)
> A package with the above patch is available for testing at:
> http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.183.bz1049017.1
> 
> Please check if this resolves the problem and report back.

Thanks, I can confirm that this patch seems to resolve the problem for us.

Comment 31 errata-xmlrpc 2014-09-16 00:17:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1240.html