175687 – autofs doesn't attempt to remount failed mount points

Bug 175687 - autofs doesn't attempt to remount failed mount points

Summary: autofs doesn't attempt to remount failed mount points

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.3
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Jeff Moyer
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	168430
TreeView+	depends on / blocked

Reported:	2005-12-14 00:58 UTC by Curtis Zinzilieta
Modified:	2007-11-30 22:07 UTC (History)
CC List:	2 users (show)
Fixed In Version:	RHSA-2006-0132
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-03-07 21:04:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
*results of adding "daemon. /var/log/debugautofs" to /etc/syslog.conf** (33.23 KB, text/plain) 2005-12-14 01:07 UTC, Curtis Zinzilieta	no flags	Details
/etc/auto.master file used while generating the debug output (465 bytes, text/plain) 2005-12-14 01:09 UTC, Curtis Zinzilieta	no flags	Details
/etc/auto.garage file (240 bytes, text/plain) 2005-12-14 19:16 UTC, Curtis Zinzilieta	no flags	Details
strace as described above (3.24 KB, text/plain) 2005-12-14 19:17 UTC, Curtis Zinzilieta	no flags	Details
results of sysrq t output (131.78 KB, text/plain) 2005-12-14 19:53 UTC, Curtis Zinzilieta	no flags	Details
Correctly expire negative dentries. (1.36 KB, patch) 2005-12-14 19:56 UTC, Jeff Moyer	no flags	Details \| Diff
Remove negative dentry caching logic from autofs4. (1.48 KB, patch) 2006-01-03 23:10 UTC, Jeff Moyer	no flags	Details \| Diff
expire negative patch for 2.6.9-27 for review (1.48 KB, patch) 2006-01-06 21:30 UTC, Curtis Zinzilieta	no flags	Details \| Diff
fixed expire patch pointing to /fs/autofs4/ directory (1.48 KB, patch) 2006-01-07 00:12 UTC, Curtis Zinzilieta	no flags	Details \| Diff
really correct expire patch (1.65 KB, patch) 2006-01-07 00:14 UTC, Curtis Zinzilieta	no flags	Details \| Diff
Show Obsolete (4) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2006:0132	0	qe-ready	SHIPPED_LIVE	Moderate: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 3	2006-03-09 16:31:00 UTC

Description Curtis Zinzilieta 2005-12-14 00:58:08 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.12) Gecko/20050921 Red Hat/1.0.7-1.4.1 Firefox/1.0.7

Description of problem:
When attempting to cd into a submount point that had once failed, but now exists, the automounter will not find the directory.  Expected that the automounter should immediately change to the newly created directory, regardless of previous attempts.  2.6.9-22.15.1 does not exhibit this problem, and finds the directory immediately upon creation.  Stopping and restarting the automounter on -24 will allow the directory to be seen correctly.

As a possibly related point, we sometimes see automount points that were previously working begin exhibiting the symptoms below, requiring a restart of the automounter.  This is not yet repeatable or well defined, other than that it only happens on systems with the -24 kernel, and never on the -22.15.1.


Version-Release number of selected component (if applicable):
kernel-2.6.9-24

How reproducible:
Always

Steps to Reproduce:
on 2.6.9-24 and 2.6.9-22.15.1, x86_64, RHEL4, current up2date:

-(-24) 'cd /apps/test'  <fails>
-(-22.15.1) 'cd /apps/test'  <fails>
-create the automount point via a different computer system
-(-22.15.1) 'cd /apps/test'  <works fine now>
-(-24) 'cd /apps/test'  <fails>
-(-24) 'kill -HUP 4902'  (4902 is the PID for the /apps automounter)
-(-24) 'cd /apps/test'  <fails>
-(-24) 'strace -p 4902'  (start an strace of that automount process)
-(-24) 'cd /apps/test'  <fails> (nothing appears in strace output)
-(-24) 'cd /apps/test'  <fails> (still nothing in strace)
-(-24) 'cd /apps/dist'  <ok> (directory already existed/gives strace output)
-(-24) 'cd /apps/test'  <fails> (nothing appears in strace output)
-(-24) 'umount /apps/dist /apps/foo /apps/bar ...'  <umount all /apps>
-(-24) 'kill -9 4902'  <kill the automounter process for /apps>
-(-24) 'service autofs reload'  <restart any missing automounters>
-(-24) 'strace -p 6864' <strace the new /apps automounter>
-(-24) 'cd /apps/test'  <now works, generates strace output>


Actual Results:  prior to restart of autofs, cannot change to directory. Following autofs restart, change works correctly.

Expected Results:  Should immediately see new mountpoint.  Should not require autofs restart.

Additional info:

Attaching auto.master, and debug logfiles.

'rpm -q autofs' = autofs-4.1.3-155
'uname -r' = 2.6.9-24.ELsmp

Of particular note is that after the first attempt, an strace on the controlling automount daemon gives no output.

Comment 1 Curtis Zinzilieta 2005-12-14 01:07:41 UTC

Created attachment 122209 [details]
results of adding "daemon.*  /var/log/debugautofs" to /etc/syslog.conf

Comment 2 Curtis Zinzilieta 2005-12-14 01:09:01 UTC

Created attachment 122210 [details]
/etc/auto.master file used while generating the debug output

Comment 4 Jason Baron 2005-12-14 02:20:43 UTC

hmm what's in 22.15.1 ?

Comment 5 Jeff Moyer 2005-12-14 03:18:35 UTC

The strace give _no_ output?  Can you get the output from sysrq-t when the
system is in this state? ( you can do this by the following command sequence: 
sysctl -w kernel/sysrq=1; echo t > /proc/sysrq-trigger )

What are the exact steps (including the required environment) to reproduce this
problem?  I'd like to give it a try here.

Could you also provide the line in auto.apps for the directory key "test," please?

Thanks.

Comment 6 Curtis Zinzilieta 2005-12-14 19:14:54 UTC

Have made a simpler test case:

Environment:
Client: RHEL4-u2, fully up2date, 2.6.9-24 x86_64 kernel.  Use same auto.master
as already attached here.  See newly attached auto.garage file.
Server: Panasas NAS, or Sun Solaris.  Testing below is from the Solaris box
for convenience.  Created a directory /export/test/foo, export as /export/test
in /etc/dfs/dfstab and /etc/dfs/sharetab.  Similar steps on the Panasas result
in the same problem.  This directory is mounted on the client to /garage, and
controlled via auto.garage.

cd /garage/test  <works without problem>
<reboot client>
on solaris server:
cd /export/test
mv foo foo.hold

on newly rebooted client:
open a terminal session, S1
within S1:
ps ax | grep automount | grep garage
note PID of automounter process
strace -p PID -o /tmp/strace_test.txt
open another terminal session, S2
within S2:
cd /garage/test  <fails>
<Note that there is output in the strace running in S1>

cd /garage/test  <fails>
<No additional output added to strace>
cd /garage/prod  <works, this was always a valid mount point>
<Note new output in strace>

on solaris server:
cd /export/test
mv foo.hold foo

on client, in S2:
cd /garage/test  <fails, but should work now>
<No additional output added to strace>
cd /garage/soft  <works, this was always a valid mount point>
<Added data to strace in S1>
cd /garage/test  <fails, should still work>
<No additional output added to strace>

Stop strace in S1.
See attached strace as strace_test.txt

Note that in -22.15.1, the attempt to 'cd /garage/test' will work every time,
as long as the directory actually exists.  Removing the directory causes the
'cd /garage/test' to fail, but it works again immediately as soon as the
directory is recreated.

Comment 7 Curtis Zinzilieta 2005-12-14 19:16:09 UTC

Created attachment 122239 [details]
/etc/auto.garage file

Comment 8 Curtis Zinzilieta 2005-12-14 19:17:03 UTC

Created attachment 122240 [details]
strace as described above

Comment 9 Jeff Moyer 2005-12-14 19:46:17 UTC

sysrq-t output?

Comment 10 Jeff Moyer 2005-12-14 19:53:33 UTC

Sounds like this bug, actually:
  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=172986

I'll post a patch, here.

Comment 11 Curtis Zinzilieta 2005-12-14 19:53:54 UTC

Created attachment 122244 [details]
results of sysrq t output

Gathered sysrq-t output while system is in a state where 'cd /garage/test'
should work but doesn't

Comment 12 Jeff Moyer 2005-12-14 19:56:09 UTC

Created attachment 122245 [details]
Correctly expire negative dentries.

This patch will likely fix your problem.  Please give it a try.

Comment 13 Curtis Zinzilieta 2005-12-14 20:04:57 UTC

That patch is already in 2.6.9-24.  Looking at SPEC file, appears it was added
at -22.26.

Comment 14 Jay Hilliard 2005-12-15 01:06:09 UTC

We can duplicate the behavior mentioned above by first trying to cd to an
invalid mount point, then making it valid, and it then continues to fail with no
output from an strace on the automount process.

We are also seeing similar (maybe this should be a seperate bug) behavior on
-22.15.1 and -24 where, after some time, automount just stops working on
previously working mount points.

(see previously posted auto.garage for our automount map)

cd /garage/temp
/garage/temp: No such file or directory

When it was working an hour ago and no changes were made to the automount map or
nfs server.  We tried setting automount's timout to 0 so it would never try to
unmount a filesystem.  The results were the same.

cd /garage/soft (still mounted from earlier, never unmounted)
cd /garage/sys  (won't mount and strace is quiet on the garage automount process)

when this happens, the only response I get from the strace on garage's automount
process is when I cd to a mount point that doesn't match anything in our
automount map (/garage/willfail: No such file or directory).
If I try mount points that *should* work (/garage/sys) the strace is quiet as if
automount never knew it was expected to do anything.

Comment 15 Curtis Zinzilieta 2005-12-15 17:34:19 UTC

regarding specific example in #6 above, I have narrowed the issue down to this
specific change which causes this problem (from file linux-2.6.9-autofs.patch):

        /* Negative dentry.. invalidate if "old" */
        if (dentry->d_inode == NULL)
-               return (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT);
+               return time_after_eq(jiffies, dentry->d_time);

Removing this change from the patch, recompiling and rebooting, and the system
no longer exhibits the problem.

Comment 16 Jeff Moyer 2005-12-15 17:39:45 UTC

Instead of removing this change, can you try the following:

if (dentry->d_inode == NULL)
        return 1;

It's essentially the same thing.  What is causing you pain is that autofs is
caching negative dentries.  I'm not convinced that it should, and this is a
change in behaviour from previous releases.  So, I am in favor of the code
snippet above.

Thanks for taking the time to narrow the issue down.

Comment 17 Curtis Zinzilieta 2005-12-16 00:33:18 UTC

Using "return 1" does not work correctly in either -24 or -25.  Switched to -25
to see if there was any difference there.  Leaving this line as "return
(dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT)" continues to work
correctly in both versions.

Continuing to research issue mentioned in #14 above, to see if these are related
problems, or if we need to open a new bugzilla.

Comment 18 Curtis Zinzilieta 2005-12-16 20:38:00 UTC

Additional testing proves that this change (back to the original line, as
detailed in #17 above) also resolves the apparent timeout issue detailed in item
#14 above as well.

Comment 22 Jeff Moyer 2006-01-03 23:10:48 UTC

Created attachment 122728 [details]
Remove negative dentry caching logic from autofs4.

OK, it's pointless to try to fix the expiry logic, here, since it's clear that
it results in unexpected behaviour from the user's point of view.  This patch
removes the negative dentry timeout entirely.  Please test when you have a
chance.

Thanks.

Comment 23 Curtis Zinzilieta 2006-01-06 21:26:50 UTC

The last patch you uploaded appears to be for RHEL3.  We're testing with RHEL4,
2.6.9-27 now.  Uploading the patch we've applied to -27 for your approval.

Comment 24 Curtis Zinzilieta 2006-01-06 21:30:43 UTC

Created attachment 122893 [details]
expire negative patch for 2.6.9-27 for review

Comment 25 Jeff Moyer 2006-01-06 21:37:10 UTC

Wow, sorry about that!  Let me know how your testing goes, please.  The patch
you uploaded looks fine.

Thanks!

Comment 26 Curtis Zinzilieta 2006-01-06 22:40:39 UTC

This patch did not work.  We are back to the old action of being unable to see a
mount point if we've ever tried to mount it while it wasn't valid, and then made
it a valid mount point.

Would you like us to do any additional debugging, or regress back to our own
patch for this (as outlined in comment #17 above)?

Comment 27 Jeff Moyer 2006-01-06 22:43:34 UTC

I'll work on it from my end.  Thanks for the quick testing turn-around.

Comment 28 Curtis Zinzilieta 2006-01-07 00:09:40 UTC

Noted a problem with the patch I uploaded...applied against /fs/autofs/ rather
than /fs/autofs4/ as it should.  When applied correctly, this patch does indeed
appear to fix the problems.  Will continue testing, but initial indications are
that this is indeed fixed.

Comment 29 Curtis Zinzilieta 2006-01-07 00:12:24 UTC

Created attachment 122899 [details]
fixed expire patch pointing to /fs/autofs4/ directory

Comment 30 Curtis Zinzilieta 2006-01-07 00:14:33 UTC

Created attachment 122900 [details]
really correct expire patch

If I would just check before uploading...

This time for sure.

Comment 35 Red Hat Bugzilla 2006-03-07 21:04:01 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html

Note You need to log in before you can comment on or make changes to this bug.