From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.7.12) Gecko/20050921 Red Hat/1.0.7-1.4.1 Firefox/1.0.7 Description of problem: When attempting to cd into a submount point that had once failed, but now exists, the automounter will not find the directory. Expected that the automounter should immediately change to the newly created directory, regardless of previous attempts. 2.6.9-22.15.1 does not exhibit this problem, and finds the directory immediately upon creation. Stopping and restarting the automounter on -24 will allow the directory to be seen correctly. As a possibly related point, we sometimes see automount points that were previously working begin exhibiting the symptoms below, requiring a restart of the automounter. This is not yet repeatable or well defined, other than that it only happens on systems with the -24 kernel, and never on the -22.15.1. Version-Release number of selected component (if applicable): kernel-2.6.9-24 How reproducible: Always Steps to Reproduce: on 2.6.9-24 and 2.6.9-22.15.1, x86_64, RHEL4, current up2date: -(-24) 'cd /apps/test' <fails> -(-22.15.1) 'cd /apps/test' <fails> -create the automount point via a different computer system -(-22.15.1) 'cd /apps/test' <works fine now> -(-24) 'cd /apps/test' <fails> -(-24) 'kill -HUP 4902' (4902 is the PID for the /apps automounter) -(-24) 'cd /apps/test' <fails> -(-24) 'strace -p 4902' (start an strace of that automount process) -(-24) 'cd /apps/test' <fails> (nothing appears in strace output) -(-24) 'cd /apps/test' <fails> (still nothing in strace) -(-24) 'cd /apps/dist' <ok> (directory already existed/gives strace output) -(-24) 'cd /apps/test' <fails> (nothing appears in strace output) -(-24) 'umount /apps/dist /apps/foo /apps/bar ...' <umount all /apps> -(-24) 'kill -9 4902' <kill the automounter process for /apps> -(-24) 'service autofs reload' <restart any missing automounters> -(-24) 'strace -p 6864' <strace the new /apps automounter> -(-24) 'cd /apps/test' <now works, generates strace output> Actual Results: prior to restart of autofs, cannot change to directory. Following autofs restart, change works correctly. Expected Results: Should immediately see new mountpoint. Should not require autofs restart. Additional info: Attaching auto.master, and debug logfiles. 'rpm -q autofs' = autofs-4.1.3-155 'uname -r' = 2.6.9-24.ELsmp Of particular note is that after the first attempt, an strace on the controlling automount daemon gives no output.
Created attachment 122209 [details] results of adding "daemon.* /var/log/debugautofs" to /etc/syslog.conf
Created attachment 122210 [details] /etc/auto.master file used while generating the debug output
hmm what's in 22.15.1 ?
The strace give _no_ output? Can you get the output from sysrq-t when the system is in this state? ( you can do this by the following command sequence: sysctl -w kernel/sysrq=1; echo t > /proc/sysrq-trigger ) What are the exact steps (including the required environment) to reproduce this problem? I'd like to give it a try here. Could you also provide the line in auto.apps for the directory key "test," please? Thanks.
Have made a simpler test case: Environment: Client: RHEL4-u2, fully up2date, 2.6.9-24 x86_64 kernel. Use same auto.master as already attached here. See newly attached auto.garage file. Server: Panasas NAS, or Sun Solaris. Testing below is from the Solaris box for convenience. Created a directory /export/test/foo, export as /export/test in /etc/dfs/dfstab and /etc/dfs/sharetab. Similar steps on the Panasas result in the same problem. This directory is mounted on the client to /garage, and controlled via auto.garage. cd /garage/test <works without problem> <reboot client> on solaris server: cd /export/test mv foo foo.hold on newly rebooted client: open a terminal session, S1 within S1: ps ax | grep automount | grep garage note PID of automounter process strace -p PID -o /tmp/strace_test.txt open another terminal session, S2 within S2: cd /garage/test <fails> <Note that there is output in the strace running in S1> cd /garage/test <fails> <No additional output added to strace> cd /garage/prod <works, this was always a valid mount point> <Note new output in strace> on solaris server: cd /export/test mv foo.hold foo on client, in S2: cd /garage/test <fails, but should work now> <No additional output added to strace> cd /garage/soft <works, this was always a valid mount point> <Added data to strace in S1> cd /garage/test <fails, should still work> <No additional output added to strace> Stop strace in S1. See attached strace as strace_test.txt Note that in -22.15.1, the attempt to 'cd /garage/test' will work every time, as long as the directory actually exists. Removing the directory causes the 'cd /garage/test' to fail, but it works again immediately as soon as the directory is recreated.
Created attachment 122239 [details] /etc/auto.garage file
Created attachment 122240 [details] strace as described above
sysrq-t output?
Sounds like this bug, actually: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=172986 I'll post a patch, here.
Created attachment 122244 [details] results of sysrq t output Gathered sysrq-t output while system is in a state where 'cd /garage/test' should work but doesn't
Created attachment 122245 [details] Correctly expire negative dentries. This patch will likely fix your problem. Please give it a try.
That patch is already in 2.6.9-24. Looking at SPEC file, appears it was added at -22.26.
We can duplicate the behavior mentioned above by first trying to cd to an invalid mount point, then making it valid, and it then continues to fail with no output from an strace on the automount process. We are also seeing similar (maybe this should be a seperate bug) behavior on -22.15.1 and -24 where, after some time, automount just stops working on previously working mount points. (see previously posted auto.garage for our automount map) cd /garage/temp /garage/temp: No such file or directory When it was working an hour ago and no changes were made to the automount map or nfs server. We tried setting automount's timout to 0 so it would never try to unmount a filesystem. The results were the same. cd /garage/soft (still mounted from earlier, never unmounted) cd /garage/sys (won't mount and strace is quiet on the garage automount process) when this happens, the only response I get from the strace on garage's automount process is when I cd to a mount point that doesn't match anything in our automount map (/garage/willfail: No such file or directory). If I try mount points that *should* work (/garage/sys) the strace is quiet as if automount never knew it was expected to do anything.
regarding specific example in #6 above, I have narrowed the issue down to this specific change which causes this problem (from file linux-2.6.9-autofs.patch): /* Negative dentry.. invalidate if "old" */ if (dentry->d_inode == NULL) - return (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT); + return time_after_eq(jiffies, dentry->d_time); Removing this change from the patch, recompiling and rebooting, and the system no longer exhibits the problem.
Instead of removing this change, can you try the following: if (dentry->d_inode == NULL) return 1; It's essentially the same thing. What is causing you pain is that autofs is caching negative dentries. I'm not convinced that it should, and this is a change in behaviour from previous releases. So, I am in favor of the code snippet above. Thanks for taking the time to narrow the issue down.
Using "return 1" does not work correctly in either -24 or -25. Switched to -25 to see if there was any difference there. Leaving this line as "return (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT)" continues to work correctly in both versions. Continuing to research issue mentioned in #14 above, to see if these are related problems, or if we need to open a new bugzilla.
Additional testing proves that this change (back to the original line, as detailed in #17 above) also resolves the apparent timeout issue detailed in item #14 above as well.
Created attachment 122728 [details] Remove negative dentry caching logic from autofs4. OK, it's pointless to try to fix the expiry logic, here, since it's clear that it results in unexpected behaviour from the user's point of view. This patch removes the negative dentry timeout entirely. Please test when you have a chance. Thanks.
The last patch you uploaded appears to be for RHEL3. We're testing with RHEL4, 2.6.9-27 now. Uploading the patch we've applied to -27 for your approval.
Created attachment 122893 [details] expire negative patch for 2.6.9-27 for review
Wow, sorry about that! Let me know how your testing goes, please. The patch you uploaded looks fine. Thanks!
This patch did not work. We are back to the old action of being unable to see a mount point if we've ever tried to mount it while it wasn't valid, and then made it a valid mount point. Would you like us to do any additional debugging, or regress back to our own patch for this (as outlined in comment #17 above)?
I'll work on it from my end. Thanks for the quick testing turn-around.
Noted a problem with the patch I uploaded...applied against /fs/autofs/ rather than /fs/autofs4/ as it should. When applied correctly, this patch does indeed appear to fix the problems. Will continue testing, but initial indications are that this is indeed fixed.
Created attachment 122899 [details] fixed expire patch pointing to /fs/autofs4/ directory
Created attachment 122900 [details] really correct expire patch If I would just check before uploading... This time for sure.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0132.html