Issue 68758 We upgraded our environment to U4, after which we are seeing auto mounter mount points failing. Also, some times if gives an effect of directories not found. Error Message: ============== Jan 18 22:24:24 stajf16 automount[7647]: >> nfs server reported service unavailable: Connection timed out Jan 18 22:24:24 stajf16 automount[7647]: mount(nfs): nfs: mount failure @ stlinma3.us.oracle.com:/vol/ade_linux on /ade_autofs/ade_linux Jan 18 22:24:24 stajf16 automount[7647]: failed to mount /ade_autofs/ade_linux System Info: ============ bash-2.05# uname -a Linux stajf16 2.4.21-27.ELsmp #1 SMP Wed Dec 1 21:59:02 EST 2004 i686 i686 i386 GNU/Linux bash-2.05# rpm -qa | grep autofs autofs-4.1.3-47 bash-2.05# cat /etc/redhat-release Red Hat Enterprise Linux AS release 3 (Taroon Update 4)
This test case invovled two Linux Red Hat 3.0 u4 machines. Setup for the test case: ======================== 1) Take two Linux machines <automountclient> and <nfsserver> (Both should have Red Hat 3.0 u4 installed.) @ 2) Login to <automountclient> as root. a) Make sure the autofs version is autofs-4.1.3-47. b) Make the following entry in the /etc/auto.master /test /etc/test_autofs tcp,retrans=5 --ghost --debug c) Make the following entry in the /etc/test_autofs autotest -ro,intr,timeo=600,actimeo=1200,rsize=32768,wsize=32768 \ <nfsserver>:/testdir d) cd / e) umount -a -t nfs;/sbin/service autofs stop f) /sbin/service autofs start @ 3) Login to <nfsserver> as root. a) Make the following entry in the /etc/exports /testdir *(rw) b) mkdir /testdir c) mkdir -p /testdir/test1/test2 d) /sbin/service nfs stop e) /sbin/service nfs start How to reproduce the case? ========================== @ 1) Open two connects, on each to <automountclient> and <nfsserver> and login as root. 2) Let's assume "Windows 1" is connected to <automountclient> and "Windows 2" connected to <nfsserver>. 3) On "Windows 1" do the following. ls -l /test/autotest/test1 Output will show ls -l info of test2 4) Using mount command check if /test/autotest is auto umounted. 5) On "Window 2" do the following /sbin/service nfs stop 6) On "Window 1" do the following ls -l /test/autotest/test1 Ouput will show "directory not found." 7) On "Window 2" do the following /sbin/service nfs start 8) On "Window 1" do the following. ls -l /test/autotest/test1 Ouput will show "directory not found." (This is the case even after the nfs server is up and running.)
The testcase listed here only works if you do not umount the autofs mounted partition by hand, but rather shut off the target nfsd and wait for autofs to timeout the volume by itself. the failure seems to be in the cleanup after an unsuccessful unmount.
I'll take a look.
I can't reproduce the problem with U5, so please test with U5. If the problem persists, then provide all of the information requested under the "Filing bug reports" section of the following URL: http://people.redhat.com/jmoyer/ Thanks.
bash-2.05# cat /etc/auto.master ------------------------------------------------------ # $Id: auto.master,v 1.2 1997/10/06 21:52:03 hpa Exp $ # Sample auto.master file # Format of this file: # mountpoint map options # For details of the format look at autofs(8). # /misc /etc/auto.misc --timeout=60 # # ST Specific mount maps # ########################################################################## # WARNING: Each field must be separated by exactly ONE SPACE # and ONE SPACE ONLY or a restart/reload of autofs # will create multiple automount processes. ########################################################################## ### Master auto_master map - currently not used ### +auto_master ### Home directory mappings /home yp:auto_home_adc tcp,intr,timeo=600,rsize=8192,wsize=8192,retrans=5 --ghost ### Mapping for /net - work around @ /net /etc/auto.net --ghost ### Does not apply (We use DNS - we do - really) #/xfn -xfn # Auto Direct mappings /usr/local/redhat /etc/auto_redhat tcp,retrans=5 --timeout=0 --ghost /usr/local/solaris /etc/auto_solaris tcp,retrans=5 --ghost /usr/local/remote /etc/auto_remote tcp,retrans=5 --timeout=0 --ghost ### ADE - label map /ade_autofs /etc/ade_autofs tcp,retrans=5 --ghost
Jan 18 19:49:33 stajf13 automount[23763]: >> nfs server reported service unavailable: Connection timed out Jan 18 19:49:33 stajf13 automount[23763]: mount(nfs): nfs: mount failure stdmlina4:/vol/home1/aime on /home/aime Jan 18 19:49:33 stajf13 automount[23763]: failed to mount /home/aime
Note that the mount point DOES recover if you cd to the directory and do an 'ls', however, the directory is reported as empty which breaks things like symlinked executables until that ls command is issued...
*** JAPATEL 06/20/05 05:35 pm *** autofs 4.1.3-130 does not seem to have a fix for the autofs problem. Still the problem persists in the above reproducible case.
I've managed to reproduce on a U5 install. To be clear, the sequence of events necessary to trigger the bug is as follows: 1) cause mount point to be automounted 2) stop the nfs service on the server 3) wait until the mount point is expired 4) access *a path element within the automount point* (i.e. not the root) At this point, you will receive a No such file or directory error. 5) start the nfs service on the server 6) access a non-root directory or file within the automount point At this point, you will receive a No such file or directory error. This appears to be a kernel bug, and only affects the browsable map case. Please file this through the proper support channels so this can be scheduled for an update release.
autofs version 4.0.0pre10 does not support ghosting. This problem only occurs when ghosting is enabled. As such, why is this issue holding up deployment of U4? If you disable ghosting (i.e. don't specify --ghost), then you get the same behaviour you had before.
-------------------------------------- After disabling the ghost option we donot see the autofs problem. What is missed if we disable ghosting? ------------------------------------- To answer this question clearly, here is an example. 1) The nfs server1 is sharing /sratch it contains the following file path. /scratch/test1/test1/myfile 2) The autofs client on server2 has the following entry in the /etc/auto.master /jay /etc/ade_jay tcp,retrans=5 --debug and /etc/ade_jay has the following entry. jay -ro,intr,timeo=600,actimeo=1200,rsize=32768,wsize=32768 \ server1:/scratch 3) cd /jay ls There no output to the above ls command 4) cd jay ls ls command does show output in this case. We have a strong requirement the ls in 3) should show the output. Hence the autofs problem should be fixed with the ghosting option enable.
I posted this issue upstream, and got a response from the autofs maintainer. His solution would reintroduce a problem with ghosted direct maps. I've pointed this out to him, and am awaiting a response.
Ian Kent posted a patch while I was out of the country. I tested his patch, and it fixes the problem in my environment. I want to spend some additional time verifying that the patch does not introduce regressions. I'll post the patch that Ian posted in this bugzilla. It may or may not apply cleanly to a RHEL 3 kernel tree.
Created attachment 120173 [details] fix failed lookup when nfs server is back online
Created attachment 120231 [details] Patch to fix autofs4 against RHEL3 U6 kernel This patch cleans up our autofs issues! Same as previous patch, but against RHEL3u6.
Great. Thanks for testing this. This patch went into the rawhide kernel build last night. It's up to you, but I'd like to give it some soak time there before releasing it has a hotfix. The hope is that if it introduces any regressions, we will hear about it in short order. Is that acceptable for you guys?
We're deployed the patch on a testing basis to some racks in the farm, will let you know how it goes with wider deployment.
Hi, Greg, Any news? Does it work? Does it fail in new and exciting ways?
The upstream maintainer found a regression introduced by this patch, though he was scant on the details. For now, I woud not deploy this in a production environment.
Let me know when you have more information about the regression. We're seeing no problems with the patch deployed in our environment.
Hi, Greg. Do you wish for this bugzilla to remain confidential to Oracle? If not, please uncheck the "Oracle Confidential Group" box below to make the bug public. If so, just let me know so that I can add appropriate Red Hat accessibility. Thanks in advance. -ernie
OK, Ian claims that the problem was specific to his environment (he was running a patched autofs4 module). So, I'll do some further testing, and will likely propose this patch, as-is, for inclusion in the next update.
Likewise here, we're not seeing any problems (and definitely looks fixed) with the new patch. Let me know what update this will go into.
Greg, I just looked through your backported patch to RHEL 3 U6, and it misses a hunk. Here is the hunk from the original patch that you are missing: @@ -269,7 +269,7 @@ static struct dentry *autofs4_expire(str goto next; } - if ( simple_empty(dentry) ) + if (simple_empty(dentry)) goto next; /* Case 2: tree mount, expire iff entire tree is not busy */ All of my testing has been with a full patch. I'm going to roll a kernel for you to test. -Jeff
Boy, can you tell it's Friday? Disregard that last comment. I don't think the white-space change is critical. =P
A fix for this problem has just been committed to the RHEL3 U7 patch pool this evening (in kernel version 2.4.21-37.11.EL).
Jason Willeford, please create a new bugzilla for the problem you are investigating and then relink IT 83498 to that one (and remove it from this BZ 161875). Thanks in advance.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html