Bug 144729
Summary: | automount stops responding and fails to mount | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Jeremy Rosengren <jeremy> |
Component: | autofs | Assignee: | Jeff Moyer <jmoyer> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 3.0 | CC: | jeremy, raju.singh |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | RHEL3U7NAK | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-08-03 12:41:06 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jeremy Rosengren
2005-01-10 23:35:48 UTC
What do the logs look like when the failure occurs? It also appears you have applied patches to autofs. What patches have you applied? The default logging shows nothing when the error occurs, which makes it hard for us to determine exactly what is causing it. We put "--debug" in the DAEMONOPTIONS to try to get more information, but the last time it failed there was nothing in the logs that referenced /tools at all, beyond previous successful mount attempts. After the problem starts occurring (ie, "No such file or directory" when attempting to mount), nothing gets written to the messages file for that automount. We haven't patched autofs ourselves, we're using the autofs package from Update 4. We did get that package a bit earlier than RHN got it, due to another bug with replicated host mounting, but that shouldn't matter. We've determined that the --ghost option in DAEMONOPTIONS has something to do with the cause of this behavior. We were able to confirm this by removing --ghost and restarting: The problem has not resurfaced since. We also found machines that never had the --ghost option enabled and have never seen the problem. We were never able to catch the problem as it was happening, so we were never able to come up a set of steps to replicate the problem. Please add the --debug option to your /tools entry in auto.master. Then, configure syslog to send all messages to a debug log, by adding a line like the following to your /etc/syslog.conf. *.* /var/log/debug Restart syslogd and the automount (please be sure the automounter did actually stop and start again). When the problem next occurs, attach the logs to this bugzilla. There was a patch committed upstream recently which claims to fix problems such as this. I'll create a new kernel rpm for testing and post it when ready. Hi Jeffrey, We are also facing the same problem. We are using RedHat Enterprise 3 Update 4. We upgraded to Update 5 also but same scenerio. As Jeremy informed that after removing --ghost switich in /etc/sysconfig/autofs file, problem didn't come. But we never enabled --ghost option and automount on all machines are running without --ghost option. But even we are facing the problem. Following are the our configuration :- 1) /etc/sysconfig/autofs ------ starts --------- # LOCALOPTIONS="" DAEMONOPTIONS="--timeout=60" # UNDERSCORETODOT changes auto_home to auto.home and auto_mnt to auto.mnt UNDERSCORETODOT=1 DISABLE_DIRECT=1 -------- END ------------ 2) /etc/nsswitch.conf automount: files nis 3) /etc/auto.master +auto.master 4) output of % ypcat -k auto.master ------ starts ------- /design auto.design tcp,rsize=32768,wsize=32768 /proj auto.proj tcp,rsize=32768,wsize=32768 /home auto.home tcp,rsize=32768,wsize=32768 /data auto.data tcp,rsize=32768,wsize=32768 /sw auto.sw tcp,rsize=32768,wsize=32768 --------- END ---------- NFS servers for file system are mainly Netapp, HP Cluster and Solaris enterprise machines. 5) Error log with automount in debug mode : -------- Starts --------------- May 26 18:36:57 del11frd automount[20523]: attempting to mount entry /home/piyushj May 26 18:36:57 del11frd automount[15681]: mount(nfs): no host elected May 26 18:36:57 del11frd automount[15681]: failed to mount /home/piyushj May 26 12:10:22 del15frd automount[2122]: attempting to mount entry /data/rd192a May 26 12:10:22 del15frd automount[5349]: mount(nfs): no host elected May 26 12:10:22 del15frd automount[5349]: failed to mount /data/rd192a ------ END -------- Following are the tests we carried out :- 1) By keeping local /etc/auto.master file as given below :- --------- starts --------- /design auto.design --timeout=60 tcp /proj auto.proj --timeout=60 tcp /home auto.home --timeout=300 tcp /data auto.data --timeout==300 tcp /sw auto.sw --timeout=0 tcp -------- ENDS ----------------- In this case problem didn't resurface for auto.sw map but in all other maps no change. 2) We made machine as NIS slave instead of NIS client but all in vain. 3) In RHEL 3.0 Update 5, and RHEL 4 same problem exists. But it seems that in RHEL 3.0 Update 5 occurence of problem is less compare to RHEL 3.0 Update 4. 4) Workign with --ghost option so lets see what results comes. Raju, Your problem description is difficult to follow. The bug you describe seems similar to that reported in bz #150690. Please look at the following URL: http://people.redhat.com/jmoyer and update bug #150690 with all of the information requested under the "Filing bug reports" section. Jeffrey, In bug report #150690 problem came after upgradation and it comes when he does cd /home/<mountdir> but in our case it doesn't happens. Problem of "failed to mount" doesn't come everytime, but some time as Jeremy said, "An unknown event triggers the automount process for one of our automount maps to stop trying to perform automounts", automount failes to mount requested directory and after some time if same request comes again then it works. So it is intermittent problem. We were facing this problem in RHEL3.0 Update 4 so we upgraded to Update 5 also but problem still exists. We also installed RHEL4 on some machines but no change. If you see this problem different compare to bug #144729 then suggest me what to do. Either I should open new bug or should update bug #150690 which I seems something different in compare to our problem. Following are the some extra details :- ------------------------------------- % rpm -qa autofs autofs-4.1.3-130 % uname -r 2.4.21-32.ELsmp If you need some more details then let me know. Thanks & Regards, Raju Hi Jeffrey, This is update for "comment #8" in which I wrote that on some machines I enabled "--ghost" option for automount daemon. So with "--ghost" option we observed that error in log file is comming for one level ahead as given below :- ------------- starts --------------- Jun 2 12:53:23 del16frd automount[3466]: failed to mount /data/rdnetifl/LIBIO --------- ENDS ---------- PS: here mount point is /data and key is rdnetifl Other than this I am also noticing some "kernel: NFS" messages in log file but not sure whether it is linked with this problem or not. It is given below :- ------------ Starts --------- Jun 2 14:21:16 del16frd kernel: NFS: Buggy server - nlink == 0! Jun 2 14:21:16 del16frd kernel: __nfs_fhget: iget failed Jun 2 14:21:27 del16frd kernel: NFS: Buggy server - nlink == 0! Jun 2 14:21:27 del16frd kernel: __nfs_fhget: iget failed Jun 2 14:21:32 del16frd kernel: NFS: Buggy server - nlink == 0! Jun 2 14:21:32 del16frd kernel: __nfs_fhget: iget failed ---------------- ENDs ------------ We are contineously working to resolve and get more details on this problem. So will be updating you if gets something related. There is a patch for the problem you are seeing with the --ghost option. Please try the kernel patch available here: https://bugzilla.redhat.com/bugzilla//attachment.cgi?id=114991 Actually we are not using "--ghost" mode so we are not applying this patch. We enabled "--ghost" option on two-three machines just for testing purpose to see the behaviour of error. We are using automount without --ghost option and error status at our side is as it is. Any update from your side. I have been able to reproduce the problem without ghosting enabled, and the patch corrects it. Please try the patch. Hi Jeffrey, This patch seems to kernel patch so pls let me know whether I have to re-build the kerenl also. It will be more helpfull for us if it is possible for you to provide patch in rpm form. Hi Jeffrey, Any updates on comment #17. I am waiting for your reply regarding patch in rpm form. Hello Jeffrey, I compiled the module with this patch and kept in observation for a week. But problem is still there. So pls let us know what to do ? Thanks & Regards, Raju Singh I'm putting together the next round of patches for submission. Raju, I'll post a test kernel for you when that is complete. Hi, Jeremy and Raju, Sorry for the delay on this. There is a kernel rpm available on my people page which contains all recent autofs patches: http://people.redhat.com/jmoyer/.bz144729/ Please give it a try and let me know whether it solves your problems. Thanks! When we initially found that disabling the --ghost feature prevented this problem, we disabled --ghost on all our production machines. It may not be possible to do this testing in the same environment where I was seeing the issue, but I'll see what I can do. Thanks Jeff Still waiting for testing feedback from interested parties. |