From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041217 Galeon/1.3.19 Description of problem: This may be a duplicate of 134399. That bug was unclear about whether the dir was being removed before or after it was unmounted. On a network of machines I have /home/<dirs> being automounted with the following yp auto.master and auto.home maps (respectively): # ypcat -k auto.master /home yp:auto.home /misc yp:auto.misc # ypcat -k auto.home * -rw,soft,intr srv0:/export/home/& Boy was I surprised to find my home directory (/home/brian) completely empty at some point yesterday and get emptied of any new files I put in it every 60 seconds. Through a thorough investigation I came to discover that it was automount on a number (but far from all) of the hosts that was emptying my home directory. From looking at the code it was obvious that umount_multi() that was doing this "cleaning out" in this block of code: if (left == 0) { if ((!ap.ghost) || (ap.state == ST_SHUTDOWN_PENDING || ap.state == ST_SHUTDOWN)) rm_unwanted(path, incl, 1); else if (ap.ghost && (ap.type == LKP_INDIRECT)) rm_unwanted(path, 0, 1); } "left" is given a value just prior to that block of code depending on whether the filesystem being mounted was found in the /etc/mtab file and if it was successful in unmounting the shared filesystem if it was found in the /etc/mtab file. When i looked at the mount table on one of the hosts doing the cleaning it was clear that /home/brian was indeed still mounted from the server. So either the block of code that was enumerating the the /etc/mtab file or the block of code trying to do the unmount has a defect in it that made automount believe that the filesystem was either not mounted or was successfully unmounted. This indeed is a very evil chunk of code if it goes wrong! I have currently built and installed the -99 release of the autofs package from rawhide along with a safety plug that does not actually remove any found files in the tree. Version-Release number of selected component (if applicable): autofs-4.1.3-28 How reproducible: Couldn't Reproduce Steps to Reproduce: Unknown as of yet. Additional info:
I don't see any obvious errors in the code. To clear things up for me, do you export the home directories with no_root_squash? The automount daemon runs as root, and if the files in your home directory are owned by you, then the daemon should not be able to unlink them. Oh, and bug number 134399 does not exist. Was that a typo? Thanks!
I didn't see anything obviously wrong either. Could be some subtle bug with the mtab parsing perhaps. If I see this happen (or attempte to happen) again, I will put the little mtab parsing in a loop and see what it's doing. As for bug 134399: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=134399
You didn't answer my question about your NFS server setup.
Ah. Yes. My appologies: We do export no_root_squash: /export/home *(rw,sync,no_root_squash) I think there are requirements that root be able to write into some /home places if not other places that we also export and automount. So while this could be a band-aid to prevent "devastation", so does my short-circiut out of the removal process. Both are just workarounds for the bug though of course.
The reason for my query was not to try to get around the problem. I was trying to make sure that autofs actually was able to do such a thing in the first place. Could you enable debugging? You can do that by adding a --debug for the entry in your auto.master: /home yp:auto.home --debug Then, configure syslog to capture debug logs. You can do this by simply adding this line to the end of /etc/syslog.conf: *.* /var/log/debug You will, of course, have to either HUP or restart syslogd. autofs will need to be restarted as well. These logs will be necessary if the problem rears its ugly head again. My initial guess at what is happening is that it is a race between mount and umount. Autofs expires a directory, and then another process accesses the directory while we're doing the post-expiry cleanup. I'll look into this further. Thanks! Jeff Oh, and the bugzilla you mentioned above is not the same problem.
After inspecting the code, it seems remotely possible that this bug may have been encountered due to a locking issue with autofs. The autofs locking has changed with 4.1.4, and should resolve this issue. We will be updating our package to this version. When this happens, I will post information on where to obtain the package to this bugzilla.
Have you seen this issue crop up again in your environment? We are currently working on some patches that should a) keep autofs from unlinking files, and b) provide more debugging information if we run into this bug. Would you be interested in running a debug version of autofs? Thanks.
I'm afraid in the environment that this was seen in (and is no longer being seen for whatever reason) I can't drop debug versions of autofs in. :-(
We've experienced data loss with autofs-4.1.3-131 on RHEL4 U1 x86_64. strace showed automount doing rmdir("/data/ada83/CHIC/char/.....etc") It was walking the path and unlinking files. The mount point in this case was /data/ada83 over nfs3. Here's the NIS automount entry for /data/ADA98 ada83 -rw,intr,hard,timeo=600,nfsvers=3,tcp,rsize=32768,wsize=32768 fa:/panfs/fa/ada83 We've looked at the recent autofs-4.1.3-149.src.rpm and are a still concerned about the following: ap.ioctlfd = open(path, O_RDONLY); if (ap.ioctlfd < 0) { umount_autofs(1); return -1; } stat(path, &st); ap.dev = st.st_dev; Here's a case where umount_autofs() is called which eventually tries to check if we're using the save device (via ap.dev), but ap.dev hasn't been assigned to yet. So it ends up comparing against an uninitialzed variable. I'm really concerned about the safety of our data before using RHEL4 U1 in our environment. Your help is appreciated.
Unfortunately, we've been unable to replicate the problems you've seen. But, we have removed the code which unlink's files in -149 and beyond. So, if autofs does get confused it will error out and should not accidently start removing files. I'm investigating the code snippit you found to see if I can cause autofs to fail.
In the case mentioned above, state will be equal to ST_INIT. In this case, we do not attempt to unmount anything.
Moving bug to MODIFIED state. (We were unable to replicate, but have modified the code to prevent accidental deletion of files.)