Bug 150116

Summary: autofs removed all files in mounted directory!
Product: [Fedora] Fedora Reporter: Brian J. Murrell <brian>
Component: autofsAssignee: Chris Feist <cfeist>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: cfeist, jay.hilliard, jmoyer
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-04-19 21:05:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Brian J. Murrell 2005-03-02 16:57:21 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041217 Galeon/1.3.19

Description of problem:
This may be a duplicate of 134399.  That bug was unclear about whether
the dir was being removed before or after it was unmounted.

On a network of machines I have /home/<dirs> being automounted with
the following yp auto.master and auto.home maps (respectively):

# ypcat -k auto.master
/home yp:auto.home
/misc yp:auto.misc

# ypcat -k auto.home
* -rw,soft,intr srv0:/export/home/&

Boy was I surprised to find my home directory (/home/brian) completely
empty at some point yesterday and get emptied of any new files I put
in it every 60 seconds.

Through a thorough investigation I came to discover that it was
automount on a number (but far from all) of the hosts that was
emptying my home directory.

From looking at the code it was obvious that umount_multi() that was
doing this "cleaning out" in this block of code:

    if (left == 0) {
        if ((!ap.ghost) ||
            (ap.state == ST_SHUTDOWN_PENDING ||
             ap.state == ST_SHUTDOWN))
            rm_unwanted(path, incl, 1);
        else if (ap.ghost && (ap.type == LKP_INDIRECT))
            rm_unwanted(path, 0, 1);
    } 

"left" is given a value just prior to that block of code depending on
whether the filesystem being mounted was found in the /etc/mtab file
and if it was successful in unmounting the shared filesystem if it was
found in the /etc/mtab file.

When i looked at the mount table on one of the hosts doing the
cleaning it was clear that /home/brian was indeed still mounted from
the server.  So either the block of code that was enumerating the the
/etc/mtab file or the block of code trying to do the unmount has a
defect in it that made automount believe that the filesystem was
either not mounted or was successfully unmounted.

This indeed is a very evil chunk of code if it goes wrong!

I have currently built and installed the -99 release of the autofs
package from rawhide along with a safety plug that does not actually
remove any found files in the tree.

Version-Release number of selected component (if applicable):
autofs-4.1.3-28

How reproducible:
Couldn't Reproduce

Steps to Reproduce:
Unknown as of yet.

Additional info:

Comment 1 Jeff Moyer 2005-03-02 17:25:49 UTC
I don't see any obvious errors in the code.  To clear things up for
me, do you export the home directories with no_root_squash?  The
automount daemon runs as root, and if the files in your home directory
are owned by you, then the daemon should not be able to unlink them.

Oh, and bug number 134399 does not exist.  Was that a typo?

Thanks!

Comment 2 Brian J. Murrell 2005-03-02 17:32:02 UTC
I didn't see anything obviously wrong either.  Could be some subtle
bug with the mtab parsing perhaps.  If I see this happen (or attempte
to happen) again, I will put the little mtab parsing in a loop and see
what it's doing.

As for bug 134399:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=134399

Comment 3 Jeff Moyer 2005-03-02 17:33:57 UTC
You didn't answer my question about your NFS server setup.

Comment 4 Brian J. Murrell 2005-03-02 18:43:19 UTC
Ah.  Yes.  My appologies:

We do export no_root_squash:

/export/home                    *(rw,sync,no_root_squash)

I think there are requirements that root be able to write into some
/home places if not other places that we also export and automount. 
So while this could be a band-aid to prevent "devastation", so does my
short-circiut out of the removal process.  Both are just workarounds
for the bug though of course.

Comment 5 Jeff Moyer 2005-03-02 20:30:16 UTC
The reason for my query was not to try to get around the problem.  I
was trying to make sure that autofs actually was able to do such a
thing in the first place.

Could you enable debugging?  You can do that by adding a --debug for
the entry in your auto.master:

/home yp:auto.home --debug

Then, configure syslog to capture debug logs.  You can do this by
simply adding this line to the end of /etc/syslog.conf:

*.*    /var/log/debug

You will, of course, have to either HUP or restart syslogd.  autofs
will need to be restarted as well.

These logs will be necessary if the problem rears its ugly head again.

My initial guess at what is happening is that it is a race between
mount and umount.  Autofs expires a directory, and then another
process accesses the directory while we're doing the post-expiry
cleanup.  I'll look into this further.

Thanks!

Jeff

Oh, and the bugzilla you mentioned above is not the same problem.

Comment 6 Jeff Moyer 2005-04-11 22:19:47 UTC
After inspecting the code, it seems remotely possible that this bug may have
been encountered due to a locking issue with autofs.  The autofs locking has
changed with 4.1.4, and should resolve this issue.  We will be updating our
package to this version.  When this happens, I will post information on where to
obtain the package to this bugzilla.

Comment 8 Jeff Moyer 2005-07-06 20:46:09 UTC
Have you seen this issue crop up again in your environment?  We are currently
working on some patches that should a) keep autofs from unlinking files, and b)
provide more debugging information if we run into this bug.  Would you be
interested in running a debug version of autofs?

Thanks.

Comment 9 Brian J. Murrell 2005-07-19 16:10:47 UTC
I'm afraid in the environment that this was seen in (and is no longer being seen
for whatever reason) I can't drop debug versions of autofs in.  :-(

Comment 10 Jay Hilliard 2005-08-19 23:45:05 UTC
We've experienced data loss with autofs-4.1.3-131 on RHEL4 U1 x86_64.
strace showed automount doing rmdir("/data/ada83/CHIC/char/.....etc") It was
walking the path and unlinking files. The mount point in this case was
/data/ada83 over nfs3.

Here's the NIS automount entry for /data/ADA98
ada83 -rw,intr,hard,timeo=600,nfsvers=3,tcp,rsize=32768,wsize=32768  
fa:/panfs/fa/ada83

We've looked at the recent autofs-4.1.3-149.src.rpm and are a still concerned
about the following:

         ap.ioctlfd = open(path, O_RDONLY);
         if (ap.ioctlfd < 0) {
                 umount_autofs(1);
                 return -1;
         }

         stat(path, &st);
         ap.dev = st.st_dev;

Here's a case where umount_autofs() is called which eventually tries
to check if we're using the save device (via ap.dev), but ap.dev hasn't
been assigned to yet.  So it ends up comparing against an uninitialzed
variable.

I'm really concerned about the safety of our data before using RHEL4 U1 in our
environment.  Your help is appreciated.

Comment 11 Chris Feist 2005-08-22 16:25:03 UTC
Unfortunately, we've been unable to replicate the problems you've seen.  But, we
have removed the code which unlink's files in -149 and beyond.  So, if autofs
does get confused it will error out and should not accidently start removing files.

I'm investigating the code snippit you found to see if I can cause autofs to fail.

Comment 12 Jeff Moyer 2005-08-22 17:16:51 UTC
In the case mentioned above, state will be equal to ST_INIT.  In this case, we
do not attempt to unmount anything.


Comment 14 Chris Feist 2005-10-07 14:20:48 UTC
Moving bug to MODIFIED state.  (We were unable to replicate, but have modified
the code to prevent accidental deletion of files.)