Bug 493223 - Mounts aren't being auto expired correctly.
Mounts aren't being auto expired correctly.
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: autofs (Show other bugs)
5.3
All Linux
low Severity medium
: rc
: ---
Assigned To: Ian Kent
BaseOS QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-31 19:41 EDT by Ray Van Dolson
Modified: 2010-10-23 04:44 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, a stack variable was not initialized on entry to the create_udp_client() or create_tcp_client() functions. During an error exit, the stack variable was checked, and the corresponding file descriptor was closed if the variable had a value other than -1. This could result in incorrectly closing a file descriptor still in use. The stack variable is now initialized and descriptors currently in use should not be closed.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 08:00:01 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
autofs log (766.75 KB, application/x-gzip)
2009-04-03 01:35 EDT, Ray Van Dolson
no flags Details
Contents of /proc/mounts (16.54 KB, text/plain)
2009-04-04 10:03 EDT, Ray Van Dolson
no flags Details
lsof | grep automount (3.93 KB, text/plain)
2009-04-06 13:29 EDT, Ray Van Dolson
no flags Details
automounter logs and mounts (199.83 KB, application/octet-stream)
2009-04-08 17:25 EDT, Ray Van Dolson
no flags Details
Logs from second run against autofs-5.0.1-0.rc2.102.bz493233.2 (1000.93 KB, application/octet-stream)
2009-04-09 19:26 EDT, Ray Van Dolson
no flags Details
Log from deadline at line 77 in cache.c (4.50 MB, text/plain)
2009-04-10 17:03 EDT, Ray Van Dolson
no flags Details
Logs from over the weekend from autofs-5.0.1-0.rc2.102.bz493233.4 (2.31 MB, application/octet-stream)
2009-04-13 12:04 EDT, Ray Van Dolson
no flags Details
Logs from autofs-5.0.1-0.rc2.102.bz493233.4 and 2.6.18-128.1.1.el5.bz493223.1xen (2.68 MB, application/octet-stream)
2009-04-13 18:23 EDT, Ray Van Dolson
no flags Details
Logs from autofs-5.0.1-0.rc2.102.bz493233.5 and 2.6.18-128.1.1.el5.bz493223.1xen (8.46 MB, application/octet-stream)
2009-04-14 19:42 EDT, Ray Van Dolson
no flags Details

  None (edit)
Description Ray Van Dolson 2009-03-31 19:41:49 EDT
Description of problem:
Some mounts (NFS) are not being correctly expired despite no open files under them (they can all be manually unmounted).


Version-Release number of selected component (if applicable):
autofs-5.0.1-0.rc2.102

How reproducible:
Always so far (on sufficiently busy system)


Steps to Reproduce:
1. Set up automounter for /net
2. Have many clients spawn connections to various /net resources
3. Have clients release these resources periodically
  
Actual results:
Some mountpoints are not expired and we see the following:

Mar 31 16:26:37 tashi2 automount[7934]: expiring path /net/blade
Mar 31 16:26:37 tashi2 automount[7934]: some offset mounts still present under /net/blade
Mar 31 16:26:37 tashi2 automount[7934]: couldn't complete expire of /net/blade

repeatedly

Expected results:
Unused mountpoints should be unmounted.

Additional info:
This is happening on a machine we're using as a Samba server to reexport via Samba NFS shares on other, older Unix machines.

I have the following set in /etc/sysconfig/autofs:

TIMEOUT=60
UMOUNT_WAIT=12

Things work well for some period of time, but then I'll start seeing a larger and larger amount of entries in /proc/mounts just hanging around and entries in the logs like the above.

Initially I'll see -hosts entries in /proc/mounts for /net/blade/blade1, etc and can unmount those manually, but still the automatic attempt to unmount /net/blade fails.

I've seen a few mailing lists posts[1] alluding to the fact that this might be fixed upstream?

Any advice on this?  Current solution is to periodically shutdown Samba, automounter and clean things up by hand.  I've really only seen this happen on a server with a lot of clients hitting Samba shares and thus opening a lot of automounter connections.

[1] http://www.mail-archive.com/autofs@linux.kernel.org/msg07685.html
Comment 1 Ian Kent 2009-03-31 21:40:10 EDT
(In reply to comment #0)
> Description of problem:
> Some mounts (NFS) are not being correctly expired despite no open files under
> them (they can all be manually unmounted).
> 
> 
> Version-Release number of selected component (if applicable):
> autofs-5.0.1-0.rc2.102

Are you using the RHEL-5.3 kernel, revision 128?

snip...

> I've seen a few mailing lists posts[1] alluding to the fact that this might be
> fixed upstream?

Not if your using kernel rev 128.

> 
> Any advice on this?  Current solution is to periodically shutdown Samba,
> automounter and clean things up by hand.  I've really only seen this happen on
> a server with a lot of clients hitting Samba shares and thus opening a lot of
> automounter connections.

We will need to get a debug log.
Make sure you're sending daemon.* to whatever log you use for this.

Ian
Comment 2 Ray Van Dolson 2009-04-01 13:46:48 EDT
(In reply to comment #1)
> (In reply to comment #0)
> > Description of problem:
> > Some mounts (NFS) are not being correctly expired despite no open
> > files under them (they can all be manually unmounted).
> > 
> > 
> > Version-Release number of selected component (if applicable):
> > autofs-5.0.1-0.rc2.102
> 
> Are you using the RHEL-5.3 kernel, revision 128?

Yes.

$ uname -r
2.6.18-128.1.1.el5xen

> 
> snip...
> 
> > I've seen a few mailing lists posts[1] alluding to the fact that this
> > might be fixed upstream?
> 
> Not if your using kernel rev 128.
> 
> > 
> > Any advice on this?  Current solution is to periodically shutdown
> > Samba, automounter and clean things up by hand.  I've really only
> > seen this happen on a server with a lot of clients hitting Samba
> > shares and thus opening a lot of automounter connections.
> 
> We will need to get a debug log.  Make sure you're sending daemon.* to
> whatever log you use for this.

I'll work on getting this to you guys.  Also have opened an SR for this
(1908272).  I'd love to get this issue quashed before we switch back to
Solaris for our CIFS/NFS gateway box.

Thanks,
Ray
Comment 3 Ray Van Dolson 2009-04-02 14:00:03 EDT
Ian, are you able to look at attachments on SR's?  I uploaded my automount debug log replete with many "offset mounts still present" errors to SR 1908272.

The obvious question that comes to mind is are there filehandles still open on the offset mounts?  My thought is no based on the following observations:

Given path /net/servername

- smbstatus | grep servername returns nothing
- fuser and lsof | grep servername returns nothing
- I can umount /net/servername/offset1 just fine
- At this point autofs still cannot clear the mount out
- However I can umount /net/servername just fine manually

I should also note that automount ran for about 6-7 hours without generating any of these errors (you can see this reflected in the log).  Once the first error showed up however, they continued on pretty much unabated.

Let me know what else I can provide for you.
Comment 4 Ian Kent 2009-04-02 21:20:37 EDT
(In reply to comment #3)
> Ian, are you able to look at attachments on SR's?  I uploaded my automount
> debug log replete with many "offset mounts still present" errors to SR 1908272.

No.

Normally bugs get logged as a result of an SR proving to be worthy
of escalation to engineering. I can't see SRs at all and that's the
way it is supposed to be because of the the verification and
evaluation process prior to escalation.

However, I'm keen to see your debug log, so post it here.

Ian
Comment 5 Ray Van Dolson 2009-04-03 01:34:32 EDT
Understood.  I was a bit worried about privacy, but I don't think there's anything particularly critical in this log.

Attaching it now.
Comment 6 Ray Van Dolson 2009-04-03 01:35:41 EDT
Created attachment 337978 [details]
autofs log
Comment 7 Ian Kent 2009-04-04 04:23:01 EDT
What is in /proc/mounts when this is happening?
Comment 8 Ray Van Dolson 2009-04-04 10:03:10 EDT
Created attachment 338168 [details]
Contents of /proc/mounts

Here is /proc/mounts as of "now".  This post-dates the syslog entries by a full day obviously, but the log entries haven't ceased showing up since I sent you that log and likely won't until I bounce automount.

I will note that the *functionality* of the server has remained intact this time around.  In the past, it seems as though automount stopped responding completely and wouldn't create new mounts.  This time, though I see the errors, this hasn't occurred.  Makes me wonder if the extra debug logging going on is creating enough of a delay to mask some timing or race issue.  Anyways, just speculation.

Thanks.
Comment 9 Ian Kent 2009-04-04 23:44:25 EDT
(In reply to comment #8)
> Created an attachment (id=338168) [details]
> Contents of /proc/mounts
> 
> Here is /proc/mounts as of "now".  This post-dates the syslog entries by a full
> day obviously, but the log entries haven't ceased showing up since I sent you
> that log and likely won't until I bounce automount.

Right, I but just I wanted to see if the mounts generating those
errors actually had the mounts on them that appear in the expire
messages. They do and the mount list look as it should.

> 
> I will note that the *functionality* of the server has remained intact this
> time around.  In the past, it seems as though automount stopped responding
> completely and wouldn't create new mounts.  This time, though I see the errors,
> this hasn't occurred.  Makes me wonder if the extra debug logging going on is
> creating enough of a delay to mask some timing or race issue.  Anyways, just
> speculation.

I looked at the code, based on what's happening in the log, all day
yesterday and, while these a couple of slightly suspect things, the
timing and order of execution looks like there shouldn't a race.

But, in the log, this is distinctly wrong:
automount[15407]: send_ready: token = 61014
automount[15407]: send_ready:604: AUTOFS_IOC_READY: error Bad file descriptor

The file descriptor here should have been opened to as part of the
mount request, unless the mount on top of the offset was manually
umounted, in which case the existing descriptor is re-used. I can't
find anywhere that closes the descriptor without clearing it first.
If the file descriptor is somehow incorrect that could cause something
like this. It may well be coincidence that we only see this once and
the other incorrect descriptors are actually valid but just not opened
against the correct offset trigger. That's a really long shot and
little more than a guess because I really can't see how it could
happen.

At this stage, all I can do is make the couple of changes to the bits
that might not be quite right and have you use that to collect another
log.

Ian
Comment 10 Ian Kent 2009-04-05 00:34:53 EDT
Oh .. and before you restart autofs could you post the output
from "lsof|grep automount" please.
Comment 11 Ray Van Dolson 2009-04-06 13:25:31 EDT
(In reply to comment #10)
> Oh .. and before you restart autofs could you post the output
> from "lsof|grep automount" please.  

Well, this could shed some light on things.  lsof doesn't complete, and sure
enough it's waiting for NFS mounts that have "gone away":

(strace of lsof):

write(5, "/net/karatachi/karatachi4\0", 26) = 26
write(5, "`\0\0\0", 4)                  = 4
read(6, 0xbffab22c, 4)                  = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
alarm(0)                                = 0
rt_sigaction(SIGALRM, {SIG_DFL, [ALRM], SA_RESTART}, {0x8058070, [ALRM], SA_RESTART}, 8) = 0
close(5)                                = 0
close(6)                                = 0
rt_sigaction(SIGALRM, {0x8058070, [ALRM], SA_RESTART}, {SIG_DFL, [ALRM], SA_RESTART}, 8) = 0
alarm(5)                                = 0
wait4(-1, 

Unfortunately I don't know the timing here, but I imagine this is a somewhat
typical occurrence as many of these mounts are end user machines and I can't
always guarantee they'll be available.

It's also one of the reasons I set the timeout to 60 seconds in the hopes that
these sorts of stale mounts would be avoided better.

So maybe all the weirdness I'm seeing is coming as the result of standard
hardmount behavior.  Or maybe it's just ancillary...

Would the automounter get hung up on any hard mounted NFS filesystems as it
cycles through looking for things to auto clean up?  I'd love to be able to
tell it to "force" or even lazy unmount hung filesystems like this (they're
all mounted with the "intr" flag)

Anyways, guess I'll go through and force unmount all of these directories and
see if I can get lsof and df to work again.
Comment 12 Ray Van Dolson 2009-04-06 13:29:28 EDT
Created attachment 338365 [details]
lsof | grep automount

Turns out there was only the one hung NFS mount.  lazy unmounted it and lsof | grep automount completed.
Comment 13 Ian Kent 2009-04-06 22:54:30 EDT
(In reply to comment #12)
> Created an attachment (id=338365) [details]
> lsof | grep automount
> 
> Turns out there was only the one hung NFS mount.  lazy unmounted it and lsof |
> grep automount completed.  

In the listing of /proc/mounts above there are more than 40
apparently active NFS mounts but the lsof output shows only
12 (and that file handle for / doesn't look right either).
Given that this was taken at a different time does the number
of file handles in the lsof output match what /proc/mounts
shows as active NFS mounts?
Comment 14 Ian Kent 2009-04-06 23:25:57 EDT
(In reply to comment #11)
> 
> Unfortunately I don't know the timing here, but I imagine this is a somewhat
> typical occurrence as many of these mounts are end user machines and I can't
> always guarantee they'll be available.

Hahaha, been there done that!

> 
> It's also one of the reasons I set the timeout to 60 seconds in the hopes that
> these sorts of stale mounts would be avoided better.
> 
> So maybe all the weirdness I'm seeing is coming as the result of standard
> hardmount behavior.  Or maybe it's just ancillary...
> 
> Would the automounter get hung up on any hard mounted NFS filesystems as it
> cycles through looking for things to auto clean up?  I'd love to be able to
> tell it to "force" or even lazy unmount hung filesystems like this (they're
> all mounted with the "intr" flag)

That's not an entirely straight forward to answer.
It depends a lot on mount(8) and the kernel.

If umount(8) doesn't hang when umounting these then they should
expire just fine with the version of autofs in 5.3. The issue of
umount(8) changing behaviour from time to time is a concern for
autofs but I haven't done anything about it because, in RHEL,
I've not seen it as a problem but I could be mistaken.

Similarly, attempting to mount from a server that has gone away
should fail fairly quickly on RHEL but this behaviour also changes
over time with different kernels and distributions.
  
In earlier RHEL releases autofs did have a problem with mounts
to servers that had gone away because, during the expire, it
used a couple of system calls that required contacting the server
and they would hang, then fail, essentially causing expiration to
stop working for that automount.

The other problem we see from time to time is mount and umount
not returning the correct status under some conditions which
can cause autofs to get a little confused.
Comment 15 Ian Kent 2009-04-07 06:49:59 EDT
(In reply to comment #14)
> (In reply to comment #11)
> > 
> > Unfortunately I don't know the timing here, but I imagine this is a somewhat
> > typical occurrence as many of these mounts are end user machines and I can't
> > always guarantee they'll be available.
> 
> Hahaha, been there done that!
> 
> > 
> > It's also one of the reasons I set the timeout to 60 seconds in the hopes that
> > these sorts of stale mounts would be avoided better.
> > 
> > So maybe all the weirdness I'm seeing is coming as the result of standard
> > hardmount behavior.  Or maybe it's just ancillary...
> > 
> > Would the automounter get hung up on any hard mounted NFS filesystems as it
> > cycles through looking for things to auto clean up?  I'd love to be able to
> > tell it to "force" or even lazy unmount hung filesystems like this (they're
> > all mounted with the "intr" flag)
> 
> That's not an entirely straight forward to answer.
> It depends a lot on mount(8) and the kernel.
> 
> If umount(8) doesn't hang when umounting these then they should
> expire just fine with the version of autofs in 5.3. The issue of
> umount(8) changing behaviour from time to time is a concern for
> autofs but I haven't done anything about it because, in RHEL,
> I've not seen it as a problem but I could be mistaken.
> 
> Similarly, attempting to mount from a server that has gone away
> should fail fairly quickly on RHEL but this behaviour also changes
> over time with different kernels and distributions.
> 
> In earlier RHEL releases autofs did have a problem with mounts
> to servers that had gone away because, during the expire, it
> used a couple of system calls that required contacting the server
> and they would hang, then fail, essentially causing expiration to
> stop working for that automount.
> 
> The other problem we see from time to time is mount and umount
> not returning the correct status under some conditions which
> can cause autofs to get a little confused.  

If your willing to persevere with this a little longer I would
like to try and find out if the ioctl file descriptor is being
incorrectly set or cleared during mounts or expires. We can do
this by adding a couple of checks and logging the result in a
couple of strategic locations. I can make a scratch build that
will do this if you can test it.

Sorry for this hassle, Ian
Comment 16 Ray Van Dolson 2009-04-07 16:42:16 EDT
Absolutely, more than happy to test.  I'm leaving our production Samba server running on RHEL 5.3 a bit longer than planned specifically to try and get this issue tracked down.  It's the only reason we're still tied to Solaris for our Samba/NFS server needs.

Point me to the scratch build and I'll be happy to give it a try.

I could potentially write a cron job to make copies of /proc/mounts every 5 minutes or so so the next debug log would let you see how the mount table looked at a given time.  Would that be useful?
Comment 17 Ray Van Dolson 2009-04-07 17:09:59 EDT
Also, I might be able to arrange to get you remote access to this box if you think it'd make things any easier.
Comment 18 Ian Kent 2009-04-07 23:57:36 EDT
(In reply to comment #16)
> Absolutely, more than happy to test.  I'm leaving our production Samba server
> running on RHEL 5.3 a bit longer than planned specifically to try and get this
> issue tracked down.  It's the only reason we're still tied to Solaris for our
> Samba/NFS server needs.
> 
> Point me to the scratch build and I'll be happy to give it a try.

I have put a scratch build with a couple of patches at:
http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.102.bz493223.1

The package has two additional patches.

One addresses the only issue with ioctl fd corruption I've been
able to find so far. Unfortunately, your setup doesn't use this
code at all, so that's not our problem. I added it anyway, for
completeness.

The second patch makes some changes to code that is used by your
setup. It changes the locking (a little) during mounts and expires
and adds a check and log message for incorrectly setting or
clearing the ioctl fd needed by active mounts.

Hopefully I haven't made any mistakes, ;)

> 
> I could potentially write a cron job to make copies of /proc/mounts every 5
> minutes or so so the next debug log would let you see how the mount table
> looked at a given time.  Would that be useful?  

Maybe.

At this point it's critical to identify what is causing this as I
can't see any reason for it in the code that is being executed,
according to the log output we have anyway.

Right now I'm just trying to establish that we have the situation
where there are real mounts (the NFS ones) shown in /proc/mounts
that are mounted on top of corresponding autofs mounts (offset or
trigger mounts) that do not have an open ioctl fd (which we need
for expires). That was the reason I wanted to compare /proc/mounts
with the lsof output. From the log we know for sure that, internally,
autofs thinks it has an open ioctl fd for these mounts but the kernel
module sees the mount as not busy and this is the cause of those
"some offset mounts still present under <path>" messages. So we need
to confirm that, in fact, the ioctl fds are being incorrectly closed.

To answer the question above, we only need to see one entry in
/proc/mounts that has an active NFS mount (above an autofs offset
mount) that has no open ioctl fd to confirm this.

The second path above should tell us more about where this may be
happening (or not happening).

Ian
Comment 19 Ian Kent 2009-04-07 23:59:03 EDT
(In reply to comment #17)
> Also, I might be able to arrange to get you remote access to this box if you
> think it'd make things any easier.  

Thanks very much for the offer.
That may well be useful, but for now lets run the test package
and see what information we can get from it.

Ian
Comment 20 Ray Van Dolson 2009-04-08 13:41:45 EDT
Alright, scratch build of autofs is installed and running.  I'll keep an eye
on things to see when and if the errors we saw before begin showing up
(usually takes a few hours).

Ray
Comment 21 Ray Van Dolson 2009-04-08 17:25:01 EDT
Created attachment 338805 [details]
automounter logs and mounts

Well, this actually popped back up again within 2 hours or so.  The first
error shows up at 12:17:42 regarding mescalero and continues on.

lsof | grep automount doesn't show any reference to mescalero.

(See lsof.txt)

I included a log of /proc/mount entries, but it might have been more useful to
keep track of lsof | grep automount to see when mescalero disappeared as an
open file handle.

Ray
Comment 22 Ray Van Dolson 2009-04-08 17:58:18 EDT
So maybe you can help me explain how autofs works a bit.

I observed the following behavior just now:

  - Complaint about inability to clean up submount /net/mescalero
    - lsof showed no references to this
    - df showed it as mounted
    - it was in /proc/mounts
    - I umount'd /net/mescalero/mescalero1
    - Error messages stopped

However, next:

  - Complaint about /net/cjmtksol
    - lsof shows now references
    - df showed it as mounted
    - it was in /proc/mounts
    - I umount'd /net/cjmtksol/cjmtksol1
    - Error messages... continued!
    - No longer in df
    - No longer in /proc/mounts

Does autofs cache mount information?  I can't figure out why it would try and
unmount /net/cjmtksol to begin with since it's not even listed in /proc/mounts
:)
Comment 23 Ian Kent 2009-04-09 02:38:59 EDT
(In reply to comment #22)
> So maybe you can help me explain how autofs works a bit.

Sure.

> 
> I observed the following behavior just now:
> 
>   - Complaint about inability to clean up submount /net/mescalero
>     - lsof showed no references to this
>     - df showed it as mounted
>     - it was in /proc/mounts
>     - I umount'd /net/mescalero/mescalero1
>     - Error messages stopped
> 
> However, next:
> 
>   - Complaint about /net/cjmtksol
>     - lsof shows now references
>     - df showed it as mounted
>     - it was in /proc/mounts
>     - I umount'd /net/cjmtksol/cjmtksol1
>     - Error messages... continued!
>     - No longer in df
>     - No longer in /proc/mounts

Though, this doesn't appear to correlate to the log you posted.

> 
> Does autofs cache mount information?  I can't figure out why it would try and
> unmount /net/cjmtksol to begin with since it's not even listed in /proc/mounts
> :)  

I'll explain the way this works but leave out some the detail related
to the nesting of mounts for the purpose of clarity. The post will be
long enough as it is and we don't have multiple levels of nesting with
the cases were looking at here anyway. I'll also remove extraneous "s
that you would see in the log for the same reason.
 
When we get a multi-mount map entry (as we do when we grab the exports
list for a -hosts lookup) like:

mescalero /mescalero2 mescalero:/mescalero2 \
          /mescalero1 mescalero:/mescalero1

Two internal autofs mounts (they show up as mounts with the offset
option in /proc/mounts) are done for /net/mescalero/mescalero2
and /net/mescalero/mescalero1 to serve are triggers for the real
mounts they correspond to.

When a mount is triggered it is mounted on top of one of these
offsets but before that is done a file handle is opened on the
offset so we can send expire ioctl requests to it (obviously we
can't use open() once we have a mount on top of the offset,
although that is changing it's that way now, so we have to
live with it). This file handle is recorded for the mount
internally by autofs.

Once the mount successfully expires the file handle is closed
and cleared for the offset.

This shows the first problem we have with this scheme, that being
the handling of manual umounts. To check and correct this out of
sync. situation we have two options, one is to use a check that
involves statfs() and the other is to scan the mount table. Using
statfs() will hang if a mount is actually present and the remote
machine has gone away. But, scanning the /proc/mounts mount table
often can bring autofs to a halt if very large maps are being
used so it is avoided where possible. Scanning /etc/mtab is much
faster but we have seen endless problems with the mtab locking
done by mount in the past, which can result in mtab becoming
corrupt so we try to avoid that as much as possible as well.

Note that there is no check to do this re-sync at all in the
expire code that is used here so I need to do something about
that now, see below.

Second note, work is in progress to allow us to do this is
mounted check without scanning "any" mount table. That kernel
code went into mainline recently and is posted for inclusion
in RHEL-5.4. But, that functionality is just a small part of
the problem that submission is meant to resolve so increasing
the use of this low overhead is mounted check that's now
available will be introduced in time.

Note three, the handling of manual umounts gets much more
complex with multi-mounts with multiple levels and nesting
but lets leave that discussion for another day since we don't
appear to have that problem here.

So the bottom line for this first issue is that autofs handles
manual umount of multi-mount entries poorly but we will improve
that situation here and now.

The second thing that has been difficult to handle is the case
where the multi-mount has no real mount at the root, such as the
one above. For example, if the mount above had a real mount at
its root it might look something like:

mescalero /  mescalero:/mescalero0 \
          /mescalero2 mescalero:/mescalero2 \
          /mescalero1 mescalero:/mescalero1

When there is no real mount at the root of the multi-mount the
last thing we still need to do is umount any remaining offset
mount triggers. To do this we calculate a suitable maximum
number of times to send an expire ioctl to the top of the
multi-mount, in this case to /net/mescalero. Estimating how
many times to send an expire is complicated because, at some
point the 2.6 kernel started to have a latency between the
closing of a file handle and the mount being seen as no longer
busy. So the number of times we call the ioctl (with small
delays in between) is increased by an arbitrary amount that
has shown to be effective in getting rid of mounts that really
aren't busy any more. This actually works quite well even
though it sounds a bit odd and is the mechanism that has been
used effectively in version 4 for ages.

But there's gotcha, when we do this final expire on the
multi-mount we can't afford to allow the expire to complete
if it looks like there are busy mounts. If we do it ultimately
leads to a partial expires and things go down hill rapidly
from there to the point where it isn't possible to recover
at all.

The "couldn't complete expire of /net/<host name>" message
results when an ioctld file handle is not -1, meaning there
is an alleged active mount, when we are trying to complete
the final expire.

Bottom line, autofs handles manual umount of mounts poorly
and has a problem of sensitivity to inaccurate returns from
mount and umount.

But, based on our discussion here adding an is mounted check
to the file handle check during this final expire would be
quite effective as the overhead should be low because it
would be called as an exception rather than the rule. So that
will be more effective than any other mechanism we could use
to recover from a manual umount in a good number of cases, the
case here for a start.

Finally, going through the log you posted we, don't see any
evidence of the log messages I added. These were placed to
catch cases were autofs incorrectly set or cleared a file
handle, specifically assuming that the return of the mount
and umount operation was accurate.

So let me add some logging to check if we are getting inaccurate
returns from the mount and umount operations and also have a go
at the fix to recover from manual umounts I described.

It's interesting to note that if we aren't getting occassional
bad returns from the mounts and umounts then suddenly things
get much harder. This is because there really aren't other
places I have been able to track down where the file handle
is changed in the code that is being executed for the setup
we have here.

Ian
Comment 24 Ian Kent 2009-04-09 04:43:48 EDT
(In reply to comment #23)
> So let me add some logging to check if we are getting inaccurate
> returns from the mount and umount operations and also have a go
> at the fix to recover from manual umounts I described.
> 

Arrgh!

I've been here before!
This method for manual umount recovery can't work.
This check needs to be done much earlier in the process.
I'll need to come up with another way to do it.
Comment 25 Ian Kent 2009-04-09 06:46:42 EDT
I have put another scratch build:
http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.102.bz493223.2

The package has two additional patches.

This time the first one attempts to address the manual umount issue.

The second patch now adds some debug logging to check for incorrect
returns from mount and umount as well as the previous checks.

Hopefully I haven't made any mistakes this time either, ;)
Comment 26 Ray Van Dolson 2009-04-09 10:36:52 EDT
(In reply to comment #23)
> Though, this doesn't appear to correlate to the log you posted.

True, this was based on observations I made after submitting the updated log to you... :-)

> snip

Thanks for the very informative reply.

I've updated autofs on our system (as of a few hours ago) and will send back results shortly no doubt.
Comment 27 Ray Van Dolson 2009-04-09 19:26:31 EDT
Created attachment 339019 [details]
Logs from second run against autofs-5.0.1-0.rc2.102.bz493233.2

Well, this time it managed to make it nearly the entire day without an error showing up.  One finally did -- at 15:59:36 local time (obiwan).

You'll note though that there are several errors for "hep" (these are still ongoin).  I included output from a cronjob that not only outputs /proc/mounts but also lsof... you can see that hep was present in the lsof output from 1555 and 1600 localtime, but not at 1605 or later, yet we see offset mount errors in the log from 1602 onwards (there are entries for hep in /proc/mounts however).

Anyways, let me know what you think!
Comment 28 Ian Kent 2009-04-10 02:09:52 EDT
(In reply to comment #27)
> Created an attachment (id=339019) [details]
> Logs from second run against autofs-5.0.1-0.rc2.102.bz493233.2
> 
> Well, this time it managed to make it nearly the entire day without an error
> showing up.  One finally did -- at 15:59:36 local time (obiwan).
> 
> You'll note though that there are several errors for "hep" (these are still
> ongoin).  I included output from a cronjob that not only outputs /proc/mounts
> but also lsof... you can see that hep was present in the lsof output from 1555
> and 1600 localtime, but not at 1605 or later, yet we see offset mount errors in
> the log from 1602 onwards (there are entries for hep in /proc/mounts however).
> 
> Anyways, let me know what you think!  

I think the obiwan problem is related to the bits I added to detect
manual mounts.

I have added more debug prints to a third debug revision of autofs.
Same place as the others, but with a 3 on the end of the directory.

This is a puzzling, that's for sure.

Ian
Comment 29 Ray Van Dolson 2009-04-10 17:03:10 EDT
Created attachment 339134 [details]
Log from deadline at line 77 in cache.c

automount crashed with the following:

Apr 10 13:16:29 tashi2 automount[25065]: mapent cache rwlock lock failed
Apr 10 13:16:29 tashi2 automount[25065]: deadlock detected at line 77 in cache.c, dumping core.

(full log attached)
Comment 30 Ray Van Dolson 2009-04-10 17:08:16 EDT
(gdb) bt
#0  0x0043a402 in __kernel_vsyscall ()
#1  0x00a25a01 in raise () from /lib/i686/nosegneg/libpthread.so.0
#2  0x006a6fe4 in dump_core () at spawn.c:49
#3  0x006adba3 in cache_writelock (mc=0x8558e98) at cache.c:77
#4  0x006a135e in expire_proc_indirect (arg=0x85769b8) at indirect.c:474
#5  0x00a1d4d2 in start_thread () from /lib/i686/nosegneg/libpthread.so.0
#6  0x00b7c48e in clone () from /lib/i686/nosegneg/libc.so.6

(gdb) bt full
#0  0x0043a402 in __kernel_vsyscall ()
No symbol table info available.
#1  0x00a25a01 in raise () from /lib/i686/nosegneg/libpthread.so.0
No symbol table info available.
#2  0x006a6fe4 in dump_core () at spawn.c:49
	segv = {__val = {1024, 0 <repeats 31 times>}}
#3  0x006adba3 in cache_writelock (mc=0x8558e98) at cache.c:77
	status = 35
#4  0x006a135e in expire_proc_indirect (arg=0x85769b8) at indirect.c:474
	ind_key = <value optimized out>
	ret = <value optimized out>
	ap = (struct autofs_point *) 0x8558d88
	me = <value optimized out>
	mnts = <value optimized out>
	next = (struct mnt_list *) 0x857c0f8
	ec = {mutex = {__data = {__lock = 139958264, __count = 0, __owner = 0, __kind = 0, __nusers = 139958256, {__spins = 0, __list = {__next = 0x0}}}, 
    __size = "�\227W\b", '\0' <repeats 12 times>, "�\227W\b\000\000\000", __align = 139958264}, cond = {__data = {__lock = 0, __futex = 0, __total_seq = 601116166692049568, 
      __wakeup_seq = 13218051334235750405, __woken_seq = 601116166691893520, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, 
    __size = "\000\000\000\000\000\000\000\000�\222l\000�\227W\b\005\000\000\000��o�\0201j\000�\227W\b", '\0' <repeats 12 times>, "���", __align = 0}, signaled = 10666368, ap = 0x8558d88, 
  state = 12308596, when = 139958264, status = -1}
	now = 0
	submnts = <value optimized out>
	count = <value optimized out>
	ioctlfd = <value optimized out>
	cur_state = 0
	status = <value optimized out>
	ret = <value optimized out>
	left = 18
	__FUNCTION__ = "expire_proc_indirect"
#5  0x00a1d4d2 in start_thread () from /lib/i686/nosegneg/libpthread.so.0
No symbol table info available.
#6  0x00b7c48e in clone () from /lib/i686/nosegneg/libc.so.6
No symbol table info available.
Comment 31 Ian Kent 2009-04-10 22:41:22 EDT
(In reply to comment #29)
> Created an attachment (id=339134) [details]
> Log from deadline at line 77 in cache.c
> 
> automount crashed with the following:
> 
> Apr 10 13:16:29 tashi2 automount[25065]: mapent cache rwlock lock failed
> Apr 10 13:16:29 tashi2 automount[25065]: deadlock detected at line 77 in
> cache.c, dumping core.

Sorry, typo.

But that also means that there may be mismatches between what's
in /etc/mtab and /proc/mounts. That's not good but might not
actually cause much of a problem. It's odd we don't see any error
messages from mount or umount saying this though. Anyway, that's
a different issue so lets stay focused.

The deadlock is probably a good thing since you wouldn't have
wasted much time with this then. When I was trying this package
on my F-10 system I got the "couldn't complete expire ..."
message here, purely by coincidence, and haven't seen it since
after running all night. In my case autofs recovered on the next
expire run but I'm using a much later kernel. The output from
the debug logging I added for this has sent me looking further
afield. Your situation may be different but the fact remains
that the kernel incorrectly told me I could expire the mount
which according to log output was wrong.

Anyway, I have placed a corrected autofs package on
people.redhat.com in autofs-5.0.1-0.rc2.102.bz493223.4 and have
build a kernel with some debug logging added as well an placed
that in kernel-2.6.18-128.1.1.el5.bz493223.1. I don't have quota
to post all the built kernel packages so if you need another let
me know and I'll post it.

The goal now is firstly to check the newly added log output when
the "couldn't ...." message appears to see if the file handles
and device and inode numbers are corrupted but now we also want to
match that output to the expire check in the kernel. Ideally you
would send kern.* to a separate log file as you have done with
daemon.* and post both.
Comment 32 Ray Van Dolson 2009-04-13 12:04:07 EDT
Created attachment 339337 [details]
Logs from over the weekend from autofs-5.0.1-0.rc2.102.bz493233.4

Logs from over the weekend (which explains why it's so large).

Although I have kern.* logging to kern.log (separate logfile), it doesn't look like anything was sent to this facility.  Not sure if that's expected or not.
Comment 33 Ian Kent 2009-04-13 12:29:42 EDT
(In reply to comment #32)
> Created an attachment (id=339337) [details]
> Logs from over the weekend from autofs-5.0.1-0.rc2.102.bz493233.4
> 
> Logs from over the weekend (which explains why it's so large).
> 
> Although I have kern.* logging to kern.log (separate logfile), it doesn't look
> like anything was sent to this facility.  Not sure if that's expected or not.  

It's expected if you didn't use the kernel I mentioned in
comment #31. Also, you will have been running without a
correction that was included in that kernel, and we won't
be able to correlate the autofs debug output.
Comment 34 Ian Kent 2009-04-13 13:09:29 EDT
(In reply to comment #32)
> Created an attachment (id=339337) [details]
> Logs from over the weekend from autofs-5.0.1-0.rc2.102.bz493233.4
> 
> Logs from over the weekend (which explains why it's so large).

It looks like you've started to manually umount autofs internal
mount as well now. Manually umounting autofs mount is generally
not supported because there is a limited amount that can be done
to recover from that but I have tried to help out. umounting
autofs internal mounts has rendered this log useless as far as
the information I'm looking for goes. The correction contained
in the kernel I recommended above may allow these mounts to
recover on their next expire but until that happens that mount
tree will not work and will return mount errors.
Comment 35 Ray Van Dolson 2009-04-13 13:21:13 EDT
D'oh.  My bad on the kernel aspect, totally missed that on Friday as I was at home and a bit distracted.  Just applied your autofs patch and wandered off. :)

I'll get that going now and have some better results for you in a little while.
Comment 36 Ray Van Dolson 2009-04-13 18:23:43 EDT
Created attachment 339393 [details]
Logs from autofs-5.0.1-0.rc2.102.bz493233.4 and 2.6.18-128.1.1.el5.bz493223.1xen

Haven't seen the errors yet in my daemon.log, but kern log was getting rather large so I'm sending it now in case you can glean anything useful from it.
Comment 37 Ray Van Dolson 2009-04-13 20:08:05 EDT
autofs crashed:

Apr 13 16:27:36 tashi2 automount[1685]: mapent cache rwlock lock failed
Apr 13 16:27:36 tashi2 automount[1685]: deadlock detected at line 65 in cache.c, dumping core.

This is with the .4 release.

I've reverted to .2 for the time being.
Comment 38 Ian Kent 2009-04-13 21:45:17 EDT
(In reply to comment #37)
> autofs crashed:
> 
> Apr 13 16:27:36 tashi2 automount[1685]: mapent cache rwlock lock failed
> Apr 13 16:27:36 tashi2 automount[1685]: deadlock detected at line 65 in
> cache.c, dumping core.

But where did it deadlock?
This is just the lock function, it's called from heaps of places.
Comment 39 Ian Kent 2009-04-13 22:38:41 EDT
(In reply to comment #38)
> (In reply to comment #37)
> > autofs crashed:
> > 
> > Apr 13 16:27:36 tashi2 automount[1685]: mapent cache rwlock lock failed
> > Apr 13 16:27:36 tashi2 automount[1685]: deadlock detected at line 65 in
> > cache.c, dumping core.
> 
> But where did it deadlock?
> This is just the lock function, it's called from heaps of places.  

Arrgh, never mind that.
Somehow the change between rev 3 and 4 didn't get applied.
That's happened to me before, something for me to check later.

The kernel log is really just to try and check why we are
attempting to expire what looks like a busy mount when we
get those "couldn't complete ..." messages. The correction
may have made that issue go away, we'll see. If you don't
have any occurrences of "couldn't ..." messages then you
can  "cp /dev/null kern.log" without losing any useful
information.

btw, you can expect to see these messages when trying to
umount mounts at the leaves of the tree. To be busy a mount
only needs an elevated reference count. If a path walk occurs
after a mount is selected for expire but before it is umounted
the path walk will block, holding a reference to the mount,
until the expire completes so the expire won't be able to
complete. On a busy system you may well see this quite a bit.
The other thing to note is that a mount won't be selected for
expire if a path walk is occurring at the time of the expire
check, as the reference count is elevated. So, again, on a
busy system most used mounts will appear to take longer to
expire.
Comment 40 Ian Kent 2009-04-13 23:50:58 EDT
(In reply to comment #38)
> (In reply to comment #37)
> > autofs crashed:
> > 
> > Apr 13 16:27:36 tashi2 automount[1685]: mapent cache rwlock lock failed
> > Apr 13 16:27:36 tashi2 automount[1685]: deadlock detected at line 65 in
> > cache.c, dumping core.
> 
> But where did it deadlock?
> This is just the lock function, it's called from heaps of places.  

This should be fixed now (at last it should be).
The autofs packages are in directory autofs-5.0.1-0.rc2.102.bz493223.5
under my account on people.redhat.com as usual.
Comment 41 Ray Van Dolson 2009-04-14 12:45:57 EDT
Alright, new autofs applied.  If any crashes happen again I'll be sure to provide a backtrace.
Comment 42 Ray Van Dolson 2009-04-14 19:42:31 EDT
Created attachment 339597 [details]
Logs from autofs-5.0.1-0.rc2.102.bz493233.5 and 2.6.18-128.1.1.el5.bz493223.1xen

Errors start in daemon.log around 16:21:07 -- "qalinux".  lsof entry disappears after 16:20:01 entry.
Comment 43 Ian Kent 2009-04-15 00:00:10 EDT
(In reply to comment #42)
> Created an attachment (id=339597) [details]
> Logs from autofs-5.0.1-0.rc2.102.bz493233.5 and
> 2.6.18-128.1.1.el5.bz493223.1xen
> 
> Errors start in daemon.log around 16:21:07 -- "qalinux".  lsof entry disappears
> after 16:20:01 entry.  

Well done, this is very interesting.
Your perseverance might have paid off but it's a bit of a long
shot so we'll have to wait and see.

I notice in the log that things go bad just after:

lookup_mount: lookup(hosts): fetchng export list for prague
create_udp_client:90: hostname lookup failed: Operation not permitted
create_tcp_client:299: hostname lookup failed: Operation not permitted
lookup_mount: exports lookup failed for prague

which is the first occurrence of this.

I had noticed this behaviour in the the very first logs we looked
at but, after checking for a known bug in these functions, I
couldn't see how this could be our problem.

But the evidence we now have shows that a descriptor is being
incorrectly closed so I looked more closely at the create_udp_client()
and create_tcp_client() functions. In each of these functions the
stack variable, fd, that holds the socket descriptor isn't initialized
on entry to the functions but is checked upon an error exit and if
it's not equal to -1 a close is done. Even if it was zero we shouldn't
be doing a close on it but if it isn't zero we could be closing a
descriptor that is in use. So, at the very least this is wrong and
perhaps this is in fact causing our problem.

I'll add the patch for the known bug, correct the initialization
and build another test package.

Ian
Comment 44 Ian Kent 2009-04-15 00:01:49 EDT
(In reply to comment #43)
> 
> I notice in the log that things go bad just after:
> 
> lookup_mount: lookup(hosts): fetchng export list for prague
> create_udp_client:90: hostname lookup failed: Operation not permitted
> create_tcp_client:299: hostname lookup failed: Operation not permitted
> lookup_mount: exports lookup failed for prague

Btw, I know the error message isn't quite right, I'll get around
to fixing that at some point.
Comment 45 Ian Kent 2009-04-15 02:03:31 EDT
(In reply to comment #43)
> 
> I'll add the patch for the known bug, correct the initialization
> and build another test package.

Here we go again.
autofs-5.0.1-0.rc2.102.bz493223.6 is in the expected location
on people.redhat.com.
Comment 46 Ray Van Dolson 2009-04-15 19:02:39 EDT
(In reply to comment #45)
> autofs-5.0.1-0.rc2.102.bz493223.6 is in the expected location
> on people.redhat.com.  

So far so good.  Going on 10 hours now with no errors in daemon.log.
Comment 47 Ray Van Dolson 2009-04-16 21:00:18 EDT
No errors yet....
Comment 48 Ray Van Dolson 2009-04-19 11:54:54 EDT
All still well.
Comment 49 Ian Kent 2009-04-19 21:22:56 EDT
(In reply to comment #48)
> All still well.  

Great, sounds like that last problem was the root cause.

The addition of the change to help with manual umount
recovery (as far as we can anyway) is working OK as well.

So these two changes will constitute the resolution of
this bug.

I will need to open another bug for the mistake found in
the kernel code but, although it is a mistake, I suspect 
it doesn't actually make much difference here.

It would be worth changing back to the kernel you were
originally using to verify that before we consider this
issue resolved.

Thanks
Ian
Comment 50 Ray Van Dolson 2009-04-23 01:26:04 EDT
Reverted to stock kernel on the 19th; all is still running fine.
Comment 51 Ray Van Dolson 2009-04-27 14:08:58 EDT
All still well after a week.  Looks like this bug is quashed! :-)
Comment 52 Ian Kent 2009-05-20 23:06:08 EDT
This issue has been fixed in the latest autofs package
autofs-5.0.1-0.rc2.125.

The autofs RHTS test bugzillas/bz493223 can be used to
verify this correction.
Comment 53 Ian Kent 2009-05-21 00:29:51 EDT
Note that the RHTS test above tests only the improvement to
handling of manual umounts of leaf mounts of an autofs
multi-mount. The root cause of this issue is virtually
impossible to reproduce as it is due to a stack variable
having a none zero value when allocated on entry to the
errant function.

Fortunately the reported tested the correction for some time
and found it functioned as required.
Comment 59 Chris Ward 2009-07-03 14:28:29 EDT
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.
Comment 61 Ruediger Landmann 2009-08-31 19:24:58 EDT
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Changes in the mount command left autofs sometimes unable to unmount mount points to which it had lost its connection. This resulted in a gradual accumulation of stale mounts. Now, autofs uses a miscellaneous device node to route ioctl commands to mount points. This allows autofs to reconstruct a mount tree from existing mounts and reconnect mounts as it needs to.
Comment 62 Ruediger Landmann 2009-08-31 22:06:42 EDT
Sorry -- I had the underlying cause confused with a different bug.
Comment 63 Ruediger Landmann 2009-08-31 22:06:42 EDT
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1 @@
-Changes in the mount command left autofs sometimes unable to unmount mount points to which it had lost its connection. This resulted in a gradual accumulation of stale mounts. Now, autofs uses a miscellaneous device node to route ioctl commands to mount points. This allows autofs to reconstruct a mount tree from existing mounts and reconnect mounts as it needs to.+Previously, a stack variable was not initialized on entry to the create_udp_client() or create_tcp_client() functions. During an error exit, the stack variable was checked, and the corresponding file descriptor was closed if the variable had a value other than -1. This could result in incorrectly closing a file descriptor still in use. The stack variable is now initialized and descriptors currently in use should not be closed.
Comment 64 errata-xmlrpc 2009-09-02 08:00:01 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1397.html

Note You need to log in before you can comment on or make changes to this bug.