Red Hat Bugzilla – Bug 151431
automount hangs due to unsafe call in signal handler
Last modified: 2007-11-30 17:07:06 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.2) Gecko/20040301
Description of problem:
automount daemon seems to hang, and will not mount (or expire) anything.
attached strace to automount, saw the following ...
[root@acnlin86 tmp]# strace -p 3719
Process 3719 attached - interrupt to quit
futex(0x24720c, FUTEX_WAIT, 2, NULL
Tried to mount filesystems while in strace,
saw absolutely know activity.
Version-Release number of selected component (if applicable):
Couldn;t reproduce but problem seems to have started
when a file server exporting a filesystem went down.
Automount never recovered after that.
Is there a hung umount process? Can you manually umount the filesystem that was
mounted from the server that went down? What is the output of alt-sysrq-t when
this happens? What do the logs show?
Unfortunately didn't try all the things you mentioned.
Will keep this in mind next time.
But I could manually mount filesystems,
though didn't try the one that had previously failed.
The machine was up and functioniong so I didn't do an alt-sysrq-t.
The /proc/mounts and /etc/mtabs didn't show the filestystem that
had gone bad, nor did it show any of the ones that couldn't be mounted.
So I didn't try unmounting it.
If they had been previously mounted then they were okay, but no
new ones could be mounted.
When we tried to reboot, we got a lot of these messages...
NXNODE 1.3.2-25: ERROR: file match line: cannot open file
such file or directory 'main:nxnode_ee:4383'
kernel: VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice
Followed by a kernel panic
Wrote down the following stack info...
I can send the /var/log/message file if that helps
We actually had several machines crash with similar messages
i.e have a nice day when the machine exporting the filesystem went bad.
We got mesgs like this..
MVFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day...
automount: >> mount: RPC: Port mapper failure - RPC: Timed out
automount: mount(nfs): nfs: mount failure acnlin31.pbn.bnl.gov:/cfsi on
We seem to be in the same state now, ie. automount not expiring or mounting
anything new. No problematic filesystems this time.
Umounting any mounted filesytem works sort of
ie. it disappears from /proc/mounts but is still present in /etc/mtabs
and the mt point still exists in the auto.xxx directory.
I see no umount msgs in /var/log/messages
Automount daemon looks hung in....
[root@acnlin86 root]# strace -p 3745
Process 3745 attached - interrupt to quit
futex(0x3f320c, FUTEX_WAIT, 2, NULL
We have 2 automount daemons, we are only having problems with one of them
and it's always for the same filesystem ??
Heres some results you asked for
acnlin86 102:ps -elf | grep mount
1 S root 3743 1 0 75 0 - 441 - Mar17 ? 00:00:02
/usr/sbin/automount --timeout=60 --debug /misc file /etc/auto.misc
1 S root 3877 1 0 85 0 - 438 - Mar17 ? 00:00:00
1 S root 3745 1 0 75 0 - 440 - Mar17 ? 00:00:00
/usr/sbin/automount --timeout=60 --debug /cfs file /etc/auto.cfs
Let me know what other info I can get to you while the machine is in this state.
Should we try restarting autofs ?
Debug logs. I see you have debugging enabled. Do you also send all messages
to a debug log? Something like this in your syslog.conf would do the trick:
You mentioned 2 different versions of the kernel and automounter. When you post
test results, please let me know which versions you are running.
The busy inodes after umount issue is being tracked in bz #124600. You may want
to add yourself to the CC list there, though that isn't the main bug you are
So, in summary, please get me debug logs.
Created attachment 112138 [details]
gzipped debug file for autofs
debug file created by automount
The info I am (and have been) sending is for
WS release 3 (Taroon Update 3).
The first set of info I sent was for autofs-4.1.3-47
We then upgraded to autofs-4.1.3-104,
So the second set of info was for autofs-4.1.3-104
Created attachment 112511 [details]
gzipped autofs debug log file #2
The problem is continueing and consistent on only one of our machines.
Rebooting does not help, since it quickly reverts to the bad state,
where the daemon hangs in a futex wait, and it no longer expires or mounts
Even stranger is the fact that the problem seems to occur mostly with only one
automount daemon on this system.
I will attach the debug log in case any one is still looking into this problem.
Currently, we have to manually mount the filesystems on this machine.
Yes, I'm still working on this. Could you please try the following kernel:
This will likely not resolve your autofs issues, but I would like to know if you
still get the panics and the busy inode after umount messages.
I'm looking at your logs now.
Could you post the map file for the troublesome automount?
Created attachment 112518 [details]
tar file of autofs map files fror problem system
attached is a tar file containing the map files for our problem system.
Not sure about the kernel upgrade, can't reproduce the panics at will.
I noticed some comments in issue 12 of autofs Digest about a hanging autofs
"It's possible for an event wait request to arive before the event
requestor. If this happens the daemon never gets notified and autofs
Could this problem be behind our hanging autofs as well ?
i.e bug 151431
I'm not sure. I've requested more information on this specific patch.
This may be a duplicate of bz #144729.
The symptoms seem to be the similar.
However, it mentions the problem went away when --ghost option was removed.
We do not use that option, so that won't help.
It would have been interesting to see if the daemon in bz 144729 was stuck on a
futex, but I saw no mention.
Oh, duh! The futex.... Thanks for mentioning that again. It seems that autofs
will issue syslog(3) calls while in a signal handler. This is a no no, and can
result in the automount process hanging.
See bug 154224. I put together this patch:
But it is against 4.1.4_beta2. I'll put together a patch against our package
and post it for you to try.
Ok, looking forward to the patched package.
Didn't seem to have permission to view bug 154224 you mentioned ?
Since today is our maintenance day,
I was wondering if you got around to putting together a patch for us. Thanks
Created attachment 113359 [details]
comment out syslogs in signal handler context
Dan Berrange put together this patch to verify the problem. If you apply this,
the problem should go away, but we won't get any of the log information from
signal handlers. In other words, this patch is by no means the solution, but
it should help to verify we are addressing the right problem in your
I'm currently working with upstream to resolve the problem in a more permanent
fashion. The proper fix will take another week or two to hammer out.
Please try this patch, and let me know if it resolves your issues.
Would like to try it.
But we don't have source for autofs-4.1.3-104.
Would you happen to have an rpm package ready to go ?
Created attachment 113364 [details]
rpm with syslog patch applied
Here is an i386 rpm, based on autofs-4.1.3-120, which includes the syslog
patch. Please give this a try.
Will do, I'll keep you posted.
Does this patch resolve your hangs? Did you have a chance to try it?
Yes, it did.
Let me know when there is a permanent fix.
Can you tell me if this current release of autofs 4.13-130
contains a fix for this problem.
autofs-4.1.3-130 does not contain the fix for this problem.
I was wondering if you could provide an rpm for 4.1.3-130
with the patch you sent us earlier.
This way we can upgrade autofs on some of our systems
experiencing mount problems.
Could you advice on the best course of action for us ?
We have a large number of systems that we need to upgrade autofs,
to prevent failed mounts, or hung daemons.
Any idea when the fix above will be released.
Should we upgrade with the patched version you gave us earlier ?
Or, is there a more recent version that we could use ?
I can't release anything in a supported fashion. If you would like, I can apply
the patch I made for 4.1.3-120 to the 4.1.3-130 RPM. Most likely, the solution
will be the reentrant syslog implementation that is being developed upstream. I
am targetting U6 for that bug fix. Can you wait that long?
Any idea of a time frame for U6 ?
Is there an alternative to waiting ?
If not, then I guess trying the latest version
with the patch applied makes some sense.
If you have a support contract with Red Hat, then the proper method for getting
this resolved more quickly is to go through Issue Tracker (or through your TAM).
I'll put together an RPM for you with the patch listed in this bugzilla. This,
unfortunately, will not be a supported RPM. What that means is that before you
report any bugs on autofs, you'll have to reproduce them on an unpatched
Not faimiliar with issue tracker.
How do I get there ?
*** Bug 154224 has been marked as a duplicate of this bug. ***
A fix for this was built into autofs version 4.1.3-138.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.