Bug 151431 - automount hangs due to unsafe call in signal handler
automount hangs due to unsafe call in signal handler
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: autofs (Show other bugs)
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Jeffrey Moyer
Brock Organ
: 154224 (view as bug list)
Depends On:
Blocks: 156321
  Show dependency treegraph
Reported: 2005-03-17 15:39 EST by Sev Binello
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version: RHBA-2005-654
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-09-28 15:10:33 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
gzipped debug file for autofs (1.66 MB, text/plain)
2005-03-18 15:39 EST, Sev Binello
no flags Details
gzipped autofs debug log file #2 (5.83 MB, text/plain)
2005-03-31 09:23 EST, Sev Binello
no flags Details
tar file of autofs map files fror problem system (10.00 KB, text/plain)
2005-03-31 11:50 EST, Sev Binello
no flags Details
comment out syslogs in signal handler context (3.13 KB, patch)
2005-04-19 10:31 EDT, Jeffrey Moyer
no flags Details | Diff
rpm with syslog patch applied (193.82 KB, application/octet-stream)
2005-04-19 11:21 EDT, Jeffrey Moyer
no flags Details

  None (edit)
Description Sev Binello 2005-03-17 15:39:10 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.2) Gecko/20040301

Description of problem:
automount daemon seems to hang, and will not mount (or expire) anything.
attached strace to automount, saw the following ...
[root@acnlin86 tmp]# strace -p 3719
Process 3719 attached - interrupt to quit
futex(0x24720c, FUTEX_WAIT, 2, NULL

Tried to mount filesystems while in strace,
saw absolutely know activity.

Version-Release number of selected component (if applicable):
autofs-4.1.3-47/2.4.21-27.0.2  autofs-4.1.3-104/2.4.21-27.0.1

How reproducible:
Couldn't Reproduce

Additional info:

Couldn;t reproduce but problem seems to have started
when a file server exporting a filesystem went down.
Automount never recovered after that.
Comment 1 Jeffrey Moyer 2005-03-17 17:01:29 EST
Is there a hung umount process?  Can you manually umount the filesystem that was
mounted from the server that went down?  What is the output of alt-sysrq-t when
this happens?  What do the logs show?
Comment 2 Sev Binello 2005-03-17 17:41:42 EST
Unfortunately didn't try all the things you mentioned. 
Will keep this in mind next time.

But I could manually mount filesystems, 
though didn't try the one that had previously failed.

The machine was up and functioniong so I didn't do an alt-sysrq-t.

The /proc/mounts and /etc/mtabs didn't show the filestystem that 
had gone bad, nor did it show any of the ones that couldn't be mounted.
So I didn't try unmounting it.

If they had been previously mounted then they were okay, but no
new ones could be mounted.

When we tried to reboot, we got a lot of these messages...
 NXNODE 1.3.2-25[28966]: ERROR: file match line: cannot open file
'/.nx/C-acnlin86.pbn.bnl.gov-1114-0BD1438F69351E511DE69789FE2A43B4/session': No
such file or directory 'main:nxnode_ee:4383'
kernel: VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice

Followed by a kernel panic
Wrote down the following stack info...

eip @destroy_inode

I can send the /var/log/message file if that helps

We actually had several machines crash with similar messages
i.e have a nice day when the machine exporting the filesystem went bad. 

We got mesgs like this..
MVFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...
automount[25838]: >> mount: RPC: Port mapper failure - RPC: Timed out
automount[25838]: mount(nfs): nfs: mount failure acnlin31.pbn.bnl.gov:/cfsi on

Comment 3 Sev Binello 2005-03-18 10:36:39 EST
We seem to be in the same state now, ie. automount not expiring or mounting
anything new. No problematic filesystems this time.
Umounting any mounted filesytem works sort of
ie. it disappears from /proc/mounts but is still present in /etc/mtabs
and the mt point still exists in the auto.xxx directory.
I see no umount msgs in /var/log/messages

Automount daemon looks hung in....
[root@acnlin86 root]# strace -p 3745
Process 3745 attached - interrupt to quit
futex(0x3f320c, FUTEX_WAIT, 2, NULL

We have 2 automount daemons, we are only having problems with one of them
and it's always for the same filesystem ??

Heres some results you asked for
acnlin86 102:ps -elf | grep mount
1 S root      3743     1  0  75   0    -   441 -      Mar17 ?        00:00:02
/usr/sbin/automount --timeout=60 --debug /misc file /etc/auto.misc
1 S root      3877     1  0  85   0    -   438 -      Mar17 ?        00:00:00
1 S root      3745     1  0  75   0    -   440 -      Mar17 ?        00:00:00
/usr/sbin/automount --timeout=60 --debug /cfs file /etc/auto.cfs

Let me know what other info I can get to you while the machine is in this state.
Should we try restarting autofs ?

Comment 4 Jeffrey Moyer 2005-03-18 15:27:54 EST
Debug logs.  I see you have debugging enabled.   Do you also send all messages
to a debug log?  Something like this in your syslog.conf would do the trick:

*.*    /var/log/debug

You mentioned 2 different versions of the kernel and automounter.  When you post
test results, please let me know which versions you are running.

The busy inodes after umount issue is being tracked in bz #124600.  You may want
to add yourself to the CC list there, though that isn't the main bug you are
running into.

So, in summary, please get me debug logs.

Comment 5 Sev Binello 2005-03-18 15:39:30 EST
Created attachment 112138 [details]
gzipped debug file for autofs

debug file created by automount
Comment 6 Sev Binello 2005-03-18 15:49:06 EST
The info I am (and have been) sending is for
kernel 2.4.21-27.0.2.EL
WS release 3 (Taroon Update 3). 

The first set of info I sent was for autofs-4.1.3-47
We then upgraded to autofs-4.1.3-104,
So the second set of info was for  autofs-4.1.3-104
Comment 7 Sev Binello 2005-03-31 09:23:43 EST
Created attachment 112511 [details]
gzipped autofs debug log file #2

The problem is continueing and consistent on only one of our machines.
Rebooting does not help, since it quickly reverts to the bad state,
where the daemon hangs in a futex wait, and it no longer expires or mounts
Even stranger is the fact that the problem seems to occur mostly with only one
automount daemon on this system.
I will attach the debug log in case any one is still looking into this problem.

Currently, we have to manually mount the filesystems on this machine.
Comment 8 Jeffrey Moyer 2005-03-31 11:00:31 EST
Yes, I'm still working on this.  Could you please try the following kernel:


This will likely not resolve your autofs issues, but I would like to know if you
still get the panics and the busy inode after umount messages.

I'm looking at your logs now.
Comment 9 Jeffrey Moyer 2005-03-31 11:04:34 EST
Could you post the map file for the troublesome automount?

Comment 10 Sev Binello 2005-03-31 11:50:49 EST
Created attachment 112518 [details]
tar file of autofs map files fror problem system

attached is a tar file containing the map files for our problem system.
Not sure about the kernel upgrade, can't reproduce the panics at will.
Comment 11 Sev Binello 2005-04-11 10:02:11 EDT
Jeff -

    I noticed some comments in issue 12 of autofs Digest about a hanging autofs

"It's possible for an event wait request to arive before the event
requestor. If this happens the daemon never gets notified and autofs

Could this problem be behind our hanging autofs as well ?
i.e bug 151431

Comment 12 Jeffrey Moyer 2005-04-11 13:52:22 EDT
I'm not sure.  I've requested more information on this specific patch.
Comment 13 Jeffrey Moyer 2005-04-11 18:22:37 EDT
This may be a duplicate of bz #144729.
Comment 14 Sev Binello 2005-04-12 10:13:05 EDT
The symptoms seem to be the similar.
However, it mentions the problem went away when --ghost option was removed.
We do not use that option, so that won't help.
It would have been interesting to see if the daemon in bz 144729 was stuck on a
futex, but I saw no mention.
Comment 15 Jeffrey Moyer 2005-04-12 10:19:09 EDT
Oh, duh!  The futex....  Thanks for mentioning that again.  It seems that autofs
will issue syslog(3) calls while in a signal handler.  This is a no no, and can
result in the automount process hanging.

See bug 154224.  I put together this patch:


But it is against 4.1.4_beta2.  I'll put together a patch against our package
and post it for you to try.
Comment 16 Sev Binello 2005-04-12 14:57:58 EDT
Ok, looking forward to the patched package.
Didn't seem to have permission to view bug 154224 you mentioned ?
Comment 17 Sev Binello 2005-04-19 10:21:51 EDT
Hi Jeff-
   Since today is our maintenance day,
   I was wondering if you got around to putting together a patch for us. Thanks
Comment 18 Jeffrey Moyer 2005-04-19 10:31:56 EDT
Created attachment 113359 [details]
comment out syslogs in signal handler context

Dan Berrange put together this patch to verify the problem.  If you apply this,
the problem should go away, but we won't get any of the log information from
signal handlers.  In other words, this patch is by no means the solution, but
it should help to verify we are addressing the right problem in your

I'm currently working with upstream to resolve the problem in a more permanent
fashion.  The proper fix will take another week or two to hammer out.

Please try this patch, and let me know if it resolves your issues.

Comment 19 Sev Binello 2005-04-19 11:15:08 EDT
Would like to try it.
But we don't have source for autofs-4.1.3-104.
Would you happen to have an rpm package ready to go ?
Comment 20 Jeffrey Moyer 2005-04-19 11:21:32 EDT
Created attachment 113364 [details]
rpm with syslog patch applied

Here is an i386 rpm, based on autofs-4.1.3-120, which includes the syslog
patch.	Please give this a try.

Comment 21 Sev Binello 2005-04-19 11:36:38 EDT
Will do, I'll keep you posted.
Comment 22 Jeffrey Moyer 2005-05-12 15:27:59 EDT
Does this patch resolve your hangs?  Did you have a chance to try it?

Comment 23 Sev Binello 2005-05-12 16:04:54 EDT
Yes, it did.
Let me know when there is a permanent fix.
Comment 25 Sev Binello 2005-05-24 11:35:44 EDT
Jeff -
Can you tell me if this current release of autofs 4.13-130 
contains a fix for this problem.
Comment 26 Jeffrey Moyer 2005-05-24 11:56:59 EDT
autofs-4.1.3-130 does not contain the fix for this problem.
Comment 27 Sev Binello 2005-06-01 10:10:06 EDT
I was wondering if you could provide an rpm for 4.1.3-130
with the patch you sent us earlier.
This way we can upgrade autofs on some of our systems
experiencing mount problems. 
Comment 28 Sev Binello 2005-06-06 15:41:26 EDT
Could you advice on the best course of action for us ?
We have a large number of systems that we need to upgrade autofs,
to prevent failed mounts, or hung daemons.
Any idea when the fix above will be released.
Should we upgrade with the patched version you gave us earlier ?
Or, is there a more recent version that we could use ? 

Comment 29 Jeffrey Moyer 2005-06-07 09:59:29 EDT
I can't release anything in a supported fashion.  If you would like, I can apply
the patch I made for 4.1.3-120 to the 4.1.3-130 RPM.  Most likely, the solution
will be the reentrant syslog implementation that is being developed upstream.  I
am targetting U6 for that bug fix.  Can you wait that long?
Comment 30 Sev Binello 2005-06-07 10:55:48 EDT
Any idea of a time frame for U6 ?
Is there an alternative to waiting ?
If not, then I guess trying the latest version 
with the patch applied makes some sense.
Comment 31 Jeffrey Moyer 2005-06-07 11:09:08 EDT
If you have a support contract with Red Hat, then the proper method for getting
this resolved more quickly is to go through Issue Tracker (or through your TAM).

I'll put together an RPM for you with the patch listed in this bugzilla.  This,
unfortunately, will not be a supported RPM.  What that means is that before you
report any bugs on autofs, you'll have to reproduce them on an unpatched
Comment 32 Sev Binello 2005-06-07 11:26:42 EDT
Ok thanks.
Comment 33 Sev Binello 2005-06-07 11:37:56 EDT
Not faimiliar with issue tracker.
How do I get there ?
Comment 34 Jeffrey Moyer 2005-06-07 12:21:17 EDT
Comment 41 Jeffrey Moyer 2005-06-08 13:02:18 EDT
*** Bug 154224 has been marked as a duplicate of this bug. ***
Comment 51 Jeffrey Moyer 2005-07-12 11:30:58 EDT
A fix for this was built into autofs version 4.1.3-138.
Comment 58 Red Hat Bugzilla 2005-09-28 15:10:33 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.