Red Hat Bugzilla – Bug 132162
NFS intr flag prevents core dumps
Last modified: 2007-11-30 17:07:04 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3)
Description of problem:
A process having it's working directory on an NFS filesystem mounted
with the 'intr' flag receiving a SIGSEGV will produce no core dump
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. mount -o 'rw,proto=tcp,intr' filadelfia.carmen.se:/raidtest /mnt/test
2. cd /mnt/test
3. cat > foo&
4. killall -11 cat
5. Note that no core dump was produced
6. Now try again, this time without the 'intr' flag and there will be
a nice core file waiting for you
I could replicate the problem, and discussed the issue internally. It
is the normal behaviour.
Because the program is stuck in an interruptible read, when it
receives the SEGV signal, it will disconnect from the NFS server, and
thus will not write on the NFS server the core dump. nointr is
necessary for most Unix systems to be able to write core dump. Some
OSes like HP-UX can't even have a debugger attached to a program
running on an "intr" NFS mount.
I'm re-opening this since LLNL has reported the same problem. We
disagree with what Bastien posted above. This behavior has changed
between RHEL2.1 and RHEL3. Under RHEL2.1 I was able to produce a core
file when using the intr option. However w/ RHEL3 and RHEL4 it
Furthermore LLNL reports that upstream 2.4.20, 2.4.27, and Debian
2.4.27 kernels do not have this behavior.
I am not sure if is it related, but in our configuration
of dual opterons, a core dump in a nfs directory generates
a kernel panic, and the machine hangs.
This is under RHEL3.
By now, we have the core dump generation disabled.
Could you please post that panic backtrace?
Steve, can we clarify whether or not the original described behavior is a bug?
Example (on RHEL3-U3):
1. mount the same nfs volume in two places on the same machine
(one w/ 'intr', the other without)
2. enable core dumps
3. in each of the mountpoints, do the following:
sleep 20 &
killall -11 sleep
date # this is just to show the death of the backgrounded sleep
What you'll find is that the sequence yields a core file when run in the nointr
mount, but not in the intr mount. Is this not a bug?
Well if "ulimit -c" says cores should be dropped and
they are not, then its probably a bug....
Created attachment 108550 [details]
The patch sets the process flag, PF_DUMPCORE, in do_coredump
which signals the RPC that a core is being written. Since
'intr' has been set and the process has been signaled(), we don't
want to get hung up (by a dead server) writing the core out, so there
is a limit (3) on how many retries.
LLNL reports that this patch allows core dumps to be generated, however they are
I tried this patch and it fixes the problem with filp_open().
However, the corefile produced in NFS is now always truncated, and therefore
# toad13 /usr/local/tmp > sleep 20
Quit (core dumped)
# toad13 /usr/local/tmp > ls -hl *sleep*core
-rw------- 1 root root 677 Dec 17 10:54 toad13-sleep-1100.core
# toad13 /usr/local/tmp > cd /tmp
# toad13 /tmp > sleep 20
Quit (core dumped)
# toad13 /tmp > ls -hl *sleep*core
-rw------- 1 root root 248K Dec 17 10:55 toad13-sleep-1104.core
Yes, I also see the size difference which is strange, but using
ethereal to exam the NFS traffic (basically count the writes) it
appears NFS is writing correct amount of data and gdb was able
to use the core file.
Question: is the smaller core file usable by gdb?
No, the core files are not valid. Here's what GDB says:
[dave@sideshowbob nfstest]$ gdb xload core.2932
GNU gdb Red Hat Linux (6.1post-1.20040607.52rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
welcome to change it and/or distribute copies of it under certain
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
This GDB was configured as "i386-redhat-linux-gnu"...(no debugging
symbols found)...Using host libthread_db library
"/mnt/nfstest/core.2932" is not a core dump: File format not recognized
So how are you causing xload to drop core?
here's what I did:
1. xload &
2. kill -s sigabrt <pid>
and here's another way that LLNL was testing this (from the NFS
1. sleep 20 &
2. killall -11 sleep
hmm.... I wonder if something up above is noticing the
signaled flag is set and not completing the write.....
Created attachment 109435 [details]
There was one place in the NFS code (nfs_wait_event() to
be exact) that needed to check the PF_DUMPCORE bit.
Steve - I just tested w/ this updated patch and was able to drop a
valid core file. I've asked LLNL to provide confirmation as well, but
from my quick test this seems to do the trick. Thanks much!
A fix for this problem has just been committed to the RHEL3 U5
patch pool this afternoon (in kernel version 2.4.21-27.9.EL).
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.