Bug 132162 - NFS intr flag prevents core dumps
NFS intr flag prevents core dumps
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Steve Dickson
Depends On:
  Show dependency treegraph
Reported: 2004-09-09 10:07 EDT by David Juran
Modified: 2007-11-30 17:07 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-05-18 09:27:57 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Proposed patch (1.62 KB, patch)
2004-12-14 14:11 EST, Steve Dickson
no flags Details | Diff
Updated patch (4.05 KB, patch)
2005-01-06 13:36 EST, Steve Dickson
no flags Details | Diff

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:294 normal SHIPPED_LIVE Moderate: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 5 2005-05-18 00:00:00 EDT

  None (edit)
Description David Juran 2004-09-09 10:07:06 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3)

Description of problem:
A process having it's working directory on an NFS filesystem mounted
with the 'intr' flag receiving a SIGSEGV will produce no core dump

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. mount -o 'rw,proto=tcp,intr' filadelfia.carmen.se:/raidtest /mnt/test
2. cd /mnt/test
3. cat > foo&
4. killall -11 cat
5. Note that no core dump was produced
6. Now try again, this time without the 'intr' flag and there will be
a nice core file waiting for you 

Additional info:
Comment 1 Bastien Nocera 2004-10-01 07:30:22 EDT
I could replicate the problem, and discussed the issue internally. It
is the normal behaviour.
Because the program is stuck in an interruptible read, when it
receives the SEGV signal, it will disconnect from the NFS server, and
thus will not write on the NFS server the core dump. nointr is
necessary for most Unix systems to be able to write core dump. Some
OSes like HP-UX can't even have a debugger attached to a program
running on an "intr" NFS mount.
Comment 3 Dave Maley 2004-11-30 15:38:32 EST
I'm re-opening this since LLNL has reported the same problem.  We
disagree with what Bastien posted above.  This behavior has changed
between RHEL2.1 and RHEL3.  Under RHEL2.1 I was able to produce a core
file when using the intr option.  However w/ RHEL3 and RHEL4 it
doesn't work.[

Furthermore LLNL reports that upstream 2.4.20, 2.4.27, and Debian
2.4.27 kernels do not have this behavior.
Comment 4 Marcelo Matus 2004-12-02 19:34:48 EST
I am not sure if is it related, but in our configuration
of dual opterons, a core dump in a nfs directory generates
a kernel panic, and the machine hangs.

This is under RHEL3.

By now, we have the core dump generation disabled.
Comment 5 Steve Dickson 2004-12-03 06:28:43 EST
Could you please post that panic backtrace?
Comment 6 David Lehman 2004-12-06 17:01:19 EST
Steve, can we clarify whether or not the original described behavior is a bug?

Example (on RHEL3-U3):

 1. mount the same nfs volume in two places on the same machine 
    (one w/ 'intr', the other without)
 2. enable core dumps
 3. in each of the mountpoints, do the following:

        sleep 20 &
        killall -11 sleep
        date # this is just to show the death of the backgrounded sleep

What you'll find is that the sequence yields a core file when run in the nointr
mount, but not in the intr mount. Is this not a bug?
Comment 7 Steve Dickson 2004-12-06 18:07:32 EST
Well if "ulimit -c" says cores should be dropped and 
they are not, then its probably a bug....  
Comment 8 Steve Dickson 2004-12-14 14:11:55 EST
Created attachment 108550 [details]
Proposed patch

The patch sets the process flag, PF_DUMPCORE, in do_coredump
which signals the RPC that a core is being written. Since
'intr' has been set and the process has been signaled(), we don't 
want to get hung up (by a dead server) writing the core out, so there 
is a limit (3) on how many retries.
Comment 9 Dave Maley 2004-12-17 14:10:57 EST
LLNL reports that this patch allows core dumps to be generated, however they are
always truncated:

I tried this patch and it fixes the problem with filp_open().
However, the corefile produced in NFS is now always truncated, and therefore

# toad13 /usr/local/tmp > sleep 20
Quit (core dumped)
# toad13 /usr/local/tmp > ls -hl *sleep*core
-rw-------    1 root     root          677 Dec 17 10:54 toad13-sleep-1100.core
# toad13 /usr/local/tmp > cd /tmp
# toad13 /tmp > sleep 20
Quit (core dumped)
# toad13 /tmp > ls -hl *sleep*core
-rw-------    1 root     root         248K Dec 17 10:55 toad13-sleep-1104.core
Comment 10 Steve Dickson 2005-01-03 10:45:55 EST
Yes, I also see the size difference which is strange, but using 
ethereal to exam the NFS traffic (basically count the writes) it 
appears NFS is writing correct amount of data and gdb was able 
to use the core file.

Question: is the smaller core file usable by gdb?
Comment 11 Dave Maley 2005-01-03 16:13:17 EST
No, the core files are not valid.  Here's what GDB says:

[dave@sideshowbob nfstest]$ gdb xload core.2932
GNU gdb Red Hat Linux (6.1post-1.20040607.52rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
This GDB was configured as "i386-redhat-linux-gnu"...(no debugging
symbols found)...Using host libthread_db library

"/mnt/nfstest/core.2932" is not a core dump: File format not recognized
(gdb) q
[dave@sideshowbob nfstest]$
Comment 12 Steve Dickson 2005-01-03 17:06:49 EST
So how are you causing xload to drop core?
Comment 13 Dave Maley 2005-01-03 17:38:30 EST
here's what I did:

1. xload &
2. kill -s sigabrt <pid>

and here's another way that LLNL was testing this (from the NFS
mounted dir):

1. sleep 20 &
2. killall -11 sleep
Comment 14 Steve Dickson 2005-01-04 06:43:55 EST
hmm.... I wonder if something up above is noticing the
signaled flag is set and not completing the write..... 
Comment 16 Steve Dickson 2005-01-06 13:36:08 EST
Created attachment 109435 [details]
Updated patch

There was one place in the NFS code  (nfs_wait_event() to
be exact) that needed to check the PF_DUMPCORE bit.
Comment 17 Dave Maley 2005-01-06 15:24:43 EST
Steve - I just tested w/ this updated patch and was able to drop a
valid core file.  I've asked LLNL to provide confirmation as well, but
from my quick test this seems to do the trick.  Thanks much!
Comment 21 Ernie Petrides 2005-01-25 18:32:09 EST
A fix for this problem has just been committed to the RHEL3 U5
patch pool this afternoon (in kernel version 2.4.21-27.9.EL).
Comment 22 Tim Powers 2005-05-18 09:27:57 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.