Bug 242976 - BUG at exec.c:1324 while coredumping multithreaded app
BUG at exec.c:1324 while coredumping multithreaded app
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.8
All Linux
urgent Severity high
: ---
: ---
Assigned To: Dave Anderson
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-06-06 15:11 EDT by Bryn M. Reeves
Modified: 2007-11-16 20:14 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-06-15 06:33:58 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Comment 4 Dave Anderson 2007-06-07 17:03:44 EDT

12118 -- do_coredump() called first:
  ...
  down_write(mm->mmap_sem) -> successful
  is_dumpable() true because:
    (shared) mm->dumpable was 1 at that point 
    task->dumpable still is 1 (for all pe tasks)
  mm->dumpable set to 0
  ...
  coredump_wait() called:
    mm->core_waiters += 1
    zap_threads() sends SIGKILL to all "pe" tasks
      and bumps mm->core_waiters for each one
    mm->core_waiters - 1 > 0, so up_write(mm->mmap_sem)
      and calls wait_for_completion()

12119 do_coredump() called after 12118 gets there:
  ...
  down_write(mm->mmap_sem) blocks on 12118
  < 12118 does the up_write() in coredump_wait() > 
  is_dumpable() fails because mm->dumpable is (now) 0:

        if (!is_dumpable(current))
        {
                if(!core_setuid_ok || !current->task_dumpable) {
                        up_write(&mm->mmap_sem);
                        goto fail;
                }
                current->fsuid = 0;
        }
  
  core_setuid_ok is 1, and current->task_dumpable is 1, so it
  does *not* goto fail, and proceeds headlong into the BUG() in
  coredump_wait().

I'm probably missing something, but I don't see how this could
*ever* work properly if two threads are in do_coredump(), i.e.,
such that the second one is not subject to the SIGKILL from
the zap_threads() call?  It makes no sense to let the second
one continue under any circumstances.

Maybe most customers leave /proc/sys/kernel/core_setuid_ok at
its default setting of 0?





Comment 5 Dave Anderson 2007-06-07 17:05:50 EDT
It would seem that maybe a check for whether mm->core_startup_done
has been initialized could be added to the "goto fail" check?
Comment 6 Dave Anderson 2007-06-07 17:12:28 EDT
Ernie -- what do you think about that (comment #5), i.e.,

        if (!is_dumpable(current))
        {
                if(!core_setuid_ok || !current->task_dumpable ||
                    mm->core_startup_done) {
                        up_write(&mm->mmap_sem);
                        goto fail;
                }
                current->fsuid = 0;
        }

I suppose the mm->core_waiters could also be checked?  It would
have to be at least 1 due to the zap_threads() call against the
running thread.
Comment 7 Ernie Petrides 2007-06-07 18:33:43 EDT
Dave, I agree that you and Bryn have identified the crux of the bug.  But
I think the synchronization of core dump initiation is (should be) via the
mm->dumpable value.  In other words, the 2nd "if" statement should be:

        if (!core_setuid_ok || !mm->dumpable) {


Please note that I would NAK this fix for a post-U9 security erratum.

This bug only occurs when core_setuid_ok is set, which should never be the
case on a production system.  I believe its non-zero setting should only be
used by software developers doing debugging (possibly on system daemons or
special utilities).

Since the customer has an easy work-around, which is the recommended mode of
operation, the severity of the issue tracker should be down-graded.

I believe this bug will be automatically closed next week.
Comment 9 Bryn M. Reeves 2007-06-15 06:33:58 EDT
I'm closing this as WONTFIX since we have a root cause & there is no plan to
address this in RHEL3 since a straightforward workaround exists.


Note You need to log in before you can comment on or make changes to this bug.