Bug 242976 - BUG at exec.c:1324 while coredumping multithreaded app
Summary: BUG at exec.c:1324 while coredumping multithreaded app
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel   
(Show other bugs)
Version: 3.8
Hardware: All
OS: Linux
urgent
high
Target Milestone: ---
Assignee: Dave Anderson
QA Contact: Martin Jenner
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-06-06 19:11 UTC by Bryn M. Reeves
Modified: 2007-11-17 01:14 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-06-15 10:33:58 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

Comment 4 Dave Anderson 2007-06-07 21:03:44 UTC

12118 -- do_coredump() called first:
  ...
  down_write(mm->mmap_sem) -> successful
  is_dumpable() true because:
    (shared) mm->dumpable was 1 at that point 
    task->dumpable still is 1 (for all pe tasks)
  mm->dumpable set to 0
  ...
  coredump_wait() called:
    mm->core_waiters += 1
    zap_threads() sends SIGKILL to all "pe" tasks
      and bumps mm->core_waiters for each one
    mm->core_waiters - 1 > 0, so up_write(mm->mmap_sem)
      and calls wait_for_completion()

12119 do_coredump() called after 12118 gets there:
  ...
  down_write(mm->mmap_sem) blocks on 12118
  < 12118 does the up_write() in coredump_wait() > 
  is_dumpable() fails because mm->dumpable is (now) 0:

        if (!is_dumpable(current))
        {
                if(!core_setuid_ok || !current->task_dumpable) {
                        up_write(&mm->mmap_sem);
                        goto fail;
                }
                current->fsuid = 0;
        }
  
  core_setuid_ok is 1, and current->task_dumpable is 1, so it
  does *not* goto fail, and proceeds headlong into the BUG() in
  coredump_wait().

I'm probably missing something, but I don't see how this could
*ever* work properly if two threads are in do_coredump(), i.e.,
such that the second one is not subject to the SIGKILL from
the zap_threads() call?  It makes no sense to let the second
one continue under any circumstances.

Maybe most customers leave /proc/sys/kernel/core_setuid_ok at
its default setting of 0?







Comment 5 Dave Anderson 2007-06-07 21:05:50 UTC
It would seem that maybe a check for whether mm->core_startup_done
has been initialized could be added to the "goto fail" check?

Comment 6 Dave Anderson 2007-06-07 21:12:28 UTC
Ernie -- what do you think about that (comment #5), i.e.,

        if (!is_dumpable(current))
        {
                if(!core_setuid_ok || !current->task_dumpable ||
                    mm->core_startup_done) {
                        up_write(&mm->mmap_sem);
                        goto fail;
                }
                current->fsuid = 0;
        }

I suppose the mm->core_waiters could also be checked?  It would
have to be at least 1 due to the zap_threads() call against the
running thread.


Comment 7 Ernie Petrides 2007-06-07 22:33:43 UTC
Dave, I agree that you and Bryn have identified the crux of the bug.  But
I think the synchronization of core dump initiation is (should be) via the
mm->dumpable value.  In other words, the 2nd "if" statement should be:

        if (!core_setuid_ok || !mm->dumpable) {


Please note that I would NAK this fix for a post-U9 security erratum.

This bug only occurs when core_setuid_ok is set, which should never be the
case on a production system.  I believe its non-zero setting should only be
used by software developers doing debugging (possibly on system daemons or
special utilities).

Since the customer has an easy work-around, which is the recommended mode of
operation, the severity of the issue tracker should be down-graded.

I believe this bug will be automatically closed next week.


Comment 9 Bryn M. Reeves 2007-06-15 10:33:58 UTC
I'm closing this as WONTFIX since we have a root cause & there is no plan to
address this in RHEL3 since a straightforward workaround exists.




Note You need to log in before you can comment on or make changes to this bug.