Bug 504263 (CVE-2009-1388) - CVE-2009-1388 kernel: do_coredump() vs ptrace_start() deadlock
Summary: CVE-2009-1388 kernel: do_coredump() vs ptrace_start() deadlock
Alias: CVE-2009-1388
Product: Security Response
Classification: Other
Component: vulnerability
Version: unspecified
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Red Hat Product Security
QA Contact:
Depends On: 504157 504265
TreeView+ depends on / blocked
Reported: 2009-06-05 08:30 UTC by Eugene Teo (Security Response)
Modified: 2021-11-12 19:57 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2021-10-19 09:07:27 UTC

Attachments (Terms of Use)
Proposed patch from OpenVZ (2.76 KB, patch)
2009-06-05 08:35 UTC, Eugene Teo (Security Response)
no flags Details | Diff
fix do_coredump() vs ptrace_start() deadlock (458 bytes, patch)
2009-06-06 05:25 UTC, Oleg Nesterov
no flags Details | Diff

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1193 0 normal SHIPPED_LIVE Important: kernel security and bug fix update 2009-08-04 13:15:15 UTC

Description Eugene Teo (Security Response) 2009-06-05 08:30:13 UTC
Description of problem:
OpenVZ linux kernel team has found deadlock between ptrace and coredump code,
no root privileges required.

2.6.18-128.1.10.el5 affected, exploit in attach.

    --- SysRq-T trace:

    expl_zap3     R ffff81003fa6cd70     0  8645  10409  8646              
     0000000000000068 ffffffff8006358b ffff81007e13f980 0000000000000000
     ffff81003fa6cd70 ffff81003fa6cd70 0000013e867d5d52 0000027bcf133f75
     ffff81003fa6cf80 ffffffff8046b780 ffffffff802fa680 ffff81003d0f1ec8
    Call Trace:
     [<ffffffff8006358b>] __sched_text_start+0x11b/0xfcb
     [<ffffffff800863c2>] task_rq_lock+0x26/0x45
     [<ffffffff80088be2>] sys_sched_yield+0xb1/0xb8
     [<ffffffff800c8fa8>] ptrace_start+0x3bd/0x465
     [<ffffffff80029223>] do_wait+0xafd/0xb99
     [<ffffffff800c9cfb>] sys_ptrace+0x48/0x1f7
     [<ffffffff80060477>] ptregscall_common+0x67/0xac
     [<ffffffff80060166>] system_call+0x7e/0x83

    expl_zap3     D ffff81007ff395c0     0  8647   8645                8646
     ffff81007f6abc58 0000000000000086 0000000000000006 ffff81007b944c30
     ffff81007ff395c0 ffff81007ff33500 0000004ed2555339 0000009d55ffa665
     ffff81007ff397c8 ffffffff8046b780 0000004ed2554f5e ffff81007ff33500
    Call Trace:
     [<ffffffff80086900>] __activate_task+0x92/0x157
     [<ffffffff800492ac>] try_to_wake_up+0x3ce/0x3e0
     [<ffffffff80064691>] wait_for_completion+0x79/0xa2
     [<ffffffff80088417>] default_wake_function+0x0/0xe
     [<ffffffff800ecbb5>] do_coredump+0x341/0x8a1
     [<ffffffff800a32e5>] ub_slab_uncharge+0xd0/0xdb
     [<ffffffff800a3493>] do_ub_siginfo_uncharge+0x42/0x55
     [<ffffffff80096391>] recalc_sigpending+0xe/0x25
     [<ffffffff8002c00a>] get_signal_to_deliver+0x434/0x46b
     [<ffffffff8005d8f6>] do_notify_resume+0xd0/0x7e3
     [<ffffffff80096f63>] __group_send_sig_info+0x89/0x94
     [<ffffffff8005d26a>] group_send_sig_info+0x76/0x83
     [<ffffffff80088a0d>] vcpu_put+0x8e/0x16e
     [<ffffffff8004ea1d>] sys_kill+0x19f/0x1b2
     [<ffffffff80088a0d>] vcpu_put+0x8e/0x16e
     [<ffffffff800601ef>] sysret_signal+0x1c/0x27
     [<ffffffff80060477>] ptregscall_common+0x67/0xac

Comment 1 Eugene Teo (Security Response) 2009-06-05 08:35:21 UTC
Created attachment 346615 [details]
Proposed patch from OpenVZ

Comment 7 Oleg Nesterov 2009-06-06 05:25:56 UTC
Created attachment 346742 [details]
fix do_coredump() vs ptrace_start() deadlock

This is not as simple as I thought...

I suspect the patch from openvz is not exactly right.

PF_SIGNALED is always set when the thread is killed by the fatal
signal, if we check this flag in ptrace_start() I'm afraid we can
have a false positive when the exiting tracee calls

Also. There is no guarantee the tracee must have PF_SIGNALED when
we are going to deadlock. Suppose the tracee just exits and sleeps
in TASK_TRACED because of PTRACE_EVENT_EXIT. The tracer calls
ptrace_start(). After that another thread which shares the same
->mm starts the coredump. zap_process() wakes up the tracee, it
calls exit_mm()->wait_for_completion() and sleeps in D state but
without PF_SIGNALED.

We could add the SIGNAL_GROUP_EXIT check:

        --- a/kernel/ptrace.c
        +++ b/kernel/ptrace.c
        @@ -933,7 +933,8 @@ ptrace_start(long pid, long request,
                while (child->state != TASK_TRACED && child->state != TASK_STOPPED) {
        -               if (child->exit_state) {
        +               if (child->exit_state ||
        +                  (current->signal->flags & SIGNAL_GROUP_EXIT)) {
                                goto out_tsk;

If we race with the coredumping thread which shares the same ->mm,
the tracer should be killed by SIGKILL too. In that case we can just
return, the error code does not matter because we will never return
to the user-space.

(This patch could also help if the rt tracer preempts the tracee, we
 can spin forever in this case. At least, with this check we can kill
 the tracer. However, without fixing wait_task_inactive() this doesn't
 really help).

The patch above should fix this deadlock, but unfortunately it does
not solve all problems. Note that with this test-case the tracer,
tracee, and the coredumping thread share the same ->mm. This is
because it was written originally to exploit another problem fixed
by 5ecfbae093f0c37311e89b29bfc0c9d586eace87.

But what if the tracer does not participate in coredumping? In that
case the tracer is not killed, and we still have problems. The tracer
will spin until the coredump completes. So I'd suggest this patch.

Untested, not even compiled. Just for review/discussion.

Also. Upstream checks mm->core_state in may_ptrace_stop() to
prevent another deadlock (and it is still needed afaics, despite
the fact schedule() now checks signal_pending_state()). I wonder
if RHEL needs something like this check. utrace_quiescent() checks
sigkill_pending(), but if SIGKILL was already dequeued it can
return false. This means that if the tracer, tracee, and the
coredumping thread share the same ->mm we can deadlock. Fortunately,
sigkill_pending() == F also means we can send the private SIGKILL
to the tracee and wake it up, but I guess this won't be obvious
to admin.

Comment 17 errata-xmlrpc 2009-08-04 13:15:44 UTC
This issue has been addressed in following products:

  Red Hat Enterprise Linux 5

Via RHSA-2009:1193 https://rhn.redhat.com/errata/RHSA-2009-1193.html

Note You need to log in before you can comment on or make changes to this bug.