Bug 504263 (CVE-2009-1388)

Summary: CVE-2009-1388 kernel: do_coredump() vs ptrace_start() deadlock
Product: [Other] Security Response Reporter: Eugene Teo (Security Response) <eteo>
Component: vulnerabilityAssignee: Red Hat Product Security <security-response-team>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: atangrin, dhoward, khlebnikov, khorenko, onestero, rkhan, security-response-team
Target Milestone: ---Keywords: Security
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-19 09:07:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 504157, 504265    
Bug Blocks:    
Attachments:
Description Flags
Proposed patch from OpenVZ
none
fix do_coredump() vs ptrace_start() deadlock none

Description Eugene Teo (Security Response) 2009-06-05 08:30:13 UTC
Description of problem:
OpenVZ linux kernel team has found deadlock between ptrace and coredump code,
no root privileges required.

2.6.18-128.1.10.el5 affected, exploit in attach.

    --- SysRq-T trace:

    expl_zap3     R ffff81003fa6cd70     0  8645  10409  8646              
(NOTLB)
     0000000000000068 ffffffff8006358b ffff81007e13f980 0000000000000000
     ffff81003fa6cd70 ffff81003fa6cd70 0000013e867d5d52 0000027bcf133f75
     ffff81003fa6cf80 ffffffff8046b780 ffffffff802fa680 ffff81003d0f1ec8
    Call Trace:
     [<ffffffff8006358b>] __sched_text_start+0x11b/0xfcb
     [<ffffffff800863c2>] task_rq_lock+0x26/0x45
     [<ffffffff80088be2>] sys_sched_yield+0xb1/0xb8
     [<ffffffff800c8fa8>] ptrace_start+0x3bd/0x465
     [<ffffffff80029223>] do_wait+0xafd/0xb99
     [<ffffffff800c9cfb>] sys_ptrace+0x48/0x1f7
     [<ffffffff80060477>] ptregscall_common+0x67/0xac
     [<ffffffff80060166>] system_call+0x7e/0x83

    expl_zap3     D ffff81007ff395c0     0  8647   8645                8646
(NOTLB)
     ffff81007f6abc58 0000000000000086 0000000000000006 ffff81007b944c30
     ffff81007ff395c0 ffff81007ff33500 0000004ed2555339 0000009d55ffa665
     ffff81007ff397c8 ffffffff8046b780 0000004ed2554f5e ffff81007ff33500
    Call Trace:
     [<ffffffff80086900>] __activate_task+0x92/0x157
     [<ffffffff800492ac>] try_to_wake_up+0x3ce/0x3e0
     [<ffffffff80064691>] wait_for_completion+0x79/0xa2
     [<ffffffff80088417>] default_wake_function+0x0/0xe
     [<ffffffff800ecbb5>] do_coredump+0x341/0x8a1
     [<ffffffff800a32e5>] ub_slab_uncharge+0xd0/0xdb
     [<ffffffff800a3493>] do_ub_siginfo_uncharge+0x42/0x55
     [<ffffffff80096391>] recalc_sigpending+0xe/0x25
     [<ffffffff8002c00a>] get_signal_to_deliver+0x434/0x46b
     [<ffffffff8005d8f6>] do_notify_resume+0xd0/0x7e3
     [<ffffffff80096f63>] __group_send_sig_info+0x89/0x94
     [<ffffffff8005d26a>] group_send_sig_info+0x76/0x83
     [<ffffffff80088a0d>] vcpu_put+0x8e/0x16e
     [<ffffffff8004ea1d>] sys_kill+0x19f/0x1b2
     [<ffffffff80088a0d>] vcpu_put+0x8e/0x16e
     [<ffffffff800601ef>] sysret_signal+0x1c/0x27
     [<ffffffff80060477>] ptregscall_common+0x67/0xac

Comment 1 Eugene Teo (Security Response) 2009-06-05 08:35:21 UTC
Created attachment 346615 [details]
Proposed patch from OpenVZ

Comment 7 Oleg Nesterov 2009-06-06 05:25:56 UTC
Created attachment 346742 [details]
fix do_coredump() vs ptrace_start() deadlock

This is not as simple as I thought...

I suspect the patch from openvz is not exactly right.

PF_SIGNALED is always set when the thread is killed by the fatal
signal, if we check this flag in ptrace_start() I'm afraid we can
have a false positive when the exiting tracee calls
tracehook_report_exit().

Also. There is no guarantee the tracee must have PF_SIGNALED when
we are going to deadlock. Suppose the tracee just exits and sleeps
in TASK_TRACED because of PTRACE_EVENT_EXIT. The tracer calls
ptrace_start(). After that another thread which shares the same
->mm starts the coredump. zap_process() wakes up the tracee, it
calls exit_mm()->wait_for_completion() and sleeps in D state but
without PF_SIGNALED.


We could add the SIGNAL_GROUP_EXIT check:

        --- a/kernel/ptrace.c
        +++ b/kernel/ptrace.c
        @@ -933,7 +933,8 @@ ptrace_start(long pid, long request,
                 */
                wait_task_inactive(child);
                while (child->state != TASK_TRACED && child->state != TASK_STOPPED) {
        -               if (child->exit_state) {
        +               if (child->exit_state ||
        +                  (current->signal->flags & SIGNAL_GROUP_EXIT)) {
                                __ptrace_state_free(state);
                                goto out_tsk;
                        }

If we race with the coredumping thread which shares the same ->mm,
the tracer should be killed by SIGKILL too. In that case we can just
return, the error code does not matter because we will never return
to the user-space.

(This patch could also help if the rt tracer preempts the tracee, we
 can spin forever in this case. At least, with this check we can kill
 the tracer. However, without fixing wait_task_inactive() this doesn't
 really help).

The patch above should fix this deadlock, but unfortunately it does
not solve all problems. Note that with this test-case the tracer,
tracee, and the coredumping thread share the same ->mm. This is
because it was written originally to exploit another problem fixed
by 5ecfbae093f0c37311e89b29bfc0c9d586eace87.

But what if the tracer does not participate in coredumping? In that
case the tracer is not killed, and we still have problems. The tracer
will spin until the coredump completes. So I'd suggest this patch.

Untested, not even compiled. Just for review/discussion.


Also. Upstream checks mm->core_state in may_ptrace_stop() to
prevent another deadlock (and it is still needed afaics, despite
the fact schedule() now checks signal_pending_state()). I wonder
if RHEL needs something like this check. utrace_quiescent() checks
sigkill_pending(), but if SIGKILL was already dequeued it can
return false. This means that if the tracer, tracee, and the
coredumping thread share the same ->mm we can deadlock. Fortunately,
sigkill_pending() == F also means we can send the private SIGKILL
to the tracee and wake it up, but I guess this won't be obvious
to admin.

Comment 17 errata-xmlrpc 2009-08-04 13:15:44 UTC
This issue has been addressed in following products:

  Red Hat Enterprise Linux 5

Via RHSA-2009:1193 https://rhn.redhat.com/errata/RHSA-2009-1193.html