Bug 468089 - utrace: PTRACE_O_TRACEVFORKDONE does not work
Summary: utrace: PTRACE_O_TRACEVFORKDONE does not work
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 9
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Roland McGrath
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 468395
TreeView+ depends on / blocked
 
Reported: 2008-10-22 18:17 UTC by Jan Kratochvil
Modified: 2009-07-14 16:44 UTC (History)
4 users (show)

Fixed In Version: 2.6.26.6-79.fc9.i686
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-07-14 16:44:46 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Fix (relative 2.6.26.6-79) (1.17 KB, patch)
2008-11-04 14:01 UTC, Denys Vlasenko
no flags Details | Diff

Description Jan Kratochvil 2008-10-22 18:17:02 UTC
Description of problem:
Unable to debug GDB with GDB.  Probably also some other programs but I hit just this case.

Version-Release number of selected component (if applicable):
kernel-2.6.27-17.fc10.x86_64 BAD
kernel-2.6.27.3-30.rc1.fc10.x86_64 BAD
kernel-2.6.26.6-75.fc9.x86_64 BAD
kernel-2.6.26.5-45.fc9.x86_64 OK
kernel-2.6.25.10-86.fc9.x86_64 OK

How reproducible:
Always.

Steps to Reproduce:
wget -q -O o_tracevforkdone.c http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/o_tracevforkdone.c?cvsroot=systemtap; gcc -o o_tracevforkdone o_tracevforkdone.c -Wall -ggdb2 -D_GNU_SOURCE; ./o_tracevforkdone; echo $?

Actual results:
1

Expected results:
0

Additional info:
Upstream GDB CVS HEAD severly crashes on such kernels:
$ ./gdb -nx -ex r --args ./gdb -nx -ex r true
...
warning: Unexpected waitpid result 00117f when waiting for vfork-done
linux-nat.c:1945: internal-error: unknown ptrace event 5
...

Comment 1 Chuck Ebbert 2008-10-23 05:09:10 UTC
Should be fixed in 2.6.26.6-79, already in updates-testing. And same as bug #464520 in F8 AFAICT...

Comment 2 Chuck Ebbert 2008-10-23 05:11:23 UTC
Actually this is a new bug introduced by the latest utrace update.

Comment 3 Lennart Poettering 2008-10-23 19:38:50 UTC
Please please please with cream on top, give us a working debugger! How are we supposed to fix bugs for F10 if gdb is completely broken?

Comment 4 Jan Kratochvil 2008-10-24 15:11:23 UTC
Verified as fixed also in Rawhide - kernel-2.6.27.3-39.fc10.i686.

Comment 5 Jan Kratochvil 2008-10-25 07:51:25 UTC
The problem is still present in both F9 and F10.
It is just x86_64 specific:
kernel-2.6.27.3-39.fc10.x86_64 BAD
kernel-2.6.26.6-79.fc9.x86_64 BAD

Comment 6 Denys Vlasenko 2008-10-27 14:55:20 UTC
Hmmmm.... it works for me on 2.6.26.6-79.fc9.x86_64

Comment 9 Denys Vlasenko 2008-10-30 18:10:02 UTC
On affected machines, when I run o_tracevforkdone.c, syscall_trace_leave() is not called. Thus after vfork parent does not get notified.

I built instrumented kernel and am comparing how this testcase runs on my laptop and on affected machine.

Good dmesg output:
[3473] utrace_set_events: new:0 old:0
[3473] utrace_set_events: new:0 old:0
[3473] ptrace_report_clone 1
[3473] utrace_set_events: new:80 old:0
[3473] utrace_set_events 2: new:80 old:0
[3473] thread_info->flags:889 TIF_SYSCALL_TRACE:0
[3473] ptrace_report_clone 2: ret:0 engine->flags:1f8d
[3473] A thread_info->flags:889
[3473] B thread_info->flags:80000889
 [3474] sys_exit: thread_info->flags:80
[3473] syscall_trace_leave: 31th bit is set
[3473] syscall_trace_leave: TIF_SYSCALL_TRACE is set
[3473] tracehook_report_syscall_exit 1
[3473] tracehook_report_syscall_exit 2
[3473] utrace_set_events: new:0 old:80
[3473] ptrace_report_syscall_exit 1
[3473] ptrace_report_syscall_exit 2
[3473] utrace_reset: clear TIF_SYSCALL_TRACE
[3473] sys_exit: thread_info->flags:80000080

The interesting things are after we see vfork child exiting. Parent calls syscall_trace_leave() and all is well. Then it exits.

Bad:
[2393] utrace_set_events: new:0 old:0
[2393] utrace_set_events: new:0 old:0
[2393] ptrace_report_clone 1
[2393] utrace_set_events: new:80 old:0
[2393] utrace_set_events 2: new:80 old:0
[2393] thread_info->flags:1 TIF_SYSCALL_TRACE:0
[2393] ptrace_report_clone 2: ret:0 engine->flags:1f8d
[2393] A thread_info->flags:1
[2393] B thread_info->flags:80000001
 [2394] sys_exit: thread_info->flags:0
[2393] sys_exit: thread_info->flags:80000001

syscall_trace_leave was not called. Look at bit 0 (TIF_SYSCALL_TRACE)
still being set in thread_info->flags when parent exits after vfork!
 How entry_64.S managed to not notice it last time it was exiting a syscall (vfork)?!
NB: adding more syscalls still does not make it stop.
Need to look into entry_64.S - how does it miss TIF_SYSCALL_TRACE bit?

Comment 10 Denys Vlasenko 2008-10-31 17:15:04 UTC
More instrumentation revealed the following. On x86_64 in entry_64.S there are two paths out of syscalls - "sysret" path and "iretd" (or "int") path.
"sysret" path fails to check TIF_SYSCALL_TRACE bit in thread_info->flags, but this isn't a bug, because "sysret" path should not be taken in TIF_SYSCALL_TRACE case, we should be in "int" path. Please compare these, and notice that "good" trace has "tracesys" instead of "system_call_after_swapgs" in stack dump. That means that somehow this part of entry_64.S code prior to entry to vfork syscall:

        testl $_TIF_WORK_SYSCALL_ENTRY,threadinfo_flags(%rcx)
        jnz tracesys

did not trigger in "bad" case and we erroneously did not go to tracesys label but went into "sysret" fastpath which then have lost TIF_SYSCALL_TRACE bit.

Good output:
[3482] utrace_set_events: new:0 old:0
[3482] utrace_set_events: new:0 old:0
[3482] ptrace_report_clone 1         
[3482] utrace_set_events: new:80 old:0
[3482] utrace_set_events 2: new:80 old:0
[3482] thread_info->flags:81 TIF_SYSCALL_TRACE:0
[3482] ptrace_report_clone 2: ret:0 engine->flags:1f8d
[3482] A thread_info->flags:81                        
[3482] B thread_info->flags:80000081                  
Pid: 3482, comm: o_tracevforkdon Not tainted 2.6.26.6-vda #14
Call Trace:                                                  
 [<ffffffff8103f15d>] ptrace_report_clone+0x1c0/0x21f        
 [<ffffffff81069db3>] utrace_report_clone+0xa3/0x115         
 [<ffffffff81036644>] do_fork+0x16b/0x254                    
 [<ffffffff8100c377>] ? tracesys+0xd0/0xd5                   
 [<ffffffff8100a5c4>] sys_vfork+0x20/0x22                    
 [<ffffffff8100c5a7>] ptregscall_common+0x67/0xb0            
[3483] sys_exit: thread_info->flags:80                       
In int_very_careful!                                         
[3482] syscall_trace_leave: 31th bit is set                  
[3482] syscall_trace_leave: TIF_SYSCALL_TRACE is set         
[3482] tracehook_report_syscall_exit 1                       
[3482] tracehook_report_syscall_exit 2                       
[3482] utrace_set_events: new:0 old:80                       
[3482] ptrace_report_syscall_exit 1                          
[3482] ptrace_report_syscall_exit 2                          
In int_very_careful!                                         
[3482] utrace_reset: clear TIF_SYSCALL_TRACE                 
[3482] sys_exit: thread_info->flags:80000080                 


Bad:
[2403] utrace_set_events: new:0 old:0
[2403] utrace_set_events: new:0 old:0
[2403] ptrace_report_clone 1         
[2403] utrace_set_events: new:80 old:0
[2403] utrace_set_events 2: new:80 old:0
[2403] thread_info->flags:809 TIF_SYSCALL_TRACE:0
[2403] ptrace_report_clone 2: ret:0 engine->flags:1f8d
[2403] A thread_info->flags:809                       
[2403] B thread_info->flags:80000809                  
Pid: 2403, comm: o_tracevforkdon Not tainted 2.6.26.6-vda #13
Call Trace:                                                  
 [<ffffffff8103f15d>] ptrace_report_clone+0x1c0/0x21f        
 [<ffffffff81069db3>] utrace_report_clone+0xa3/0x115         
 [<ffffffff81036644>] do_fork+0x16b/0x254                    
 [<ffffffff8100c11a>] ? system_call_after_swapgs+0x8a/0x8f
 [<ffffffff8100a5c4>] sys_vfork+0x20/0x22
 [<ffffffff8100c5a7>] ptregscall_common+0x67/0xb0
[2404] sys_exit: thread_info->flags:0
In sysret_check!
In sysret_signal!
[2403] sys_exit: thread_info->flags:80000001



NB: 31th bit of thread_info->flags is set by instrumentation code. Bit 0 is TIF_SYSCALL_TRACE. Other bits vary from run to run and we generally are not interested in those. First sys_exit in the trace (with different PID) belongs to exiting vfork child.

Comment 11 Denys Vlasenko 2008-11-03 18:17:56 UTC
in fork.c in do_fork() lies pre-utrace point where PTRACE_O_TRACEVFORKDONE was handled:

                if (clone_flags & CLONE_VFORK) {
                        freezer_do_not_count();
                        wait_for_completion(&vfork);
                        freezer_count();
==>                     tracehook_report_vfork_done(p, nr);
                }

But it's a nop for utrace kernels. utrace seems to attempt handling it on syscall exit, but fails as described above.

I managed to reproduce the bug on my machine by switching off selinux and syscall auditing.

Attempted another "route of attack" on this bug by making _TIF_SYSCALL_TRACE bit be present in _TIF_WORK_MASK:

/* work to do on interrupt/exception return */
#define _TIF_WORK_MASK                                                  \
        (0x0000FFFF & ~(/*_TIF_SYSCALL_TRACE |*/ _TIF_SYSCALL_AUDIT |   \
                        _TIF_SINGLESTEP | _TIF_SECCOMP | _TIF_SYSCALL_EMU))

This makes "return from syscall" code path notice _TIF_SYSCALL_TRACE bit which was just set by vfork() syscall we return from.

Unfortunately, in this particular case vfork() returns because [recall when vfork() returns...] child has died (not execed), and this makes parent receive SIGCHLD, which collides with, and overrides, PTRACE_O_TRACEVFORKDONE notification - we get SIGCHLD instead of SIGTRAP.

We also race with clearing of PT_VFORKING and if ptrace_report_signal wins, sometimes this WARN_ON triggers:

static u32 ptrace_report_syscall_exit(...)
{
        if (!(engine->flags & UTRACE_EVENT(SYSCALL_ENTRY))) {
                int ret = utrace_set_events(
                        task, engine,
                        engine->flags & ~UTRACE_EVENT(SYSCALL_EXIT));
                WARN_ON(ret);
===>            WARN_ON(!(task->ptrace & PT_VFORKING));
                task->ptrace &= ~PT_VFORKING;

Comment 12 Denys Vlasenko 2008-11-04 14:01:04 UTC
Created attachment 322425 [details]
Fix (relative 2.6.26.6-79)

syscall entry/return code path has two major branches, "syscall/sysret" for the most usual case and "int" branch which is slower but allows for more manipulations with CPU state.

syscall tracing uses "int" branch. "syscall/sysret" branch does not handle syscall entry/exit tracing because it is never entered with syscall tracing on. This includes vfork syscall too. This appears to be by design, because 32-bit version does the same.

But in the case of vfork + PTRACE_O_TRACEVFORKDONE, vfork syscall turns ON syscall tracing bit (_TIF_SYSCALL_TRACE) when it is unblocked.

The bug is that even though bit is set, "syscall/sysret" branch does not handle it, and PTRACE_O_TRACEVFORKDONE is not delivered to tracer.

This patch fixes the issue by checking for _TIF_SYSCALL_TRACE bit in thread flags in sysret_signal part of entry_64.S, and switching to "int" syscall return code path.

Since vfork can be unblocked by child execing, *or exiting*, in the second case we (parent-which-is-being-traced) are getting usual SIGCHLD, and since we are traced, we need to report *both* PTRACE_O_TRACEVFORKDONE and the fact that we got SIGCHLD. It seems that the intention is to report PTRACE_O_TRACEVFORKDONE first, and then (after tracer PTRACE_CONTs or PTRACE_SYSCALLs us) report SIGCHLD.

This second notification is sometimes lost. I ran o_tracevforkdone test in the loop:

while sleep 0.05; do echo -n .; ./o_tracevforkdone || { echo BAD!; exit; }; done

and it loses SIGCHLD in ~1% of cases. It happens here:

      l = ptrace (PTRACE_CONT, child, NULL, NULL);
      assert_perror (errno);
      assert (l == 0);
      errno = 0;
      pid = waitpid (child, &status, 0);
      assert_perror (errno);
      assert (pid == child);
      assert (WIFSTOPPED (status)); <====

I checked status and it indicates that process has exited with exit code 42. Definitely the case of SIGCHLD being lost.

Comment 13 Denys Vlasenko 2008-11-04 15:22:57 UTC
Might be related to bug 469684 "signals not delivered when tracing system calls"

Comment 14 Roland McGrath 2008-11-16 21:51:04 UTC
I modified the o_tracevforkdone test case (in ptrace-tests cvs) so it also tests using PTRACE_SYSCALL on the child to enter the vfork system call.  This means the parent task will already have TIF_SYSCALL_TRACE set.  Both variants of the test fail, which would seem to contradict the whole comment#10 theory.

I don't think there is any evidence of a problem in the system call return path.

The problem I see is that the traced thread doing clone does not stop in as many distinct places as ptrace wants it to.  In utrace, there is no stopping point between the vfork-parent block and the quiesce/signal callback.  In non-vfork clones, there is no stopping point after the clone callback before the quiesce/signal callbacks either.  So ptrace gets the multiple callbacks but does not actually stop (enter TASK_TRACED) until report_signal or report_quiesce returns UTRACE_STOP, so the intermediate values of ->exit_code are missed.
This also explains bug 469684, I think (no stop between syscall-exit and quiesce).

I'll need to rework the ptrace code to handle the multiple stops differently.

Comment 15 Denys Vlasenko 2008-11-17 18:31:21 UTC
> I don't think there is any evidence of a problem in the system call return
path.

I observed that bug doesn't manifest itself when syscall audit is on (normal state of Fedora 9 with selinux enabled). And the difference is that with syscall audit we always go through "int" path in entry_64.S

This is how we fail to catch TIF_SYSCALL_TRACE on exit from vfork:


entry_64.S
ENTRY(system_call)
        SWAPGS_UNSAFE_STACK
ENTRY(system_call_after_swapgs)
        movq    %rsp,%gs:pda_oldrsp
        movq    %gs:pda_kernelstack,%rsp
        ENABLE_INTERRUPTS(CLBR_NONE)
        SAVE_ARGS 8,1
        movq  %rax,ORIG_RAX-ARGOFFSET(%rsp)
        movq  %rcx,RIP-ARGOFFSET(%rsp)
        GET_THREAD_INFO(%rcx)
        testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%rcx)
        jnz tracesys

Above: 
#define _TIF_WORK_SYSCALL_ENTRY \
        (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | \
         _TIF_SYSCALL_AUDIT | _TIF_SECCOMP | _TIF_SINGLESTEP)
and if SYSCALL_AUDIT is on, we jump to tracesys, and eventually go thru "int" path, where vforkdone notification will be caught. (I do not know why same does not work with SYSCALL_TRACE, this is not what I am talking about here). Let's see what will happen if we did not jump:

system_call_fastpath:
        cmpq $__NR_syscall_max,%rax
        ja badsys
        movq %r10,%rcx
        call *sys_call_table(,%rax,8)  # XXX:    rip relative

Here we do vfork. When we return, SYSCALL_TRACE is set by vfork. Now let's see whether we will notice this:

        movq %rax,RAX-ARGOFFSET(%rsp)
ret_from_sys_call:
        movl $_TIF_ALLWORK_MASK,%edi
sysret_check:
        LOCKDEP_SYS_EXIT
        GET_THREAD_INFO(%rcx)
        DISABLE_INTERRUPTS(CLBR_NONE)
        TRACE_IRQS_OFF
        movl TI_flags(%rcx),%edx
        andl %edi,%edx
        jnz  sysret_careful

TIF_ALLWORK_MASK contain all bits including SYSCALL_TRACE, we jump here

sysret_careful:
        bt $TIF_NEED_RESCHED,%edx
        jnc sysret_signal

Let's assume NEED_RESCHED is not set and we jump, skipping these instructions:

        TRACE_IRQS_ON
        ENABLE_INTERRUPTS(CLBR_NONE)
        pushq %rdi
        call schedule
        popq  %rdi
        jmp sysret_check

sysret_signal:
        TRACE_IRQS_ON
        ENABLE_INTERRUPTS(CLBR_NONE)
#ifdef CONFIG_AUDITSYSCALL
        bt $TIF_SYSCALL_AUDIT,%edx
        jc sysret_audit

We do not jump here

#endif
        leaq do_notify_resume(%rip),%rax
        leaq -ARGOFFSET(%rsp),%rdi # &pt_regs -> arg1
        xorl %esi,%esi # oldset -> arg2
        call ptregscall_common
        movl $_TIF_WORK_MASK,%edi   <=========== NB!
        DISABLE_INTERRUPTS(CLBR_NONE)
        TRACE_IRQS_OFF
        jmp int_with_check

Obviously we jump here. NB: %edi == TIF_WORK_MASK, which is:
#define _TIF_WORK_MASK                                                  \
        (0x0000FFFF &                                                   \
         ~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|                       \
           _TIF_SINGLESTEP|_TIF_SECCOMP|_TIF_SYSCALL_EMU))
It does not contain SYSCALL_TRACE bit.
Ok, now we are here:

int_with_check:
        LOCKDEP_SYS_EXIT_IRQ
        GET_THREAD_INFO(%rcx)
        movl TI_flags(%rcx),%edx
        andl %edi,%edx
        jnz   int_careful

Here we test thread flags against %edi, *and %edi does not have SYSCALL_TRACE bit*. So we do not jump to int_careful (and that is the place where we eventually would "testl $_TIF_WORK_SYSCALL_EXIT,%edx" and catch SYSCALL_TRACE). Instead, we continue:

        andl    $~TS_COMPAT,TI_status(%rcx)
        jmp   retint_swapgs
...
retint_swapgs:          /* return to user-space */
        DISABLE_INTERRUPTS(CLBR_ANY)
        TRACE_IRQS_IRETQ
        SWAPGS
        jmp restore_args
...
restore_args:
        RESTORE_ARGS 0,8,0
irq_return:
        INTERRUPT_RETURN


Bingo. SYSCALL_TRACE was not caught.

The patch in comment #12 prevents this, and I did run-test it.

However, I do not insist on my fix being correct. Maybe it's better to fix it another way (say, make vfork not abuse SYSCALL_TRACE?)

I merely insist that in current code SYSCALL_TRACE is indeed not caught. :)

Comment 16 Denys Vlasenko 2009-01-14 13:48:22 UTC
kernel-2.6.29-0.28.rc1.fc11.x86_64 appears to have it fixed. I ran a looping test of o_tracevforkdone testcase and it does not fail.

Comment 17 Bug Zapper 2009-06-10 03:03:12 UTC
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 18 Bug Zapper 2009-07-14 16:44:46 UTC
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.