Description of problem: Unable to debug GDB with GDB. Probably also some other programs but I hit just this case. Version-Release number of selected component (if applicable): kernel-2.6.27-17.fc10.x86_64 BAD kernel-2.6.27.3-30.rc1.fc10.x86_64 BAD kernel-2.6.26.6-75.fc9.x86_64 BAD kernel-2.6.26.5-45.fc9.x86_64 OK kernel-2.6.25.10-86.fc9.x86_64 OK How reproducible: Always. Steps to Reproduce: wget -q -O o_tracevforkdone.c http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/o_tracevforkdone.c?cvsroot=systemtap; gcc -o o_tracevforkdone o_tracevforkdone.c -Wall -ggdb2 -D_GNU_SOURCE; ./o_tracevforkdone; echo $? Actual results: 1 Expected results: 0 Additional info: Upstream GDB CVS HEAD severly crashes on such kernels: $ ./gdb -nx -ex r --args ./gdb -nx -ex r true ... warning: Unexpected waitpid result 00117f when waiting for vfork-done linux-nat.c:1945: internal-error: unknown ptrace event 5 ...
Should be fixed in 2.6.26.6-79, already in updates-testing. And same as bug #464520 in F8 AFAICT...
Actually this is a new bug introduced by the latest utrace update.
Please please please with cream on top, give us a working debugger! How are we supposed to fix bugs for F10 if gdb is completely broken?
Verified as fixed also in Rawhide - kernel-2.6.27.3-39.fc10.i686.
The problem is still present in both F9 and F10. It is just x86_64 specific: kernel-2.6.27.3-39.fc10.x86_64 BAD kernel-2.6.26.6-79.fc9.x86_64 BAD
Hmmmm.... it works for me on 2.6.26.6-79.fc9.x86_64
On affected machines, when I run o_tracevforkdone.c, syscall_trace_leave() is not called. Thus after vfork parent does not get notified. I built instrumented kernel and am comparing how this testcase runs on my laptop and on affected machine. Good dmesg output: [3473] utrace_set_events: new:0 old:0 [3473] utrace_set_events: new:0 old:0 [3473] ptrace_report_clone 1 [3473] utrace_set_events: new:80 old:0 [3473] utrace_set_events 2: new:80 old:0 [3473] thread_info->flags:889 TIF_SYSCALL_TRACE:0 [3473] ptrace_report_clone 2: ret:0 engine->flags:1f8d [3473] A thread_info->flags:889 [3473] B thread_info->flags:80000889 [3474] sys_exit: thread_info->flags:80 [3473] syscall_trace_leave: 31th bit is set [3473] syscall_trace_leave: TIF_SYSCALL_TRACE is set [3473] tracehook_report_syscall_exit 1 [3473] tracehook_report_syscall_exit 2 [3473] utrace_set_events: new:0 old:80 [3473] ptrace_report_syscall_exit 1 [3473] ptrace_report_syscall_exit 2 [3473] utrace_reset: clear TIF_SYSCALL_TRACE [3473] sys_exit: thread_info->flags:80000080 The interesting things are after we see vfork child exiting. Parent calls syscall_trace_leave() and all is well. Then it exits. Bad: [2393] utrace_set_events: new:0 old:0 [2393] utrace_set_events: new:0 old:0 [2393] ptrace_report_clone 1 [2393] utrace_set_events: new:80 old:0 [2393] utrace_set_events 2: new:80 old:0 [2393] thread_info->flags:1 TIF_SYSCALL_TRACE:0 [2393] ptrace_report_clone 2: ret:0 engine->flags:1f8d [2393] A thread_info->flags:1 [2393] B thread_info->flags:80000001 [2394] sys_exit: thread_info->flags:0 [2393] sys_exit: thread_info->flags:80000001 syscall_trace_leave was not called. Look at bit 0 (TIF_SYSCALL_TRACE) still being set in thread_info->flags when parent exits after vfork! How entry_64.S managed to not notice it last time it was exiting a syscall (vfork)?! NB: adding more syscalls still does not make it stop. Need to look into entry_64.S - how does it miss TIF_SYSCALL_TRACE bit?
More instrumentation revealed the following. On x86_64 in entry_64.S there are two paths out of syscalls - "sysret" path and "iretd" (or "int") path. "sysret" path fails to check TIF_SYSCALL_TRACE bit in thread_info->flags, but this isn't a bug, because "sysret" path should not be taken in TIF_SYSCALL_TRACE case, we should be in "int" path. Please compare these, and notice that "good" trace has "tracesys" instead of "system_call_after_swapgs" in stack dump. That means that somehow this part of entry_64.S code prior to entry to vfork syscall: testl $_TIF_WORK_SYSCALL_ENTRY,threadinfo_flags(%rcx) jnz tracesys did not trigger in "bad" case and we erroneously did not go to tracesys label but went into "sysret" fastpath which then have lost TIF_SYSCALL_TRACE bit. Good output: [3482] utrace_set_events: new:0 old:0 [3482] utrace_set_events: new:0 old:0 [3482] ptrace_report_clone 1 [3482] utrace_set_events: new:80 old:0 [3482] utrace_set_events 2: new:80 old:0 [3482] thread_info->flags:81 TIF_SYSCALL_TRACE:0 [3482] ptrace_report_clone 2: ret:0 engine->flags:1f8d [3482] A thread_info->flags:81 [3482] B thread_info->flags:80000081 Pid: 3482, comm: o_tracevforkdon Not tainted 2.6.26.6-vda #14 Call Trace: [<ffffffff8103f15d>] ptrace_report_clone+0x1c0/0x21f [<ffffffff81069db3>] utrace_report_clone+0xa3/0x115 [<ffffffff81036644>] do_fork+0x16b/0x254 [<ffffffff8100c377>] ? tracesys+0xd0/0xd5 [<ffffffff8100a5c4>] sys_vfork+0x20/0x22 [<ffffffff8100c5a7>] ptregscall_common+0x67/0xb0 [3483] sys_exit: thread_info->flags:80 In int_very_careful! [3482] syscall_trace_leave: 31th bit is set [3482] syscall_trace_leave: TIF_SYSCALL_TRACE is set [3482] tracehook_report_syscall_exit 1 [3482] tracehook_report_syscall_exit 2 [3482] utrace_set_events: new:0 old:80 [3482] ptrace_report_syscall_exit 1 [3482] ptrace_report_syscall_exit 2 In int_very_careful! [3482] utrace_reset: clear TIF_SYSCALL_TRACE [3482] sys_exit: thread_info->flags:80000080 Bad: [2403] utrace_set_events: new:0 old:0 [2403] utrace_set_events: new:0 old:0 [2403] ptrace_report_clone 1 [2403] utrace_set_events: new:80 old:0 [2403] utrace_set_events 2: new:80 old:0 [2403] thread_info->flags:809 TIF_SYSCALL_TRACE:0 [2403] ptrace_report_clone 2: ret:0 engine->flags:1f8d [2403] A thread_info->flags:809 [2403] B thread_info->flags:80000809 Pid: 2403, comm: o_tracevforkdon Not tainted 2.6.26.6-vda #13 Call Trace: [<ffffffff8103f15d>] ptrace_report_clone+0x1c0/0x21f [<ffffffff81069db3>] utrace_report_clone+0xa3/0x115 [<ffffffff81036644>] do_fork+0x16b/0x254 [<ffffffff8100c11a>] ? system_call_after_swapgs+0x8a/0x8f [<ffffffff8100a5c4>] sys_vfork+0x20/0x22 [<ffffffff8100c5a7>] ptregscall_common+0x67/0xb0 [2404] sys_exit: thread_info->flags:0 In sysret_check! In sysret_signal! [2403] sys_exit: thread_info->flags:80000001 NB: 31th bit of thread_info->flags is set by instrumentation code. Bit 0 is TIF_SYSCALL_TRACE. Other bits vary from run to run and we generally are not interested in those. First sys_exit in the trace (with different PID) belongs to exiting vfork child.
in fork.c in do_fork() lies pre-utrace point where PTRACE_O_TRACEVFORKDONE was handled: if (clone_flags & CLONE_VFORK) { freezer_do_not_count(); wait_for_completion(&vfork); freezer_count(); ==> tracehook_report_vfork_done(p, nr); } But it's a nop for utrace kernels. utrace seems to attempt handling it on syscall exit, but fails as described above. I managed to reproduce the bug on my machine by switching off selinux and syscall auditing. Attempted another "route of attack" on this bug by making _TIF_SYSCALL_TRACE bit be present in _TIF_WORK_MASK: /* work to do on interrupt/exception return */ #define _TIF_WORK_MASK \ (0x0000FFFF & ~(/*_TIF_SYSCALL_TRACE |*/ _TIF_SYSCALL_AUDIT | \ _TIF_SINGLESTEP | _TIF_SECCOMP | _TIF_SYSCALL_EMU)) This makes "return from syscall" code path notice _TIF_SYSCALL_TRACE bit which was just set by vfork() syscall we return from. Unfortunately, in this particular case vfork() returns because [recall when vfork() returns...] child has died (not execed), and this makes parent receive SIGCHLD, which collides with, and overrides, PTRACE_O_TRACEVFORKDONE notification - we get SIGCHLD instead of SIGTRAP. We also race with clearing of PT_VFORKING and if ptrace_report_signal wins, sometimes this WARN_ON triggers: static u32 ptrace_report_syscall_exit(...) { if (!(engine->flags & UTRACE_EVENT(SYSCALL_ENTRY))) { int ret = utrace_set_events( task, engine, engine->flags & ~UTRACE_EVENT(SYSCALL_EXIT)); WARN_ON(ret); ===> WARN_ON(!(task->ptrace & PT_VFORKING)); task->ptrace &= ~PT_VFORKING;
Created attachment 322425 [details] Fix (relative 2.6.26.6-79) syscall entry/return code path has two major branches, "syscall/sysret" for the most usual case and "int" branch which is slower but allows for more manipulations with CPU state. syscall tracing uses "int" branch. "syscall/sysret" branch does not handle syscall entry/exit tracing because it is never entered with syscall tracing on. This includes vfork syscall too. This appears to be by design, because 32-bit version does the same. But in the case of vfork + PTRACE_O_TRACEVFORKDONE, vfork syscall turns ON syscall tracing bit (_TIF_SYSCALL_TRACE) when it is unblocked. The bug is that even though bit is set, "syscall/sysret" branch does not handle it, and PTRACE_O_TRACEVFORKDONE is not delivered to tracer. This patch fixes the issue by checking for _TIF_SYSCALL_TRACE bit in thread flags in sysret_signal part of entry_64.S, and switching to "int" syscall return code path. Since vfork can be unblocked by child execing, *or exiting*, in the second case we (parent-which-is-being-traced) are getting usual SIGCHLD, and since we are traced, we need to report *both* PTRACE_O_TRACEVFORKDONE and the fact that we got SIGCHLD. It seems that the intention is to report PTRACE_O_TRACEVFORKDONE first, and then (after tracer PTRACE_CONTs or PTRACE_SYSCALLs us) report SIGCHLD. This second notification is sometimes lost. I ran o_tracevforkdone test in the loop: while sleep 0.05; do echo -n .; ./o_tracevforkdone || { echo BAD!; exit; }; done and it loses SIGCHLD in ~1% of cases. It happens here: l = ptrace (PTRACE_CONT, child, NULL, NULL); assert_perror (errno); assert (l == 0); errno = 0; pid = waitpid (child, &status, 0); assert_perror (errno); assert (pid == child); assert (WIFSTOPPED (status)); <==== I checked status and it indicates that process has exited with exit code 42. Definitely the case of SIGCHLD being lost.
Might be related to bug 469684 "signals not delivered when tracing system calls"
I modified the o_tracevforkdone test case (in ptrace-tests cvs) so it also tests using PTRACE_SYSCALL on the child to enter the vfork system call. This means the parent task will already have TIF_SYSCALL_TRACE set. Both variants of the test fail, which would seem to contradict the whole comment#10 theory. I don't think there is any evidence of a problem in the system call return path. The problem I see is that the traced thread doing clone does not stop in as many distinct places as ptrace wants it to. In utrace, there is no stopping point between the vfork-parent block and the quiesce/signal callback. In non-vfork clones, there is no stopping point after the clone callback before the quiesce/signal callbacks either. So ptrace gets the multiple callbacks but does not actually stop (enter TASK_TRACED) until report_signal or report_quiesce returns UTRACE_STOP, so the intermediate values of ->exit_code are missed. This also explains bug 469684, I think (no stop between syscall-exit and quiesce). I'll need to rework the ptrace code to handle the multiple stops differently.
> I don't think there is any evidence of a problem in the system call return path. I observed that bug doesn't manifest itself when syscall audit is on (normal state of Fedora 9 with selinux enabled). And the difference is that with syscall audit we always go through "int" path in entry_64.S This is how we fail to catch TIF_SYSCALL_TRACE on exit from vfork: entry_64.S ENTRY(system_call) SWAPGS_UNSAFE_STACK ENTRY(system_call_after_swapgs) movq %rsp,%gs:pda_oldrsp movq %gs:pda_kernelstack,%rsp ENABLE_INTERRUPTS(CLBR_NONE) SAVE_ARGS 8,1 movq %rax,ORIG_RAX-ARGOFFSET(%rsp) movq %rcx,RIP-ARGOFFSET(%rsp) GET_THREAD_INFO(%rcx) testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%rcx) jnz tracesys Above: #define _TIF_WORK_SYSCALL_ENTRY \ (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | \ _TIF_SYSCALL_AUDIT | _TIF_SECCOMP | _TIF_SINGLESTEP) and if SYSCALL_AUDIT is on, we jump to tracesys, and eventually go thru "int" path, where vforkdone notification will be caught. (I do not know why same does not work with SYSCALL_TRACE, this is not what I am talking about here). Let's see what will happen if we did not jump: system_call_fastpath: cmpq $__NR_syscall_max,%rax ja badsys movq %r10,%rcx call *sys_call_table(,%rax,8) # XXX: rip relative Here we do vfork. When we return, SYSCALL_TRACE is set by vfork. Now let's see whether we will notice this: movq %rax,RAX-ARGOFFSET(%rsp) ret_from_sys_call: movl $_TIF_ALLWORK_MASK,%edi sysret_check: LOCKDEP_SYS_EXIT GET_THREAD_INFO(%rcx) DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF movl TI_flags(%rcx),%edx andl %edi,%edx jnz sysret_careful TIF_ALLWORK_MASK contain all bits including SYSCALL_TRACE, we jump here sysret_careful: bt $TIF_NEED_RESCHED,%edx jnc sysret_signal Let's assume NEED_RESCHED is not set and we jump, skipping these instructions: TRACE_IRQS_ON ENABLE_INTERRUPTS(CLBR_NONE) pushq %rdi call schedule popq %rdi jmp sysret_check sysret_signal: TRACE_IRQS_ON ENABLE_INTERRUPTS(CLBR_NONE) #ifdef CONFIG_AUDITSYSCALL bt $TIF_SYSCALL_AUDIT,%edx jc sysret_audit We do not jump here #endif leaq do_notify_resume(%rip),%rax leaq -ARGOFFSET(%rsp),%rdi # &pt_regs -> arg1 xorl %esi,%esi # oldset -> arg2 call ptregscall_common movl $_TIF_WORK_MASK,%edi <=========== NB! DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF jmp int_with_check Obviously we jump here. NB: %edi == TIF_WORK_MASK, which is: #define _TIF_WORK_MASK \ (0x0000FFFF & \ ~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT| \ _TIF_SINGLESTEP|_TIF_SECCOMP|_TIF_SYSCALL_EMU)) It does not contain SYSCALL_TRACE bit. Ok, now we are here: int_with_check: LOCKDEP_SYS_EXIT_IRQ GET_THREAD_INFO(%rcx) movl TI_flags(%rcx),%edx andl %edi,%edx jnz int_careful Here we test thread flags against %edi, *and %edi does not have SYSCALL_TRACE bit*. So we do not jump to int_careful (and that is the place where we eventually would "testl $_TIF_WORK_SYSCALL_EXIT,%edx" and catch SYSCALL_TRACE). Instead, we continue: andl $~TS_COMPAT,TI_status(%rcx) jmp retint_swapgs ... retint_swapgs: /* return to user-space */ DISABLE_INTERRUPTS(CLBR_ANY) TRACE_IRQS_IRETQ SWAPGS jmp restore_args ... restore_args: RESTORE_ARGS 0,8,0 irq_return: INTERRUPT_RETURN Bingo. SYSCALL_TRACE was not caught. The patch in comment #12 prevents this, and I did run-test it. However, I do not insist on my fix being correct. Maybe it's better to fix it another way (say, make vfork not abuse SYSCALL_TRACE?) I merely insist that in current code SYSCALL_TRACE is indeed not caught. :)
kernel-2.6.29-0.28.rc1.fc11.x86_64 appears to have it fixed. I ran a looping test of o_tracevforkdone testcase and it does not fail.
This message is a reminder that Fedora 9 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 9. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '9'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 9's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 9 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.