The debian folks reported that kernel doesn't ensure direction flag is cleared upon entry to signal handler, which violates both i?86 and x86_64 ABIs. Old GCCs conservatively used cld anyway before using any instructions that use that flag, but GCC 4.3 no longer does that, it relies on the ABI guarantees that on entry to a function the direction flag must be cleared. See http://gcc.gnu.org/ml/gcc-patches/2006-12/msg00276.html Anything that uses std instruction must cld again before calling another function or before returning from function. Unfortunately, if async signal is sent while a thread has std flag set, kernel will start a signal handler with DF flag set. The fix is addition of regs->eflags &= ~X86_EFLAGS_DF; or similar in setup_frame/setup_rt_frame (i386, x86_64, x86_64 32-bit support). While only code compiled with GCC 4.3 and later will be affected by this bug, RHEL5 kernels are often used with later Fedora userland (e.g. koji buildboxes), so IMHO if at all possible to fix should be backported to RHEL5.2 and perhaps even RHEL4.7 kernels.
Upstream patch: http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-x86.git;a=commitdiff;h=52c841e1012b8e73cc04b53f92fb933db580fb42 RHEL5 patch will be different, as it doesn't have unified x86 tree, etc.
Testcase for QA in http://gcc.gnu.org/ml/gcc/2008-03/msg00267.html (on x86_64 testing with both -m32 and -m64 compiled/linked testcase is needed).
Whenever i run the test case (sending SIGUSR1 to the test process), i get: 'DF = 1'. indicating that the the df flag is set...I expected it to be unset b/c from reading this gcc clears the df flags...i've tried older distros and compat-gcc and still i get DF = 1...what am i missing? thanks.
If you get DF = 1, then that means the kernel is buggy. The psABI says that the DF flag must be clear upon entry to a function and also on exit from function. GCC <= 4.2.x would add cld just in case, whenever it used some string instruction (movs*/stos*/loads*/cmps* etc.), GCC 4.3.0 relies on the ABI guarantee. As GCC itself never issues std insn, it is just inline or out-of-line assembly which has to reset cld after it did std (AFAIK all such assembly I saw does that), the kernel has to start a process with cleared DF flag (also done) and the kernel signal handler needs to clear it for the signal handler (this is the bug).
Created attachment 297246 [details] rhel5 clear df flags patch this looks like what we want...i'm going to go test it. its very similar to the upstream match modulo the flags->eflags rename, and file locations.
Wouldn't: --- linux-2.6.18/arch/i386/kernel/signal.c.jj 2008-03-07 16:16:32.000000000 -0500 +++ linux-2.6.18/arch/i386/kernel/signal.c 2008-03-07 16:26:44.000000000 -0500 @@ -540,7 +540,7 @@ handle_signal(unsigned long sig, siginfo * The tracer may want to single-step inside the * handler too. */ - regs->eflags &= ~TF_MASK; + regs->eflags &= ~(TF_MASK | X86_EFLAGS_DF); tracehook_report_handle_signal(sig, ka, oldset, regs); } --- linux-2.6.18/arch/x86_64/kernel/signal.c.jj 2008-03-07 16:16:34.000000000 -0500 +++ linux-2.6.18/arch/x86_64/kernel/signal.c 2008-03-07 16:24:38.000000000 -0500 @@ -384,7 +384,7 @@ handle_signal(unsigned long sig, siginfo * The tracer may want to single-step inside the * handler too. */ - regs->eflags &= ~TF_MASK; + regs->eflags &= ~(TF_MASK | X86_EFLAGS_DF); tracehook_report_handle_signal(sig, ka, oldset, regs); } be better? Shorter plus doesn't add any extra instructions, just changes the andl immediate operands. The places you were adding regs->eflags &= X86_EFLAGS_DF; to all very soon return 0; and these two places are the only places where the *setup_*frame functions return to.
Created attachment 297257 [details] 32-bit test case
Created attachment 297942 [details] Modified reproducer (runs on both {32,64} architectures) -- Thanks Jakub for help.
in kernel-2.6.18-87.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Confirmed DF is being cleared with the test on the -89.el5 kernel, tested both 32-bit and 64-bit on an x86_64 box.
*** Bug 444178 has been marked as a duplicate of this bug. ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html