Red Hat Bugzilla – Bug 547893
floating point register state corruption after handling SIGSEGV
Last modified: 2011-12-16 06:08:31 EST
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:220.127.116.11) Gecko/2009060309 Ubuntu/9.04 (jaunty) Firefox/3.0.11
The context save/restore done by the kernel for a signal handler will cause floating point register state corruption in certain circumstances. This can be demonstrated by a test program that does roughly the following:
- alloc a page; mark it as not writable.
- create a bunch of threads that have a SIGSEGV signal handler
- the threads do the following in a loop:
- store a marker value in xmm0
- attempt to write to the non-writable page to trigger a #GP
- the kernel's #GP handler calls the thread's SIGSEGV signal handler
- the SIGSEGV signal handler:
- writes a different marker value in xmm0
- marks the page as writable
- on return from the signal handler:
- if xmm0 does not have the original marker value then issue an error
message and cause the program to exit
- else mark page as not writable
The test program running under EL4/EL5 (any version) fails within 10 seconds in both a PV guest and on bare metal.
The bug and the fix are described below. I will attach a patch for EL4 and EL5, and attach the test program described above. Upstream Linux has a similar fix for x86_64 although the code has been changed quite a bit.
The 32-bit path is similar to 64-bit but I have not been able to reproduce the bug with a 32-bit kernel. I'll follow up with another bugz if needed for 32-bit.
A test program causes the following scenario:
- user thread writes to xmm0. This will cause PF_USED_MATH to be set in struct task_struct.flags indicating that this task has used the FPU (there is state in FP regs such as xmm0). used_math() will return a non-zero value if the current task's PF_USED_MATH is set.
- user thread attempts to write to a non-writable page. The user thread has a SIGSEGV handler. kernel general protection exception handler is invoked.
- kernel #GP path sets up to call user thread SIGSEGV handler.
- used_math(), called from setup_rt_frame(), returns non-zero (current task's PF_USED_MATH is set) so save_i387() is called to save the user thread's FP state so that it can be restored after returning from the signal handler.
This code sequence in save_i387() is not correct:
- call clear_used_math()
- save FP state in a user buffer
- return 1
clear_used_math() clears PF_USED_MATH indicating that the task does not have FP state that needs to be saved before switching to another context. If we get preempted before we save our FP state, our FP state would be cleared because PF_USED_MATH is not set. For example, we could get a page fault attempting to save our FP state to the user's buffer. When the page fault handling completes and returns to the interrupted context (the #GP handler), the user thread FP state is not restored (because it wasn't saved) and the xmm registers are cleared. The original #GP path continues, saving the now cleared FP state. That cleared state will be "restored" when we return to the user thread from the signal handler. This causes the test program to complain that xmm0's value is NULL rather than the non-zero value it had prior to the general protection fault.
This problem affects 364-bit flavors of EL3, EL4 and EL5, both bare metal and OVM guests.
The fix is to move the call to clear_used_math() after the save of the FP state.
If we needed to save FP state prior to calling a signal handler (used_math() was true) then we saved it to the address pointed to by struct sigcontext.fpstate. If !used_math() then we set fpstate = NULL. restore_sigcontext() is called to restore that context after returning from the signal handler. If fpstate != NULL then restore_i387() is called to restore FP state.
The problem is that TS_USEDFPU may not be set. When set, TS_USEDFPU indicates that the FP state for the task is in the FP registers. We just restored state to the FP registers so TS_USEDFPU must also be set. Otherwise, if we are preempted after restoring FP state then context switching code will assume that there is no context in the FP registers that needs to be saved. It will assume that the FP context for this task has been saved to struct task_struct.thread.i387.fxsave, which has a stale context. It could be from the signal handler causing us to inherit the signal handler's FP values. The correct state is in the FP registers.
The fix is to check if TS_USEDFPU is set. If not, set it and make sure CR0.ts is not set (clts()) to indicate that the FPU has this task's context and is ready for use. We must also clear TS_USEDFPU when we save our FP context in save_i387() and set CR0.ts (stts()) to indicate that the FPU must be initialized on first use by the signal handler.
Steps to Reproduce:
1. Run the test program, that I will attach, on a 64-bit EL4 or EL5 PV guest or bare metal. It will detect an unexpected value in register xmm0 within one minute.
The test program will detect an unexpected value in register xmm0. It is not the value placed in xmm0 prior to invoking a SIGSEGV handler.
The value placed in register xmm0 prior to invoking a SIGSEGV handler should be there after returning from the handler.
Created attachment 378650 [details]
Patch for EL4u8
Created attachment 378651 [details]
Patch for EL5u4
Created attachment 378652 [details]
C program to demonstrate FP corruption
Runs on 64-bit EL4 or EL5.
Compile with "gcc lpthread xmmtest4.c -o xmmtest4.
This is the same bug as https://bugzilla.redhat.com/show_bug.cgi?id=560891
which is hopefully fixed.
*** This bug has been marked as a duplicate of bug 560891 ***
Bug 560891 is private. Could someone please copy the outcome of 560891 here?