Quentin Casasnovas reported on http://seclists.org/oss-sec/2015/q1/877:
Jamie and I discovered there was a flaw in the way the xsave/xrstor (and
their alternative instructions) were being protected against a fault in
kernel space from linux 3.15. The problem was introduced in commit f31a9f7
("x86/xsaves: Use xsaves/xrstors to save and restore xsave area") which
ends up protecting the .altinstr_replacement from faulting instead of the
target of the alternative in .text, leaving the instruction un-protected.
You can find a reproducer (thanks to Allan for his help with/comments on
it!) triggering the fault in kernel space attached to this e-mail but it
should be noted there are a few different places where these instructions
are used un-protected and the reproducer only uses one of them present in
the kvm code. You can find a list of all such places in the attached
unprotected_xsave_faults attachment which was generated against a v4.0-rc1
defconfig + CONFIG_KVM vmlinux.o (the most concerning one probably being in
__switch_to()). The reproducer is a patch to apply on top of lkvm
(https://github.com/penberg/linux-kvm) but it should be trivial to write as
a standalone C application.
It should be noted that this vulnerability is present even if the hardware
does not support xsaveS.
This is fixed by upstream commit 06c8173eb:
Also, more detailed problem description:
We have two different mecanisms in the kernel that are involved here:
- the alternative instructions: it allows the Linux kernel to self-modify
its running code to use optimized instructions when they are available
on the host CPU. The way it works is that the initial instruction in
.text is supported by all CPU variants and we add optimized version of
the instruction in the .altinstr_replacement section. At boot, or when
loading a kernel module, the kernel will replace the instruction in
.text by its optimized version from .altinstr_replacement if the CPU
supports it, we can see it as something like this:
memcpy(.text, .altinstr_replacement, sizeof(instruction));
The CPU will never have its instruction pointer pointing in
.altinstr_replacement, this is just used as a memory source when
applying the alternative instructions.
- the ex_table entries: it allows the kernel to mark addresses where a
fault might occur but should not cause a panic. It works by storing in
the __ex_table section a pair of addresses, the first one being the
address of the instruction which could fault, and the second is where
the execution should continue when that fault happens. On page fault in
kernel space, the page fault handler will check if there is an ex_table
entry corresponding to where the fault happened, and if that's the case,
will restore the CPU context with RIP pointing to that second address.
This is how copy_from_user() does not panic the kernel when the userland
pointer given as argument is borked for example.
Now, the xsave instruction (or its alternatives xsaveopt and xsaves) could
fault for different reasons (unaligned memory operands, non-cannonical
address memory operand, ...), so we want to have an ex_table entry pointing
to the xsave instruction in .text so that if it faults, the kernel does not
die but simply continue its normal flaw and return an error to the caller.
The problem with the above commit is that it correctly added an ex_table
entry to prevent this, but the pointer to the instruction which might fault
was not pointing to .text but to .altinstr_replacement. The effect is that
if userspace manages to trigger the fault on xsave (which is in .text), the
page fault handler will never find a corresponding ex_table entry and will
consider this as an unhandled fault. The fix is to make the ex_table entry
properly point to .text and not to .altinstr_replacement so the kernel
properly handles the fault and does not die.
Created kernel tracking bugs for this issue:
Affects: fedora-all [bug 1204724]
This issue does not affect the Linux kernel packages as shipped with Red Hat Entereprise Linux 5, 6, 7 and Red Hat Enterprise MRG 2.