203894 – 4/4GB split issue in MCE handler

Bug 203894 - 4/4GB split issue in MCE handler

Summary: 4/4GB split issue in MCE handler

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.4
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Neil Horman
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-08-24 09:44 UTC by Vasily Averin
Modified:	2008-01-09 17:29 UTC (History)
CC List:	2 users (show)
Fixed In Version:	RHBA-2007-0304
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-05-08 03:23:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
fixed by attached patch (638 bytes, patch) 2006-08-24 09:50 UTC, Vasily Averin	no flags	Details \| Diff
alternate patch to fix the read of kernel space memory in user space context (1.46 KB, patch) 2006-11-08 13:33 UTC, Neil Horman	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0304	0	normal	SHIPPED_LIVE	Updated kernel packages available for Red Hat Enterprise Linux 4 Update 5	2007-04-28 18:58:50 UTC

Description Vasily Averin 2006-08-24 09:44:41 UTC

SWsoft Virtuozzo/OpenVZ Linux kernel team has found 4/4GB split related issue:
machine check exception handler tries accessing kernel-space memory
(machine_check_vector) before switching to kernel-space context.

If MCE interrupts userspace application it usually leads to non-fatal oops
message, however if this memory address is used by application kernel will jump
to wrong pointer and it can lead to various troubles like memory corruptions,
hangs or reboot.

Comment 1 Vasily Averin 2006-08-24 09:47:58 UTC

Oops example:
from Virtuozzo/OpenVZ kernel 2.6.8-022stab078.9-enterprise (with 4/4GB split patch)

Unable to handle kernel paging request at virtual address 024a8e10
fffad03e
*pde = 0063c027
Oops: 0000 [#1]
CPU:    0, VCPU: 1:2
EIP:    0060:[<fffad03e>]    Tainted:  P
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206   (2.6.8-022stab078.9-enterprise)
eax: 409e55b8   ebx: 40c52660   ecx: 00000200   edx: 40331468
esi: 40cd0018   edi: 403317e8   ebp: bdffe2cc   esp: d0e9ffe8
ds: 007b   es: 007b   ss: 0068
Stack: 00000000 08217e13 00000073 00000206 bdffe2a4 0000007b
Call Trace:
Code: ff 35 10 8e 4a 02 e9 ab fd ff ff 8d 76 00 6a 00 68 50 7a 10


>>EIP; fffad03e <machine_check+2/10>   <=====

>>eax; 409e55b8 <pg0+3e3aa5b8/fd971000>
>>ebx; 40c52660 <pg0+3e617660/fd971000>
>>edx; 40331468 <pg0+3dcf6468/fd971000>
>>esi; 40cd0018 <pg0+3e695018/fd971000>
>>edi; 403317e8 <pg0+3dcf67e8/fd971000>
>>ebp; bdffe2cc <pg0+bb9c32cc/fd971000>
>>esp; d0e9ffe8 <pg0+ce864fe8/fd971000>

Code;  fffad03e <machine_check+2/10>
00000000 <_EIP>:
Code;  fffad03e <machine_check+2/10>   <=====
   0:   ff 35 10 8e 4a 02         pushl  0x24a8e10   <=====
Code;  fffad044 <machine_check+8/10>
   6:   e9 ab fd ff ff            jmp    fffffdb6 <_EIP+0xfffffdb6>
Code;  fffad049 <machine_check+d/10>
   b:   8d 76 00                  lea    0x0(%esi),%esi
Code;  fffad04c <spurious_interrupt_bug+0/50fb4>
   e:   6a 00                     push   $0x0
Code;  fffad04e <spurious_interrupt_bug+2/50fb4>
  10:   68 50 7a 10 00            push   $0x107a50

from System map:
024a8e10 D machine_check_vector

Comment 2 Vasily Averin 2006-08-24 09:50:05 UTC

Created attachment 134797 [details]
fixed by attached patch

Comment 3 Vasily Averin 2006-10-03 07:01:34 UTC

Ingo,
could you please comment this bug?
From our point of view it can explain the various troubles on the nodes where
kernel with 4G split patch is running. It may be memomry corruptions, hangs and
reboots without any diagnostic.

Comment 4 Neil Horman 2006-11-06 13:49:08 UTC

please try with our latest RHEL4 kernel.  The kernel you are reporting this
problem on isnt a RHEL kernel and all of the RHEL4 kernels appear to properly
switch to kernel space via the error_code path in entry.S before calling the
vector pushed onto the stack.

Comment 5 Vasily Averin 2006-11-07 12:18:56 UTC

Ok, please look at the arch/i386/kernel/entry.S
I've copy it from our 2.6.9-42.0.3 kernel:

ENTRY(alignment_check)
        pushl $do_alignment_check
        jmp error_code

ENTRY(page_fault)
        pushl $do_page_fault
        jmp error_code

#ifdef CONFIG_X86_MCE
ENTRY(machine_check)
        pushl $0
        pushl machine_check_vector
        jmp error_code
#endif

we see here 3 interrups vectors. please note that that in first 2 cases we push
into stack the constants:  $do_alignment_check and $do_page_fault

But in case of machine_check we read the content from kernel-space _variable_
machine_check_vector. And we do it _before_ jump to error_code where we change
the context from user-space to kernel-space.

Therefore we will access kernel-space adress in user-space context. Ususally
this address is not mapped, and we have on oops message. But if this address is
mappend in userspace -- we will access to it, and read his content as
machine_check_vector and push it into stack.

Then we jumps to error_code, switches the context and calls to wrong pointer, in
kernel-space context, with unexpected behaviour.

error_code:
...
        movl ORIG_EAX(%esp), %esi       # get the error code
        movl ES(%esp), %edi             # get the function address
...
        __SWITCH_KERNELSPACE

        leal 4(%esp), %edx              # prepare pt_regs
        pushl %edx                      # push pt_regs

        call *%edi


I would note that it is real issue, we (SWsoft Virtuozzo/OpenVZ kernel team)
have received 2 such oopses from our customers.

Comment 6 Neil Horman 2006-11-07 17:08:39 UTC

Ok, my bad, I see what your doing now.  I was hung up thinking you were worried
about an alignment issue with the movl ORIG_EAX(%esp) in error_code, since
machine_check pushes a 0 error code onto the stack, which I see now that you
aren't.  This looks to make sense to me now.  You're getting the oops because
the machine_check_vector is holding a kernel address, and we're trying to access
it from user space context.  Your patch fixes that by adding the
machine_check_vector variable to the entry.text trampoline area, so that it can
be safely accessed from  user space.

The patch looks fine to me, although it appears that Dave Jones is proposing to
handle it slightly differently upstream:
http://www.uwsg.iu.edu/hypermail/linux/kernel/0210.3/0669.html

It appears he's trying to keep excess stuff out of the trampoline area.  unless
you have a particular objection, I'm going to try to do this inline with
upstream (assuming his patch gets taken, no sign of it yet).  I'll post a patch
here for testing shortly.

Comment 7 Neil Horman 2006-11-07 20:10:41 UTC

scratch that last comment, didn't see the date on that post.  So it actually
looks like this needs to go upstream as well.

Comment 8 Vasily Averin 2006-11-08 06:48:47 UTC

I'm not sure that this patch will be included into mainstream.

I would note that it is 4G/4G split-related issue. Any linux mainstream kernels
are not vulnerabled because of this patch is not included into mainstream kernel.

As far as I know 4G split patch is used now only in RHEL i386 hugemem kernels
and  in our Virtuozzo/OpenVz kernels. Old FC kernels did used it but dropped
long time ago.

Are you (or Ingo Molnar) knows probably other vendors who uses 4G split patch?

Comment 9 Neil Horman 2006-11-08 13:33:31 UTC

Created attachment 140648 [details]
alternate patch to fix the read of kernel space memory in user space context

Please test this alternate patch and confirm that it solves the problem equally
well.  I'm not sure which patch is more appropriate yet (add a few dozen bytes
to kernel text, or 4 bytes to the shared trampoline area, which should be as
small as possible), but I'd like to have both alternatives available when I
propose a solution.  Thanks

Comment 10 Kirill Korotaev 2006-11-08 14:23:45 UTC

both patches are correct. there is no any real difference in effeciency,
however your patch looks a bit better for me.

Comment 11 Vasily Averin 2006-11-09 11:13:39 UTC

Neil,

From my point of view your patch is better. Our patch is unclean and looks like
a hack, but your patch looks like correct solution. It fixes the root cause of
this error -- access to kernel-space variable before error_code. Nobody do it,
and I assume there is some important reason. How do you think is it probably the
other wrong situtions? Probably we have not guarantee that some segment
registers are correct before error_code? Probably when MCE interrupts CPU in
VM86 mode? In this case we have a chance to commit your patch into mainstream.

Also I would note that my patch requires the write permissions for trampoline
area. In our case variable placed in this area is changed and it is not a very
good in principle. In your case trampoline area can be write-protected, and it
is yet another little advantage of your patch.

Comment 12 Neil Horman 2006-11-09 12:14:20 UTC

I hadn't considered the read/write dillemma of your patch, but I would guess
that entry.text needs to be read/write in the case of signal handler returns
(not sure though). Either way, if you're comfortable with my variant of the
patch, I'll go propose this upstream, and, if accepted, I'll get it into update
5 ASAP.

Comment 13 Neil Horman 2006-11-09 14:22:27 UTC

posted upstream for review

Comment 14 Neil Horman 2006-11-10 18:52:34 UTC

Just got pulled into -mm.  I'll post internally soon

Comment 15 Jason Baron 2006-11-14 18:30:05 UTC

committed in stream U5 build 42.25. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/

Comment 16 RHEL Program Management 2006-11-28 03:34:44 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 17 RHEL Program Management 2006-11-28 03:35:03 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 18 RHEL Program Management 2006-11-28 03:35:38 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 19 Jay Turner 2006-12-18 14:47:40 UTC

QE ack for RHEL4.5.

Comment 22 Red Hat Bugzilla 2007-05-08 03:23:22 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html

Note You need to log in before you can comment on or make changes to this bug.