Bug 207669 - RHEL4 PV x86_64 race condition revolving around TLS (probably)
Summary: RHEL4 PV x86_64 race condition revolving around TLS (probably)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Chris Lalancette
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 201781
TreeView+ depends on / blocked
 
Reported: 2006-09-22 13:41 UTC by Chris Lalancette
Modified: 2007-11-30 22:07 UTC (History)
4 users (show)

Fixed In Version: RHBA-2007-0304
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-05-08 03:38:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Fix for do_arch_prctl crash in RHEL4 PV (3.07 KB, patch)
2006-10-05 23:02 UTC, Chris Lalancette
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0304 0 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 4 Update 5 2007-04-28 18:58:50 UTC

Description Chris Lalancette 2006-09-22 13:41:22 UTC
Description of problem:

RHEL4 PV x86_64 has some sort of race condition that is trivially reproduced by
executing "lvm.static" in a tight loop.  The bug manifests itself as a crash in
do_arch_prctl, and the crash is in this piece of code:

                if (addr <= 0xffffffff) {
                        set_32bit_tls(task, FS_TLS, addr);
                        if (doit) {
                                load_TLS(&task->thread, cpu);
                                asm volatile("movl %0,%%fs" :: "r" (FS_TLS_SEL));
                        }
                        task->thread.fsindex = FS_TLS_SEL;
                        task->thread.fs = 0;
                } else {
                       task->thread.fsindex = 0;
                        task->thread.fs = addr;
                        if (doit) {
                                /* set the selector to 0 to not confuse
                                   __switch_to */
                                asm volatile("movl %0,%%fs" :: "r" (0));
                                ret = HYPERVISOR_set_segment_base(SEGBASE_FS,
                                                                addr);
                        }
                }

If we make the first if statement look like this:

                if (0 && addr <= 0xffffffff) {

we no longer see the crash.  So there is some sort of race condition with
setting up the TLS completely locally.  riel also notes that every once in a
while, init fails to start on x86_64, and he believes it is related to this.

Comment 1 Chris Lalancette 2006-10-05 23:01:18 UTC
OK, I think I have the fix here.  I'll attach the patch.  Basically, although we
set up a GDT for every CPU in arch/x86_64/kernel/head-xen.S, we only use the one
for CPU 0.  The rest are allocated by get_free_page in
drivers/xen/core/smpboot.c, and marked as readonly pages in Xen.  These all were
set up properly, but we were actually indexing into the old table, which wasn't
set up, and causing us to crash if arch_prctl was executed on any CPU but the
boot one.  The attached patch fixes it.

Comment 2 Chris Lalancette 2006-10-05 23:02:46 UTC
Created attachment 137880 [details]
Fix for do_arch_prctl crash in RHEL4 PV

Comment 3 Chris Lalancette 2006-10-05 23:07:42 UTC
Just as an additional note, my test case here was to run lvm.static in a tight
loop until it was run on a CPU other than 0 (usually a matter of a couple of
minutes).  Before this patch, that would always crash.  After applying this
patch, I can run lvm.static in a loop indefinitely (OK, at least for 20 minutes,
but I also verified that it got run on a CPU > 0).

Comment 4 Jason Baron 2006-10-10 15:27:38 UTC
committed in stream U5 build 42.17. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 5 RHEL Program Management 2006-10-12 23:46:45 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Jay Turner 2006-10-17 15:39:05 UTC
QE ack for 4.5.

Comment 9 Red Hat Bugzilla 2007-05-08 03:38:50 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html


Note You need to log in before you can comment on or make changes to this bug.