Bug 207669 - RHEL4 PV x86_64 race condition revolving around TLS (probably)
RHEL4 PV x86_64 race condition revolving around TLS (probably)
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Chris Lalancette
Brian Brock
:
Depends On:
Blocks: 201781
  Show dependency treegraph
 
Reported: 2006-09-22 09:41 EDT by Chris Lalancette
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version: RHBA-2007-0304
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-05-07 23:38:50 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fix for do_arch_prctl crash in RHEL4 PV (3.07 KB, patch)
2006-10-05 19:02 EDT, Chris Lalancette
no flags Details | Diff

  None (edit)
Description Chris Lalancette 2006-09-22 09:41:22 EDT
Description of problem:

RHEL4 PV x86_64 has some sort of race condition that is trivially reproduced by
executing "lvm.static" in a tight loop.  The bug manifests itself as a crash in
do_arch_prctl, and the crash is in this piece of code:

                if (addr <= 0xffffffff) {
                        set_32bit_tls(task, FS_TLS, addr);
                        if (doit) {
                                load_TLS(&task->thread, cpu);
                                asm volatile("movl %0,%%fs" :: "r" (FS_TLS_SEL));
                        }
                        task->thread.fsindex = FS_TLS_SEL;
                        task->thread.fs = 0;
                } else {
                       task->thread.fsindex = 0;
                        task->thread.fs = addr;
                        if (doit) {
                                /* set the selector to 0 to not confuse
                                   __switch_to */
                                asm volatile("movl %0,%%fs" :: "r" (0));
                                ret = HYPERVISOR_set_segment_base(SEGBASE_FS,
                                                                addr);
                        }
                }

If we make the first if statement look like this:

                if (0 && addr <= 0xffffffff) {

we no longer see the crash.  So there is some sort of race condition with
setting up the TLS completely locally.  riel also notes that every once in a
while, init fails to start on x86_64, and he believes it is related to this.
Comment 1 Chris Lalancette 2006-10-05 19:01:18 EDT
OK, I think I have the fix here.  I'll attach the patch.  Basically, although we
set up a GDT for every CPU in arch/x86_64/kernel/head-xen.S, we only use the one
for CPU 0.  The rest are allocated by get_free_page in
drivers/xen/core/smpboot.c, and marked as readonly pages in Xen.  These all were
set up properly, but we were actually indexing into the old table, which wasn't
set up, and causing us to crash if arch_prctl was executed on any CPU but the
boot one.  The attached patch fixes it.
Comment 2 Chris Lalancette 2006-10-05 19:02:46 EDT
Created attachment 137880 [details]
Fix for do_arch_prctl crash in RHEL4 PV
Comment 3 Chris Lalancette 2006-10-05 19:07:42 EDT
Just as an additional note, my test case here was to run lvm.static in a tight
loop until it was run on a CPU other than 0 (usually a matter of a couple of
minutes).  Before this patch, that would always crash.  After applying this
patch, I can run lvm.static in a loop indefinitely (OK, at least for 20 minutes,
but I also verified that it got run on a CPU > 0).
Comment 4 Jason Baron 2006-10-10 11:27:38 EDT
committed in stream U5 build 42.17. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 5 RHEL Product and Program Management 2006-10-12 19:46:45 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 6 Jay Turner 2006-10-17 11:39:05 EDT
QE ack for 4.5.
Comment 9 Red Hat Bugzilla 2007-05-07 23:38:50 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html

Note You need to log in before you can comment on or make changes to this bug.