Bug 207669

Summary: RHEL4 PV x86_64 race condition revolving around TLS (probably)
Product: Red Hat Enterprise Linux 4 Reporter: Chris Lalancette <clalance>
Component: kernelAssignee: Chris Lalancette <clalance>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: clalance, ddutile, jbaron, riel
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0304 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-08 03:38:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 201781    
Attachments:
Description Flags
Fix for do_arch_prctl crash in RHEL4 PV none

Description Chris Lalancette 2006-09-22 13:41:22 UTC
Description of problem:

RHEL4 PV x86_64 has some sort of race condition that is trivially reproduced by
executing "lvm.static" in a tight loop.  The bug manifests itself as a crash in
do_arch_prctl, and the crash is in this piece of code:

                if (addr <= 0xffffffff) {
                        set_32bit_tls(task, FS_TLS, addr);
                        if (doit) {
                                load_TLS(&task->thread, cpu);
                                asm volatile("movl %0,%%fs" :: "r" (FS_TLS_SEL));
                        }
                        task->thread.fsindex = FS_TLS_SEL;
                        task->thread.fs = 0;
                } else {
                       task->thread.fsindex = 0;
                        task->thread.fs = addr;
                        if (doit) {
                                /* set the selector to 0 to not confuse
                                   __switch_to */
                                asm volatile("movl %0,%%fs" :: "r" (0));
                                ret = HYPERVISOR_set_segment_base(SEGBASE_FS,
                                                                addr);
                        }
                }

If we make the first if statement look like this:

                if (0 && addr <= 0xffffffff) {

we no longer see the crash.  So there is some sort of race condition with
setting up the TLS completely locally.  riel also notes that every once in a
while, init fails to start on x86_64, and he believes it is related to this.

Comment 1 Chris Lalancette 2006-10-05 23:01:18 UTC
OK, I think I have the fix here.  I'll attach the patch.  Basically, although we
set up a GDT for every CPU in arch/x86_64/kernel/head-xen.S, we only use the one
for CPU 0.  The rest are allocated by get_free_page in
drivers/xen/core/smpboot.c, and marked as readonly pages in Xen.  These all were
set up properly, but we were actually indexing into the old table, which wasn't
set up, and causing us to crash if arch_prctl was executed on any CPU but the
boot one.  The attached patch fixes it.

Comment 2 Chris Lalancette 2006-10-05 23:02:46 UTC
Created attachment 137880 [details]
Fix for do_arch_prctl crash in RHEL4 PV

Comment 3 Chris Lalancette 2006-10-05 23:07:42 UTC
Just as an additional note, my test case here was to run lvm.static in a tight
loop until it was run on a CPU other than 0 (usually a matter of a couple of
minutes).  Before this patch, that would always crash.  After applying this
patch, I can run lvm.static in a loop indefinitely (OK, at least for 20 minutes,
but I also verified that it got run on a CPU > 0).

Comment 4 Jason Baron 2006-10-10 15:27:38 UTC
committed in stream U5 build 42.17. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 5 RHEL Program Management 2006-10-12 23:46:45 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Jay Turner 2006-10-17 15:39:05 UTC
QE ack for 4.5.

Comment 9 Red Hat Bugzilla 2007-05-08 03:38:50 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html