Description of problem: RHEL4 PV x86_64 has some sort of race condition that is trivially reproduced by executing "lvm.static" in a tight loop. The bug manifests itself as a crash in do_arch_prctl, and the crash is in this piece of code: if (addr <= 0xffffffff) { set_32bit_tls(task, FS_TLS, addr); if (doit) { load_TLS(&task->thread, cpu); asm volatile("movl %0,%%fs" :: "r" (FS_TLS_SEL)); } task->thread.fsindex = FS_TLS_SEL; task->thread.fs = 0; } else { task->thread.fsindex = 0; task->thread.fs = addr; if (doit) { /* set the selector to 0 to not confuse __switch_to */ asm volatile("movl %0,%%fs" :: "r" (0)); ret = HYPERVISOR_set_segment_base(SEGBASE_FS, addr); } } If we make the first if statement look like this: if (0 && addr <= 0xffffffff) { we no longer see the crash. So there is some sort of race condition with setting up the TLS completely locally. riel also notes that every once in a while, init fails to start on x86_64, and he believes it is related to this.
OK, I think I have the fix here. I'll attach the patch. Basically, although we set up a GDT for every CPU in arch/x86_64/kernel/head-xen.S, we only use the one for CPU 0. The rest are allocated by get_free_page in drivers/xen/core/smpboot.c, and marked as readonly pages in Xen. These all were set up properly, but we were actually indexing into the old table, which wasn't set up, and causing us to crash if arch_prctl was executed on any CPU but the boot one. The attached patch fixes it.
Created attachment 137880 [details] Fix for do_arch_prctl crash in RHEL4 PV
Just as an additional note, my test case here was to run lvm.static in a tight loop until it was run on a CPU other than 0 (usually a matter of a couple of minutes). Before this patch, that would always crash. After applying this patch, I can run lvm.static in a loop indefinitely (OK, at least for 20 minutes, but I also verified that it got run on a CPU > 0).
committed in stream U5 build 42.17. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
QE ack for 4.5.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0304.html