Description of problem: gcc44-gfortran-4.4.0-6.el5 Reproducible with kernel-xen-2.6.18-157.el5 (Linux athlon5.rhts.bos.redhat.com 2.6.18-157.el5xen #1 SMP Mon Jul 6 18:26:42 EDT 2009 i686 athlon i386 GNU/Linux): * gfortran44, kernel-xen, *no* -mno-tls-direct-seg-refs: [root@athlon5 445666-OpenMP-segv]# ./reproducer44-kernel-xen 1 Segmentation fault * gfortran44, kernel-xen, *with* -mno-tls-direct-seg-refs: [root@athlon5 445666-OpenMP-segv]# gfortran44 -o reproducer reproducer.f90 -mno-tls-direct-seg-refs [root@athlon5 445666-OpenMP-segv]# ./reproducer 1 2 * without OpenMP it works fine [root@athlon5 445666-OpenMP-segv]# gfortran44 -o reproducer-mnowak reproducer.f90 -O1 [root@athlon5 445666-OpenMP-segv]# ./reproducer-mnowak 1 2 [root@athlon5 445666-OpenMP-segv]# cat reproducer.f90 program foo implicit none common /bobcom/ bob(2) !$omp threadprivate (/bobcom/) integer i real*8 bob do i=1,2 write(*,*) i bob(i)=0.0d0 enddo end program Testcase: /tools/gcc/Regression/OpenMP/445666-OpenMP-segv kernel-xen-2.6.18-128.1.18.el5 is the same as well as 2.6.18-126.el5xen.
Actually, to really reproduce the bug, you need to use the above program but compile with: gfortran -g -fopenmp -o chris rep.f90 (the important bit is the -fopenmp). Also, it's important to note that this also fails for a -128 kernel, as far as I can tell, so it's not a regression. Chris Lalancette
I've been able to reduce the test case a bit further: program foo implicit none common /bobcom/ bob(2) !$omp threadprivate (/bobcom/) real*8 bob write(*,*) 1 bob(1)=0.0d0 end program Interestingly, the important piece is the threadprivate stuff for openmp. That seems to generate this bit of assembly: bob(1)=0.0d0 804865b: d9 ee fldz 804865d: 65 dd 1d f0 ff ff ff fstpl %gs:0xfffffff0 8048664: c9 leave 8048665: c3 ret 8048666: 90 nop 8048667: 90 nop 8048668: 90 nop 8048669: 90 nop 804866a: 90 nop 804866b: 90 nop 804866c: 90 nop 804866d: 90 nop 804866e: 90 nop 804866f: 90 nop I'm guessing that we aren't properly emulating the fldz and/or fstpl instructions, and that is what is causing the failure. Indeed, while the upstream hypervisor emulates those instructions, we do not. However, after some brief debugging, it doesn't seem we are entering the emulator properly at all. It will need more looking at. Chris Lalancette
The problem is the fstpl. With the following reduced C test case (which does not require -fopenmp BTW): __thread double x; double y; int main() { x = y * 0.0; } I get a segmentation fault, with just "x = y" (which does not use fstpl) I get the Xen warning.
Actually if I compile the C code without -O2 I get an infinite loop instead. The encoding of the problematic instruction is 65 dd 1d f8 ff ff ff.
Reproducible with Xen 3.2 on RHEL kernel, but not with Xen 3.2 on XenLiveCD kernel 2.6.26.
Actually I was wrong, it is still reproducible with the XenLiveCD's kernel 2.6.26.
And also with upstream hypervisor, despite a lot of changes went in for x87 emulation (16859 16860 17120 17175 17180 17183 17474 17475 17924)
And also with upstream hypervisor _and_ kernel.
Created attachment 355065 [details] patch to fix the bug Aha, I confused instruction emulation with TLS segment fixup. The patch is trivial since the segment fixup code cares only about the operands of the instruction, not about its semantics. I'm submitting the patch upstream.
Committed upstream at http://xenbits.xensource.com/xen-unstable.hg?rev/19985
Created attachment 355680 [details] patch matching what was applied upstream
I've uploaded a test kernel that should have a fix for this problem here: http://people.redhat.com/clalance/virttest/ Can the reporters who are having problems please download and try out this test kernel? Thanks, Chris Lalancette
I'll have a look with 5.4 + your kernel.
Kernel version: 2.6.18-164.el5xen ================================= :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ LOG ] :: [gfortran44] Testing the executable :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ PASS ] :: [gfortran44] Compile the testcase 1 /usr/lib/beakerlib//testing.sh: line 575: 2636 Segmentation fault ./reproducer :: [ FAIL ] :: [gfortran44] Checking we have a working executable (Expected 0, got 139) 1 /tools/gcc/Regression/OpenMP/445666-OpenMP-segv/-gfortran44-Testing-the-executable result: FAIL Kernel version: 2.6.18-164.el5virttest17xen =========================================== :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ LOG ] :: [gfortran44] Testing the executable :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ PASS ] :: [gfortran44] Compile the testcase 1 2 :: [ PASS ] :: [gfortran44] Checking we have a working executable 0 /tools/gcc/Regression/OpenMP/445666-OpenMP-segv/-gfortran44-Testing-the-executable result: PASS FIXED. Paolo, Chris, thanks!
in kernel-2.6.18-170.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html