Bug 510225 - Segfault/Infinite loop in TLS double access
Summary: Segfault/Infinite loop in TLS double access
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.4
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Paolo Bonzini
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 526775 526946
TreeView+ depends on / blocked
 
Reported: 2009-07-08 12:32 UTC by Michal Nowak
Modified: 2013-03-08 02:06 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 07:45:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch to fix the bug (4.77 KB, patch)
2009-07-24 16:32 UTC, Paolo Bonzini
no flags Details | Diff
patch matching what was applied upstream (4.96 KB, patch)
2009-07-30 14:09 UTC, Paolo Bonzini
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Michal Nowak 2009-07-08 12:32:42 UTC
Description of problem:

gcc44-gfortran-4.4.0-6.el5

Reproducible with kernel-xen-2.6.18-157.el5 (Linux
athlon5.rhts.bos.redhat.com 2.6.18-157.el5xen #1 SMP Mon Jul 6 18:26:42 EDT
2009 i686 athlon i386 GNU/Linux):

* gfortran44, kernel-xen, *no* -mno-tls-direct-seg-refs:

[root@athlon5 445666-OpenMP-segv]# ./reproducer44-kernel-xen 
           1
Segmentation fault

* gfortran44, kernel-xen, *with* -mno-tls-direct-seg-refs:

[root@athlon5 445666-OpenMP-segv]# gfortran44 -o reproducer reproducer.f90
-mno-tls-direct-seg-refs
[root@athlon5 445666-OpenMP-segv]# ./reproducer
           1
           2


* without OpenMP it works fine

[root@athlon5 445666-OpenMP-segv]# gfortran44 -o reproducer-mnowak reproducer.f90 -O1 
[root@athlon5 445666-OpenMP-segv]# ./reproducer-mnowak 
           1
           2

[root@athlon5 445666-OpenMP-segv]# cat reproducer.f90
program foo
        implicit none
        common /bobcom/ bob(2)
!$omp threadprivate (/bobcom/)

        integer i
        real*8 bob

        do i=1,2
        write(*,*) i
        bob(i)=0.0d0
        enddo

        end program




Testcase: /tools/gcc/Regression/OpenMP/445666-OpenMP-segv

kernel-xen-2.6.18-128.1.18.el5 is the same as well as 2.6.18-126.el5xen.

Comment 1 Chris Lalancette 2009-07-08 12:41:49 UTC
Actually, to really reproduce the bug, you need to use the above program but  compile with:

gfortran -g -fopenmp -o chris rep.f90

(the important bit is the -fopenmp).

Also, it's important to note that this also fails for a -128 kernel, as far as I can tell, so it's not a regression.

Chris Lalancette

Comment 2 Chris Lalancette 2009-07-09 11:30:13 UTC
I've been able to reduce the test case a bit further:

program foo
        implicit none
        common /bobcom/ bob(2)
!$omp threadprivate (/bobcom/)
        real*8 bob

        write(*,*) 1
        bob(1)=0.0d0

        end program

Interestingly, the important piece is the threadprivate stuff for openmp.  That seems to generate this bit of assembly:

        bob(1)=0.0d0
 804865b:	d9 ee                	fldz   
 804865d:	65 dd 1d f0 ff ff ff 	fstpl  %gs:0xfffffff0
 8048664:	c9                   	leave  
 8048665:	c3                   	ret    
 8048666:	90                   	nop    
 8048667:	90                   	nop    
 8048668:	90                   	nop    
 8048669:	90                   	nop    
 804866a:	90                   	nop    
 804866b:	90                   	nop    
 804866c:	90                   	nop    
 804866d:	90                   	nop    
 804866e:	90                   	nop    
 804866f:	90                   	nop    

I'm guessing that we aren't properly emulating the fldz and/or fstpl instructions, and that is what is causing the failure.  Indeed, while the upstream hypervisor emulates those instructions, we do not.  However, after some brief debugging, it doesn't seem we are entering the emulator properly at all.  It will need more looking at.

Chris Lalancette

Comment 3 Paolo Bonzini 2009-07-09 16:48:24 UTC
The problem is the fstpl.

With the following reduced C test case (which does not require -fopenmp BTW):

  __thread double x;
  double y;
  int main()
  {
    x = y * 0.0;
  }

I get a segmentation fault, with just "x = y" (which does not use fstpl) I get the Xen warning.

Comment 4 Paolo Bonzini 2009-07-09 18:33:45 UTC
Actually if I compile the C code without -O2 I get an infinite loop instead.

The encoding of the problematic instruction is 65 dd 1d f8 ff ff ff.

Comment 5 Paolo Bonzini 2009-07-10 14:38:05 UTC
Reproducible with Xen 3.2 on RHEL kernel, but not with Xen 3.2 on XenLiveCD kernel 2.6.26.

Comment 6 Paolo Bonzini 2009-07-16 14:13:43 UTC
Actually I was wrong, it is still reproducible with the XenLiveCD's kernel 2.6.26.

Comment 7 Paolo Bonzini 2009-07-20 13:04:47 UTC
And also with upstream hypervisor, despite a lot of changes went in for x87 emulation (16859 16860 17120 17175 17180 17183 17474 17475 17924)

Comment 8 Paolo Bonzini 2009-07-20 16:29:08 UTC
And also with upstream hypervisor _and_ kernel.

Comment 9 Paolo Bonzini 2009-07-24 16:32:55 UTC
Created attachment 355065 [details]
patch to fix the bug

Aha, I confused instruction emulation with TLS segment fixup.

The patch is trivial since the segment fixup code cares only about the operands of the instruction, not about its semantics.  I'm submitting the patch upstream.

Comment 10 Paolo Bonzini 2009-07-30 09:41:13 UTC
Committed upstream at http://xenbits.xensource.com/xen-unstable.hg?rev/19985

Comment 11 Paolo Bonzini 2009-07-30 14:09:55 UTC
Created attachment 355680 [details]
patch matching what was applied upstream

Comment 12 Chris Lalancette 2009-08-25 10:00:29 UTC
I've uploaded a test kernel that should have a fix for this problem here:

http://people.redhat.com/clalance/virttest/

Can the reporters who are having problems please download and try out this test kernel?

Thanks,
Chris Lalancette

Comment 13 Michal Nowak 2009-08-25 11:02:35 UTC
I'll have a look with 5.4 + your kernel.

Comment 14 Michal Nowak 2009-08-25 12:27:08 UTC
Kernel version: 2.6.18-164.el5xen
=================================

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: [gfortran44] Testing the executable
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [   PASS   ] :: [gfortran44] Compile the testcase
           1
/usr/lib/beakerlib//testing.sh: line 575:  2636 Segmentation fault      ./reproducer
:: [   FAIL   ] :: [gfortran44] Checking we have a working executable (Expected 0, got 139)
1
/tools/gcc/Regression/OpenMP/445666-OpenMP-segv/-gfortran44-Testing-the-executable result: FAIL


Kernel version: 2.6.18-164.el5virttest17xen
===========================================

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: [gfortran44] Testing the executable
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [   PASS   ] :: [gfortran44] Compile the testcase
           1
           2
:: [   PASS   ] :: [gfortran44] Checking we have a working executable
0
/tools/gcc/Regression/OpenMP/445666-OpenMP-segv/-gfortran44-Testing-the-executable result: PASS



FIXED.

Paolo, Chris, thanks!

Comment 15 Don Zickus 2009-10-21 19:12:13 UTC
in kernel-2.6.18-170.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 21 errata-xmlrpc 2010-03-30 07:45:20 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html


Note You need to log in before you can comment on or make changes to this bug.