Bug 510225
| Summary: | Segfault/Infinite loop in TLS double access | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Michal Nowak <mnowak> | ||||||
| Component: | kernel-xen | Assignee: | Paolo Bonzini <pbonzini> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | low | ||||||||
| Version: | 5.4 | CC: | clalance, cward, dzickus, ohudlick, pbonzini, xen-maint | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2010-03-30 07:45:20 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 526775, 526946 | ||||||||
| Attachments: |
|
||||||||
Actually, to really reproduce the bug, you need to use the above program but compile with: gfortran -g -fopenmp -o chris rep.f90 (the important bit is the -fopenmp). Also, it's important to note that this also fails for a -128 kernel, as far as I can tell, so it's not a regression. Chris Lalancette I've been able to reduce the test case a bit further:
program foo
implicit none
common /bobcom/ bob(2)
!$omp threadprivate (/bobcom/)
real*8 bob
write(*,*) 1
bob(1)=0.0d0
end program
Interestingly, the important piece is the threadprivate stuff for openmp. That seems to generate this bit of assembly:
bob(1)=0.0d0
804865b: d9 ee fldz
804865d: 65 dd 1d f0 ff ff ff fstpl %gs:0xfffffff0
8048664: c9 leave
8048665: c3 ret
8048666: 90 nop
8048667: 90 nop
8048668: 90 nop
8048669: 90 nop
804866a: 90 nop
804866b: 90 nop
804866c: 90 nop
804866d: 90 nop
804866e: 90 nop
804866f: 90 nop
I'm guessing that we aren't properly emulating the fldz and/or fstpl instructions, and that is what is causing the failure. Indeed, while the upstream hypervisor emulates those instructions, we do not. However, after some brief debugging, it doesn't seem we are entering the emulator properly at all. It will need more looking at.
Chris Lalancette
The problem is the fstpl.
With the following reduced C test case (which does not require -fopenmp BTW):
__thread double x;
double y;
int main()
{
x = y * 0.0;
}
I get a segmentation fault, with just "x = y" (which does not use fstpl) I get the Xen warning.
Actually if I compile the C code without -O2 I get an infinite loop instead. The encoding of the problematic instruction is 65 dd 1d f8 ff ff ff. Reproducible with Xen 3.2 on RHEL kernel, but not with Xen 3.2 on XenLiveCD kernel 2.6.26. Actually I was wrong, it is still reproducible with the XenLiveCD's kernel 2.6.26. And also with upstream hypervisor, despite a lot of changes went in for x87 emulation (16859 16860 17120 17175 17180 17183 17474 17475 17924) And also with upstream hypervisor _and_ kernel. Created attachment 355065 [details]
patch to fix the bug
Aha, I confused instruction emulation with TLS segment fixup.
The patch is trivial since the segment fixup code cares only about the operands of the instruction, not about its semantics. I'm submitting the patch upstream.
Committed upstream at http://xenbits.xensource.com/xen-unstable.hg?rev/19985 Created attachment 355680 [details]
patch matching what was applied upstream
I've uploaded a test kernel that should have a fix for this problem here: http://people.redhat.com/clalance/virttest/ Can the reporters who are having problems please download and try out this test kernel? Thanks, Chris Lalancette I'll have a look with 5.4 + your kernel. Kernel version: 2.6.18-164.el5xen
=================================
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [ LOG ] :: [gfortran44] Testing the executable
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [ PASS ] :: [gfortran44] Compile the testcase
1
/usr/lib/beakerlib//testing.sh: line 575: 2636 Segmentation fault ./reproducer
:: [ FAIL ] :: [gfortran44] Checking we have a working executable (Expected 0, got 139)
1
/tools/gcc/Regression/OpenMP/445666-OpenMP-segv/-gfortran44-Testing-the-executable result: FAIL
Kernel version: 2.6.18-164.el5virttest17xen
===========================================
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [ LOG ] :: [gfortran44] Testing the executable
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [ PASS ] :: [gfortran44] Compile the testcase
1
2
:: [ PASS ] :: [gfortran44] Checking we have a working executable
0
/tools/gcc/Regression/OpenMP/445666-OpenMP-segv/-gfortran44-Testing-the-executable result: PASS
FIXED.
Paolo, Chris, thanks!
in kernel-2.6.18-170.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html |
Description of problem: gcc44-gfortran-4.4.0-6.el5 Reproducible with kernel-xen-2.6.18-157.el5 (Linux athlon5.rhts.bos.redhat.com 2.6.18-157.el5xen #1 SMP Mon Jul 6 18:26:42 EDT 2009 i686 athlon i386 GNU/Linux): * gfortran44, kernel-xen, *no* -mno-tls-direct-seg-refs: [root@athlon5 445666-OpenMP-segv]# ./reproducer44-kernel-xen 1 Segmentation fault * gfortran44, kernel-xen, *with* -mno-tls-direct-seg-refs: [root@athlon5 445666-OpenMP-segv]# gfortran44 -o reproducer reproducer.f90 -mno-tls-direct-seg-refs [root@athlon5 445666-OpenMP-segv]# ./reproducer 1 2 * without OpenMP it works fine [root@athlon5 445666-OpenMP-segv]# gfortran44 -o reproducer-mnowak reproducer.f90 -O1 [root@athlon5 445666-OpenMP-segv]# ./reproducer-mnowak 1 2 [root@athlon5 445666-OpenMP-segv]# cat reproducer.f90 program foo implicit none common /bobcom/ bob(2) !$omp threadprivate (/bobcom/) integer i real*8 bob do i=1,2 write(*,*) i bob(i)=0.0d0 enddo end program Testcase: /tools/gcc/Regression/OpenMP/445666-OpenMP-segv kernel-xen-2.6.18-128.1.18.el5 is the same as well as 2.6.18-126.el5xen.