Bug 790399 - Segmentation fault in dynamic loader on AVX enabled CPU
Summary: Segmentation fault in dynamic loader on AVX enabled CPU
Status: CLOSED DUPLICATE of bug 752122
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: glibc
Version: 6.1
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Jeff Law
QA Contact: qe-baseos-tools
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-14 12:42 UTC by Steven Haigh
Modified: 2016-11-24 16:04 UTC (History)
8 users (show)

(edit)
Clone Of: 720176
(edit)
Last Closed: 2012-02-14 16:14:59 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Debian BTS 646549 None None None Never

Description Steven Haigh 2012-02-14 12:42:10 UTC
+++ This bug was initially created as a clone of Bug #720176 +++
I believe this issue may also affect EL6 using the following:
kernel-2.6.32-220.4.1.el6.x86_64
glibc-2.12-1.47.el6_2.5.x86_64
mod_perl-2.0.4-10.el6.x86_64
httpd-2.2.15-9.sl6.3.x86_64

Whenever trying to run a perl script via mod_perl for web output on a certain SandyBridge (i5-2500k CPU, Gigabyte Z68M-D2H mainboard) I get the following in the apache logs:
[Tue Feb 14 23:30:02 2012] [notice] child pid 20491 exit signal Illegal instruction (4)

The exact same software versions on an older Intel Xeon based IBM x3650 does not show this error and functions as per normal.

Below is the transcript of the same possible bug from F13...

Description of problem:

A problem reported WRT glibc 2.12 Fedora 13 on AVX enabled hardware:

http://sourceware.org/bugzilla/show_bug.cgi?format=multiple&id=12113

Seems to also affect F13's glibc 2.13.  I have attached the "small reproducer" which was attached to the other bug report, as it does reproduce the issue.

I ran across this after my openVZ VPS was migrated to Xeon "Sandy Bridge" hardware and apache mod_perl applications crashed.  Here are the first five contexts from running apache in gdb after triggering the problem.

Program received signal SIGILL, Illegal instruction.
_dl_x86_64_save_sse () at ../sysdeps/x86_64/dl-trampoline.S:189
189		vmovdqa %ymm0, %fs:RTLD_SAVESPACE_SSE+0*YMM_SIZE
(gdb) bt
#0  _dl_x86_64_save_sse () at ../sysdeps/x86_64/dl-trampoline.S:189
#1  0x00002aaaaad0f3e0 in add_dependency (undef_name=0x2aaabc349503 "Perl_Istack_sp_ptr", undef_map=0x2aaab612d230, 
    ref=0x7fffffffd4d0, symbol_scope=0x2aaab612d588, version=0x0, type_class=1, flags=1, skip_map=0x0)
    at dl-lookup.c:628
#2  _dl_lookup_symbol_x (undef_name=0x2aaabc349503 "Perl_Istack_sp_ptr", undef_map=0x2aaab612d230, ref=0x7fffffffd4d0, 
    symbol_scope=0x2aaab612d588, version=0x0, type_class=1, flags=1, skip_map=0x0) at dl-lookup.c:831
#3  0x00002aaaaad12220 in _dl_fixup (l=<value optimized out>, reloc_arg=<value optimized out>)
    at ../elf/dl-runtime.c:118
#4  0x00002aaaaad18db5 in _dl_runtime_resolve () at ../sysdeps/x86_64/dl-trampoline.S:41
#5  0x00002aaabc349b16 in boot_Apache2__Const (my_perl=0x2aaab60425f0, cv=<value optimized out>) at Const.c:89

Which is identical to the initial backtrace posted by Vitaly Slobodskoy on the sourceware.org report.  According to wikipedia, these new "Sandy Bridge" Xeons use the AVX extensions.


Version-Release number of selected component (if applicable):

glibc 2.13


How reproducible:

1) Execute simple mod_perl module via apache using Apache2::Const.
OR
2) Compile and run attachment glibc_avx.zip.

Actual results from #2:

Hello, World!
Illegal instruction


Expected results:
Hello, World!
Hello, World!

--- Additional comment from mk@cognitivedissonance.ca on 2011-07-10 12:03:42 EDT ---

(In reply to comment #0)
> Seems to also affect F13's glibc 2.13.  I have attached the "small reproducer"
> which was attached to the other bug report, as it does reproduce the issue.

Sorry, that should read "F14", not "F13".

--- Additional comment from mk@cognitivedissonance.ca on 2011-07-10 12:06:54 EDT ---

Created attachment 512095 [details]
test case for SIGILL

--- Additional comment from mk@cognitivedissonance.ca on 2011-07-10 12:08:42 EDT ---

(In reply to comment #0)
> 2) Compile and run attachment glibc_avx.zip.

Sorry, the .zip referred to did not attach with the first message.  The attachment labelled "test case for SIGILL" is a tar.gz containing the same code.

--- Additional comment from schwab@redhat.com on 2011-07-11 04:51:25 EDT ---

Please provide /proc/cpuinfo.

--- Additional comment from mk@cognitivedissonance.ca on 2011-07-11 09:10:46 EDT ---

(In reply to comment #4)
> Please provide /proc/cpuinfo.

This is the first of four identical cores:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      :           Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
stepping        : 7
cpu MHz         : 3192.925
cache size      : 8192 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse s
se2 ss ht tm syscall nx rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr ss
e4_1 sse4_2 popcnt lahf_lm
bogomips        : 6385.85
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management: [8]

--- Additional comment from schwab@redhat.com on 2011-07-11 09:26:54 EDT ---

It doesn't look like the cpu supports AVX.

--- Additional comment from mk@cognitivedissonance.ca on 2011-07-11 11:51:14 EDT ---

Okay, although the E31230 is listed here:

http://en.wikipedia.org/wiki/Sandy_Bridge#Server_processors
http://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX

As supporting AVX.  But I have no further confirmation of that.  The provider definitely seems to believe the new hardware uses "Sandy Bridge", etc.

Anyway, presuming that it actually does *not* support AVX, or that the kernel does not consider it so, what could cause this problem?  This is on an openVZ slice, so I did not/cannot compile the kernel (2.6.32 built with gcc 4.1.2).  

Your patience is appreciated, and please excuse my ignorance here.  This does not look like my mistake. Is there something else that could produce the identical backtrace and cause that "Hello World" test to receive a SIGILL?

--- Additional comment from schwab@redhat.com on 2011-07-12 02:26:01 EDT ---

What is "openVZ slice"?

--- Additional comment from mk@cognitivedissonance.ca on 2011-07-12 08:26:17 EDT ---

(In reply to comment #8)
> What is "openVZ slice"?

OpenVZ is akin to Xen; it's a method for "container-based virtualization".  Ie, the system in question is a VPS (virtual private server) "slice".  I have never used the openVZ software myself, but I do work on small remote servers that depend on it or Xen.

http://wiki.openvz.org/Main_Page

Since the kernel used may be a factor, I'll get on the openVZ mail list and see if anyone has any thoughts about possible causes, and perhaps the kernel devel list too.

--- Additional comment from updates@fedoraproject.org on 2011-08-15 06:58:13 EDT ---

glibc-2.14.90-5 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/glibc-2.14.90-5

--- Additional comment from updates@fedoraproject.org on 2011-08-15 16:25:42 EDT ---

Package glibc-2.14.90-5:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing glibc-2.14.90-5'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/glibc-2.14.90-5
then log in and leave karma (feedback).

--- Additional comment from updates@fedoraproject.org on 2011-08-24 11:27:33 EDT ---

glibc-2.14.90-6 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/glibc-2.14.90-6

--- Additional comment from mk@cognitivedissonance.ca on 2011-08-27 17:25:06 EDT ---

I upgraded to this glibc-2.14.90.5 on the avx hardware under openVZ, fedora 14, and the test case succeeded.  After recompiling apache + perl + mod_perl the original issue was also resolved.

Thanks much.

--- Additional comment from updates@fedoraproject.org on 2011-09-02 03:19:16 EDT ---

glibc-2.14.90-7 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/glibc-2.14.90-7

--- Additional comment from updates@fedoraproject.org on 2011-09-13 02:11:25 EDT ---

glibc-2.14.90-8 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 1 Steven Haigh 2012-02-14 12:46:57 UTC
I should also note that both systems (the working and the failing) are Xen DomU configurations.

The output of /proc/cpuinfo for 1 CPU:
# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
stepping        : 7
cpu MHz         : 3309.776
cache size      : 6144 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu de tsc msr pae cx8 cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm up rep_good aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat epb pln pts dts
bogomips        : 6619.55
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

Comment 3 Steven Haigh 2012-02-14 13:04:19 UTC
As further confirmation of my suspicions, I downloaded the test case program attached to #720126 and got the same output:

# ./exe
Hello, World!
Illegal instruction

Comment 4 Steven Haigh 2012-02-14 13:22:48 UTC
Added link to same bug on bugs.debian.org with attached patch.

Comment 5 Jeff Law 2012-02-14 16:14:59 UTC
This is a duplicate of 752122, which is scheduled to be fixed in Red Hat Enterprise Linux 6.3.

*** This bug has been marked as a duplicate of bug 752122 ***


Note You need to log in before you can comment on or make changes to this bug.