Bug 829011 - Python fails to run properly on Xen because it does not check for Advanced Vector Instructions before using them
Summary: Python fails to run properly on Xen because it does not check for Advanced Ve...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 17
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jeff Law
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-05 18:53 UTC by W. Michael Petullo
Modified: 2016-11-24 12:44 UTC (History)
16 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2012-09-10 02:58:40 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description W. Michael Petullo 2012-06-05 18:53:22 UTC
Description of problem:
I have installed Fedora 17 using a Kickstart script that uses --nobase to install a very minimal set of package. Following this install, yum will not work.

Version-Release number of selected component (if applicable):
yum-3.4.3-23.fc17.noarch

How reproducible:
Every time

Steps to Reproduce:
1. Install a minimal Fedora 17 using Kickstart's --nobase flag and:

avahi
nss-mdns
ntp
system-config-firewall-base
xen
yum
gcc
git
glibc.i686
glibc-devel.i686
glibc-devel.x86_64
glibc-static.i686
glibc-static.x86_64
libgcc.i686
make
tar
xen-devel

2. Run yum.
  
Actual results:
Yum crashes with "Illegal instruction"

Expected results:
Yum should run.

Additional info:
The machine has an AMD FX-4170 quad-core processor.

I tried to figure out what was causing the crash, so I run the Python intepreter manually and found this:

$ python
>>> import yum
Illegal instruction

That is, the crash happens upon attempting to load the yum module.

I thought there might be a problem with the .pyc/.pyo file installed by the yum RPMs. So I removed the .pyc/.pyo files installed by the yum RPM and then the rpm RPM. This had no effect. But when I removed all .pyc/.pyo files, the Python interpreter itself started crashing with "Illegal instruction".

Comment 1 James Antill 2012-06-07 14:26:31 UTC
This likely means it's a module that yum imports which is causing a problem (presumably due to something weird on your system).

Does "import rpm" work, how about "import sqlite3" ... or "import sqlitecachec"?

Comment 2 W. Michael Petullo 2012-06-25 21:55:19 UTC
All of the import statements mentioned in comment 1 work. I tried to look into the yum module and import all of the modules the yum module imports. As I was going down the list, I found that "import tempfile" caused an "Illegal instruction" crash.

I did the same for tempfile.py and found "import random" crashed.

Of course, this is a pain because I can't yum install gdb or the Python debuginfo package.

Comment 3 W. Michael Petullo 2012-06-25 22:12:05 UTC
I also tried to download Python (and gdbm) to the Fedora 16 package, otherwise running on Fedora 17. I did this, but "import random" continues to crash with "Illegal instruction". This is despite that I have an identical machine running Fedora 16---it works fine.

Comment 4 W. Michael Petullo 2012-06-28 19:32:05 UTC
I just noticed that there is only a problem when I boot Fedora 17 as a Xen Dom0 guest. I tried booting Fedora 17 on bare metal and that worked fine. When running on Xen, /proc/cpuinfo reports:

fpu de tsc msr pae cx8 apic cmov pat clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch xop fma4 perfctr_core arat cpb hw_pstate

Comment 5 W. Michael Petullo 2012-06-28 21:03:25 UTC
I ran Python in gdb and found that it uses Advanced Vector Instructions without checking to see if they are available.

As stated before, "gdb python", "run", and "import random" causes an illegal operation exception. I then used "info address __ieee754_exp_fma4" to obtain the address of the function that caused the exception. Sure enough, "disassemble 0x..." showed a vmovsd %xmm0,-0x20(%rsp) instruction.

Problems with AVX on Xen has been discussed elsewhere:

http://lists.xen.org/archives/html/xen-devel/2012-05/msg00216.html

Comment 6 Dave Malcolm 2012-06-29 14:29:51 UTC
(In reply to comment #5)
> I ran Python in gdb and found that it uses Advanced Vector Instructions
> without checking to see if they are available.
> 
> As stated before, "gdb python", "run", and "import random" causes an illegal
> operation exception. I then used "info address __ieee754_exp_fma4" to obtain
> the address of the function that caused the exception. Sure enough,
> "disassemble 0x..." showed a vmovsd %xmm0,-0x20(%rsp) instruction.
Where *exactly* is it using the instruction?  Which source file, and which DSO?

Comment 7 W. Michael Petullo 2012-07-02 14:45:39 UTC
Here is a partial backtrace on "import random" (if I install the Python and glibc debuginfo packages, gdb itself crashes):

(gdb) ba
#0  0x00007ffff71474ec in __ieee754_exp_fma4 () from /lib64/libm.so.6
#1  0x00007ffff009e415 in ?? () from /usr/lib64/python2.7/lib-dynload/math.so
#2  0x00007ffff7b00053 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#3  0x00007ffff7b00b2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#4  0x00007ffff7b00c02 in PyEval_EvalCode () from /lib64/libpython2.7.so.1.0
#5  0x00007ffff7b1017d in PyImport_ExecCodeModuleEx () from /lib64/libpython2.7.so.1.0
#6  0x00007ffff7b10947 in ?? () from /lib64/libpython2.7.so.1.0
#7  0x00007ffff7b11486 in ?? () from /lib64/libpython2.7.so.1.0
#8  0x00007ffff7b11700 in ?? () from /lib64/libpython2.7.so.1.0
#9  0x00007ffff7b11c6f in ?? () from /lib64/libpython2.7.so.1.0
#10 0x00007ffff7b1221a in PyImport_ImportModuleLevel () from /lib64/libpython2.7.so.1.0
#11 0x00007ffff7af860f in ?? () from /lib64/libpython2.7.so.1.0
#12 0x00007ffff7a6ca7e in PyObject_Call () from /lib64/libpython2.7.so.1.0
#13 0x00007ffff7afa1d7 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#14 0x00007ffff7afc051 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#15 0x00007ffff7b00b2f in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#16 0x00007ffff7b00c02 in PyEval_EvalCode () from /lib64/libpython2.7.so.1.0
#17 0x00007ffff7b19baa in ?? () from /lib64/libpython2.7.so.1.0
#18 0x00007ffff7b1b834 in PyRun_InteractiveOneFlags () from /lib64/libpython2.7.so.1.0
#19 0x00007ffff7b1ba0e in PyRun_InteractiveLoopFlags () from /lib64/libpython2.7.so.1.0
#20 0x00007ffff7b1bfac in PyRun_AnyFileExFlags () from /lib64/libpython2.7.so.1.0
#21 0x00007ffff7b2c892 in Py_Main () from /lib64/libpython2.7.so.1.0
#22 0x00007ffff6d6f735 in __libc_start_main () from /lib64/libc.so.6
#23 0x00000000004006f1 in _start ()

Comment 8 Dave Malcolm 2012-07-02 14:53:05 UTC
Thanks!

Looking at comment #7, I see that the __ieee754_exp_fma4 is within /lib64/libm.so.6, which is part of glibc - reassigning the component accordingly.

I believe that you're seeing it within "import random" because python's random.py imports math.exp as _exp, and executes this line:
   NV_MAGICCONST = 4 * _exp(-0.5)/_sqrt(2.0)
later on as part of the import of random.py; math.exp is imported in terms of the C standard library's exp function.

A more minimal reproducer may be:
  python -c"from math import exp; print exp(1)"
albeit within the Xen environment you're using (which I'm not expert at).

Comment 9 W. Michael Petullo 2012-07-02 15:06:07 UTC
Yes, "python -c 'from math import exp; exp (1)'" causes an illegal instruction.

Comment 10 Dave Malcolm 2012-07-02 15:16:43 UTC
Thanks - this is a glibc vs xen issue then: Python's math.exp is just a thin wrapper around the C standard library's:
   double exp(double);
entrypoint in libm.

Comment 11 Jeff Law 2012-07-02 21:54:29 UTC
Xen vs AVX2, round N.

There's a very very good chance this works in Rawhide.  I was really focused on the Xen dom0 booting problem due to incorrect AVX checking.  The same problems we had with AVX checking affect the FMA4 checks.

Comment 12 Jeff Law 2012-07-03 20:07:55 UTC
This is definitely the FMA4 detection problem.  Test builds with a patch fix the problem.  Official builds are spinning.

Comment 13 Fedora Update System 2012-07-03 21:12:46 UTC
glibc-2.15-51.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/glibc-2.15-51.fc17

Comment 14 Fedora Update System 2012-07-19 09:15:11 UTC
glibc-2.15-51.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.