Bug 707707 - pypy crash (core dump due to illegal instruction) on startup
Summary: pypy crash (core dump due to illegal instruction) on startup
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: pypy
Version: 15
Hardware: i686
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Dave Malcolm
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-25 18:17 UTC by Eike Hein
Modified: 2011-12-17 12:15 UTC (History)
2 users (show)

Fixed In Version: pypy-1.7-2.fc17
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-16 22:21:56 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
gdb backtrace of pypy crash (4.81 KB, text/plain)
2011-05-25 18:44 UTC, Eike Hein
no flags Details
Second gdb backtrace including assembler dump (8.41 KB, text/plain)
2011-05-25 18:47 UTC, Eike Hein
no flags Details
Third backtrace including assembler dump and register info (8.90 KB, text/plain)
2011-05-25 18:49 UTC, Eike Hein
no flags Details

Description Eike Hein 2011-05-25 18:17:52 UTC
Description of problem:
I'm getting "Illegal instruction (core dumped)" on trying to start "pypy" on F15 i686:


Version-Release number of selected component (if applicable):

[sho@ehs1 ~ 34M]$ rpm -q pypy
pypy-1.5-1.fc15.i686


How reproducible:

Always.


Steps to Reproduce:

[sho@ehs1 ~ 23M]$ pypy
Illegal instruction (core dumped)


Additional info:

The same works fine on the F15 x86_64 I have on another machine, so the problem appears to be i686-specific.

#pypy on Freenode doesn't seem familiar with seeing this, so it might be Fedora-specific. Or machine-specific, of course.

cpuinfo:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 10
model name      : AMD Athlon(tm) XP 2800+
stepping        : 0
cpu MHz         : 2088.184
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow up
bogomips        : 4176.36
clflush size    : 32
cache_alignment : 32
address sizes   : 34 bits physical, 32 bits virtual
power management: ts


A bit of strace leading up to the crash:

[...]
open("/proc/filesystems", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb784f000
read(3, "nodev\tsysfs\nnodev\trootfs\nnodev\tb"..., 1024) = 294
read(3, "", 1024)                       = 0
close(3)                                = 0
munmap(0xb784f000, 4096)                = 0
mmap2(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77ec000
ioctl(2, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
mmap2(NULL, 204800, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb77ba000
--- {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x80cf47b} (Illegal instruction) ---
+++ killed by SIGILL (core dumped) +++
Illegal instruction (core dumped)

Comment 1 Dave Malcolm 2011-05-25 18:27:21 UTC
Thanks for filing this bug report.

Please can you install pypy-debuginfo:
   debuginfo-install pypy
and generate a stack backtrace, and attach it to this bug, as per:

http://fedoraproject.org/wiki/StackTraces#How_do_I_generate_a_backtrace.3F

Comment 2 Eike Hein 2011-05-25 18:44:26 UTC
Created attachment 500915 [details]
gdb backtrace of pypy crash

Thanks for the swift response :); here's the requested trace.

Comment 3 Eike Hein 2011-05-25 18:47:49 UTC
Created attachment 500917 [details]
Second gdb backtrace including assembler dump

Comment 4 Eike Hein 2011-05-25 18:49:57 UTC
Created attachment 500918 [details]
Third backtrace including assembler dump and register info

As requested over on IRC.

Comment 5 Dave Malcolm 2011-05-25 19:37:57 UTC
Thanks for the data.

If I'm reading this correctly, the crash is happening on this instruction:

=> 0x080cf47b <+91>:	movsd  0x30(%ebx),%xmm0

which seems to correspond to this autogenerated C code:

    l_v114035 = RPyField(l_self_7466, mmgc_inst_min_heap_size);

which appears to be reading "self.min_heap_size" as part of:
  RPython source '/builddir/build/BUILD/pypy-1.5-src/pypy/rpython/memory/gc/minimark.py'
  370 :     def allocate_nursery(self):                                              
  ...
  379 :         self.min_heap_size = max(self.min_heap_size, self.nursery_size *     
  380 :                                               self.major_collection_threshold

during the early stages of initializing PyPy (setting up the allocator/garbage collector, in fact)

Comment 6 Dave Malcolm 2011-05-25 19:42:02 UTC
(gdb) p (void*)&(l_self_7466->mmgc_inst_min_heap_size) - (void*)l_self_7466
$19 = 48

and 48 is indeed 0x30

(gdb) p /x $ebx
$22 = 0x8c45820
(gdb) p l_self_7466
$23 = (struct pypy_pypy_rpython_memory_gc_minimark_MiniMarkGC0 *) 0x8c45820

This confirms that:

  0x080cf47b <+91>: movsd  0x30(%ebx),%xmm0

is indeed a read of l_self_7466's field "mmgc_inst_min_heap_size".

Comment 7 Eike Hein 2011-05-25 20:01:05 UTC
Speculation on IRC is that the CPU may be too old to support one of the instructions that get issued, which was my hunch as well, hence the cpuinfo print in the initial report:

<daniel_hozac> dmalcolm: requires SSE2, and that person seems to only have SSE.
<daniel_hozac> dmalcolm: wait, no, my bad. that should work on any machine with SSE.

It's an Athlon XP with a Barton core though, which does seem to support SSE, as the kernel claims. gcc --march=native concurs:

cc -march=native -E -v - </dev/null 2>&1 | grep cc1

produces:

/usr/libexec/gcc/i686-redhat-linux/4.6.0/cc1 -E -quiet -v - -march=athlon-4 --param l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=512 -mtune=athlon

gcc docs for athlon-4: "Improved AMD Athlon CPU with MMX, 3dNOW!, enhanced 3dNOW! and full SSE instruction set support."

Comment 8 Dave Malcolm 2011-05-25 20:42:09 UTC
Looking at the build log:
http://kojipkgs.fedoraproject.org/packages/pypy/1.5/1.fc15/data/logs/i686/build.log

I see that:
   CFLAGS=' -g -pipe -Wall     -m32   '
which doesn't contain Fedora's standard:
  -march=i686 -mtune=atom

The absence of those CPU flags is related to bug 666966, so this may be my fault for overriding Fedora's standard build flags - I'll have another look at that bug.

Comment 9 Dave Malcolm 2011-05-25 20:54:36 UTC
Am attempting a scratch build that reinstates those flags:
  http://koji.fedoraproject.org/koji/taskinfo?taskID=3092719

Note to self: see also http://fedoraproject.org/wiki/Features/F12X86Support

Comment 10 Dave Malcolm 2011-05-25 22:39:41 UTC
(In reply to comment #9)
> Am attempting a scratch build that reinstates those flags:
>   http://koji.fedoraproject.org/koji/taskinfo?taskID=3092719

This build failed with:
[translation:ERROR] 	Traceback (most recent call last):
[translation:ERROR] 	  File "/builddir/build/BUILD/pypy-1.5-src/pypy/translator/c/gcc/trackgcroot.py", line 1951, in <module>
[translation:ERROR] 	    tracker.process(f, g, entrypoint=entrypoint, filename=fn)
[translation:ERROR] 	  File "/builddir/build/BUILD/pypy-1.5-src/pypy/translator/c/gcc/trackgcroot.py", line 1842, in process
[translation:ERROR] 	    tracker = parser.process_function(lines, entrypoint, filename)
[translation:ERROR] 	  File "/builddir/build/BUILD/pypy-1.5-src/pypy/translator/c/gcc/trackgcroot.py", line 1350, in process_function
[translation:ERROR] 	    table = tracker.computegcmaptable(self.verbose)
[translation:ERROR] 	  File "/builddir/build/BUILD/pypy-1.5-src/pypy/translator/c/gcc/trackgcroot.py", line 54, in computegcmaptable
[translation:ERROR] 	    self.fixlocalvars()
[translation:ERROR] 	  File "/builddir/build/BUILD/pypy-1.5-src/pypy/translator/c/gcc/trackgcroot.py", line 295, in fixlocalvars
[translation:ERROR] 	    setattr(insn, name, fixvar(localvar))
[translation:ERROR] 	  File "/builddir/build/BUILD/pypy-1.5-src/pypy/translator/c/gcc/trackgcroot.py", line 277, in fixvar
[translation:ERROR] 	    assert localvar != 0    # that's the return address
[translation:ERROR] 	AssertionError
[translation:ERROR] 	make: *** [testing_1.gcmap] Error 1

i.e.:
Traceback (most recent call last):
  File "/builddir/build/BUILD/pypy-1.5-src/pypy/translator/c/gcc/trackgcroot.py", line 1951, in <module>
    tracker.process(f, g, entrypoint=entrypoint, filename=fn)
  File "/builddir/build/BUILD/pypy-1.5-src/pypy/translator/c/gcc/trackgcroot.py", line 1842, in process
    tracker = parser.process_function(lines, entrypoint, filename)
  File "/builddir/build/BUILD/pypy-1.5-src/pypy/translator/c/gcc/trackgcroot.py", line 1350, in process_function
    table = tracker.computegcmaptable(self.verbose)
  File "/builddir/build/BUILD/pypy-1.5-src/pypy/translator/c/gcc/trackgcroot.py", line 54, in computegcmaptable
    self.fixlocalvars()
  File "/builddir/build/BUILD/pypy-1.5-src/pypy/translator/c/gcc/trackgcroot.py", line 295, in fixlocalvars
    setattr(insn, name, fixvar(localvar))
  File "/builddir/build/BUILD/pypy-1.5-src/pypy/translator/c/gcc/trackgcroot.py", line 277, in fixvar
    assert localvar != 0    # that's the return address
AssertionError
make: *** [testing_1.gcmap] Error 1

Comment 11 Dave Malcolm 2011-12-16 22:21:56 UTC
As of 1.7-2, I'm using --gcrootfinder=shadowstack (to avoid relying on implementation details of gcc), and using the distro-wide compilation flags:
http://pkgs.fedoraproject.org/gitweb/?p=pypy.git;a=commitdiff;h=416c353a67345f32fa7da698b18eef5ad1d1e77e

This should ensure that we get the correct arch flags (see comment #8) and thus get machine code that runs on every machine Fedora targets.

Comment 12 Eike Hein 2011-12-17 12:15:57 UTC
Thanks Dave. Unfortunately I cannot test because the machine I had the problem on died in the meantime, and my remaining machines have too new CPUs to reproduce. Perhaps QEMU could help ...


Note You need to log in before you can comment on or make changes to this bug.