1330098 – FTBFS: segmentation fault during build

Bug 1330098 - FTBFS: segmentation fault during build

Summary: FTBFS: segmentation fault during build

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	coq
Sub Component:
Version:	24
Hardware:	ppc64le
OS:	Linux
Priority:	medium
Severity:	unspecified
Target Milestone:	---
Assignee:	Alan Dunn
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	PPCTracker
TreeView+	depends on / blocked

Reported:	2016-04-25 12:10 UTC by Karsten Hopp
Modified:	2017-01-23 05:19 UTC (History)
CC List:	5 users (show)
Fixed In Version:	coq-8.6-1.fc25
Clone Of:
Environment:
Last Closed:	2017-01-23 05:19:15 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Karsten Hopp 2016-04-25 12:10:28 UTC

Description of problem:
bin/coqtop -boot   -native-compiler -compile theories/Init/Datatypes.v  -noinit -R theories Coq
rm -f theories/Init/Logic_Type.glob
bin/coqtop -boot   -native-compiler -compile theories/Init/Logic_Type.v  -noinit -R theories Coq
rm -f theories/Init/Nat.glob
bin/coqtop -boot   -native-compiler -compile theories/Init/Nat.v  -noinit -R theories Coq
Makefile.build:590: recipe for target 'theories/Init/Nat.vo' failed
make[1]: *** [theories/Init/Nat.vo] Segmentation fault (core dumped)
make[1]: *** [theories/Init/Nat.vo] Deleting file 'theories/Init/Nat.glob'
make[1]: Leaving directory '/builddir/build/BUILD/coq-8.5pl1'
Makefile:158: recipe for target 'submake' failed
make: *** [submake] Error 2

Version-Release number of selected component (if applicable):
coq-8.5pl1-1.fc24.

How reproducible:
always

Steps to Reproduce:
1. ppc-koj ibuild --scratch f24 coq-8.5pl1-1.fc24.src.rpm
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jerry James 2016-04-27 14:16:18 UTC

This is probably related to the new native compilation feature of coq.  I'll try to track it down but, if unsuccessful, may have to turn that feature off for ppc64le.

Comment 2 Jerry James 2016-05-06 22:22:42 UTC

Since I already had a ppc64 VM that I used to track down a gap bug, I tried building coq there, but it succeeded.  So this bug appears on ppc64le, but not ppc64, unless my VM is somehow significantly different from real hardware.  I'll try creating a ppc64le VM next.  (Although access to real hardware would be very much appreciated, as these emulated CPUs are really slooooooow.)

Comment 3 Dan Horák 2016-05-09 05:57:58 UTC

Oh, when you spoke about VM I thought you already have access to real HW VM. There are OpenPOWER hubs across the world that provide VMs to developers, like the one we manage at Brno Technical University. You can sign up at https://docs.google.com/forms/d/1py0Fg2x9xoQP9yYzOm4DSnJ1buyQBBCWN9Sj_OZuXek/viewform

Comment 4 Jerry James 2016-05-20 23:51:20 UTC

I suspect an ocaml code generation bug on ppc64le.  This is not related to the new native compilation feature of coq.  The segfault happens with that turned off.  The segfault occurs in the function nat_of_int (see plugins/syntax/nat_syntax.ml).

On ppc64, which works, that function starts like this:

    1d40:       7c 08 02 a6     mflr    r0
    1d44:       f8 01 00 10     std     r0,16(r1)
    1d48:       f8 21 ff 51     stdu    r1,-176(r1)
    1d4c:       f8 81 00 88     std     r4,136(r1)
    1d50:       f8 61 00 80     std     r3,128(r1)
    1d54:       7c 83 23 78     mr      r3,r4
    1d58:       e9 62 81 08     ld      r11,-32504(r2)
    1d5c:       f8 41 00 28     std     r2,40(r1)
    1d60:       e8 4b 00 08     ld      r2,8(r11)
    1d64:       e9 6b 00 00     ld      r11,0(r11)
    1d68:       7d 69 03 a6     mtctr   r11
    1d6c:       4e 80 04 21     bctrl

On ppc64le, which segfaults, that function starts like this:

    1aa0:       02 00 4c 3c     addis   r2,r12,2
    1aa4:       60 63 42 38     addi    r2,r2,25440
    1aa8:       a6 02 08 7c     mflr    r0
    1aac:       10 00 01 f8     std     r0,16(r1)
    1ab0:       51 ff 21 f8     stdu    r1,-176(r1)
    1ab4:       88 00 81 f8     std     r4,136(r1)
    1ab8:       80 00 61 f8     std     r3,128(r1)
    1abc:       78 23 83 7c     mr      r3,r4
    1ac0:       08 81 82 e9     ld      r12,-32504(r2)
    1ac4:       18 00 41 f8     std     r2,24(r1)
    1ac8:       a6 03 89 7d     mtctr   r12
    1acc:       21 04 80 4e     bctrl

The bctrl instruction is the one that faults, because a bogus address has been loaded into the ctr register.  I notice that on ppc64, after loading r11 with -32504(r2), an additional load into r11 of 0(r11) is done, but there is no such corresponding additional load of r12 in the ppc64le code.  I also notice that on ppc, the value of r2 appears to come from the caller, where on ppc64le, there are 2 additional instructions at the top of the function that set r2 from the value of r12.

Of course, I basically know squat about ppc assembly, so all of that might be utter nonsense.

I'm adding Richard Jones to the CC list.  Richard, are you aware of any ocaml code generation bugs on ppc64le?  Perhaps involving recursive local functions?

Comment 5 Richard W.M. Jones 2016-05-21 08:55:16 UTC

It wouldn't surprise me if there are still lurking ppc64le codegen
bugs.  Note that Fedora 24 is still using the out-of-tree code
generator written by IBM.  New in OCaml 4.03 is an in-tree code
generator for POWER written from scratch by Xavier Leroy et al.

Therefore I'm not inclined to fix code gen bugs in this version
of OCaml unless they are simple/obvious ones.  We will get the
new code generator in F25.

Comment 6 Richard W.M. Jones 2016-05-21 08:56:45 UTC

(In reply to Jerry James from comment #4)
> Perhaps involving recursive local functions?

BTW don't overlook simple stack overflows.  These are much more
common on POWER because the stack frames are larger.  Try doing:

ulimit -s 65536

before the build.

Comment 7 Fedora Update System 2017-01-13 21:18:57 UTC

why-2.36-1.fc25 frama-c-1.13-7.fc25 why3-0.87.3-1.fc25 gappalib-coq-1.3.2-2.fc25 zenon-0.8.2-5.fc25 flocq-2.5.2-4.fc25 coq-8.6-1.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-033c8000df

Comment 8 Fedora Update System 2017-01-14 06:22:31 UTC

coq-8.6-1.fc25, flocq-2.5.2-4.fc25, frama-c-1.13-7.fc25, gappalib-coq-1.3.2-2.fc25, why-2.36-1.fc25, why3-0.87.3-1.fc25, zenon-0.8.2-5.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-033c8000df

Comment 9 Fedora Update System 2017-01-23 05:19:15 UTC

coq-8.6-1.fc25, flocq-2.5.2-4.fc25, frama-c-1.13-7.fc25, gappalib-coq-1.3.2-2.fc25, why-2.36-1.fc25, why3-0.87.3-1.fc25, zenon-0.8.2-5.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.