Bug 843731

Summary: ocamlopt.opt (from Rawhide) segfaults when run on a RHEL 6 32 bit kernel
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: ocamlAssignee: Richard W.M. Jones <rjones>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: acathrow, c.david86, fedora-ocaml-list, mbooth, rjones, tcallawa
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-06 19:40:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Richard W.M. Jones 2012-07-27 08:10:51 UTC
Description of problem:

ocamlopt.opt got signal and exited
make[2]: *** [inspect_vm] Error 2
make[2]: *** Waiting for unfinished jobs....

(probably when building ocaml/examples)

Version-Release number of selected component (if applicable):

libguestfs 1.19.26, Rawhide when building in Koji

How reproducible:

Happened twice (three times??)

Comment 1 Richard W.M. Jones 2012-07-27 09:29:05 UTC
Cannot reproduce on Rawhide (64 bit) even with latest OCaml
and glibc.

Builds which failed:
http://koji.fedoraproject.org/koji/taskinfo?taskID=4332739 (i686)
http://koji.fedoraproject.org/koji/taskinfo?taskID=4332668 (i686)

Comment 2 Richard W.M. Jones 2012-07-27 09:50:27 UTC
http://koji.fedoraproject.org/koji/taskinfo?taskID=4335505 (also 32 bit)

3 out of 3 failed only on 32 bit, so I'm just installing a
32 bit Rawhide VM for testing.

Comment 3 Richard W.M. Jones 2012-07-27 16:22:52 UTC
I cannot reproduce this in a VM.

Comment 4 seth vidal 2012-07-27 18:08:58 UTC
Can you try reproducing it in an rhel 6 vm?

also:

ocamlopt.opt[19472]: segfault at 55b67514 ip 0000000008196fe3 sp 00000000ffb3823c error 4 in ocamlopt.opt[8048000+163000]
conftest[17385]: segfault at 1 ip 00007f9c4d49df05 sp 00007fff66d87540 error 4 in libc-client.so.2007[7f9c4d45c000+105000]
conftest[10530]: segfault at 1 ip 00007ffe966d5f05 sp 00007fffbad02040 error 4 in libc-client.so.2007[7ffe96694000+105000]
php[32450]: segfault at 0 ip 000000000044ff0c sp 00007fff297faa70 error 4 in php[400000+309000]
ocamlopt.opt[15091]: segfault at 55b67514 ip 0000000008196fe3 sp 00000000ff826bec error 4 in ocamlopt.opt[8048000+163000]


is what I see in dmesg when I do the build using mock directly on one of the builders.

this isn't koji-specific at least.

Comment 5 Richard W.M. Jones 2012-07-27 18:22:41 UTC
Thanks for testing.

Could be the same thing as the bug that stops coq from building,
which we think is a bug in the i686 code generator in OCaml 4.00.0.

Comment 6 Richard W.M. Jones 2012-07-27 21:59:48 UTC
The location of the segfault is in _C_ code (not
generated OCaml code).

asmrun/compact.c: invert_pointer_at  line 80:

     while (Ecolor (*hp) == 0) hp = (word *) *hp;

(specifically it happens while dereferencing *hp).

However this is the garbage collector 'compact' module so
this probably just indicates the some OCaml code corrupted
the OCaml heap and we don't find out until the GC runs.

Comment 7 Richard W.M. Jones 2012-07-28 15:41:54 UTC
I updated Rawhide to OCaml 4.00.0 official release, but
the bug still manifests itself exactly the same way.

Comment 8 Richard W.M. Jones 2012-07-30 15:32:42 UTC
(In reply to comment #4)
> Can you try reproducing it in an rhel 6 vm?

I took the F18 32 bit guest over and booted it on a RHEL 6.3
host.  libguestfs builds correctly (ie. the bug is not exhibited).

However I'm wondering how closely my environment matches the
Koji environment:

 (1) What host kernel is used?
     => in my case: 2.6.32-279.el6.x86_64

 (2) What guest kernel is used (ie. the environment where mock runs)?
     => in my case: 3.3.4-5.fc17.i686.PAE (Rawhide kernel doesn't boot
        for unrelated reasons)

I suspect that on the real Koji, (2) is different because what
Koji does is to boot a RHEL 6 guest with a mock chroot containing
F18 packages, whereas I've got a real F17/18 guest.

Comment 9 Richard W.M. Jones 2012-07-31 18:56:10 UTC
I managed to reproduce this.

I used a RHEL 6, 32 bit VM.  I installed mock and built
libguestfs-1.19.26-2.fc18.src.rpm in Rawhide, ie:

$ ls -l /etc/mock/default.cfg 
lrwxrwxrwx. 1 root root 23 Jul 31 13:16 /etc/mock/default.cfg -> fedora-rawhide-i386.cfg
$ mock -D '%libguestfs_buildnet 1' -D '%libguestfs_runtests 0' --rebuild libguestfs-1.19.26-2.fc18.src.rpm

So the bug has something to do with ocamlopt.opt from Rawhide
when run on a RHEL 6 32 bit kernel.

Comment 10 Richard W.M. Jones 2012-08-01 10:00:10 UTC
Needless to say, going into the mock chroot and building
by hand does not exhibit the bug.  Gahhhhhh ....

Comment 11 Fedora End Of Life 2013-04-03 17:23:03 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 12 Richard W.M. Jones 2014-01-06 19:40:17 UTC
This is a stack alignment problem.  Worked around in Rawhide:

http://pkgs.fedoraproject.org/cgit/ocaml.git/commit/?id=179ac32d01818da5252cc100e9b97f347568727d

Upstream is working on a fix:

http://caml.inria.fr/mantis/view.php?id=6038