Bug 843731 - ocamlopt.opt (from Rawhide) segfaults when run on a RHEL 6 32 bit kernel
Summary: ocamlopt.opt (from Rawhide) segfaults when run on a RHEL 6 32 bit kernel
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: ocaml
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Richard W.M. Jones
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-07-27 08:10 UTC by Richard W.M. Jones
Modified: 2014-01-06 19:40 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-01-06 19:40:17 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Richard W.M. Jones 2012-07-27 08:10:51 UTC
Description of problem:

ocamlopt.opt got signal and exited
make[2]: *** [inspect_vm] Error 2
make[2]: *** Waiting for unfinished jobs....

(probably when building ocaml/examples)

Version-Release number of selected component (if applicable):

libguestfs 1.19.26, Rawhide when building in Koji

How reproducible:

Happened twice (three times??)

Comment 1 Richard W.M. Jones 2012-07-27 09:29:05 UTC
Cannot reproduce on Rawhide (64 bit) even with latest OCaml
and glibc.

Builds which failed:
http://koji.fedoraproject.org/koji/taskinfo?taskID=4332739 (i686)
http://koji.fedoraproject.org/koji/taskinfo?taskID=4332668 (i686)

Comment 2 Richard W.M. Jones 2012-07-27 09:50:27 UTC
http://koji.fedoraproject.org/koji/taskinfo?taskID=4335505 (also 32 bit)

3 out of 3 failed only on 32 bit, so I'm just installing a
32 bit Rawhide VM for testing.

Comment 3 Richard W.M. Jones 2012-07-27 16:22:52 UTC
I cannot reproduce this in a VM.

Comment 4 seth vidal 2012-07-27 18:08:58 UTC
Can you try reproducing it in an rhel 6 vm?

also:

ocamlopt.opt[19472]: segfault at 55b67514 ip 0000000008196fe3 sp 00000000ffb3823c error 4 in ocamlopt.opt[8048000+163000]
conftest[17385]: segfault at 1 ip 00007f9c4d49df05 sp 00007fff66d87540 error 4 in libc-client.so.2007[7f9c4d45c000+105000]
conftest[10530]: segfault at 1 ip 00007ffe966d5f05 sp 00007fffbad02040 error 4 in libc-client.so.2007[7ffe96694000+105000]
php[32450]: segfault at 0 ip 000000000044ff0c sp 00007fff297faa70 error 4 in php[400000+309000]
ocamlopt.opt[15091]: segfault at 55b67514 ip 0000000008196fe3 sp 00000000ff826bec error 4 in ocamlopt.opt[8048000+163000]


is what I see in dmesg when I do the build using mock directly on one of the builders.

this isn't koji-specific at least.

Comment 5 Richard W.M. Jones 2012-07-27 18:22:41 UTC
Thanks for testing.

Could be the same thing as the bug that stops coq from building,
which we think is a bug in the i686 code generator in OCaml 4.00.0.

Comment 6 Richard W.M. Jones 2012-07-27 21:59:48 UTC
The location of the segfault is in _C_ code (not
generated OCaml code).

asmrun/compact.c: invert_pointer_at  line 80:

     while (Ecolor (*hp) == 0) hp = (word *) *hp;

(specifically it happens while dereferencing *hp).

However this is the garbage collector 'compact' module so
this probably just indicates the some OCaml code corrupted
the OCaml heap and we don't find out until the GC runs.

Comment 7 Richard W.M. Jones 2012-07-28 15:41:54 UTC
I updated Rawhide to OCaml 4.00.0 official release, but
the bug still manifests itself exactly the same way.

Comment 8 Richard W.M. Jones 2012-07-30 15:32:42 UTC
(In reply to comment #4)
> Can you try reproducing it in an rhel 6 vm?

I took the F18 32 bit guest over and booted it on a RHEL 6.3
host.  libguestfs builds correctly (ie. the bug is not exhibited).

However I'm wondering how closely my environment matches the
Koji environment:

 (1) What host kernel is used?
     => in my case: 2.6.32-279.el6.x86_64

 (2) What guest kernel is used (ie. the environment where mock runs)?
     => in my case: 3.3.4-5.fc17.i686.PAE (Rawhide kernel doesn't boot
        for unrelated reasons)

I suspect that on the real Koji, (2) is different because what
Koji does is to boot a RHEL 6 guest with a mock chroot containing
F18 packages, whereas I've got a real F17/18 guest.

Comment 9 Richard W.M. Jones 2012-07-31 18:56:10 UTC
I managed to reproduce this.

I used a RHEL 6, 32 bit VM.  I installed mock and built
libguestfs-1.19.26-2.fc18.src.rpm in Rawhide, ie:

$ ls -l /etc/mock/default.cfg 
lrwxrwxrwx. 1 root root 23 Jul 31 13:16 /etc/mock/default.cfg -> fedora-rawhide-i386.cfg
$ mock -D '%libguestfs_buildnet 1' -D '%libguestfs_runtests 0' --rebuild libguestfs-1.19.26-2.fc18.src.rpm

So the bug has something to do with ocamlopt.opt from Rawhide
when run on a RHEL 6 32 bit kernel.

Comment 10 Richard W.M. Jones 2012-08-01 10:00:10 UTC
Needless to say, going into the mock chroot and building
by hand does not exhibit the bug.  Gahhhhhh ....

Comment 11 Fedora End Of Life 2013-04-03 17:23:03 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 12 Richard W.M. Jones 2014-01-06 19:40:17 UTC
This is a stack alignment problem.  Worked around in Rawhide:

http://pkgs.fedoraproject.org/cgit/ocaml.git/commit/?id=179ac32d01818da5252cc100e9b97f347568727d

Upstream is working on a fix:

http://caml.inria.fr/mantis/view.php?id=6038


Note You need to log in before you can comment on or make changes to this bug.