Bug 1292969 - ocaml programs on ppc64le segfault when linked with -pie
ocaml programs on ppc64le segfault when linked with -pie
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: ocaml (Show other bugs)
23
ppc64le Linux
unspecified Severity medium
: ---
: ---
Assigned To: Richard W.M. Jones
Fedora Extras Quality Assurance
:
Depends On:
Blocks: PPCTracker
  Show dependency treegraph
 
Reported: 2015-12-18 16:48 EST by Antony Messerli
Modified: 2016-12-01 03:56 EST (History)
7 users (show)

See Also:
Fixed In Version: ocaml-4.04.0-7.fc26.ppc64le
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-30 15:52:38 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Antony Messerli 2015-12-18 16:48:56 EST
Description of problem:

virt-top segfaults when ran

Version-Release number of selected component (if applicable):
virt-top-1.0.8-16.fc23.ppc64le

How reproducible:
Every time on F23 4.2.6-301.fc23.ppc64le

Steps to Reproduce:
1. run virt-top

Actual results:
[root@host ~]# virt-top
Segmentation fault (core dumped)

[78078.494470] virt-top[61562]: unhandled signal 11 at 0000000056db3150 nip 0000000056ba29e4 lr 0000000056c9b9d8 code 30001

Dec 18 21:44:54 host systemd-coredump[61576]: Process 61575 (virt-top) of user 0 dumped core.
Stack trace of thread 61575:
#0  0x000000005e1129e4 n/a (virt-top)
#1  0x000000005e20b9d8 n/a (virt-top)
#2  0x000000005e20f788 n/a (virt-top)
#3  0x000000005e1818a4 n/a (virt-top)

Expected results:

should run virt-top normally

Additional info:
Comment 1 Richard W.M. Jones 2015-12-18 17:35:08 EST
Hmm, I thought we'd fixed all the OCaml/ppc64le problems.  Are there
more debug symbols you could install?
Comment 2 Richard W.M. Jones 2015-12-20 06:35:04 EST
It took me 48 hours to build a Fedora 23 ppc64le instance and compile
virt-top on it, but I finally have a stack trace:

#0  0x00003fffb6350754 in OPENSSL_crypto207_probe () at ppccpuid.s:23
#1  0x00003fffb6350c5c in OPENSSL_cpuid_setup () at ppccap.c:152
#2  0x00003fffb63fdbc8 in OPENSSL_add_all_algorithms_noconf () at c_all.c:82
#3  0x00003fffb6f59fc4 in libssh2_init (flags=<optimized out>) at global.c:48
#4  0x00003fffb6eb5714 in curl_global_init (flags=3) at easy.c:273
#5  0x00003fffb7cd6fdc in virGlobalInit () at libvirt.c:386
#6  0x00003fffb7941a34 in __pthread_once_slow (
    once_control=0x3fffb7f670c0 <virGlobalOnce>, 
    init_routine=0x3fffb7cd6f80 <virGlobalInit>) at pthread_once.c:116
#7  0x00003fffb7c2a278 in virOnce (once=<optimized out>, init=<optimized out>)
    at util/virthread.c:48
#8  0x00003fffb7cd7cb8 in virInitialize () at libvirt.c:476
#9  0x000000002001c890 in ocaml_libvirt_init ()
#10 0x0000000020063494 in caml_interprete ()
#11 0x00000000200420bc in caml_main ()
#12 0x000000002001589c in main ()

So it looks like an openssl problem.

A simple reproducer is:

#include <stdio.h>
#include <stdlib.h>
#include <libvirt/libvirt.h>

main ()
{
  virInitialize ();
  exit (0);
}

gcc test.c -o test `pkg-config --cflags --libs libvirt`
gdb ./test

Program received signal SIGILL, Illegal instruction.
0x00003fffb6400754 in OPENSSL_crypto207_probe () at ppccpuid.s:23
23		blr	
Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.6-17.fc23.ppc64le device-mapper-libs-1.02.109-2.fc23.ppc64le libgcc-5.1.1-4.fc23.ppc64le libnghttp2-1.3.3-1.fc23.ppc64le
(gdb) bt
#0  0x00003fffb6400754 in OPENSSL_crypto207_probe () at ppccpuid.s:23
#1  0x00003fffb6400c5c in OPENSSL_cpuid_setup () at ppccap.c:152
#2  0x00003fffb64adbc8 in OPENSSL_add_all_algorithms_noconf () at c_all.c:82
#3  0x00003fffb7169fc4 in libssh2_init (flags=<optimized out>) at global.c:48
#4  0x00003fffb70c5714 in curl_global_init (flags=3) at easy.c:273
#5  0x00003fffb7cd6fdc in virGlobalInit () at libvirt.c:386
#6  0x00003fffb6ee1a34 in __pthread_once_slow (
    once_control=0x3fffb7f670c0 <virGlobalOnce>, 
    init_routine=0x3fffb7cd6f80 <virGlobalInit>) at pthread_once.c:116
#7  0x00003fffb7c2a278 in virOnce (once=<optimized out>, init=<optimized out>)
    at util/virthread.c:48
#8  0x00003fffb7cd7cb8 in virInitialize () at libvirt.c:476
#9  0x0000000010000860 in main ()
Comment 3 Tomas Mraz 2015-12-21 05:00:52 EST
I do not think this is the real source of the crash when it is not run under the GDB. The OPENSSL_crypto207_probe intentionally uses instructions that might not be present but openssl installs SIGILL handler which should take care of it. So the sigill in OPENSSL_crypto207_probe needs to be ignored in GDB and tracing needs to continue to find if there is another unhandled SIGILL.
Comment 4 Richard W.M. Jones 2016-01-04 10:40:42 EST
You're right.  Moving back to virt-top per comment 3.
Comment 5 Richard W.M. Jones 2016-01-04 12:10:38 EST
Finally I managed to reproduce this properly.  Here is the
three-line reproducer:

echo 'let () = print_endline "hello"' > test.ml
ocamlfind opt -package unix,extlib,curses,str,libvirt,gettext-stub,xml-light,csv,calendar -g -linkpkg test.ml -o test
./test

The ./test command segfaults on Fedora 23.  The stack trace is:

#0  0x000000004b3e2984 in 0000054a.plt_call.strlen@@GLIBC_2.17 ()
#1  0x000000004b448578 in caml_register_named_value ()
#2  0x000000004b44c328 in caml_c_call ()
#3  0x000000004b3f67d8 in camlPervasives__entry ()
#4  0x000000004b3e2f8c in caml_program ()
#5  0x000000004b44c4ac in caml_start_program ()
#6  0x000000004b42ea28 in caml_main ()
#7  0x000000004b3e2d0c in main ()

which is the same as the stack trace for failing virt-top (when
I compile it locally).

This fails on Fedora 23 / ppc64le.  It does not fail on x86-64.
I didn't try RHEL yet.
Comment 6 Richard W.M. Jones 2016-01-05 10:26:49 EST
I can fix this by removing:

  -specs=/usr/lib/rpm/redhat/redhat-hardened-ld

from LDFLAGS (all other CFLAGS & LDFLAGS are unchanged).  I still
don't understand why the failure happens.
Comment 7 Richard W.M. Jones 2016-01-05 10:34:31 EST
Here's the final reproducer:

$ cat test.ml 
let () = print_endline "hello"
$ ocamlopt.opt -g test.ml -cclib -pie -o test
$ ./test 
Segmentation fault (core dumped)

The stack trace is:

#0  0x000000002ff76ac4 in 000000a8.plt_call.strlen@@GLIBC_2.17 ()
#1  0x000000002ff8ed38 in caml_register_named_value ()
#2  0x000000002ff91e64 in caml_c_call ()
#3  0x000000002ff7ab90 in camlPervasives__entry ()
#4  0x000000002ff7703c in caml_program ()
#5  0x000000002ff91fe8 in caml_start_program ()
#6  0x000000002ff92778 in caml_main ()
#7  0x000000002ff76dbc in main ()

Removing the '-cclib -pie' flags fixes this.

Basically I suspect that our ppc64le backend doesn't support PIE
at all.

Upstream there is a new POWER backend (covers both 32 bit and 64 bit,
BE and LE) which was written independently.  We plan to switch over
to using that backend at some point in the fairly near future.
Comment 8 Menanteau Guy 2016-09-16 08:44:28 EDT
Same problem during make check of ocaml-tplib package build for ppc64le.
http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=3642666
Comment 9 Fedora End Of Life 2016-11-24 09:21:20 EST
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 10 Richard W.M. Jones 2016-11-30 15:52:38 EST
This is fixed in the Rawhide version of OCaml, on both
ppc64 and ppc64le.

Note You need to log in before you can comment on or make changes to this bug.