We observe autoconf FTBFS on rawhide (testsuite failures). One of the testsuite failures is related to Erlang & autoconf, but it appears only on i686. I tried to cut related testcase out into segfault-i686.tar.gz reproducer: $ tar -xf segfault-i686.tar.gz $ cd segfault-i686 $ make && make run erlc -b beam my_testsuite.erl cd lib && ./compile erl -pa ./lib -s my_testsuite test Erlang/OTP 17 [erts-6.3] [source] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] Eshell V6.3 (abort with ^G) 1> All 3 tests passed. Makefile:6: recipe for target 'run' failed make: *** [run] Segmentation fault (core dumped) The segfault ^^ breaks autoconf testsuite, but I'm not able to diagnose properly. Any help appreciated, let me know if you need some other info. FTBFS: https://kojipkgs.fedoraproject.org//work/tasks/7404/10217404/build.log Pavel
Created attachment 1049095 [details] Reproducer
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle. Changing version to '23'. (As we did not run this process for some time, it could affect also pre-Fedora 23 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23
I have also been experiencing this bug, and am unfortunately unable to run the test suites on my packages for i686. I see this issue on Rawhide.
Ok, now I've got the same issues both in F-23 and in Rawhide. The latest failure is here: http://koji.fedoraproject.org/koji/taskinfo?taskID=12592117 Surprisingly but I can see them only in Koji. If I run build manually (with rpmbuild) everything is fine.
Unfortunately this reproducer doesn't reproduce the issue on my machine. Everything is fine: [petro@fedora32i686 segfault-i686]$ make run erl -pa ./lib -s my_testsuite test Erlang/OTP 18 [erts-7.2.1] [source] [async-threads:10] [hipe] [kernel-poll:false] Eshell V7.2.1 (abort with ^G) 1> All 3 tests passed. [petro@fedora32i686 segfault-i686]$
(In reply to Pavel Raiskup from comment #0) > We observe autoconf FTBFS on rawhide (testsuite failures). One of the > testsuite failures is related to Erlang & autoconf, but it appears only on > i686. I tried to cut related testcase out into segfault-i686.tar.gz > reproducer: > > $ tar -xf segfault-i686.tar.gz > $ cd segfault-i686 > $ make && make run > erlc -b beam my_testsuite.erl > cd lib && ./compile > erl -pa ./lib -s my_testsuite test > Erlang/OTP 17 [erts-6.3] [source] [smp:4:4] [async-threads:10] [hipe] > [kernel-poll:false] > > Eshell V6.3 (abort with ^G) > 1> All 3 tests passed. > Makefile:6: recipe for target 'run' failed > make: *** [run] Segmentation fault (core dumped) > > The segfault ^^ breaks autoconf testsuite, but I'm not able to diagnose > properly. Any help appreciated, let me know if you need some other info. > > FTBFS: > https://kojipkgs.fedoraproject.org//work/tasks/7404/10217404/build.log > > Pavel Pavel, I've just checked - the issue is still there. Unfortunately I can't reproduce it on my hardware (KVM VM) - only in Fedora Koji. Do you have an access to the machine where it's possible to reproduce the issue? I really don't have any clue on what's going on there?
I'm not able to reproduce this now.
Hello Peter! Is it possible that the recent update to Erlang 18 fixed this issue? P.S. Now we really have to get ejabberd updated, as it doesn't seem to work with Erlang 18 ☺ If you have some time, jcline and I have a few package review requests waiting. We CAN review each other's if necessary, but we'd rather that someone with more Erlang experience than we have review them if you or anyone else has the time. Oh, if we only had more time, right?
(In reply to Randy Barlow from comment #8) > Hello Peter! > > Is it possible that the recent update to Erlang 18 fixed this issue? Randy, it's certainly not fixed yet. And I'm afraid this issue has something with Koji buildsystem itself (hardware + software + configuration) rather that with Erlang itself. I'm still trying to find an Erlang-related issue but I failed to reproduce it anywhere (with Erlang on a native i686 Rawhide, with mockbuilds for Rawhide at RHEL6/RHEL7) on machines available to me. The only place where I can reproduce this issue with 100% reproducibility is Fedora Koji Buildsystem. This makes me very suspicious.
Hi Peter! Interesting, working on problems that are hard to reproduce is tricky. I am sad to say that I am out of ideas. If you can think of a way I can assist, I am happy to!
Filip, you mentioned in bug 1221824#c20 that you have a reproducer. Could you please run it again with strace or gdb attached? We really need your help here. :)
Hi Peter, sure, will do in the evening, when I get to my fedora box. f.
Hi, I have been commenting into the other issue (https://bugzilla.redhat.com/show_bug.cgi?id=1221824), sorry :-) Copying the most important parts here: * strace -- useless, the VM crashes in userspace (https://bugzilla.redhat.com/attachment.cgi?id=1116279) * gdb stracktrace (gdb) bt #0 0x56798d70 in ethr_dw_atomic_cmpxchg () at ../include/internal/i386/atomic.h:177 #1 0x566103ce in ethr_dw_atomic_cmpxchg_nob (xchg=0xf4e0609c, new=0xf4e060a4, var=0x568688f0 <erts_proc+48>) at beam/erl_threads.h:1456 #2 erts_atomic64_inc_read_nob (var=0x568688f0 <erts_proc+48>) at beam/erl_threads.h:1646 #3 step_interval_nob (icp=0x568688f0 <erts_proc+48>) at beam/utils.c:4954 #4 erts_smp_step_interval_nob (icp=icp@entry=0x568688f0 <erts_proc+48>) at beam/utils.c:5004 #5 0x5671572b in ptab_list_bif_engine (c_p=c_p@entry=0xf6dc0218, res_accp=res_accp@entry=0xf4e06178, mbp=mbp@entry=0xf1f80a88) at beam/erl_ptab.c:927 #6 0x56716a5d in erts_ptab_list (c_p=c_p@entry=0xf6dc0218, ptab=0x568688c0 <erts_proc>) at beam/erl_ptab.c:766 #7 0x5661be76 in processes_0 (A__p=0xf6dc0218, BIF__ARGS=0xf7483100) at beam/bif.c:3841 #8 0x5659978b in process_main () at beam/beam_emu.c:3690 #9 0x56638784 in sched_thread_func (vesdp=0xf6087dc0) at beam/erl_process.c:8021 #10 0x567a19cc in thr_wrapper (vtwd=0xffffd1b4) at pthread/ethread.c:114 #11 0xf7f164be in start_thread (arg=0xf4e06b40) at pthread_create.c:333 #12 0xf7e2a3fe in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:114 * the problem seems to be triggered by the i686 build using the -mtune=atom flag, I tried the following change and the resulting binary doesn't have the same problem: %ifarch %{ix86} %global optflags -mtune=generic %endif Build: http://koji.fedoraproject.org/koji/taskinfo?taskID=12621253 Now the erlang:processes() command executes successfully: $ mock -r fedora-rawhide-i386 --no-clean --shell INFO: mock.py version 1.2.14 starting (python version = 3.4.2)... Start: init plugins INFO: selinux enabled Finish: init plugins Start: run Start: chroot init INFO: calling preinit hooks INFO: enabled root cache INFO: enabled dnf cache Start: cleaning dnf metadata Finish: cleaning dnf metadata INFO: enabled ccache Finish: chroot init Start: shell <mock-chroot>sh-4.3# erl Erlang/OTP 18 [erts-7.2.1] [source] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] Eshell V7.2.1 (abort with ^G) 1> erlang:processes(). [<0.0.0>,<0.3.0>,<0.6.0>,<0.7.0>,<0.9.0>,<0.10.0>,<0.11.0>, <0.12.0>,<0.14.0>,<0.15.0>,<0.16.0>,<0.17.0>,<0.18.0>, <0.20.0>,<0.21.0>,<0.22.0>,<0.23.0>,<0.24.0>,<0.25.0>, <0.26.0>,<0.27.0>,<0.28.0>,<0.29.0>,<0.30.0>,<0.34.0>] 2> Resume: There seem to be an error in the fallback implementation of ethr_dw_atomic_cmpxchg. I'm not sure whether these binaries would run on an Atom processor though (and I don't have means to test it). I guess I may ask in the erlang-bugs mailing list but I would let it to you to decide if building for generic processor (instead of Atom) is a viable workaround or not.
Found a way to see actual stacktrace. Run erl in GDB as shown above. Then when you got a SIGSEGV you will have a corrupted stack. First we need to recover it by adding/removing random values to/from $esp register (stack pointer). I believe those who know Intel assembly already know what values one should try first. I tried stepping by 4 in each direction until I realized that I have to add 32. So, please, do: (gdb) set $pc = *(void **)$esp (gdb) set $esp = $esp + 32 (gdb) bt #0 0x568688f0 in erts_proc () #1 0x566103ce in ethr_dw_atomic_cmpxchg_nob (xchg=0xf461609c, new=0xf46160a4, var=0x568688f0 <erts_proc+48>) at beam/erl_threads.h:1456 #2 erts_atomic64_inc_read_nob (var=0x568688f0 <erts_proc+48>) at beam/erl_threads.h:1646 #3 step_interval_nob (icp=0x568688f0 <erts_proc+48>) at beam/utils.c:4954 #4 erts_smp_step_interval_nob (icp=icp@entry=0x568688f0 <erts_proc+48>) at beam/utils.c:5004 #5 0x5671572b in ptab_list_bif_engine (c_p=c_p@entry=0xf6d80218, res_accp=res_accp@entry=0xf4616178, mbp=mbp@entry=0xf1f816a0) at beam/erl_ptab.c:927 #6 0x56716a5d in erts_ptab_list (c_p=c_p@entry=0xf6d80218, ptab=0x568688c0 <erts_proc>) at beam/erl_ptab.c:766 #7 0x5661be76 in processes_0 (A__p=0xf6d80218, BIF__ARGS=0xf74861c0) at beam/bif.c:3841 #8 0x5659978b in process_main () at beam/beam_emu.c:3690 #9 0x56638784 in sched_thread_func (vesdp=0xf608e000) at beam/erl_process.c:8021 #10 0x567a19cc in thr_wrapper (vtwd=0xffffd184) at pthread/ethread.c:114 #11 0xf7f184be in start_thread (arg=0xf4616b40) at pthread_create.c:333 #12 0xf7e2c3fe in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:114 (gdb) See - a cool nice stacktrace! erts_proc is a bogus value. It's a stack corruption after calling ethr_dw_atomic_cmpxchg_nob. That's all I've got for today.
Possible workaround: https://github.com/erlang/otp/commit/fd7fa46
Fixed in Rawhide already. Will do builds for (both affected) F22 and F23 shortly.
erlang-17.4-6.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-a79a47efb0
erlang-17.4-6.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2016-18e2827992
Peter, thanks so much for looking into this difficult issue. You are the man!
erlang-17.4-6.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-18e2827992
erlang-17.4-6.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-a79a47efb0
erlang-17.4-6.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.
erlang-17.4-6.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.