Bug 803698

Summary: FTBFS: ruby segmentation faults during self check on PPC
Product: [Fedora] Fedora Reporter: Karsten Hopp <karsten>
Component: rubyAssignee: Jeroen van Meeuwen <vanmeeuwen+fedora>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: 17CC: jeremy, mmorsi, mtasaka, mtasaka, pknirsch, tagoh, vanmeeuwen+fedora, vondruch
Target Milestone: ---   
Target Release: ---   
Hardware: powerpc   
OS: Linux   
URL: http://bugs.ruby-lang.org/issues/6344
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-01-20 16:14:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
build log
none
Trial patch to avoid segv on ppc none

Description Karsten Hopp 2012-03-15 12:54:49 UTC
Created attachment 570285 [details]
build log

Description of problem:
ruby-1.9.3.0-7.fc17 fails to build on PPC. It aborts witch a segmentation fault during the self checks:
test_massign.rb ................................(eval):2: [BUG] Segmentation fault
ruby 1.9.3p0 (2011-10-30) [powerpc-linux]
-- Control frame information -----------------------------------------------
c:0007 p:---- s:0020 b:0020 l:000019 d:000019 CFUNC  :resume
c:0006 p:0039 s:0017 b:0017 l:000b14 d:0024d4 EVAL   (eval):2
c:0005 p:---- s:0015 b:0015 l:000014 d:000014 FINISH
c:0004 p:---- s:0013 b:0013 l:000012 d:000012 CFUNC  :eval
c:0003 p:0053 s:0009 b:0008 l:000b14 d:0000cc EVAL   bootstraptest.tmp.rb:9
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:0000 s:0002 b:0002 l:000b14 d:000b14 TOP   
-- Ruby level backtrace information ----------------------------------------
bootstraptest.tmp.rb:9:in `<main>'
bootstraptest.tmp.rb:9:in `eval'
(eval):2:in `<main>'
(eval):2:in `resume'

Version-Release number of selected component (if applicable):
ruby-1.9.3.0-7.fc17

How reproducible:
always

Steps to Reproduce:
1. ppc-koji build --scratch f17 ruby-1.9.3.0-7.fc17
2.
3.
  
Actual results:
http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=438663

Expected results:


Additional info:
build log attached so it doesn't vanish after the next koji cleanup

Comment 1 Vít Ondruch 2012-04-02 10:21:39 UTC
Hm, interesting, the errors seems to be exactly the same as [1], but I am afraid we will need to find different solution.


[1] http://bugs.ruby-lang.org/issues/5122

Comment 2 Vít Ondruch 2012-04-02 10:24:04 UTC
May be we will need never point release, since the [1], which is parent issue to the one mentioned above, was closed just month ago.


[1] http://bugs.ruby-lang.org/issues/5076

Comment 3 Karsten Hopp 2012-04-09 14:07:14 UTC
still happens with ruby-1.9.3.125-2.fc18

Comment 4 Mamoru TASAKA 2012-04-20 01:58:24 UTC
Actually 100% reproducible on a ppc64 machine which I can access.

Backtrace:
[tasaka@localhost ruby-1.9.3-p125]$ cat rubydev-32581.rb 
  a,s=[],"aaa"
  300.times { a<<s; s=s.succ }
  eval <<-END__
  GC.stress=true
  Fiber.new do
    #{ a.join(",") },*zzz=1
  end.resume
  END__
[tasaka@localhost ruby-1.9.3-p125]$ gdb ./miniruby
...
...
This GDB was configured as "ppc64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/tasaka/rpmbuild/BUILD/ruby-1.9.3-p125/miniruby...done.
(gdb) run  -Ilib -I. --disable-gems ./rubydev-32581.rb 
Starting program: /home/tasaka/rpmbuild/BUILD/ruby-1.9.3-p125/miniruby -Ilib -I. --disable-gems ./rubydev-32581.rb
[Thread debugging using libthread_db enabled]
[New Thread 0xfffb1dff200 (LWP 1759)]

Program received signal SIGSEGV, Segmentation fault.
0x00000080db556b20 in .__makecontext () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install nss-softokn-freebl-3.12.9-3.el7.ppc64
(gdb) thread apply all bt

Thread 2 (Thread 0xfffb1dff200 (LWP 1759)):
#0  0x00000080db5fe054 in .__select () from /lib64/libc.so.6
#1  0x000000001018911c in thread_timer (p=0x10280f68) at thread_pthread.c:1155
#2  0x00000080db70b330 in start_thread (arg=0xfffb1dff200) at pthread_create.c:299
#3  0x00000080db6076ec in .__clone () from /lib64/libc.so.6

Thread 1 (Thread 0x80db4d7010 (LWP 1756)):
#0  0x00000080db556b20 in .__makecontext () from /lib64/libc.so.6
#1  0x000000001018ff80 in fiber_initialize_machine_stack_context (fib=0x103ab970, size=65536) at cont.c:606
#2  0x0000000010190094 in fiber_setcontext (newfib=0x103ab970, oldfib=0x103ac940) at cont.c:623
#3  0x0000000010190214 in fiber_store (next_fib=0x103ab970) at cont.c:1234
#4  0x00000000101903f8 in fiber_switch (fibval=271105960, argc=<value optimized out>, argv=0xfffb1e00098) at cont.c:1319
#5  rb_fiber_resume (fibval=271105960, argc=<value optimized out>, argv=0xfffb1e00098) at cont.c:1347
#6  0x00000000101905e4 in rb_fiber_m_resume (argc=<value optimized out>, argv=<value optimized out>, fib=<value optimized out>) at cont.c:1404
#7  0x000000001016c754 in call_cfunc (func=@0x10255a90: 0x101905c0 <rb_fiber_m_resume>, recv=271105960, len=<value optimized out>, argc=<value optimized out>, 
    argv=<value optimized out>) at vm_insnhelper.c:326
#8  0x0000000010171c74 in vm_call_cfunc (th=0x10281560, cfp=0xfffb1effe00, num=<value optimized out>, blockptr=<value optimized out>, flag=0, 
    id=<value optimized out>, me=0x1039b8f0, recv=271105960) at vm_insnhelper.c:404
#9  vm_call_method (th=0x10281560, cfp=0xfffb1effe00, num=<value optimized out>, blockptr=<value optimized out>, flag=0, id=<value optimized out>, me=0x1039b8f0, 
    recv=271105960) at vm_insnhelper.c:534
#10 0x00000000101734f4 in vm_exec_core (th=0x10281560, initial=<value optimized out>) at insns.def:1015
#11 0x0000000010178da8 in vm_exec (th=0x10281560) at vm.c:1220
#12 0x0000000010179480 in eval_string_with_cref (self=271477440, src=271373360, scope=4, cref=0x0, file=0x101b7fd8 "(eval)", line=1) at vm_eval.c:1050
#13 0x0000000010179b20 in eval_string (argc=<value optimized out>, argv=<value optimized out>, self=271477440) at vm_eval.c:1091
#14 rb_f_eval (argc=<value optimized out>, argv=<value optimized out>, self=271477440) at vm_eval.c:1139
#15 0x000000001016c754 in call_cfunc (func=@0x10254660: 0x101799a0 <rb_f_eval>, recv=271477440, len=<value optimized out>, argc=<value optimized out>, 
    argv=<value optimized out>) at vm_insnhelper.c:326
#16 0x0000000010171c74 in vm_call_cfunc (th=0x10281560, cfp=0xfffb1efff08, num=<value optimized out>, blockptr=<value optimized out>, flag=8, 
    id=<value optimized out>, me=0x1030f710, recv=271477440) at vm_insnhelper.c:404
#17 vm_call_method (th=0x10281560, cfp=0xfffb1efff08, num=<value optimized out>, blockptr=<value optimized out>, flag=8, id=<value optimized out>, me=0x1030f710, 
    recv=271477440) at vm_insnhelper.c:534
#18 0x00000000101734f4 in vm_exec_core (th=0x10281560, initial=<value optimized out>) at insns.def:1015
#19 0x0000000010178da8 in vm_exec (th=0x10281560) at vm.c:1220
#20 0x0000000010179078 in rb_iseq_eval_main (iseqval=271386440) at vm.c:1461
#21 0x00000000100559a8 in ruby_exec_internal (n=0x102d0748) at eval.c:204
#22 0x00000000100559f8 in ruby_exec_node (n=value has been optimized out
) at eval.c:251
#23 0x0000000010057650 in ruby_run_node (n=0x102d0748) at eval.c:244
#24 0x0000000010015664 in main (argc=5, argv=0xffffffff528) at main.c:38

Comment 5 Vít Ondruch 2012-04-20 10:38:08 UTC
Today was released p194, we will see if that helps. I hope I will be able to submit update soon (Monday in the worst case I hope).

Comment 6 Mamoru TASAKA 2012-04-22 13:23:40 UTC
(In reply to comment #5)
> Today was released p194, we will see if that helps. I hope I will be able to
> submit update soon (Monday in the worst case I hope).

Still no good.
http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=507117

Comment 7 Mamoru TASAKA 2012-04-22 13:30:43 UTC
Created attachment 579297 [details]
Trial patch to avoid segv on ppc

From the backtrace I posted on comment 4 , I suspected
that the buffer to store context is in short (see man
makecontext). Increasing buffer size "seems" to suppress
segv.

http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=507128
http://koji.fedoraproject.org/koji/taskinfo?taskID=4013190

.. with the following diff on the spec file
--- a/ruby.spec
+++ b/ruby.spec
@@ -56,7 +56,7 @@ Version: %{ruby_version_patch_level}
 # we cannot reset the release number to 1 even when the main (ruby) version
 # is updated - because it may be that the versions of sub-components don't
 # change.
-Release: 10.1%{?dist}
+Release: 10.900%{?dist}
 Group: Development/Languages
 License: Ruby or BSD
 URL: http://ruby-lang.org/
@@ -90,6 +90,8 @@ Patch8: ruby-1.9.3-custom-rubygems-location.patch
 Patch9: rubygems-1.8.11-binary-extensions.patch
 # Make mkmf verbose by default
 Patch12: ruby-1.9.3-mkmf-verbose.patch
+# Fix ppc test suite failure
+Patch13: ruby-1.9.3-p125-fiber-stack-size.patch
 
 Requires: %{name}-libs%{?_isa} = %{version}-%{release}
 Requires: ruby(rubygems) >= %{rubygems_version}
@@ -316,6 +318,7 @@ Tcl/Tk interface for the object-oriented scripting language Ruby.
 %patch8 -p1
 %patch9 -p1
 %patch12 -p1
+%patch13 -p1
 
 %build
 autoconf
@@ -435,10 +438,17 @@ sed -i '2 a\
 # https://bugzilla.redhat.com/show_bug.cgi?id=789410
 # https://bugs.ruby-lang.org/issues/6011
 # same for ppc(64), RH bugzilla #803698
-%ifnarch %{arm} ppc ppc64
+%ifnarch %{arm}
 # OpenSSL 1.0.1 is breaking the drb test suite.
 # https://bugs.ruby-lang.org/issues/6221
-make check TESTS="-v -x test_drbssl.rb"
+SKIPOPTS="-x test_drbssl.rb"
+# On ppc koji server, webrick/test_cgi.rb fails with Internal Server Error.
+# Also webrick/test_filehandler.rb returns 500 instead of 200.
+# No problem with local build, perhaps with network issue
+%ifarch ppc ppc64
+SKIPOPTS="$SKIPOPTS -x test_cgi.rb -x test_filehandler.rb"
+%endif
+make check TESTS="-v $SKIPOPTS"
 %endif
 
 %post libs -p /sbin/ldconfig

Comment 8 Vít Ondruch 2012-04-23 13:37:09 UTC
I can confirm that the tests passes with the following patch from mtasaka:

--- ruby-1.9.3-p125/cont.c.debug	2012-02-16 07:34:48.000000000 +0900
+++ ruby-1.9.3-p125/cont.c	2012-04-20 02:36:15.245278117 +0900
@@ -47,7 +47,7 @@
 #define RB_PAGE_SIZE (pagesize)
 #define RB_PAGE_MASK (~(RB_PAGE_SIZE - 1))
 static long pagesize;
-#define FIBER_MACHINE_STACK_ALLOCATION_SIZE  (0x10000)
+#define FIBER_MACHINE_STACK_ALLOCATION_SIZE  (0x20000)
 #endif
 
 #define CAPTURE_JUST_VALID_VM_STACK 1


Mamoru, will you report it upstream or should I?

BTW the same error hits latest upstream ruby 2.0.0dev (2012-04-23 trunk 35432) [x86_64-linux] and the same medicine helps.

Comment 9 Mamoru TASAKA 2012-04-23 14:29:17 UTC
Reported upstream.
http://bugs.ruby-lang.org/issues/6344

Comment 10 Phil Knirsch 2013-01-20 16:14:14 UTC
We got a F17 updates build from ruby that builds fine on ppc64, so closing this bug as ERRATA.

Thanks & regards, Phil