Hide Forgot
Description of problem: Here's some info from Jim Meyering [1]: # gdb miniruby Reading symbols from /root/Meyering/RHEL-6/ruby-1.8.7.299/ruby-1.8.7-p299/miniruby...done. (gdb) r -I./lib -I.ext/powerpc64-linux server.rb Starting program: /root/Meyering/RHEL-6/ruby-1.8.7.299/ruby-1.8.7-p299/miniruby -I./lib -I.ext/powerpc64-linux server.rb [Thread debugging using libthread_db enabled] [2010-08-27 10:42:29] INFO WEBrick 1.3.1 [2010-08-27 10:42:29] INFO ruby 1.8.7 (2010-06-23) [powerpc64-linux] [2010-08-27 10:42:29] WARN TCPServer Error: Address already in use - bind(2) [2010-08-27 10:42:29] INFO WEBrick::HTTPServer#start: pid=4574 port=2000 ^C Program received signal SIGINT, Interrupt. 0x00000080affcf428 in ___newselect_nocancel () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.6.el6.ppc64 libgcc-4.4.4-13.el6.ppc64 nss-softokn-freebl-3.12.4-19.el6.ppc64 openssl-1.0.0-4.el6. ppc64 zlib-1.2.3-25.el6.ppc64 (gdb) watch curr_thread No symbol "curr_thread" in current context. (gdb) c Continuing. curr_thread: 0x0x10198a70 [New Thread 0xfffb7bbf200 (LWP 4578)] curr_thread: 0x0x10198a70 curr_thread: 0x0x10180a30 Program received signal SIGSEGV, Segmentation fault. 0x0000000010037fe0 in rb_thread_start_0 ( fn=@0x10145780: 0x10038610 <rb_thread_yield>, arg=0xfffb7cfbfb8, th=0x10180a30) at eval.c:12444 12444 if (THREAD_SAVE_CONTEXT(curr_thread)) { (gdb) l 12439 if (OBJ_FROZEN(curr_thread->thgroup)) { 12440 rb_raise(rb_eThreadError, 12441 "can't start a new thread (frozen ThreadGroup)"); 12442 } 12443 12444 if (THREAD_SAVE_CONTEXT(curr_thread)) { 12445 return thread; 12446 } 12447 12448 if (ruby_block) { /* should nail down higher blocks */ (gdb) p curr_thread $1 = (rb_thread_t) 0x10198a70 (gdb) p *curr_thread $2 = {next = 0x10180a30, prev = 0x10180a30, context = {{context = { uc_flags = 0, uc_link = 0x0, uc_stack = {ss_sp = 0x0, ss_flags = 0, ss_size = 0}, uc_sigmask = {__val = {0 <repeats 16 times>}}, uc_mcontext = {__unused = {0, 0, 0, 0}, signal = 0, __pad0 = 0, handler = 0, oldmask = 0, regs = 0x10198b68, gp_regs = {270109312, 17592185962496, 552709206616, 0, 17592185962288, 77840, 268447856, 1, 0, 269972664, 270109296, 552707714176, 17592185962288, 552707220368, 0 <repeats 13 times>, 552707160960, 552707168448, 552707160768, 17592185977744, 17592185962496, 268664796, 0, 0, 552707713448, 268664796, 0, 603980834, 0, 0, 0, 0, 0, 0, 0, 0, 0}, fp_regs = {0 <repeats 11 times>, 2.9841565008811291e-319, 4.9406564584124654e-324, 2, 0 <repeats 18 times>, -nan(0x8000082002000)}, v_regs = 0x10198e00, vmx_reserve = {0, 0, 65536, 0 <repeats 19 times>, 65536, -65064, 0 <repeats 42 times>, 65536, -4294967296, 0}}}, status = 1}}, result = 0, stk_len = 9730, stk_max = 9730, stk_ptr = 0x103944d0, stk_pos = 0xffffffebf30, frame = 0xffffffec9b0, scope = 0xfffb7cfc2d8, dyna_vars = 0x0, block = 0xffffffee640, iter = 0xffffffec980, tag = 0xffffffeec28, klass = 17590977005440, wrapper = 0, cref = 0xfffb7ef85c8, flags = 0, node = 0xfffb7f2ec90, tracing = 0, errinfo = 4, last_status = 4, last_line = 0, last_match = 4, safe = 0, status = THREAD_RUNNABLE, wait_for = 0, fd = 0, readfds = {fds_bits = {0 <repeats 16 times>}}, writefds = {fds_bits = {0 <repeats 16 times>}}, exceptfds = {fds_bits = { 0 <repeats 16 times>}}, select_value = 0, delay = 0, join = 0x0, abort = 0, priority = 0, thgroup = 17590977856400, locals = 0x10229b30, thread = 17590977856360, sandbox = 4} (gdb) bt #0 0x0000000010037fe0 in rb_thread_start_0 ( fn=@0x10145780: 0x10038610 <rb_thread_yield>, arg=0xfffb7cfbfb8, th=0x10180a30) at eval.c:12444 #1 0x00000000100389e4 in rb_thread_start (klass=17590977860680, args=17590974922680) at eval.c:12652 #2 0x00000000100224c4 in call_cfunc ( func=@0x101457b0: 0x10038980 <rb_thread_start>, recv=17590977860680, len=-2, argc=0, argv=0x0) at eval.c:5775 #3 0x0000000010023974 in rb_call0 (klass=17590977860640, recv=17590977860680, id=5313, oid=5313, argc=0, argv=0x0, body=0xfffb7fc9330, flags=0) at eval.c:5928 #4 0x0000000010024d48 in rb_call (klass=17590977860640, recv=17590977860680, mid=5313, argc=0, argv=0x0, scope=0, self=17590974950640) at eval.c:6176 #5 0x00000000100198dc in rb_eval (self=17590974950640, n=0xfffb7f2ec90) at eval.c:3506 #6 0x00000000100182d0 in rb_eval (self=17590974950640, n=0xfffb7f2ed08) at eval.c:3236 #7 0x0000000010024520 in rb_call0 (klass=17590977005440, recv=17590974950640, id=16105, oid=16105, argc=0, argv=0xffffffefb98, body=0xfffb7f2ed08, flags=2) at eval.c:6079 #8 0x0000000010024d48 in rb_call (klass=17590977005440, recv=17590974950640, mid=16105, argc=1, argv=0xffffffefb90, scope=1, self=17590974950640) at eval.c:6176 #9 0x0000000010019bf8 in rb_eval (self=17590974950640, n=0xfffb7f7bec8) at eval.c:3521 #10 0x000000001002e5cc in block_pass (self=17590974950640, node=0xfffb7f7c030) at eval.c:9173 #11 0x000000001001800c in rb_eval (self=17590974950640, n=0xfffb7f7c030) at eval.c:3222 #12 0x000000001001addc in rb_eval (self=17590974950640, n=0xfffb7f7a690) at eval.c:3701 #13 0x000000001001f664 in rb_yield_0 (val=17590974942920, self=17590974950640, klass=0, flags=0, avalue=0) at eval.c:5095 #14 0x000000001001fc40 in rb_yield (val=17590974942920) at eval.c:5179 #15 0x00000000100f4dcc in rb_ary_each (ary=17590974925680) at array.c:1261 #16 0x0000000010022544 in call_cfunc ( func=@0x1014c8c0: 0x100f4d38 <rb_ary_each>, recv=17590974925680, len=0, argc=0, argv=0x0) at eval.c:5781 #17 0x0000000010023974 in rb_call0 (klass=17590977829840, recv=17590974925680, id=4001, oid=4001, argc=0, argv=0x0, body=0xfffb7fc1388, flags=0) at eval.c:5928 #18 0x0000000010024d48 in rb_call (klass=17590977829840, recv=17590974925680, mid=4001, argc=0, argv=0x0, scope=0, self=17590974950640) at eval.c:6176 #19 0x00000000100198dc in rb_eval (self=17590974950640, n=0xfffb7f7c558) at eval.c:3506 #20 0x00000000100182d0 in rb_eval (self=17590974950640, n=0xfffb7f7a190) at eval.c:3236 #21 0x0000000010018998 in rb_eval (self=17590974950640, n=0xfffb7f77ad0) at eval.c:3322 #22 0x0000000010017c78 in rb_eval (self=17590974950640, n=0xfffb7f7d7c8) at eval.c:3160 #23 0x000000001001f664 in rb_yield_0 (val=6, self=17590974950640, klass=0, flags=0, avalue=0) at eval.c:5095 #24 0x0000000010018898 in rb_eval (self=17590977006800, n=0xfffb7f8ab80) at eval.c:3311 #25 0x0000000010024520 in rb_call0 (klass=17590977006280, recv=17590977006800, id=5313, oid=5313, argc=0, argv=0x0, body=0xfffb7f8ab80, flags=0) at eval.c:6079 #26 0x0000000010024d48 in rb_call (klass=17590977006280, recv=17590977006800, mid=5313, argc=0, argv=0x0, scope=0, self=17590974950640) at eval.c:6176 #27 0x00000000100198dc in rb_eval (self=17590974950640, n=0xfffb7f7e5d8) at eval.c:3506 #28 0x00000000100182d0 in rb_eval (self=17590974950640, n=0xfffb7f7f528) at eval.c:3236 #29 0x0000000010024520 in rb_call0 (klass=17590977005440, recv=17590974950640, id=5313, oid=5313, argc=0, argv=0x0, body=0xfffb7f7f528, flags=0) at eval.c:6079 #30 0x0000000010024d48 in rb_call (klass=17590977005440, recv=17590974950640, mid=5313, argc=0, argv=0x0, scope=0, self=17590977884840) at eval.c:6176 #31 0x00000000100198dc in rb_eval (self=17590977884840, n=0xfffb7fada40) at eval.c:3506 #32 0x0000000010012220 in eval_node (self=17590977884840, node=0xfffb7fada40) at eval.c:1449 #33 0x0000000010012d5c in ruby_exec_internal () at eval.c:1654 #34 0x0000000010012e18 in ruby_exec () at eval.c:1674 #35 0x0000000010012e7c in ruby_run () at eval.c:1684 #36 0x000000001000ed88 in main (argc=4, argv=0xffffffff3a8, envp=0xffffffff3d0) at main.c:48 ============================================== Valgrind corroborates: # valgrind ./miniruby -I./lib -I.ext/powerpc64-linux server.rb ... ... many use-uninit and "Conditional jump or move depends on uninitialised value(s)" ... ==4874== Use of uninitialised value of size 8 ==4874== at 0x10037FE0: rb_thread_start_0 (eval.c:12444) ==4874== by 0x100389E3: rb_thread_start (eval.c:12652) ==4874== by 0x100224C3: call_cfunc (eval.c:5775) ==4874== by 0x10023973: rb_call0 (eval.c:5928) ==4874== by 0x10024D47: rb_call (eval.c:6176) ==4874== by 0x100198DB: rb_eval (eval.c:3506) ==4874== by 0x100182CF: rb_eval (eval.c:3236) ==4874== by 0x1002451F: rb_call0 (eval.c:6079) ==4874== by 0x10024D47: rb_call (eval.c:6176) ==4874== by 0x10019BF7: rb_eval (eval.c:3521) ==4874== by 0x1002E5CB: block_pass (eval.c:9173) ==4874== by 0x1001800B: rb_eval (eval.c:3222) ==4874== ==4874== Invalid read of size 8 ==4874== at 0x10037FE0: rb_thread_start_0 (eval.c:12444) ==4874== by 0x100389E3: rb_thread_start (eval.c:12652) ==4874== by 0x100224C3: call_cfunc (eval.c:5775) ==4874== by 0x10023973: rb_call0 (eval.c:5928) ==4874== by 0x10024D47: rb_call (eval.c:6176) ==4874== by 0x100198DB: rb_eval (eval.c:3506) ==4874== by 0x100182CF: rb_eval (eval.c:3236) ==4874== by 0x1002451F: rb_call0 (eval.c:6079) ==4874== by 0x10024D47: rb_call (eval.c:6176) ==4874== by 0x10019BF7: rb_eval (eval.c:3521) ==4874== by 0x1002E5CB: block_pass (eval.c:9173) ==4874== by 0x1001800B: rb_eval (eval.c:3222) ==4874== Address 0xffffffffffff97a5 is not stack'd, malloc'd or (recently) free'd ==4874== ./lib/webrick/server.rb:162: [BUG] Segmentation fault ruby 1.8.7 (2010-06-23 patchlevel 299) [powerpc64-linux] The above makes it look like there is a stack-relative reference with 0 as the stack pointer. This is not specific to webrick. Running ruby's "make check" rule fails immediately on PPC: root@auto-ppcp-002# , check ./miniruby -I./lib -I.ext/common -I./- -r./ext/purelib.rb ./runruby.rb --extout=.ext -- "./test/runner.rb" --basedir="./test" --runner=console test succeeded /root/Meyering/RHEL-6/ruby-1.8.7.299/ruby-1.8.7-p299/lib/rinda/tuplespace.rb:618: [BUG] Segmentation fault ruby 1.8.7 (2010-06-23 patchlevel 299) [powerpc64-linux] make: *** [test-all] Aborted (core dumped) [Exit 2] [1] https://bugzilla.redhat.com/show_bug.cgi?id=605418#c8
Created attachment 445801 [details] Save stack after getcontext ------- Comment on attachment From michael.neuling@au1.ibm.com 2010-09-07 18:51 EDT------- The problem is in ruby's thread switching. They take a snapshot of the stack and the GPR state but these are not taken at the same time. When getcontext is called (to save the GPR state), r2 (the Table of Contents (TOC) on ppc64) is saved onto the stack but getcontext is called after the stack is already saved for this thread. Hence when the thread is restored and r2 is reloaded from the stack it gets a stale value which leads to the seg fault. Below is a simple patch which saves the stack after getcontext. This solves the problem for ppc64 but it's possible it might effect other archs.
Proposing for 6.0.z stream based on GSS input
(In reply to comment #8) > Created attachment 445801 [details] > Save stack after getcontext > ------- Comment on attachment From michael.neuling@au1.ibm.com > > The problem is in ruby's thread switching.... Hi Michael, Thanks for the analysis and patch. I'll look into it soon.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: On PPC64 systems, Ruby is not saving context correctly when switching threads. Thus when a thread is restored it gets a stale value which generally leads to a segfault.
I've made a scratch build with Michael's patch: https://brewweb.devel.redhat.com/taskinfo?taskID=2770958
------- Comment From sbest@us.ibm.com 2010-09-24 16:44 EDT------- i added the rpm with the fix to my gsa dir. please test and let us know your results http://pokgsa.ibm.com/~sbest/public/66803/ ruby-1.8.7.299-5.el6.ppc64.rpm
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,2 +1 @@ -On PPC64 systems, Ruby is not saving context correctly when switching threads. +Under some circumstances on the PPC64 architechture, Ruby does not save the context correctly when switching threads. Consequently, when a thread is restored it has a stale value which might return a segmentaion fault.-Thus when a thread is restored it gets a stale value which generally leads to a segfault.
Could I get a build with the fix please. trying to run puppet on rhel6 ppc64 boxes results in a segfault which is causing me to not be able to configure the fedora ppc64 builders.
Hi Dennis, This should do what you want: https://brewweb.devel.redhat.com/buildinfo?buildID=149690
Upstream patch: https://bugs.ruby-lang.org/projects/ruby-18/repository/revisions/32542