Bug 628715 - ruby segfaults on ppc
Summary: ruby segfaults on ppc
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: ruby
Version: 6.0
Hardware: powerpc
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Vít Ondruch
QA Contact: Aleš Mareček
URL:
Whiteboard:
Depends On: 629274
Blocks: 653824
TreeView+ depends on / blocked
 
Reported: 2010-08-30 20:09 UTC by Ondrej Moriš
Modified: 2018-06-25 16:59 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Under some circumstances on the PPC64 architechture, Ruby does not save the context correctly when switching threads. Consequently, when a thread is restored it has a stale value which might return a segmentaion fault.
Clone Of:
Environment:
Last Closed: 2011-07-11 09:02:30 UTC
Target Upstream Version:


Attachments (Terms of Use)
Save stack after getcontext (479 bytes, text/plain)
2010-09-07 23:00 UTC, IBM Bug Proxy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 66803 0 None None None Never

Description Ondrej Moriš 2010-08-30 20:09:08 UTC
Description of problem:

Here's some info from  Jim Meyering [1]:

# gdb miniruby

Reading symbols from
/root/Meyering/RHEL-6/ruby-1.8.7.299/ruby-1.8.7-p299/miniruby...done.
(gdb) r -I./lib -I.ext/powerpc64-linux server.rb
Starting program: /root/Meyering/RHEL-6/ruby-1.8.7.299/ruby-1.8.7-p299/miniruby
-I./lib -I.ext/powerpc64-linux server.rb
[Thread debugging using libthread_db enabled]
[2010-08-27 10:42:29] INFO  WEBrick 1.3.1
[2010-08-27 10:42:29] INFO  ruby 1.8.7 (2010-06-23) [powerpc64-linux]
[2010-08-27 10:42:29] WARN  TCPServer Error: Address already in use - bind(2)
[2010-08-27 10:42:29] INFO  WEBrick::HTTPServer#start: pid=4574 port=2000
^C
Program received signal SIGINT, Interrupt.
0x00000080affcf428 in ___newselect_nocancel () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.6.el6.ppc64
libgcc-4.4.4-13.el6.ppc64 nss-softokn-freebl-3.12.4-19.el6.ppc64
openssl-1.0.0-4.el6.
ppc64 zlib-1.2.3-25.el6.ppc64
(gdb) watch curr_thread
No symbol "curr_thread" in current context.
(gdb) c
Continuing.
curr_thread: 0x0x10198a70
[New Thread 0xfffb7bbf200 (LWP 4578)]
curr_thread: 0x0x10198a70
curr_thread: 0x0x10180a30

Program received signal SIGSEGV, Segmentation fault.
0x0000000010037fe0 in rb_thread_start_0 (
    fn=@0x10145780: 0x10038610 <rb_thread_yield>, arg=0xfffb7cfbfb8,
    th=0x10180a30) at eval.c:12444  
12444       if (THREAD_SAVE_CONTEXT(curr_thread)) {
(gdb) l
12439       if (OBJ_FROZEN(curr_thread->thgroup)) {
12440           rb_raise(rb_eThreadError,
12441                    "can't start a new thread (frozen ThreadGroup)");
12442       }
12443
12444       if (THREAD_SAVE_CONTEXT(curr_thread)) {
12445           return thread;
12446       }
12447
12448       if (ruby_block) {           /* should nail down higher blocks */
(gdb) p curr_thread
$1 = (rb_thread_t) 0x10198a70
(gdb) p *curr_thread
$2 = {next = 0x10180a30, prev = 0x10180a30, context = {{context = {
        uc_flags = 0, uc_link = 0x0, uc_stack = {ss_sp = 0x0, ss_flags = 0,
          ss_size = 0}, uc_sigmask = {__val = {0 <repeats 16 times>}},
        uc_mcontext = {__unused = {0, 0, 0, 0}, signal = 0, __pad0 = 0,
          handler = 0, oldmask = 0, regs = 0x10198b68, gp_regs = {270109312,
            17592185962496, 552709206616, 0, 17592185962288, 77840, 268447856,
            1, 0, 269972664, 270109296, 552707714176, 17592185962288,
            552707220368, 0 <repeats 13 times>, 552707160960, 552707168448,
            552707160768, 17592185977744, 17592185962496, 268664796, 0, 0,
            552707713448, 268664796, 0, 603980834, 0, 0, 0, 0, 0, 0, 0, 0, 0},
          fp_regs = {0 <repeats 11 times>, 2.9841565008811291e-319,
            4.9406564584124654e-324, 2, 0 <repeats 18 times>,
            -nan(0x8000082002000)}, v_regs = 0x10198e00, vmx_reserve = {0, 0,
            65536, 0 <repeats 19 times>, 65536, -65064, 0 <repeats 42 times>,
            65536, -4294967296, 0}}}, status = 1}}, result = 0, stk_len = 9730,
  stk_max = 9730, stk_ptr = 0x103944d0, stk_pos = 0xffffffebf30,
  frame = 0xffffffec9b0, scope = 0xfffb7cfc2d8, dyna_vars = 0x0,
  block = 0xffffffee640, iter = 0xffffffec980, tag = 0xffffffeec28,
  klass = 17590977005440, wrapper = 0, cref = 0xfffb7ef85c8, flags = 0,
  node = 0xfffb7f2ec90, tracing = 0, errinfo = 4, last_status = 4,
  last_line = 0, last_match = 4, safe = 0, status = THREAD_RUNNABLE,
  wait_for = 0, fd = 0, readfds = {fds_bits = {0 <repeats 16 times>}},
  writefds = {fds_bits = {0 <repeats 16 times>}}, exceptfds = {fds_bits = {
      0 <repeats 16 times>}}, select_value = 0, delay = 0, join = 0x0,
  abort = 0, priority = 0, thgroup = 17590977856400, locals = 0x10229b30,
  thread = 17590977856360, sandbox = 4}
(gdb) bt
#0  0x0000000010037fe0 in rb_thread_start_0 (
    fn=@0x10145780: 0x10038610 <rb_thread_yield>, arg=0xfffb7cfbfb8,
    th=0x10180a30) at eval.c:12444  
#1  0x00000000100389e4 in rb_thread_start (klass=17590977860680,
    args=17590974922680) at eval.c:12652
#2  0x00000000100224c4 in call_cfunc (
    func=@0x101457b0: 0x10038980 <rb_thread_start>, recv=17590977860680,
    len=-2, argc=0, argv=0x0) at eval.c:5775
#3  0x0000000010023974 in rb_call0 (klass=17590977860640, recv=17590977860680,
    id=5313, oid=5313, argc=0, argv=0x0, body=0xfffb7fc9330, flags=0)
    at eval.c:5928
#4  0x0000000010024d48 in rb_call (klass=17590977860640, recv=17590977860680,
    mid=5313, argc=0, argv=0x0, scope=0, self=17590974950640) at eval.c:6176
#5  0x00000000100198dc in rb_eval (self=17590974950640, n=0xfffb7f2ec90)
    at eval.c:3506
#6  0x00000000100182d0 in rb_eval (self=17590974950640, n=0xfffb7f2ed08)
    at eval.c:3236
#7  0x0000000010024520 in rb_call0 (klass=17590977005440, recv=17590974950640,
    id=16105, oid=16105, argc=0, argv=0xffffffefb98, body=0xfffb7f2ed08,
    flags=2) at eval.c:6079
#8  0x0000000010024d48 in rb_call (klass=17590977005440, recv=17590974950640,
    mid=16105, argc=1, argv=0xffffffefb90, scope=1, self=17590974950640)
    at eval.c:6176
#9  0x0000000010019bf8 in rb_eval (self=17590974950640, n=0xfffb7f7bec8)
    at eval.c:3521
#10 0x000000001002e5cc in block_pass (self=17590974950640, node=0xfffb7f7c030)
    at eval.c:9173
#11 0x000000001001800c in rb_eval (self=17590974950640, n=0xfffb7f7c030)
    at eval.c:3222
#12 0x000000001001addc in rb_eval (self=17590974950640, n=0xfffb7f7a690)
    at eval.c:3701
#13 0x000000001001f664 in rb_yield_0 (val=17590974942920, self=17590974950640,
    klass=0, flags=0, avalue=0) at eval.c:5095
#14 0x000000001001fc40 in rb_yield (val=17590974942920) at eval.c:5179
#15 0x00000000100f4dcc in rb_ary_each (ary=17590974925680) at array.c:1261
#16 0x0000000010022544 in call_cfunc (
    func=@0x1014c8c0: 0x100f4d38 <rb_ary_each>, recv=17590974925680, len=0,
    argc=0, argv=0x0) at eval.c:5781
#17 0x0000000010023974 in rb_call0 (klass=17590977829840, recv=17590974925680,
    id=4001, oid=4001, argc=0, argv=0x0, body=0xfffb7fc1388, flags=0)
    at eval.c:5928
#18 0x0000000010024d48 in rb_call (klass=17590977829840, recv=17590974925680,
    mid=4001, argc=0, argv=0x0, scope=0, self=17590974950640) at eval.c:6176
#19 0x00000000100198dc in rb_eval (self=17590974950640, n=0xfffb7f7c558)
    at eval.c:3506
#20 0x00000000100182d0 in rb_eval (self=17590974950640, n=0xfffb7f7a190)
    at eval.c:3236
#21 0x0000000010018998 in rb_eval (self=17590974950640, n=0xfffb7f77ad0)
    at eval.c:3322
#22 0x0000000010017c78 in rb_eval (self=17590974950640, n=0xfffb7f7d7c8)
    at eval.c:3160
#23 0x000000001001f664 in rb_yield_0 (val=6, self=17590974950640, klass=0,
    flags=0, avalue=0) at eval.c:5095
#24 0x0000000010018898 in rb_eval (self=17590977006800, n=0xfffb7f8ab80)
    at eval.c:3311
#25 0x0000000010024520 in rb_call0 (klass=17590977006280, recv=17590977006800,
    id=5313, oid=5313, argc=0, argv=0x0, body=0xfffb7f8ab80, flags=0)
    at eval.c:6079
#26 0x0000000010024d48 in rb_call (klass=17590977006280, recv=17590977006800,
    mid=5313, argc=0, argv=0x0, scope=0, self=17590974950640) at eval.c:6176
#27 0x00000000100198dc in rb_eval (self=17590974950640, n=0xfffb7f7e5d8)
    at eval.c:3506
#28 0x00000000100182d0 in rb_eval (self=17590974950640, n=0xfffb7f7f528)
    at eval.c:3236
#29 0x0000000010024520 in rb_call0 (klass=17590977005440, recv=17590974950640,
    id=5313, oid=5313, argc=0, argv=0x0, body=0xfffb7f7f528, flags=0)
    at eval.c:6079
#30 0x0000000010024d48 in rb_call (klass=17590977005440, recv=17590974950640,
    mid=5313, argc=0, argv=0x0, scope=0, self=17590977884840) at eval.c:6176
#31 0x00000000100198dc in rb_eval (self=17590977884840, n=0xfffb7fada40)
    at eval.c:3506
#32 0x0000000010012220 in eval_node (self=17590977884840, node=0xfffb7fada40)
    at eval.c:1449
#33 0x0000000010012d5c in ruby_exec_internal () at eval.c:1654
#34 0x0000000010012e18 in ruby_exec () at eval.c:1674
#35 0x0000000010012e7c in ruby_run () at eval.c:1684
#36 0x000000001000ed88 in main (argc=4, argv=0xffffffff3a8, envp=0xffffffff3d0)
    at main.c:48
==============================================

Valgrind corroborates:

# valgrind ./miniruby -I./lib -I.ext/powerpc64-linux server.rb
...
... many use-uninit and "Conditional jump or move depends on uninitialised
value(s)"
...
==4874== Use of uninitialised value of size 8
==4874==    at 0x10037FE0: rb_thread_start_0 (eval.c:12444)
==4874==    by 0x100389E3: rb_thread_start (eval.c:12652)
==4874==    by 0x100224C3: call_cfunc (eval.c:5775)
==4874==    by 0x10023973: rb_call0 (eval.c:5928)
==4874==    by 0x10024D47: rb_call (eval.c:6176)
==4874==    by 0x100198DB: rb_eval (eval.c:3506)
==4874==    by 0x100182CF: rb_eval (eval.c:3236)
==4874==    by 0x1002451F: rb_call0 (eval.c:6079)
==4874==    by 0x10024D47: rb_call (eval.c:6176)
==4874==    by 0x10019BF7: rb_eval (eval.c:3521)
==4874==    by 0x1002E5CB: block_pass (eval.c:9173)
==4874==    by 0x1001800B: rb_eval (eval.c:3222)
==4874==
==4874== Invalid read of size 8
==4874==    at 0x10037FE0: rb_thread_start_0 (eval.c:12444)
==4874==    by 0x100389E3: rb_thread_start (eval.c:12652)
==4874==    by 0x100224C3: call_cfunc (eval.c:5775)
==4874==    by 0x10023973: rb_call0 (eval.c:5928)
==4874==    by 0x10024D47: rb_call (eval.c:6176)
==4874==    by 0x100198DB: rb_eval (eval.c:3506)
==4874==    by 0x100182CF: rb_eval (eval.c:3236)
==4874==    by 0x1002451F: rb_call0 (eval.c:6079)
==4874==    by 0x10024D47: rb_call (eval.c:6176)
==4874==    by 0x10019BF7: rb_eval (eval.c:3521)
==4874==    by 0x1002E5CB: block_pass (eval.c:9173)
==4874==    by 0x1001800B: rb_eval (eval.c:3222)
==4874==  Address 0xffffffffffff97a5 is not stack'd, malloc'd or (recently)
free'd
==4874==
./lib/webrick/server.rb:162: [BUG] Segmentation fault
ruby 1.8.7 (2010-06-23 patchlevel 299) [powerpc64-linux]

The above makes it look like there is a stack-relative reference with 0 as the
stack pointer.

This is not specific to webrick.
Running ruby's "make check" rule fails immediately on PPC:

  root@auto-ppcp-002# , check
  ./miniruby -I./lib -I.ext/common -I./- -r./ext/purelib.rb  ./runruby.rb
--extout=.ext  -- "./test/runner.rb" --basedir="./test" --runner=console
  test succeeded
 
/root/Meyering/RHEL-6/ruby-1.8.7.299/ruby-1.8.7-p299/lib/rinda/tuplespace.rb:618:
[BUG] Segmentation fault
  ruby 1.8.7 (2010-06-23 patchlevel 299) [powerpc64-linux]

  make: *** [test-all] Aborted (core dumped)
  [Exit 2]

[1] https://bugzilla.redhat.com/show_bug.cgi?id=605418#c8

Comment 8 IBM Bug Proxy 2010-09-07 23:00:38 UTC
Created attachment 445801 [details]
Save stack after getcontext


------- Comment on attachment From michael.neuling@au1.ibm.com 2010-09-07 18:51 EDT-------


The problem is in ruby's thread switching.  They take a snapshot of the
stack and the GPR state but these are not taken at the same time.  When
getcontext is called (to save the GPR state), r2 (the Table of Contents
(TOC) on ppc64) is saved onto the stack but getcontext is called after
the stack is already saved for this thread.  Hence when the thread is
restored and r2 is reloaded from the stack it gets a stale value which
leads to the seg fault.

Below is a simple patch which saves the stack after getcontext.  This
solves the problem for ppc64 but it's possible it might effect other
archs.

Comment 9 Denise Dumas 2010-09-08 19:36:33 UTC
Proposing for 6.0.z stream based on GSS input

Comment 11 Jim Meyering 2010-09-09 12:11:03 UTC
(In reply to comment #8)
> Created attachment 445801 [details]
> Save stack after getcontext
> ------- Comment on attachment From michael.neuling@au1.ibm.com
> 
> The problem is in ruby's thread switching....

Hi Michael,
Thanks for the analysis and patch.
I'll look into it soon.

Comment 14 Denise Dumas 2010-09-16 15:38:40 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
On PPC64 systems, Ruby is not saving context correctly when switching threads. 
Thus when a thread is restored it gets a stale value which generally leads to a segfault.

Comment 15 Jim Meyering 2010-09-20 20:29:08 UTC
I've made a scratch build with Michael's patch:

    https://brewweb.devel.redhat.com/taskinfo?taskID=2770958

Comment 16 IBM Bug Proxy 2010-09-24 20:51:08 UTC
------- Comment From sbest@us.ibm.com 2010-09-24 16:44 EDT-------
i added the rpm with the fix to my gsa dir. please test and let us know your results
http://pokgsa.ibm.com/~sbest/public/66803/ ruby-1.8.7.299-5.el6.ppc64.rpm

Comment 23 Ryan Lerch 2010-11-04 05:11:25 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,2 +1 @@
-On PPC64 systems, Ruby is not saving context correctly when switching threads. 
+Under some circumstances on the PPC64 architechture, Ruby does not save the context correctly when switching threads. Consequently, when a thread is restored it has a stale value which might return a segmentaion fault.-Thus when a thread is restored it gets a stale value which generally leads to a segfault.

Comment 25 Dennis Gilmore 2010-12-06 22:48:08 UTC
Could I get a build with the fix please. trying to run puppet on rhel6 ppc64 boxes results in a segfault which is causing me to not be able to configure the fedora ppc64 builders.

Comment 26 Jim Meyering 2010-12-07 07:47:06 UTC
Hi Dennis,

This should do what you want:
https://brewweb.devel.redhat.com/buildinfo?buildID=149690

Comment 30 Vít Ondruch 2014-05-09 15:41:22 UTC
Upstream patch: https://bugs.ruby-lang.org/projects/ruby-18/repository/revisions/32542


Note You need to log in before you can comment on or make changes to this bug.