Bug 734180

Summary: Ruby hangs when making certain uses of fork
Product: Red Hat Enterprise Linux 6 Reporter: Casey Dahlin <cdahlin>
Component: rubyAssignee: Vít Ondruch <vondruch>
Status: CLOSED WONTFIX QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: james.brown, jduncan, lwang, rprice, vanhoof
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-07 22:34:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 782183    

Description Casey Dahlin 2011-08-29 16:10:05 UTC
The customer is noticing hangs in the following test script:

http://redmine.ruby-lang.org/attachments/929/forktest.rb

Which should indicate the presence of this upstream bug:

http://redmine.ruby-lang.org/issues/2739

The bug was supposedly fixed in this commit:

http://redmine.ruby-lang.org/projects/ruby-187/repository/revisions/28203

And included in the next release:

http://svn.ruby-lang.org/repos/ruby/tags/v1_8_7_352/ChangeLog

But our packages of that release (which haven't been sent out as an update yet) don't seem to resolve the issue for the customer.

Comment 2 Vít Ondruch 2011-08-30 08:12:24 UTC
I tested the reproducer with latest Ruby in Fedora and I can reproduce the issue. Lets see what is the upstream going to say about the issue [1].

[1] http://redmine.ruby-lang.org/issues/3100

Comment 6 KOSAKI Motohiro 2011-11-24 04:11:27 UTC
Hi

I did handled this issue at upstream a year ago. And I can't reproduce this issue on both ruby_1_8_7 branch and RHEL 6.2 latest and internal RC. forktest.rb completely works.

Can anyone provide detailed reproduce instruction?

Comment 7 Vít Ondruch 2011-11-24 17:36:08 UTC
(In reply to comment #6)
> Hi
> 
> I did handled this issue at upstream a year ago. And I can't reproduce this
> issue on both ruby_1_8_7 branch and RHEL 6.2 latest and internal RC.
> forktest.rb completely works.
> 
> Can anyone provide detailed reproduce instruction?

Hi, good to see you in Red Hat :) I hope I am not too late to notice that.

I can reproduce it with:

1) Fedora 16
$ rpm -q ruby
ruby-1.8.7.352-1.fc16.x86_64

$ ruby forktest.rb 5

I wrote like 5 lines of dots until it frozen.

2) Mock for RHEL-6.2 on F16, prepared using following brew repository:
baseurl=http://download.englab.brq.redhat.com/brewroot/repos/RHEL-6.2-build/latest/x86_64

# ruby -v
ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-linux]

# ruby forktest.rb 5

Something like whole page of dots until lockup.

3) RHEL 6.1 VM with Ruby 1.8.7 from RHEL-6.2 from following build:
https://brewweb.devel.redhat.com/buildinfo?buildID=175740

$ rpm -q ruby
ruby-1.8.7.352-3.el6.x86_64

$ ruby forktest.rb 10000

I can run the forktest even with 10000 and it lockups immediately, i.e. it spawns the 4 children, it runs once the info.each do |r,w,*e| but hangs on the r.gets line for second time. It was able to run if further just once from a few attempts.

Comment 8 KOSAKI Motohiro 2011-11-24 20:14:31 UTC
Hmmm.. I still have no luck. Does this issue have hardware configuration dependency?

Comment 9 Vít Ondruch 2011-11-28 16:50:45 UTC
I have reserved one virtual machine in beaker: sgi-xe500-01.rhts.eng.bos.redhat.com

I ran the test for several times and I observed two scenarios.

1) The test fails with following error: 

http://pastebin.test.redhat.com/69235

2) Hang, sluggish response
The hang is of strange nature. Once it happened, I tried to spawn another ssh connection to look what is going on and I had to wait like two minutes before the prompt appeared. At the end, before I was able to do anything else, the test suddenly continued. So this might be something completely unrelated to Ruby, however similar to the issue what the original reporter observes.

However, you might want to try by yourself, by running "$ ssh root.eng.bos.redhat.com 'ruby forktest.rb 2'". The machine should be available at lease for another 95 hours and it is possible to extend the period.

Comment 10 KOSAKI Motohiro 2011-12-02 17:26:06 UTC
Hi

I played some time sgi-xe500-01 and I could only reproduced (1). Thank you.
Unfortunately, I have no time awhile. I have another serious issue now. I plan to
resume this investigation 1 or 2 week later.

Current my guess is, it is tty related issue because my KVM guest couldn't reproduce
the issue. but I don't have any evidence yet.


my memo
  sgi-xe500-01
  RHEL6.2-20111117.0
  x86_64

Comment 14 Jamie Duncan 2012-02-28 22:07:18 UTC
(In reply to comment #8)
> Hmmm.. I still have no luck. Does this issue have hardware configuration
> dependency?

I was able to reproduce this on a KVM virtual machine running RHEL 6.2. Can provide specs if desired.

Comment 15 Jamie Duncan 2012-02-28 22:09:01 UTC
# rpm -qa ruby
ruby-1.8.7.352-4.el6_2.x86_64
# uname -a
Linux test.duncan.net 2.6.32-220.4.2.el6.x86_64 #1 SMP Mon Feb 6 16:39:28 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

from /usr/share/doc/ruby-1.8.7.352/ChangeLog

Tue Jun  8 12:37:56 2010  NAKAMURA Usaku  <usa>

        * eval.c (thread_timer, rb_thread_stop_timer): check the timing of
          stopping timer.  patch from KOSAKI Motohiro <kosaki.motohiro _AT_
          jp.fujitsu.com> via IRC.

        * eval.c (rb_thread_start_timer): NetBSD5 seems to be hung when calling
          pthread_create() from pthread_atfork()'s parent handler.

        * io.c (pipe_open): workaround for NetBSD5. stop timer thread before
          fork(), and restart it after fork() on parent, and on child if
          needed.

        * process.c (rb_f_fork, rb_f_system): ditto.

          these changes are tested by naruse.  fixed [ruby-dev:40074]

from http://bugs.ruby-lang.org/projects/ruby-187/repository/revisions/28203/:

merge revision(s) 26371,26373,26374,26972: 

* eval.c (thread_timer, rb_thread_stop_timer): check the timing of stopping timer. patch from KOSAKI Motohiro <kosaki.motohiro _AT_ jp.fujitsu.com> 

* eval.c (rb_thread_start_timer): NetBSD5 seems to be hung when calling pthread_create() from pthread_atfork()'s parent handler. 

* io.c (pipe_open): workaround for NetBSD5. stop timer thread before fork(), and start it if needed. 

* process.c (rb_f_fork, rb_f_system): ditto. fixed [ruby-dev:40074] jp.fujitsu.com> via IRC. fork(), and restart it after fork() on parent, and on child if needed. these changes are tested by naruse. fixed [ruby-dev:40074] 

* io.c, eval.c, process.c: add linux to r26371's condition. patched by Motohiro KOSAKI [ruby-core:28151]

So it SEEMS to have been addressed in the current RHEL 6 Ruby release.

Tried to reproduce the issue with http://redmine.ruby-lang.org/attachments/929/forktest.rb
(actually http://bugs.ruby-lang.org/attachments/download/929/forktest.rb)

root      2459  0.0  0.0 103300   816 pts/5    S+   09:21   0:00 grep ruby
[root@rhev-m ~]# ps aux |grep ruby
root      2416  0.1  0.0  40356  2720 pts/0    Sl+  09:19   0:00 ruby forktest.rb 1
root      2420  0.0  0.0  40200  1580 pts/2    Ss+  09:19   0:00 ruby forktest.rb 1
root      2461  0.0  0.0 103300   820 pts/5    S+   09:21   0:00 grep ruby
[root@rhev-m ~]# strace 2420
strace: 2420: command not found
[root@rhev-m ~]# strace -p 2420
Process 2420 attached - interrupt to quit
futex(0x7f3bb3e4aa20, FUTEX_WAIT_PRIVATE, 2, NULL^C <unfinished ...>
Process 2420 detached
[root@rhev-m ~]# strace -p 2416
Process 2416 attached - interrupt to quit
select(6, [5], [], [], {0, 479205})     = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {2054, 723746600}) = 0
select(6, [5], [], [], {0, 0})          = 0 (Timeout)
wait4(2420, 0x7fff5e21923c, WNOHANG, NULL) = 0
select(6, [5], [], [], {0, 0})          = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {2054, 723916265}) = 0
clock_gettime(CLOCK_MONOTONIC, {2054, 723969007}) = 0
select(6, [5], [], [], {0, 999947})     = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {2055, 725152353}) = 0
select(6, [5], [], [], {0, 0})          = 0 (Timeout)
wait4(2420, 0x7fff5e21923c, WNOHANG, NULL) = 0
select(6, [5], [], [], {0, 0})          = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {2055, 725469671}) = 0
clock_gettime(CLOCK_MONOTONIC, {2055, 725501775}) = 0
select(6, [5], [], [], {0, 999967})     = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {2056, 726757643}) = 0
select(6, [5], [], [], {0, 0})          = 0 (Timeout)
wait4(2420, 0x7fff5e21923c, WNOHANG, NULL) = 0
select(6, [5], [], [], {0, 0})          = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {2056, 726927175}) = 0
clock_gettime(CLOCK_MONOTONIC, {2056, 726960758}) = 0
select(6, [5], [], [], {0, 999966})     = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {2057, 728139687}) = 0
select(6, [5], [], [], {0, 0})          = 0 (Timeout)
wait4(2420, 0x7fff5e21923c, WNOHANG, NULL) = 0
select(6, [5], [], [], {0, 0})          = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {2057, 728415349}) = 0
clock_gettime(CLOCK_MONOTONIC, {2057, 728449740}) = 0
select(6, [5], [], [], {0, 999965})     = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {2058, 729603479}) = 0
select(6, [5], [], [], {0, 0})          = 0 (Timeout)
wait4(2420, 0x7fff5e21923c, WNOHANG, NULL) = 0
select(6, [5], [], [], {0, 0})          = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {2058, 729772152}) = 0
clock_gettime(CLOCK_MONOTONIC, {2058, 729805878}) = 0
select(6, [5], [], [], {0, 999966})     = 0 (Timeout)

Comment 16 RHEL Program Management 2012-07-10 06:09:02 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 17 RHEL Program Management 2012-07-11 01:52:40 UTC
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.

Comment 19 Tom Lavigne 2012-09-18 15:24:47 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.
    
Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 20 RHEL Program Management 2013-10-14 01:00:51 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.