Hide Forgot
Description of problem: recreate: [root@qeblade31 ~]# /etc/init.d/deltacloud-core start Starting deltacloud-core-mock: >> Deleting stale PID file /var/run/deltacloud/deltacloud-core-mock.pid [ OK ] [root@qeblade31 ~]# ll /var/run/deltacloud/deltacloud-core-mock.pid -rw-r--r--. 1 root root 5 Jan 17 13:21 /var/run/deltacloud/deltacloud-core-mock.pid [root@qeblade31 ~]# ps -ef |grep deltacloud nobody 27041 1 2 13:21 ? 00:00:00 thin server (localhost:3002) [deltacloud-mock] root 27046 24791 0 13:22 pts/0 00:00:00 grep deltacloud [root@qeblade31 ~]# tail /var/log/deltacloud-core/mock.log thin server (localhost:3002) [deltacloud-mock][26967]: 127.0.0.1 - - [17/Jan/2012 13:20:11] "GET /api/instances/ HTTP/1.1" 404 18 0.0041 /usr/lib/ruby/1.8/set.rb:201: [BUG] Segmentation fault ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-linux] >> Writing PID to /var/run/deltacloud/deltacloud-core-mock.pid >> Changing process privilege to nobody:nobody thin server (localhost:3002) [deltacloud-mock][27041]: >> Thin web server (v1.2.11 codename Bat-Shit Crazy) thin server (localhost:3002) [deltacloud-mock][27041]: >> Debugging ON thin server (localhost:3002) [deltacloud-mock][27041]: >> Maximum connections set to 1024 thin server (localhost:3002) [deltacloud-mock][27041]: >> Listening on localhost:3002, CTRL+C to stop [root@qeblade31 ~]# [root@qeblade31 ~]# [root@qeblade31 ~]# /etc/init.d/deltacloud-core restart Shutting down deltacloud-core: [ OK ] Starting deltacloud-core-mock: [ OK ] [root@qeblade31 ~]# tail /var/log/deltacloud-core/mock.log thin server (localhost:3002) [deltacloud-mock][27041]: >> Debugging ON thin server (localhost:3002) [deltacloud-mock][27041]: >> Maximum connections set to 1024 thin server (localhost:3002) [deltacloud-mock][27041]: >> Listening on localhost:3002, CTRL+C to stop thin server (localhost:3002) [deltacloud-mock][27041]: >> Exiting! >> Writing PID to /var/run/deltacloud/deltacloud-core-mock.pid >> Changing process privilege to nobody:nobody thin server (localhost:3002) [deltacloud-mock][27065]: >> Thin web server (v1.2.11 codename Bat-Shit Crazy) thin server (localhost:3002) [deltacloud-mock][27065]: >> Debugging ON thin server (localhost:3002) [deltacloud-mock][27065]: >> Maximum connections set to 1024 thin server (localhost:3002) [deltacloud-mock][27065]: >> Listening on localhost:3002, CTRL+C to stop [root@qeblade31 ~]# ps -ef | grep deltacloud nobody 27065 1 5 13:22 ? 00:00:00 thin server (localhost:3002) [deltacloud-mock] root 27069 24791 0 13:22 pts/0 00:00:00 grep deltacloud [root@qeblade31 ~]# [root@qeblade31 ~]# [root@qeblade31 ~]# [root@qeblade31 ~]# [root@qeblade31 ~]# ####### NOW HERE"S THE PROBLEM ###########################^C [root@qeblade31 ~]# [root@qeblade31 ~]# aeolus-restart-services Stoping ntpd ... Success: Shutting down ntpd: [ OK ] Stoping imagefactory ... Success: Stopping imagefactory: [ OK ] Stoping conductor-dbomatic ... [ OK ]: Shutting down conductor-dbomatic: [ OK ] Stoping aeolus-conductor ... Success: Shutting down aeolus-conductor: [ OK ] Stoping libvirtd ... Success: Stopping libvirtd daemon: [ OK ] Stoping deltacloud-core ... Success: Shutting down deltacloud-core: [ OK ] Stoping qpidd ... Success: Stopping Qpid AMQP daemon: [ OK ] Stoping httpd ... Success: Stopping httpd: [ OK ] Stoping postgresql ... Success: Stopping postgresql service: [ OK ] Stoping iwhd ... [ OK ]: Stopping iwhd daemon: [ OK ] Stoping mongod ... Success: Stopping mongod: [ OK ] Starting mongod ... Success: Starting mongod: [ OK ] Starting iwhd ... Success: waiting for mongod to listen on localhost:27017[ OK ] Starting iwhd daemon: [ OK ] Starting postgresql ... Success: Starting postgresql service: [ OK ] Starting httpd ... Success: Starting httpd: [ OK ] Starting qpidd ... Success: Starting Qpid AMQP daemon: [ OK ] Starting deltacloud-core ... Success: Starting deltacloud-core-mock: [ OK ] Starting libvirtd ... Success: Starting libvirtd daemon: [ OK ] Starting aeolus-conductor ... Success: Starting aeolus-conductor: [ OK ] Starting conductor-dbomatic ... [ OK ]: Starting conductor-dbomatic: [ OK ] Starting imagefactory ... Success: Starting imagefactory: [ OK ] Starting ntpd ... Success: Starting ntpd: [ OK ] Statusing mongod ... Success: mongod (pid 27258) is running... [root@qeblade31 ~]# tail /var/log/deltacloud-core/mock.log thin server (localhost:3002) [deltacloud-mock][27065]: >> Debugging ON thin server (localhost:3002) [deltacloud-mock][27065]: >> Maximum connections set to 1024 thin server (localhost:3002) [deltacloud-mock][27065]: >> Listening on localhost:3002, CTRL+C to stop thin server (localhost:3002) [deltacloud-mock][27065]: >> Exiting! >> Writing PID to /var/run/deltacloud/deltacloud-core-mock.pid >> Changing process privilege to nobody:nobody thin server (localhost:3002) [deltacloud-mock][27421]: >> Thin web server (v1.2.11 codename Bat-Shit Crazy) thin server (localhost:3002) [deltacloud-mock][27421]: >> Debugging ON thin server (localhost:3002) [deltacloud-mock][27421]: >> Maximum connections set to 1024 thin server (localhost:3002) [deltacloud-mock][27421]: >> Listening on localhost:3002, CTRL+C to stop [root@qeblade31 ~]# [root@qeblade31 ~]# [root@qeblade31 ~]# [root@qeblade31 ~]# /etc/init.d/deltacloud-core status deltacloud-core dead but pid file exists [root@qeblade31 ~]# tail /var/log/deltacloud-core/mock.log thin server (localhost:3002) [deltacloud-mock][27065]: >> Exiting! >> Writing PID to /var/run/deltacloud/deltacloud-core-mock.pid >> Changing process privilege to nobody:nobody thin server (localhost:3002) [deltacloud-mock][27421]: >> Thin web server (v1.2.11 codename Bat-Shit Crazy) thin server (localhost:3002) [deltacloud-mock][27421]: >> Debugging ON thin server (localhost:3002) [deltacloud-mock][27421]: >> Maximum connections set to 1024 thin server (localhost:3002) [deltacloud-mock][27421]: >> Listening on localhost:3002, CTRL+C to stop /usr/lib/ruby/gems/1.8/gems/aws-2.4.5/lib/awsbase/utils.rb:100: [BUG] Segmentation fault ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-linux] [root@qeblade31 ~]# [root@qeblade31 ~]# rpm -qa | grep aeolus aeolus-conductor-daemons-0.8.0-7.el6.noarch aeolus-conductor-doc-0.8.0-7.el6.noarch rubygem-aeolus-cli-0.3.0-3.el6.noarch rubygem-aeolus-image-0.3.0-2.el6.noarch aeolus-all-0.8.0-7.el6.noarch aeolus-configure-2.5.0-4.el6.noarch aeolus-conductor-0.8.0-7.el6.noarch [root@qeblade31 ~]# rpm -qa | grep delta deltacloud-core-ec2-0.5.0-1.rc1.el6.noarch deltacloud-core-vsphere-0.5.0-1.rc1.el6.noarch deltacloud-core-0.5.0-1.rc1.el6.noarch rubygem-deltacloud-client-0.4.0-3.el6.noarch deltacloud-core-rhevm-0.5.0-1.rc1.el6.noarch
So.. from what I can tell.. restarting dc w/ just the init script works.. restart w/ aeolus-restart-services and it fails.. wth?
hrm.. it may not be aeolus-restart-services.. which didnt make sense to me anyway looking into it..
adding some color to the bug.. prior to detecting that deltacloud segfaulted I was removing and adding datastores/clusters to the rhevm realm
Created attachment 555838 [details] rails log
thin server (localhost:3002) [deltacloud-mock][28856]: Accept: application/xml thin server (localhost:3002) [deltacloud-mock][28856]: ESC[1;29mProcessing /apiESC[0m (for 127.0.0.1 at Tue Jan 17 13:53:34 -0500 2012) [GET] [ESC[1;29mMockESC[0m] thin server (localhost:3002) [deltacloud-mock][28856]: Parameters: {} thin server (localhost:3002) [deltacloud-mock][28856]: Authentication: Basic thin server (localhost:3002) [deltacloud-mock][28856]: Server: thin 1.2.11 codename Bat-Shit Crazy thin server (localhost:3002) [deltacloud-mock][28856]: Accept: application/xml thin server (localhost:3002) [deltacloud-mock][28856]: ESC[1;29mProcessing /apiESC[0m (for 127.0.0.1 at Tue Jan 17 13:53:34 -0500 2012) [GET] [ESC[1;29mMockESC[0m] thin server (localhost:3002) [deltacloud-mock][28856]: Parameters: {} thin server (localhost:3002) [deltacloud-mock][28856]: Authentication: Basic thin server (localhost:3002) [deltacloud-mock][28856]: Server: thin 1.2.11 codename Bat-Shit Crazy thin server (localhost:3002) [deltacloud-mock][28856]: Accept: application/xml thin server (localhost:3002) [deltacloud-mock][28856]: Completed in ESC[1;29m0.000000ESC[0m | 0.000000 | 200 | ESC[1;36mapplication/xmlESC[0m | http://localhost:3002/api thin server (localhost:3002) [deltacloud-mock][28856]: Completed in ESC[1;29m0.000000ESC[0m | 0.000000 | 200 | ESC[1;36mapplication/xmlESC[0m | http://localhost:3002/api thin server (localhost:3002) [deltacloud-mock][28856]: Completed in ESC[1;29m0.000000ESC[0m | 0.000000 | 200 | ESC[1;36mapplication/xmlESC[0m | http://localhost:3002/api thin server (localhost:3002) [deltacloud-mock][28856]: ESC[1;29mProcessing /apiESC[0m (for 127.0.0.1 at Tue Jan 17 13:53:34 -0500 2012) [GET] [ESC[1;29mEC2 ESC[0m] thin server (localhost:3002) [deltacloud-mock][28856]: Parameters: {} thin server (localhost:3002) [deltacloud-mock][28856]: Authentication: Basic thin server (localhost:3002) [deltacloud-mock][28856]: Server: thin 1.2.11 codename Bat-Shit Crazy thin server (localhost:3002) [deltacloud-mock][28856]: Accept: application/xml thin server (localhost:3002) [deltacloud-mock][28856]: ESC[1;29mProcessing /apiESC[0m (for 127.0.0.1 at Tue Jan 17 13:53:34 -0500 2012) [GET] [ESC[1;29mRHEVMESC[0m] thin server (localhost:3002) [deltacloud-mock][28856]: Parameters: {} thin server (localhost:3002) [deltacloud-mock][28856]: Authentication: Basic thin server (localhost:3002) [deltacloud-mock][28856]: Server: thin 1.2.11 codename Bat-Shit Crazy thin server (localhost:3002) [deltacloud-mock][28856]: Accept: application/xml thin server (localhost:3002) [deltacloud-mock][28856]: ESC[1;29mProcessing /apiESC[0m (for 127.0.0.1 at Tue Jan 17 13:53:34 -0500 2012) [GET] [ESC[1;29mVSphereESC[0m] thin server (localhost:3002) [deltacloud-mock][28856]: Parameters: {} thin server (localhost:3002) [deltacloud-mock][28856]: Authentication: Basic thin server (localhost:3002) [deltacloud-mock][28856]: Server: thin 1.2.11 codename Bat-Shit Crazy thin server (localhost:3002) [deltacloud-mock][28856]: Accept: application/xml /usr/lib/ruby/gems/1.8/gems/aws-2.4.5/lib/awsbase/errors.rb:32: [BUG] Segmentation fault ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-linux]
Just using the deltacloud init script shows the following status for the service: >> Start service [root@localhost /]# /etc/init.d/deltacloud-core start Starting deltacloud-core (via systemctl): [ OK ] [root@localhost /]# /etc/init.d/deltacloud-core status deltacloud-core.service - LSB: Deltacloud Core API deamon Loaded: loaded (/etc/rc.d/init.d/deltacloud-core) Active: active (running) since Tue, 17 Jan 2012 08:03:43 -0500; 5h 46min ago Process: 1177 ExecStart=/etc/rc.d/init.d/deltacloud-core start (code=exited, status=0/SUCCESS) Main PID: 1379 (deltacloudd) CGroup: name=systemd:/system/deltacloud-core.service └ 1379 thin server (localhost:3002) [deltacloud-ec2] ... >> Stop service [root@localhost /]# /etc/init.d/deltacloud-core stop Stopping deltacloud-core (via systemctl): [ OK ] [root@localhost /]# /etc/init.d/deltacloud-core status deltacloud-core.service - LSB: Deltacloud Core API deamon Loaded: loaded (/etc/rc.d/init.d/deltacloud-core) Active: failed since Tue, 17 Jan 2012 13:50:23 -0500; 3s ago Process: 6232 ExecStop=/etc/rc.d/init.d/deltacloud-core stop (code=exited, status=0/SUCCESS) Process: 1177 ExecStart=/etc/rc.d/init.d/deltacloud-core start (code=exited, status=0/SUCCESS) Main PID: 1379 (code=killed, signal=KILL) CGroup: name=systemd:/system/deltacloud-core.service >> Restart service [root@localhost /]# /etc/init.d/deltacloud-core start Starting deltacloud-core (via systemctl): [ OK ] [root@localhost /]# /etc/init.d/deltacloud-core status deltacloud-core.service - LSB: Deltacloud Core API deamon Loaded: loaded (/etc/rc.d/init.d/deltacloud-core) Active: active (exited) since Tue, 17 Jan 2012 13:51:06 -0500; 7s ago Process: 6232 ExecStop=/etc/rc.d/init.d/deltacloud-core stop (code=exited, status=0/SUCCESS) Process: 6261 ExecStart=/etc/rc.d/init.d/deltacloud-core start (code=exited, status=0/SUCCESS) Main PID: 6265 (code=exited, status=0/SUCCESS) CGroup: name=systemd:/system/deltacloud-core.service └ 6266 thin server (localhost:3002) [deltacloud-ec2] ... Init script tested has 'kill -9' in the stop code: stop() { echo -n $"Shutting down $prog: " ... kill -9 `cat $PIDFILE` retval=$? if [ $retval -eq 0 ] && rm -f $LOCKFILE $PIDFILE; then ... fi }
Playing with gdb using fvollero reveals these details: #0 0x0000003b01232885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003b01234065 in abort () at abort.c:92 #2 0x0000003b00e2dac3 in rb_bug (fmt=0x3b00ec608c "Segmentation fault") at error.c:213 #3 0x0000003b00e95ddc in sigsegv (sig=11) at signal.c:634 #4 <signal handler called> #5 0x0000003b00e991c4 in st_foreach (table=0x3b00e38bdc, func=0x3b00e51260 <mark_entry>, arg=0) at st.c:486 #6 0x0000003b00e50caf in mark_locations_array (x=<value optimized out>, n=55194) at gc.c:690 #7 0x0000003b00e2ec39 in thread_mark (th=0x2929cb0) at eval.c:10541 #8 0x0000003b00e50fb0 in gc_mark_children (ptr=<value optimized out>, lev=10) at gc.c:1012 #9 0x0000003b00e5126e in mark_entry (key=<value optimized out>, value=<value optimized out>, lev=<value optimized out>) at gc.c:712 #10 0x0000003b00e991dd in st_foreach (table=0x244e3b0, func=0x3b00e51260 <mark_entry>, arg=9) at st.c:487 #11 0x0000003b00e50e88 in mark_tbl (ptr=140710230565000, lev=9) at gc.c:722 #12 gc_mark_children (ptr=140710230565000, lev=9) at gc.c:998 #13 0x0000003b00e5126e in mark_entry (key=<value optimized out>, value=<value optimized out>, lev=<value optimized out>) at gc.c:712 #14 0x0000003b00e991dd in st_foreach (table=0x217f710, func=0x3b00e51260 <mark_entry>, arg=8) at st.c:487 #15 0x0000003b00e50e88 in mark_tbl (ptr=140710237645400, lev=8) at gc.c:722 #16 gc_mark_children (ptr=140710237645400, lev=8) at gc.c:998 #17 0x0000003b00e50efe in gc_mark_children (ptr=140710237499080, lev=7) at gc.c:882 #18 0x0000003b00e5126e in mark_entry (key=<value optimized out>, value=<value optimized out>, lev=<value optimized out>) at gc.c:712 #19 0x0000003b00e991dd in st_foreach (table=0x217fa30, func=0x3b00e51260 <mark_entry>, arg=6) at st.c:487 #20 0x0000003b00e50e70 in mark_tbl (ptr=140710237645320, lev=6) at gc.c:722 #21 gc_mark_children (ptr=140710237645320, lev=6) at gc.c:997 #22 0x0000003b00e5126e in mark_entry (key=<value optimized out>, value=<value optimized out>, lev=<value optimized out>) at gc.c:712 #23 0x0000003b00e991dd in st_foreach (table=0x217ffa0, func=0x3b00e51260 <mark_entry>, arg=5) at st.c:487 #24 0x0000003b00e50e88 in mark_tbl (ptr=140710237645200, lev=5) at gc.c:722 #25 gc_mark_children (ptr=140710237645200, lev=5) at gc.c:998 #26 0x0000003b00e50d31 in gc_mark_children (ptr=140710237624280, lev=4) at gc.c:992 #27 0x0000003b00e50d31 in gc_mark_children (ptr=140710237624320, lev=3) at gc.c:992 #28 0x0000003b00e50d31 in gc_mark_children (ptr=140710204844320, lev=2) at gc.c:992 #29 0x0000003b00e50efe in gc_mark_children (ptr=140710204844360, lev=1) at gc.c:882 #30 0x0000003b00e514a8 in garbage_collect () at gc.c:1473 #31 0x0000003b00e521cf in ruby_xmalloc (size=<value optimized out>) at gc.c:153 #32 0x0000003b00e9ac80 in rb_str_buf_new (capa=128) at string.c:234 #33 0x0000003b00e581cc in appendline (fptr=0x3a91cd0, delim=10, strp=0x7fff89b26ab8) at io.c:1567 #34 0x0000003b00e5833f in rb_io_getline_fast (fptr=0x3a91cd0, delim=10 '\n') at io.c:1690 #35 0x0000003b00e7a597 in lex_getline () at parse.y:2743 #36 nextc () at parse.y:2795 #37 ruby_yylex () at parse.y:3469 #38 ruby_yyparse () at parse.c:4376 #39 0x0000003b00e7f11b in yycompile (f=0x3a8dd30 "/usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/core_ext/module/synchronization.rb", line=<value optimized out>) at parse.y:2692 #40 0x0000003b00e95791 in load_file (fname=0x3a8dd30 "/usr/lib/ruby/gems/1.8/gems/activesupport-3.0.10/lib/active_support/core_ext/module/synchronization.rb", script=0) at ruby.c:982 #41 0x0000003b00e4a1ef in rb_load (fname=140710230753280, wrap=<value optimized out>) at eval.c:7135 #42 0x0000003b00e4a90a in rb_require_safe (fname=140710204844320, safe=0) at eval.c:7519 For us it seems like a 'ruby' bug in garbage collector. No clue about how to fix it :-( Any clues?
Michal found this blog post, it appears there is a problem with mri ruby itself in how it interacts with gcc, or at least the api it provides for c extensions. This appears to be very bad news: http://timetobleed.com/the-broken-promises-of-mrireeyarv/
might be related https://bugzilla.redhat.com/show_bug.cgi?id=772708
https://bugzilla.redhat.com/show_bug.cgi?id=765611
I've got that narrowed down to just the ruby interpreter - on the one machine where this is reproducible, it happens when running the Deltacloud server under WEBRick and using json/pure Investigation still ongoing
*** Bug 785166 has been marked as a duplicate of this bug. ***
*** Bug 786432 has been marked as a duplicate of this bug. ***
*** Bug 786499 has been marked as a duplicate of this bug. ***
additional details from QE in the three above closed bugs
currently one can probably recreate the segfault by starting and stopping applications in the conductor ui
Moving back to assigned -- Wes found another reproducer.
My error - Jim informs me that this one is in fact fixed, and that there is a different issue, https://bugzilla.redhat.com/show_bug.cgi?id=786432, that we should be tracking. Moving this back to on_qa.
Created attachment 560050 [details] thin C backtrace Wes H. set this up. Provoked by running aeolus-restart-services, and then "delete an ec2 instance from the web-UI"
TL;DR thin's app-deletion code has been fixed so that it no longer triggers the still-present bug in ruby. ---------------------- We've been plagued by ruby-1.8.7 (RHEL-6.2) bugs. First it was one that made deltacloud segfault consistently. That was fixed by this (year-old!) marshalling patch: http://bugs.ruby-lang.org/issues/4339 [resolved in CE by http://bugzilla.redhat.com/782573] Then there was a harder-to-reproduce one. One way is the to run a conductor test repeatedly: while :; do cucumber features/provider.feature; sleep 2; done Another way to reproduce it is to set up all services, provision and create an EC2 application, and delete that app via the UI. Just like the preceding segfaults, it happens hundreds of levels down the stack (400-600) in garbage collection, usually during the marking phase. Turning on GC-always makes the tool so slow that it is unusable. At first, we suspected that some gem-with-C-compiled bits was making the same mistake fixed by the marshalling patch. Last I heard, David Lutterkort had eliminated most such gems as the possible cause. I also audited ruby itself, looking for similar abuse to what was fixed by the marshalling patch. prelude.c had one, but it's very special, and in fact, ok. Michal Fojtik pointed me to this site: https://sites.google.com/site/brentsrubypatches/ which seemed very promising. I've applied 4 patches (1,1a,2,3) from his git repo and made a new rhel6 rpm. With it, there is no more segfault. [ FTR, applying just 1 and 1a did not help. I don't remember if I tried also with just 1..2, so it's quite possible that patch 3 is not needed. ] With that segfault out of the way, we found that we could now exercise the app-deletion code, which had a problem: The deletion request that was segfaulting before now completes normally, but takes 2m44s, during which thin spawns over 5000 threads. Fixing the thread storm (done by http://bugzilla.redhat.com/791195) makes it so this ruby problem is no longer triggered. Here's description from the brentsrubypatches link above for the patch that seems most likely to be fixing our problem: MBARI3.patch: Ruby's conservative garbage collector cannot tell whether machine words on the 'C' stack are object pointers or integers, etc. because there is no type information associated with them. A conservative collector works by "conserving" every object to which there could possibly be a reference. In the 1.8 and 1.6 series Ruby implementations, this means scanning the stack of each Thread and Continuation assuming that every word is an object pointer if it has a value could be so interpreted. In practice, this is not as bad is it may seem, as Ruby's collector does not consider pointers "inside" an object to be valid -- only those that point to its exact base address. So, even assuming thousands of live objects, a 32-bit address space will remain very sparsely populated with valid object pointers. The garbage collector's leaking memory is not really its own fault. The trouble is that the 'C' machine stack is filled with object references. The main reason for this is that gcc compilers create overly large stack frames and do not initialize many values in them. Certain 'C' constructs used in the Ruby interpreter's core recursive expression evaluator generate especially large, sparse stack frames. The function rb_eval() is the worst offender, creating kilobyte sized stack frames for each invocation of a function that may call itself hundreds of times. This results in stacks that are hundreds of kilobytes, often full of old, dead object references that may never go away. If there were a gcc compiler option to initialize all local variables to zero whever a new stack frame is built, that would let the collector do its work properly, but no such option exists. The MBARI3 patch tracks the maximum depth of the 'C' stack at critical points in the interpreter's execution so that, whenever the stack shrinks, its old contents can be replaced with zeros. One might think this would have no effect because each new stack frame would simply overwrite all the zeros. But, as noted in the previous paragraph, gcc does not optimize that way, rather it optimizes for time by writing only those words in each frame actually needed for a specific execution path through the function. If those values are left uninitialized, when the garbage collector runs, the collector interprets some of them as valid object pointers that must be preserved. Even though these "ghost values" were written by previous iterations of functions that are no longer active. This patch ensures that almost all those hitherto uninitialized values are zero when the collector runs. In practice, for MBARI's ESP application, I observed that the process size increased slowly without bound over many hours of operation. After patching, the process size stayed below 10MB for days. Before, I had seen it increase to >24Mbytes. (Our ARM targets have only 32MB of DRAM!) Your mileage may vary, but I expect every long running application will see significant reduction in process size although some may run slightly slower due to the overhead of clearing the stack. In case anyone wants to play, RPMs (src and x86_64) of the patched ruby are here: http://people.redhat.com/meyering/tmp/
unable to recreate in ^C [root@qeblade30 ~]# rpm -qa | grep aeolus rubygem-aeolus-image-0.3.0-10.el6.noarch aeolus-conductor-doc-0.8.0-36.el6.noarch rubygem-aeolus-cli-0.3.0-10.el6.noarch aeolus-all-0.8.0-36.el6.noarch aeolus-conductor-0.8.0-36.el6.noarch aeolus-configure-2.5.0-15.el6.noarch aeolus-conductor-daemons-0.8.0-36.el6.noarch [root@qeblade30 ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2012-0587.html
This is another good reason not to use Ruby 1.8.7: https://bugzilla.redhat.com/show_bug.cgi?id=843188#c6