Bug 844123

Summary: /etc/init.d/httpd graceful is being run simultaneously
Product: OKD Reporter: Thomas Wiest <twiest>
Component: ContainersAssignee: Rob Millner <rmillner>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: urgent Docs Contact:
Priority: high    
Version: 2.xCC: jialiu, mfisher, mmcgrath, rchopra
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: devenv_1927 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-07 20:42:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Daemon server to coalesce 'graceful' restarts of apache
none
Service command to be run from cartridges for graceful restart none

Description Thomas Wiest 2012-07-29 05:47:51 UTC
Description of problem:
/etc/init.d/httpd graceful is being run simultaneously when lots of apps are created at a time on the same box, and also when that box is sluggish.

This is a problem because after awhile these processes start stacking up and eat a lot of ram. We've seen boxes where over 20 of these processes are running and the free memory on the system is critically low because of this.

When this happens, it's a vicious cycle where it causes even more "httpd graceful" processes to stack up.

The proper fix would be to have the cartridges use some sort of lock file or queueing system so that multiple aren't trying to restart apache all at once.

Also of note, is that "/etc/init.d/httpd configtest" processes pile up as well. Whatever fix is used for "httpd graceful" should be applied to configtest as well.


Version-Release number of selected component (if applicable):
rhc-node-0.95.8-1.el6_3.x86_64


How reproducible:
In PROD, it's quite reproducible.


Steps to Reproduce:
1. Create a lot of apps simultaneously on a node
2. Run: ps -ef|grep http
3. Notice that graceful is running multiple times
  
Actual results:
Many "/etc/init.d/httpd graceful" processes run at a time.


Expected results:
Only 1 "/etc/init.d/httpd graceful" process should be running at a time.

Comment 1 Mike McGrath 2012-07-30 13:10:32 UTC
Until we can start using mod_express (with a more modern version of apache) I suspect we'll need to detect and auto correct this issue.  an httpd reload might do it, an httpd restart might as well.  If that doesn't work we'll have to manually kill the graceful's and issue a reload.

Another thought is to have stickshift issue a restart when it detects a graceful is already running.

Comment 2 Rob Millner 2012-07-30 18:23:54 UTC
At the least, we should add a lock to the restart_httpd_graceful function that cartridges use; maby through an external script.  That should take care of the dominant cause of the issue being observed.

Comment 3 Rajat Chopra 2012-07-30 21:27:22 UTC
Created attachment 601344 [details]
Daemon server to coalesce 'graceful' restarts of apache

Comment 4 Rajat Chopra 2012-07-30 21:27:55 UTC
Created attachment 601345 [details]
Service command to be run from cartridges for graceful restart

Comment 5 Rajat Chopra 2012-07-30 21:30:42 UTC
Proposed solution :
   Have a daemon that serves the 'graceful' requests, that queues up and coalesces the requests. If there is a 'graceful' process running, and another 'N' applications want a graceful restart, this method will ensure that one 'graceful' addresses all 'N' at once. 
Locks may be inefficient as they will still run N+1 requests in the above scenario (versus 2).

Comment 6 Mike McGrath 2012-07-30 21:38:11 UTC
We don't want an entire daemon that sits and watches this.

Comment 7 Mike McGrath 2012-07-30 21:42:18 UTC
(In reply to comment #2)
> At the least, we should add a lock to the restart_httpd_graceful function
> that cartridges use; maby through an external script.  That should take care
> of the dominant cause of the issue being observed.

So this seems to be that a graceful (which I think is non blocking) just sits in the process queue until intervention is had.

Would the lock mean that no applications could be created until after the graceful/lock clears?  I'm not sure if the multiple graceful's are a symptom or the problem.

Comment 8 Rajat Chopra 2012-07-30 21:50:26 UTC
(In reply to comment #6)
> We don't want an entire daemon that sits and watches this.

Why not? 
Its more efficient than locks, and it reduces the number of 'graceful' processes that end up running also.
It is not watching on any process, just queuing up the requests on a single socket bind.

Comment 9 Rob Millner 2012-07-30 23:54:23 UTC
Sorry - noticed the additional comments on this ticket after I wrote and started testing the lock script.

Queuing up the graceful requests and servicing them all with one call is a lot more efficient; as we discovered when the idler was re-written to do that.

The benefit of processing them once-per-application with a wrapper is its really simple.

Neither of these solutions or what we do now gives you clarity which application may have broken the Apache configuration when there's a lot of simultaneous reconfiguration.

I'm going to finish testing the lock script and issue a pull request for it.  If it leads to restart times which are too long; then we should look at making the wrapper script coalesce requests into one configuration check and graceful restart.

Comment 10 John Poelstra 2012-07-31 17:10:05 UTC
open pull request

Comment 12 Rob Millner 2012-07-31 23:54:33 UTC
Pull requests accepted.

Comment 13 Johnny Liu 2012-08-01 13:10:26 UTC
Verified this bug on devenv_1931, and PASS.

On old instance (devenv_stage_226), this issue can be reproduced.
1. Create 10 apps at the same time.
2. In instance, run the following command to watch http restart process.
<--snip-->
------
root     26113 25202  0 08:38 ?        00:00:00 /bin/sh /sbin/service httpd graceful
root     26121 26113  0 08:38 ?        00:00:00 /bin/bash /etc/init.d/httpd graceful
root     26129 26121  0 08:38 ?        00:00:00 /bin/sh /usr/sbin/apachectl graceful
root     26148 25245  0 08:38 ?        00:00:00 /bin/sh /sbin/service httpd graceful
root     26160 26148  0 08:38 ?        00:00:00 /bin/bash /etc/init.d/httpd graceful
root     26166 26129  4 08:38 ?        00:00:00 /usr/sbin/httpd -k graceful
root     26167 26160  0 08:38 ?        00:00:00 /bin/sh /usr/sbin/apachectl graceful
------
<--snip-->

Found that at least 2 "/etc/init.d/httpd gracesful" processe are running in the same time.


On new instance (devenv_1931), do the same operation like the above steps:
# while :; do ps -ef|grep grace|grep -v grep;sleep 1; echo ------; done
<--snip-->
------
root      1568   712  6 08:58 ?        00:00:00 ruby /usr/libexec/stickshift/cartridges/abstract/info/bin/httpd_singular graceful
root      1790  1568  0 08:58 ?        00:00:00 /bin/sh /sbin/service httpd graceful
root      1801  1790  0 08:58 ?        00:00:00 /bin/bash /etc/init.d/httpd graceful
root      1804  1801  0 08:58 ?        00:00:00 /bin/sh /usr/sbin/apachectl graceful
------
root      1568   712  6 08:58 ?        00:00:00 ruby /usr/libexec/stickshift/cartridges/abstract/info/bin/httpd_singular graceful
root      1790  1568  0 08:58 ?        00:00:00 /bin/sh /sbin/service httpd graceful
root      1801  1790  0 08:58 ?        00:00:00 /bin/bash /etc/init.d/httpd graceful
root      1804  1801  0 08:58 ?        00:00:00 /bin/sh /usr/sbin/apachectl graceful
------
root      1568   712  6 08:58 ?        00:00:00 ruby /usr/libexec/stickshift/cartridges/abstract/info/bin/httpd_singular graceful
root      1790  1568  0 08:58 ?        00:00:00 /bin/sh /sbin/service httpd graceful
root      1801  1790  0 08:58 ?        00:00:00 /bin/bash /etc/init.d/httpd graceful
root      1804  1801  0 08:58 ?        00:00:00 /bin/sh /usr/sbin/apachectl graceful
root      1832  1804  0 08:58 ?        00:00:00 /usr/sbin/httpd -k graceful
------
root      1568   712  6 08:58 ?        00:00:00 ruby /usr/libexec/stickshift/cartridges/abstract/info/bin/httpd_singular graceful
root      1790  1568  0 08:58 ?        00:00:00 /bin/sh /sbin/service httpd graceful
root      1801  1790  0 08:58 ?        00:00:00 /bin/bash /etc/init.d/httpd graceful
root      1804  1801  0 08:58 ?        00:00:00 /bin/sh /usr/sbin/apachectl graceful
root      1832  1804  0 08:58 ?        00:00:00 /usr/sbin/httpd -k graceful
------
<--snip-->

Found that no multiple "/etc/init.d/httpd gracesful" processes are running in the same time.