Description of problem: HAproxy's current method of determining when to scale is suboptimal. It scales based on a spot-check of the number of current connections, with unconfigurable thresholds. This should be improved by backporting upstream PR https://github.com/openshift/origin-server/pull/4438 which introduces a moving average of the current connections (which should be a lot more stable than the spot check) and also the ability to configure thresholds. commit 47428027c64d7e282e1b91b682e1d05ea11fceb3 Author: Dan McPherson <dmcphers> Date: Wed Jan 8 21:24:58 2014 -0500
*** This bug has been marked as a duplicate of bug 1056700 ***
Moving notes from 1056700 here to be publicly available. ------------------------------------------------------- This include the following upstream commits: commit 47428027c64d7e282e1b91b682e1d05ea11fceb3 Author: Dan McPherson <dmcphers> Date: Wed Jan 8 21:24:58 2014 -0500 Make sessions per gear configurable and use moving average for num sessions commit 05d52c0b06301d6e24b5ac1a31cd107ff16e31c2 Author: Dan McPherson <dmcphers> Date: Fri Jan 10 14:14:20 2014 -0500 Bug 1051446 commit f116af85558db377d849246c089824c670215c15 Author: Dan McPherson <dmcphers> Date: Mon Jan 13 13:09:12 2014 -0500 Bisect the scale up/down threshold more evenly for lower scale numbers commit 8dd6002c5578930efce2d19a0fe486f8d309940a Author: Dan McPherson <dmcphers> Date: Wed Jan 22 11:06:35 2014 -0500 Bug 1056483 - Better error messaging with direct usage of haproxy_ctld commit c362b17efe2037ee6b5a84e0dabe5aa6b447362c Author: Ben Parees <bparees> Date: Fri Nov 15 17:11:46 2013 -0500 Bug 1029679: handle connection refused error with clean error message ---------------------------------------------------------------- PR: https://github.com/openshift/enterprise-server/pull/204 After applying this update and restarting mcollective admins will need to run the following _on the broker_: rm -rf /tmp/oo-upgrade oo-admin-upgrade upgrade-node --version 2.0.3
*** Bug 1056700 has been marked as a duplicate of this bug. ***
Just FYI, no puddle has been created with this bug yet, even though it is marked ON_QA. Will create the new puddle tomorrow.
verify this bug with package openshift-origin-cartridge-haproxy-1.17.3.2-1.el6op.noarch Auto scaling up/down works well with the moving average algorithm. ... D, [2014-01-25T22:18:38.029551 #30094] DEBUG -- : Local sessions 4 D, [2014-01-25T22:18:38.029656 #30094] DEBUG -- : Got stats from 0 remote proxies. D, [2014-01-25T22:18:38.030687 #30094] DEBUG -- : Local sessions 8 D, [2014-01-25T22:18:38.030748 #30094] DEBUG -- : Got stats from 0 remote proxies. D, [2014-01-25T22:18:38.030830 #30094] DEBUG -- : GEAR_INFO - capacity: 50.0% gear_count: 1 sessions: 8 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 0 gear_remove_thresh: 0/20 D, [2014-01-25T22:18:43.041988 #30094] DEBUG -- : Local sessions 12 D, [2014-01-25T22:18:43.042125 #30094] DEBUG -- : Got stats from 0 remote proxies. D, [2014-01-25T22:18:43.052249 #30094] DEBUG -- : Local sessions 17 D, [2014-01-25T22:18:43.052371 #30094] DEBUG -- : Got stats from 0 remote proxies. D, [2014-01-25T22:18:43.052460 #30094] DEBUG -- : GEAR_INFO - capacity: 106.25% gear_count: 1 sessions: 17 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 0 gear_remove_thresh: 0/20 D, [2014-01-25T22:18:48.053394 #30094] DEBUG -- : Local sessions 17 D, [2014-01-25T22:18:48.053535 #30094] DEBUG -- : Got stats from 0 remote proxies. I, [2014-01-25T22:18:48.053630 #30094] INFO -- : add-gear - capacity: 106.25% gear_count: 1 sessions: 17 up_thresh: 90.0% I, [2014-01-25T22:19:38.080141 #30094] INFO -- : add-gear - exit_code: 0 output: D, [2014-01-25T22:19:38.081117 #30094] DEBUG -- : Local sessions 17 D, [2014-01-25T22:19:38.081172 #30094] DEBUG -- : Got stats from 0 remote proxies. D, [2014-01-25T22:19:38.081255 #30094] DEBUG -- : GEAR_INFO - capacity: 53.125% gear_count: 2 sessions: 17 up/remove_thresh: 90.0%/31.5% sec_left_til_remove: 550 gear_remove_thresh: 0/20 ... ---------------------------------------------------------------------- ... D, [2014-01-25T22:28:48.559652 #30094] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 2 sessions: 0 up/remove_thresh: 90.0%/31.5% sec_left_til_remove: 0 gear_remove_thresh: 20/20 D, [2014-01-25T22:28:53.560616 #30094] DEBUG -- : Local sessions 0 D, [2014-01-25T22:28:53.560736 #30094] DEBUG -- : Got stats from 0 remote proxies. I, [2014-01-25T22:28:53.560882 #30094] INFO -- : remove-gear - capacity: 0.0% gear_count: 2 sessions: 0 remove_thresh: 31.5% I, [2014-01-25T22:29:04.925786 #30094] INFO -- : remove-gear - exit_code: 0 output: D, [2014-01-25T22:29:04.927707 #30094] DEBUG -- : Local sessions 0 D, [2014-01-25T22:29:04.927767 #30094] DEBUG -- : Got stats from 0 remote proxies. D, [2014-01-25T22:29:04.927839 #30094] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 1 sessions: 0 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 0 gear_remove_thresh: 20/20 ... And the issue BZ#1051446 mentioned didn't appear. while the app is topped: [app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> haproxy_ctld -u An error occurred; try again later: Could not connect to the application. Check if the application is stopped. [root@broker openshift]# rhc cartridge scale -c python-2.7 -a app3 --min 2 --max 2 This operation will run until the application is at the minimum scale and may take several minutes. Setting scale range for python-2.7 ... done [app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> haproxy_ctld -d Cannot remove gear because min limit '2' reached. [app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> echo $? 1 [app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> haproxy_ctld -u Cannot add gear because max limit '2' reached. [app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> echo $? 1 so move this bug to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0209.html