Bug 1057183 - Backport HAproxy auto-scaling enhancements
Summary: Backport HAproxy auto-scaling enhancements
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: ImageStreams
Version: 2.0.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Brenton Leanhardt
QA Contact: libra bugs
URL:
Whiteboard:
: 1056700 (view as bug list)
Depends On:
Blocks: 990500 1036728
TreeView+ depends on / blocked
 
Reported: 2014-01-23 15:24 UTC by Luke Meyer
Modified: 2017-03-08 17:36 UTC (History)
5 users (show)

Fixed In Version: openshift-origin-cartridge-haproxy-1.17.3.2-1.el6op
Doc Type: Enhancement
Doc Text:
Previously, the way the HAProxy cartridge determined when to scale an application was not optimal because it checked the number of connections against a fixed threshold, which could impact stability or performance. This enhancement improves the HAProxy cartridge so that it uses a moving average of the number of current connections and provides a configurable threshold. The following command must be run after applying this fix: # oo-admin-upgrade upgrade-node --version=2.0.3 See the Solution section in the errata advisory for full details.
Clone Of:
Environment:
Last Closed: 2014-02-25 15:43:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0209 0 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 2.0.3 bugfix and enhancement update 2014-02-25 20:40:32 UTC

Description Luke Meyer 2014-01-23 15:24:49 UTC
Description of problem:
HAproxy's current method of determining when to scale is suboptimal. It scales based on a spot-check of the number of current connections, with unconfigurable thresholds.

This should be improved by backporting upstream PR https://github.com/openshift/origin-server/pull/4438 which introduces a moving average of the current connections (which should be a lot more stable than the spot check) and also the ability to configure thresholds.

commit 47428027c64d7e282e1b91b682e1d05ea11fceb3
Author: Dan McPherson <dmcphers>
Date:   Wed Jan 8 21:24:58 2014 -0500

Comment 1 Luke Meyer 2014-01-23 15:26:44 UTC

*** This bug has been marked as a duplicate of bug 1056700 ***

Comment 2 Luke Meyer 2014-01-23 19:32:13 UTC
Moving notes from 1056700 here to be publicly available.

-------------------------------------------------------

This include the following upstream commits:

commit 47428027c64d7e282e1b91b682e1d05ea11fceb3
Author: Dan McPherson <dmcphers>
Date:   Wed Jan 8 21:24:58 2014 -0500

    Make sessions per gear configurable and use moving average for num sessions

commit 05d52c0b06301d6e24b5ac1a31cd107ff16e31c2
Author: Dan McPherson <dmcphers>
Date:   Fri Jan 10 14:14:20 2014 -0500

    Bug 1051446

commit f116af85558db377d849246c089824c670215c15
Author: Dan McPherson <dmcphers>
Date:   Mon Jan 13 13:09:12 2014 -0500

    Bisect the scale up/down threshold more evenly for lower scale numbers

commit 8dd6002c5578930efce2d19a0fe486f8d309940a
Author: Dan McPherson <dmcphers>
Date:   Wed Jan 22 11:06:35 2014 -0500

    Bug 1056483 - Better error messaging with direct usage of haproxy_ctld

commit c362b17efe2037ee6b5a84e0dabe5aa6b447362c
Author: Ben Parees <bparees>
Date:   Fri Nov 15 17:11:46 2013 -0500

    Bug 1029679: handle connection refused error with clean error message



----------------------------------------------------------------

PR: https://github.com/openshift/enterprise-server/pull/204


After applying this update and restarting mcollective admins will need to run the following _on the broker_:

rm -rf /tmp/oo-upgrade
oo-admin-upgrade upgrade-node --version 2.0.3

Comment 3 Luke Meyer 2014-01-23 19:33:11 UTC
*** Bug 1056700 has been marked as a duplicate of this bug. ***

Comment 6 John W. Lamb 2014-01-23 21:55:34 UTC
Just FYI, no puddle has been created with this bug yet, even though it is marked ON_QA. Will create the new puddle tomorrow.

Comment 7 Gaoyun Pei 2014-01-26 04:07:31 UTC
verify this bug with package openshift-origin-cartridge-haproxy-1.17.3.2-1.el6op.noarch

Auto scaling up/down works well with the moving average algorithm.

...
D, [2014-01-25T22:18:38.029551 #30094] DEBUG -- : Local sessions 4
D, [2014-01-25T22:18:38.029656 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:18:38.030687 #30094] DEBUG -- : Local sessions 8
D, [2014-01-25T22:18:38.030748 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:18:38.030830 #30094] DEBUG -- : GEAR_INFO - capacity: 50.0% gear_count: 1 sessions: 8 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 0 gear_remove_thresh: 0/20
D, [2014-01-25T22:18:43.041988 #30094] DEBUG -- : Local sessions 12
D, [2014-01-25T22:18:43.042125 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:18:43.052249 #30094] DEBUG -- : Local sessions 17
D, [2014-01-25T22:18:43.052371 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:18:43.052460 #30094] DEBUG -- : GEAR_INFO - capacity: 106.25% gear_count: 1 sessions: 17 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 0 gear_remove_thresh: 0/20
D, [2014-01-25T22:18:48.053394 #30094] DEBUG -- : Local sessions 17
D, [2014-01-25T22:18:48.053535 #30094] DEBUG -- : Got stats from 0 remote proxies.
I, [2014-01-25T22:18:48.053630 #30094]  INFO -- : add-gear - capacity: 106.25% gear_count: 1 sessions: 17 up_thresh: 90.0%
I, [2014-01-25T22:19:38.080141 #30094]  INFO -- : add-gear - exit_code: 0  output:
D, [2014-01-25T22:19:38.081117 #30094] DEBUG -- : Local sessions 17
D, [2014-01-25T22:19:38.081172 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:19:38.081255 #30094] DEBUG -- : GEAR_INFO - capacity: 53.125% gear_count: 2 sessions: 17 up/remove_thresh: 90.0%/31.5% sec_left_til_remove: 550 gear_remove_thresh: 0/20
...

----------------------------------------------------------------------
...
D, [2014-01-25T22:28:48.559652 #30094] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 2 sessions: 0 up/remove_thresh: 90.0%/31.5% sec_left_til_remove: 0 gear_remove_thresh: 20/20
D, [2014-01-25T22:28:53.560616 #30094] DEBUG -- : Local sessions 0
D, [2014-01-25T22:28:53.560736 #30094] DEBUG -- : Got stats from 0 remote proxies.
I, [2014-01-25T22:28:53.560882 #30094]  INFO -- : remove-gear - capacity: 0.0% gear_count: 2 sessions: 0 remove_thresh: 31.5%
I, [2014-01-25T22:29:04.925786 #30094]  INFO -- : remove-gear - exit_code: 0  output:
D, [2014-01-25T22:29:04.927707 #30094] DEBUG -- : Local sessions 0
D, [2014-01-25T22:29:04.927767 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:29:04.927839 #30094] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 1 sessions: 0 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 0 gear_remove_thresh: 20/20
...

And the issue BZ#1051446 mentioned didn't appear.


while the app is topped:
[app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> haproxy_ctld -u
An error occurred; try again later: Could not connect to the application.  Check if the application is stopped.


[root@broker openshift]# rhc cartridge scale -c python-2.7 -a app3 --min 2 --max 2
This operation will run until the application is at the minimum scale and may take several minutes.
Setting scale range for python-2.7 ... done

[app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> haproxy_ctld -d
Cannot remove gear because min limit '2' reached.
[app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> echo $?
1
[app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> haproxy_ctld -u
Cannot add gear because max limit '2' reached.
[app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> echo $?
1

so move this bug to verified

Comment 9 errata-xmlrpc 2014-02-25 15:43:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0209.html


Note You need to log in before you can comment on or make changes to this bug.