Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1057183 - Backport HAproxy auto-scaling enhancements
Backport HAproxy auto-scaling enhancements
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image (Show other bugs)
2.0.0
Unspecified Unspecified
unspecified Severity medium
: ---
: ---
Assigned To: Brenton Leanhardt
libra bugs
: Reopened
: 1056700 (view as bug list)
Depends On:
Blocks: 990500 1036728
  Show dependency treegraph
 
Reported: 2014-01-23 10:24 EST by Luke Meyer
Modified: 2017-03-08 12 EST (History)
5 users (show)

See Also:
Fixed In Version: openshift-origin-cartridge-haproxy-1.17.3.2-1.el6op
Doc Type: Enhancement
Doc Text:
Previously, the way the HAProxy cartridge determined when to scale an application was not optimal because it checked the number of connections against a fixed threshold, which could impact stability or performance. This enhancement improves the HAProxy cartridge so that it uses a moving average of the number of current connections and provides a configurable threshold. The following command must be run after applying this fix: # oo-admin-upgrade upgrade-node --version=2.0.3 See the Solution section in the errata advisory for full details.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-02-25 10:43:47 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0209 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 2.0.3 bugfix and enhancement update 2014-02-25 15:40:32 EST

  None (edit)
Description Luke Meyer 2014-01-23 10:24:49 EST
Description of problem:
HAproxy's current method of determining when to scale is suboptimal. It scales based on a spot-check of the number of current connections, with unconfigurable thresholds.

This should be improved by backporting upstream PR https://github.com/openshift/origin-server/pull/4438 which introduces a moving average of the current connections (which should be a lot more stable than the spot check) and also the ability to configure thresholds.

commit 47428027c64d7e282e1b91b682e1d05ea11fceb3
Author: Dan McPherson <dmcphers@redhat.com>
Date:   Wed Jan 8 21:24:58 2014 -0500
Comment 1 Luke Meyer 2014-01-23 10:26:44 EST

*** This bug has been marked as a duplicate of bug 1056700 ***
Comment 2 Luke Meyer 2014-01-23 14:32:13 EST
Moving notes from 1056700 here to be publicly available.

-------------------------------------------------------

This include the following upstream commits:

commit 47428027c64d7e282e1b91b682e1d05ea11fceb3
Author: Dan McPherson <dmcphers@redhat.com>
Date:   Wed Jan 8 21:24:58 2014 -0500

    Make sessions per gear configurable and use moving average for num sessions

commit 05d52c0b06301d6e24b5ac1a31cd107ff16e31c2
Author: Dan McPherson <dmcphers@redhat.com>
Date:   Fri Jan 10 14:14:20 2014 -0500

    Bug 1051446

commit f116af85558db377d849246c089824c670215c15
Author: Dan McPherson <dmcphers@redhat.com>
Date:   Mon Jan 13 13:09:12 2014 -0500

    Bisect the scale up/down threshold more evenly for lower scale numbers

commit 8dd6002c5578930efce2d19a0fe486f8d309940a
Author: Dan McPherson <dmcphers@redhat.com>
Date:   Wed Jan 22 11:06:35 2014 -0500

    Bug 1056483 - Better error messaging with direct usage of haproxy_ctld

commit c362b17efe2037ee6b5a84e0dabe5aa6b447362c
Author: Ben Parees <bparees@redhat.com>
Date:   Fri Nov 15 17:11:46 2013 -0500

    Bug 1029679: handle connection refused error with clean error message



----------------------------------------------------------------

PR: https://github.com/openshift/enterprise-server/pull/204


After applying this update and restarting mcollective admins will need to run the following _on the broker_:

rm -rf /tmp/oo-upgrade
oo-admin-upgrade upgrade-node --version 2.0.3
Comment 3 Luke Meyer 2014-01-23 14:33:11 EST
*** Bug 1056700 has been marked as a duplicate of this bug. ***
Comment 6 John W. Lamb 2014-01-23 16:55:34 EST
Just FYI, no puddle has been created with this bug yet, even though it is marked ON_QA. Will create the new puddle tomorrow.
Comment 7 Gaoyun Pei 2014-01-25 23:07:31 EST
verify this bug with package openshift-origin-cartridge-haproxy-1.17.3.2-1.el6op.noarch

Auto scaling up/down works well with the moving average algorithm.

...
D, [2014-01-25T22:18:38.029551 #30094] DEBUG -- : Local sessions 4
D, [2014-01-25T22:18:38.029656 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:18:38.030687 #30094] DEBUG -- : Local sessions 8
D, [2014-01-25T22:18:38.030748 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:18:38.030830 #30094] DEBUG -- : GEAR_INFO - capacity: 50.0% gear_count: 1 sessions: 8 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 0 gear_remove_thresh: 0/20
D, [2014-01-25T22:18:43.041988 #30094] DEBUG -- : Local sessions 12
D, [2014-01-25T22:18:43.042125 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:18:43.052249 #30094] DEBUG -- : Local sessions 17
D, [2014-01-25T22:18:43.052371 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:18:43.052460 #30094] DEBUG -- : GEAR_INFO - capacity: 106.25% gear_count: 1 sessions: 17 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 0 gear_remove_thresh: 0/20
D, [2014-01-25T22:18:48.053394 #30094] DEBUG -- : Local sessions 17
D, [2014-01-25T22:18:48.053535 #30094] DEBUG -- : Got stats from 0 remote proxies.
I, [2014-01-25T22:18:48.053630 #30094]  INFO -- : add-gear - capacity: 106.25% gear_count: 1 sessions: 17 up_thresh: 90.0%
I, [2014-01-25T22:19:38.080141 #30094]  INFO -- : add-gear - exit_code: 0  output:
D, [2014-01-25T22:19:38.081117 #30094] DEBUG -- : Local sessions 17
D, [2014-01-25T22:19:38.081172 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:19:38.081255 #30094] DEBUG -- : GEAR_INFO - capacity: 53.125% gear_count: 2 sessions: 17 up/remove_thresh: 90.0%/31.5% sec_left_til_remove: 550 gear_remove_thresh: 0/20
...

----------------------------------------------------------------------
...
D, [2014-01-25T22:28:48.559652 #30094] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 2 sessions: 0 up/remove_thresh: 90.0%/31.5% sec_left_til_remove: 0 gear_remove_thresh: 20/20
D, [2014-01-25T22:28:53.560616 #30094] DEBUG -- : Local sessions 0
D, [2014-01-25T22:28:53.560736 #30094] DEBUG -- : Got stats from 0 remote proxies.
I, [2014-01-25T22:28:53.560882 #30094]  INFO -- : remove-gear - capacity: 0.0% gear_count: 2 sessions: 0 remove_thresh: 31.5%
I, [2014-01-25T22:29:04.925786 #30094]  INFO -- : remove-gear - exit_code: 0  output:
D, [2014-01-25T22:29:04.927707 #30094] DEBUG -- : Local sessions 0
D, [2014-01-25T22:29:04.927767 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:29:04.927839 #30094] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 1 sessions: 0 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 0 gear_remove_thresh: 20/20
...

And the issue BZ#1051446 mentioned didn't appear.


while the app is topped:
[app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> haproxy_ctld -u
An error occurred; try again later: Could not connect to the application.  Check if the application is stopped.


[root@broker openshift]# rhc cartridge scale -c python-2.7 -a app3 --min 2 --max 2
This operation will run until the application is at the minimum scale and may take several minutes.
Setting scale range for python-2.7 ... done

[app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> haproxy_ctld -d
Cannot remove gear because min limit '2' reached.
[app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> echo $?
1
[app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> haproxy_ctld -u
Cannot add gear because max limit '2' reached.
[app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> echo $?
1

so move this bug to verified
Comment 9 errata-xmlrpc 2014-02-25 10:43:47 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0209.html

Note You need to log in before you can comment on or make changes to this bug.