Bug 1120887 - Haproxy gear ratio is not respected on restart
Summary: Haproxy gear ratio is not respected on restart
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image
Version: 2.1.0
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
: ---
Assignee: Luke Meyer
QA Contact: libra bugs
URL:
Whiteboard:
Depends On: 1121139
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-17 22:31 UTC by Timothy Williams
Modified: 2018-12-09 18:11 UTC (History)
10 users (show)

Fixed In Version: openshift-origin-cartridge-haproxy-1.23.5.4-1.el6op
Doc Type: Bug Fix
Doc Text:
When an application is scaled, the OPENSHIFT_HAPROXY_GEAR_RATIO environment variable determines when the HAProxy load balancer gears remove collocated framework gears from rotation. However, this variable was not consulted during an application start or restart and the default value "3" was used instead, resulting in unintended gear rotations. This bug fix updates the control script to consult the variable at application start up, and scaled applications now have the expected load balancer configuration when restarted.
Clone Of:
: 1121139 (view as bug list)
Environment:
Last Closed: 2014-08-04 13:27:53 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0999 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 2.1.4 bug fix and enhancement update 2014-08-04 17:26:43 UTC

Description Timothy Williams 2014-07-17 22:31:45 UTC
Description of problem:

When setting OPENSHIFT_HAPROXY_GEAR_RATIO to one (in /etc/openshift/env/OPENSHIFT_HAPROXY_GEAR_RATIO), a 503 is seen when creating an application that is scaled to a single gear. This is expected behavior, as the framework cartridge will be disabled due to the haproxy ratio.

However, when this application is restarted, it works fine, without throwing 503's. When looking at the haproxy-status page, the framework cartridge has been enabled. 

It appears as though the only time OPENSHIFT_HAPROXY_GEAR_RATIO is considered is when a gear is added to the application. This is why we are seeing the behavior where the gear is disabled at first, but enabled on app-restart.

When a gear is added to the application, we go through 'update-cluster' and consider the gear ratio:

cartridges/openshift-origin-cartridge-haproxy/usr/bin/update-cluster
-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-
echo "Web/Proxy gears ratio $ratio"
if [ "$ratio" -ge ${OPENSHIFT_HAPROXY_GEAR_RATIO-"3"} ]; then
    echo "Disabling colocated gears ${info[@]}"
    nohup $OPENSHIFT_HAPROXY_DIR/usr/bin/disable-colocated-gears ${info[@]} &
else
    echo "No disabling required"
fi
-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-

We can see that on creation, with OPENSHIFT_HAPROXY_GEAR_RATIO = 1, that the framework gear is desabled:

platform-trace.log
-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-
July 17 10:54:07 INFO oo_spawn running /sbin/runuser -s /bin/sh 53c7e383e3c9c3a31c0000ba -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c5,c976' /bin/sh -c \"set -e; /var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy/bin/control update-cluster 01141730-admin.voyager.com\|node1.voyager.com:63411\"": {:unsetenv_others=>true, :close_others=>true, :in=>"/dev/null", :chdir=>"/var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy", :out=>#<IO:fd 12>, :err=>#<IO:fd 8>}
July 17 10:54:10 INFO oo_spawn buffer(11/) Web/Proxy gears ratio 1
Disabling colocated gears 53c7e383e3c9c3a31c0000ba
-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-

When the application is restarted, the bit of code that uses OPENSHIFT_HARPROXY_GEAR_RATIO is never touched. Instead, we see control/eneable-gear used:

-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-
July 17 10:54:56 INFO oo_spawn running /sbin/runuser -s /bin/sh 53c7e383e3c9c3a31c0000ba -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c5,c976' /bin/sh -c \"set -e; /var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy/bin/control restart \"": {:unsetenv_others=>true, :close_others=>true, :in=>"/dev/null", :chdir=>"/var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy", :out=>#<IO:fd 12>, :err=>#<IO:fd 8>}
July 17 10:54:56 INFO oo_spawn buffer(11/) Restarted HAProxy instance

July 17 10:54:56 INFO oo_spawn running /sbin/runuser -s /bin/sh 53c7e383e3c9c3a31c0000ba -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c5,c976' /bin/sh -c \"set -e; /var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy/bin/control enable-server 53c7e383e3c9c3a31c0000ba\"": {:unsetenv_others=>true, :close_others=>true, :in=>"/dev/null", :chdir=>"/var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy", :out=>#<IO:fd 12>, :err=>#<IO:fd 8>}
July 17 10:54:56 INFO oo_spawn buffer(11/) Enabling server 53c7e383e3c9c3a31c0000ba
-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-

Then, the framework gear is enabled. 

Looking at the code, the only place we actually consider the OPENSHIFT_HAPROXY_GEAR_RATIO is in update-cluster:

$ grep -R HAPROXY_GEAR_RATIO /enterprise-server/cartridges/openshift-origin-cartridge-haproxy
./usr/bin/update-cluster:if [ "$ratio" -ge ${OPENSHIFT_HAPROXY_GEAR_RATIO-"3"} ]; then

Version-Release number of selected component (if applicable):
2.1.3

How reproducible:
Always

Steps to Reproduce:
1. Create a file on each node with just '1' inside '/etc/openshift/env/OPENSHIFT_HAPROXY_GEAR_RATIO'
2. Create a scaled application of any type.
3. Curl the application to ensure a 503 is returned (as expected)
4. Restart the application
5. Curl the application again

Actual results:
200 returned

Expected results:
503 returned

Additional info:
Spun off from investigation in bugzilla # 1119338

Comment 2 Luke Meyer 2014-07-22 18:16:39 UTC
Cherry-picking the upstream commit and bumping cartridge version.

commit c26abe3019a65def4375ce86f44d8acdbcd07598
Author: Ben Parees <bparees@redhat.com>
Date:   Fri Jul 18 14:28:27 2014 -0400

     Haproxy gear ratio is only considered when adding a gear, not when
    start an existing application
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1121139

Comment 9 Gaoyun Pei 2014-07-24 06:29:54 UTC
Checked on puddle 2.1.z/2014-07-23.4, it's still 
openshift-origin-cartridge-haproxy-1.23.5.3-1.el6op.noarch.rpm installed on the node. 
Wait for package openshift-origin-cartridge-haproxy-1.23.5.4-1.el6op.

Comment 10 Brenton Leanhardt 2014-07-24 12:57:10 UTC
It turns out the package was never attached to the advisory.  I'm rebuilding the puddle now. :/

Comment 15 Gaoyun Pei 2014-07-25 05:21:04 UTC
Verify this bug with pkg openshift-origin-cartridge-haproxy-1.23.5.4-1.el6op.

Steps:
1. Set OPESNSHIFT_HAPROXY_GEAR_RATIO to 1 on each node with 'echo 1 > /etc/openshift/env/OPENSHIFT_HAPROXY_GEAR_RATIO' 

2. Create a scalable app. Visit this app and it returns 503 error.
   Check the haproxy configuration of this gear, it shows local-gear is disabled.
   Check app's haproxy-status page, local-gear is also disabled.

3. Restart the app by rhc app restart <app_name>. Visit the app, still get 503 returned.
   Check the haproxy configuration of this gear and app's haproxy-status page, this gear are both shown as disabled.

4. Make this app HA. Visit the app, still get 503 returned.
   And these two gears of the app are both disabled in gears' haproxy configuration and app's haproxy-status page.

5. Scale-up the app to 3 gears. App becomes available. The third gear is up in gears' haproxy configuration and app's haproxy-status page.

6. ssh into the first head gear, run 'ctl_app restart' to restart haproxy cartridge.
   Then the haproxy configuration of this gear and app's haproxy-status page still show this gear is disabled.

7. ssh into the second head gear, run 'ctl_app restart' to restart haproxy cartridge.
   The haproxy configuration of this gear and app's haproxy-status page still show this gear is disabled.

There is no inconsistency before and after the restart, no inconsistency between gears' haproxy configuration, no inconsistency between haproxy configuration and haproxy-status page. So move this bug to VERIFIED.

Comment 17 errata-xmlrpc 2014-08-04 13:27:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0999.html


Note You need to log in before you can comment on or make changes to this bug.