Bug 1119338 - the script of update-cluster has wrong bash calculation
Summary: the script of update-cluster has wrong bash calculation
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 2.1.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Brenton Leanhardt
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-14 14:43 UTC by Kenjiro Nakayama
Modified: 2018-12-09 18:09 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-20 19:23:48 UTC


Attachments (Terms of Use)

Description Kenjiro Nakayama 2014-07-14 14:43:12 UTC
Description of problem:

$OSE_SRC/cartridges/openshift-origin-cartridge-haproxy/usr/bin/update-cluster has wrong bash calculation

------
The bug has three points. I might have to report separately, but they are relevant. So I report it all here.
------

bug #1. First of all, update-cluster has wrong calcuation as below.

  126 if [ "$ratio" -ge ${OPENSHIFT_HAPROXY_GEAR_RATIO-"3"} ]; then

should be

  126 if [ "$ratio" -ge $(($OPENSHIFT_HAPROXY_GEAR_RATIO-3)) ]; then


bug #2. Even if we fix #1, when we set as OPENSHIFT_HAPROXY_GEAR_RATIO=1, the gear will be stopped at the time created.

 *how to reproduce*
~~~
  [root@vm ~]# cat /etc/openshift/env/OPENSHIFT_HAPROXY_GEAR_RATIO 
  1
  [root@vm ~]# rhc app create -s ptest php-5.3 
  ... snip ... 
  Run 'rhc show-app ptest' for more details about your app.
  [root@vm ~]# curl http://ptest-sample.openshift.example.com/
  <html><body><h1>503 Service Unavailable</h1>
  No server is available to handle this request.
  </body></html>
~~~

bug #3. There are no explanation for the value of OPENSHIFT_HAPROXY_GEAR_RATIO. There should be some doc or somment for the source code.

Comment 1 Luke Meyer 2014-07-14 15:18:07 UTC
${OPENSHIFT_HAPROXY_GEAR_RATIO-"3"} is not a calculation; it just means to use the default of "3" if OPENSHIFT_HAPROXY_GEAR_RATIO is not set, which (if you grep the codebase) it's not set anywhere unless the user sets it explicitly for their app. Perhaps there should be a comment to this effect since the bash syntax here does look like it's supposed to subtract.

The second bug - true, that is the behavior. I don't know to what extent we should keep users from shooting their own foot here, but it would probably make sense to ensure the value is at least 2, as having a non-functional app is not likely to be their intent.

Definitely agree this should be documented.

Comment 2 Kenjiro Nakayama 2014-07-15 12:11:26 UTC
Thanks, I understand first bug.

About second bug, I think I am not completely clear. The OPENSHIFT_HAPROXY_GEAR_RATIO should be OPENSHIFT_HAPROXY_GEAR_MAX_RATIO and use "-gt" to compare ratio?

eg).
  126 if [ "$ratio" -ge ${OPENSHIFT_HAPROXY_GEAR_RATIO-"3"} ]; then

should be

  126 if [ "$ratio" -gt ${OPENSHIFT_HAPROXY_GEAR_MAX_RATIO-"3"} ]; then

I think it is strange that OPENSHIFT_HAPROXY_GEAR_RATIO=1 prohibits start new created gear.

Comment 3 Luke Meyer 2014-07-16 14:34:09 UTC
This would all probably clearer with docs, so maybe let's start here :)

OPENSHIFT_HAPROXY_GEAR_RATIO determines when the framework cartridge running on the LB gear(s) is taken out of the LB rotation. The idea is that when there is enough traffic to scale up to this level, the LB gears should be concentrating on handling traffic, not actually serving requests. So when the ratio of total framework gears to LB gears reaches this level, the LB gears are rotated out. It should be noted that the framework cartridge on LB gears is never stopped or removed, it just stops receiving traffic.

You have to dig into the cart SDK to observe how the ratio is actually calculated. It turns out that it's calculated and rounded in ruby. Practically speaking, the effect is this:

With OPENSHIFT_HAPROXY_GEAR_RATIO=3 (the default):
For standard scaled app, once scaled to 3 framework gears, the head gear is rotated out.
For standard HA app with 2 LB gears, once scaled to 5 framework gears (5/2 rounds up to 3), the 2 LB gears are rotated out. Any further LB gears that are created by further scaling would never be rotated in.

With OPENSHIFT_HAPROXY_GEAR_RATIO=10:
For standard scaled app, once scaled to 10 framework gears, the head gear is rotated out.
For standard HA app with 2 LB gears, once scaled to 19 framework gears (19/2 rounds up to 10), the 2 LB gears are rotated out. Any further LB gears that are created by further scaling would never be rotated in.
For HA app with multiplier 8 (one LB cart per 8 framework gears), at 19 the first 2 LB gears would be rotated out as before. Then at 24 gears, since 24/8=3 a 3rd LB gear would be created, at which point the LB ratio would be 24/3=8 < 10, so all 3 LB gears would be rotated in. Then at 29 gears, the ratio is 29/3 which rounds up to 10, so the 3 LBs are rotated out. Then at 32 gears, a 4th LB gear is created, now the ratio is 32/4 = 8 < 10, so the LB gears are rotated back in... This crazy behavior is why it probably doesn't make sense to set a multiplier less than the ratio. I just included the example to illustrate the concept.

With OPENSHIFT_HAPROXY_GEAR_RATIO=1:
For any scaled app, once scaled to 1 framework gear, the head gear is rotated out. Thus at 1 gear it does not route to itself and just returns 503 responses. Once scaled up past the min LB gear(s), the app will function.


This final behavior is an example of shooting yourself in the foot. It's unlikely any user would actually want to do this, so it's probably reasonable to just make sure the ratio is at least 2 (at which point a scaled app will rotate out the head gear once there is a second gear). Arguably there is still a potential problem with ratio 2, because an HA app creates a 2nd LB gear with the second gear, and would then rotate both out. So an HA app would work with 1 gear, return 503s with 2 gears, and start working again with 3 gears. There is a limit to how well you can prevent foot-shooting.


So to address this bug, I would propose the following:
1. Update
   126 if [ "$ratio" -ge ${OPENSHIFT_HAPROXY_GEAR_RATIO-"3"} ];
to
   126 if [ "$ratio" -ge ${OPENSHIFT_HAPROXY_GEAR_RATIO:-3} ];
which is at least slightly less confusing. Also briefly document what this is for in the code.

2. Ensure that $OPENSHIFT_HAPROXY_GEAR_RATIO is at least 2 first.

3. Include some of the explanation above in the docs bug 1118766
 for explaining HA apps.

Comment 5 Timothy Williams 2014-07-17 16:34:16 UTC
It appears as though the only time OPENSHIFT_HAPROXY_GEAR_RATIO is considered is when a gear is added to the application. This is why we are seeing the behavior where the gear is disabled at first, but enabled on app-restart.

When a gear is added to the application, we go through 'update-cluster' and consider the gear ratio:

cartridges/openshift-origin-cartridge-haproxy/usr/bin/update-cluster
-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-
echo "Web/Proxy gears ratio $ratio"
if [ "$ratio" -ge ${OPENSHIFT_HAPROXY_GEAR_RATIO-"3"} ]; then
    echo "Disabling colocated gears ${info[@]}"
    nohup $OPENSHIFT_HAPROXY_DIR/usr/bin/disable-colocated-gears ${info[@]} &
else
    echo "No disabling required"
fi
-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-

We can see that on creation, with OPENSHIFT_HAPROXY_GEAR_RATIO = 1, that the framework gear is desabled:

platform-trace.log
-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-
July 17 10:54:07 INFO oo_spawn running /sbin/runuser -s /bin/sh 53c7e383e3c9c3a31c0000ba -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c5,c976' /bin/sh -c \"set -e; /var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy/bin/control update-cluster 01141730-admin.voyager.com\|node1.voyager.com:63411\"": {:unsetenv_others=>true, :close_others=>true, :in=>"/dev/null", :chdir=>"/var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy", :out=>#<IO:fd 12>, :err=>#<IO:fd 8>}
July 17 10:54:10 INFO oo_spawn buffer(11/) Web/Proxy gears ratio 1
Disabling colocated gears 53c7e383e3c9c3a31c0000ba
-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-

When the application is restarted, the bit of code that uses OPENSHIFT_HARPROXY_GEAR_RATIO is never touched. Instead, we see control/eneable-gear used:

-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-
July 17 10:54:56 INFO oo_spawn running /sbin/runuser -s /bin/sh 53c7e383e3c9c3a31c0000ba -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c5,c976' /bin/sh -c \"set -e; /var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy/bin/control restart \"": {:unsetenv_others=>true, :close_others=>true, :in=>"/dev/null", :chdir=>"/var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy", :out=>#<IO:fd 12>, :err=>#<IO:fd 8>}
July 17 10:54:56 INFO oo_spawn buffer(11/) Restarted HAProxy instance

July 17 10:54:56 INFO oo_spawn running /sbin/runuser -s /bin/sh 53c7e383e3c9c3a31c0000ba -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c5,c976' /bin/sh -c \"set -e; /var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy/bin/control enable-server 53c7e383e3c9c3a31c0000ba\"": {:unsetenv_others=>true, :close_others=>true, :in=>"/dev/null", :chdir=>"/var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy", :out=>#<IO:fd 12>, :err=>#<IO:fd 8>}
July 17 10:54:56 INFO oo_spawn buffer(11/) Enabling server 53c7e383e3c9c3a31c0000ba
-=~~~~~~~~~~~~~~~~~~~~~~~~~~=-

Then, the framework gear is enabled. 

Looking at the code, the only place we actually consider the OPENSHIFT_HAPROXY_GEAR_RATIO is in update-cluster:

$ grep -R HAPROXY_GEAR_RATIO /enterprise-server/cartridges/openshift-origin-cartridge-haproxy
./usr/bin/update-cluster:if [ "$ratio" -ge ${OPENSHIFT_HAPROXY_GEAR_RATIO-"3"} ]; then

Unsure if this is something we need to actually fix. I guess we should see if it has an affect on higher (and more expected) OPENSHIFT_HAPROXY_GEAR_RATIO values. Having trouble actually testing this. Coming back to this at a later time.

- Tim

Comment 7 Timothy Williams 2016-04-20 19:23:48 UTC
I don't believe that there is anything left to be done for this bug. At this point in the v2's life, we aren't likely to make these changes. I'm closing this for now. Feel free to re-open and argue for a change if you deem it necessary.


Note You need to log in before you can comment on or make changes to this bug.