Bug 1120887
| Summary: | Haproxy gear ratio is not respected on restart | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Timothy Williams <tiwillia> | |
| Component: | ImageStreams | Assignee: | Luke Meyer <lmeyer> | |
| Status: | CLOSED ERRATA | QA Contact: | libra bugs <libra-bugs> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | high | |||
| Version: | 2.1.0 | CC: | adellape, bleanhar, bperkins, cryan, gpei, jialiu, jokerman, libra-onpremise-devel, mmccomas, tiwillia | |
| Target Milestone: | --- | Keywords: | Upstream | |
| Target Release: | --- | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | openshift-origin-cartridge-haproxy-1.23.5.4-1.el6op | Doc Type: | Bug Fix | |
| Doc Text: |
When an application is scaled, the OPENSHIFT_HAPROXY_GEAR_RATIO
environment variable determines when the HAProxy load balancer
gears remove collocated framework gears from rotation. However,
this variable was not consulted during an application start or
restart and the default value "3" was used instead, resulting in
unintended gear rotations. This bug fix updates the control
script to consult the variable at application start up, and
scaled applications now have the expected load balancer
configuration when restarted.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1121139 (view as bug list) | Environment: | ||
| Last Closed: | 2014-08-04 13:27:53 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1121139 | |||
| Bug Blocks: | ||||
Cherry-picking the upstream commit and bumping cartridge version.
commit c26abe3019a65def4375ce86f44d8acdbcd07598
Author: Ben Parees <bparees>
Date: Fri Jul 18 14:28:27 2014 -0400
Haproxy gear ratio is only considered when adding a gear, not when
start an existing application
https://bugzilla.redhat.com/show_bug.cgi?id=1121139
Checked on puddle 2.1.z/2014-07-23.4, it's still openshift-origin-cartridge-haproxy-1.23.5.3-1.el6op.noarch.rpm installed on the node. Wait for package openshift-origin-cartridge-haproxy-1.23.5.4-1.el6op. It turns out the package was never attached to the advisory. I'm rebuilding the puddle now. :/ Verify this bug with pkg openshift-origin-cartridge-haproxy-1.23.5.4-1.el6op. Steps: 1. Set OPESNSHIFT_HAPROXY_GEAR_RATIO to 1 on each node with 'echo 1 > /etc/openshift/env/OPENSHIFT_HAPROXY_GEAR_RATIO' 2. Create a scalable app. Visit this app and it returns 503 error. Check the haproxy configuration of this gear, it shows local-gear is disabled. Check app's haproxy-status page, local-gear is also disabled. 3. Restart the app by rhc app restart <app_name>. Visit the app, still get 503 returned. Check the haproxy configuration of this gear and app's haproxy-status page, this gear are both shown as disabled. 4. Make this app HA. Visit the app, still get 503 returned. And these two gears of the app are both disabled in gears' haproxy configuration and app's haproxy-status page. 5. Scale-up the app to 3 gears. App becomes available. The third gear is up in gears' haproxy configuration and app's haproxy-status page. 6. ssh into the first head gear, run 'ctl_app restart' to restart haproxy cartridge. Then the haproxy configuration of this gear and app's haproxy-status page still show this gear is disabled. 7. ssh into the second head gear, run 'ctl_app restart' to restart haproxy cartridge. The haproxy configuration of this gear and app's haproxy-status page still show this gear is disabled. There is no inconsistency before and after the restart, no inconsistency between gears' haproxy configuration, no inconsistency between haproxy configuration and haproxy-status page. So move this bug to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0999.html |
Description of problem: When setting OPENSHIFT_HAPROXY_GEAR_RATIO to one (in /etc/openshift/env/OPENSHIFT_HAPROXY_GEAR_RATIO), a 503 is seen when creating an application that is scaled to a single gear. This is expected behavior, as the framework cartridge will be disabled due to the haproxy ratio. However, when this application is restarted, it works fine, without throwing 503's. When looking at the haproxy-status page, the framework cartridge has been enabled. It appears as though the only time OPENSHIFT_HAPROXY_GEAR_RATIO is considered is when a gear is added to the application. This is why we are seeing the behavior where the gear is disabled at first, but enabled on app-restart. When a gear is added to the application, we go through 'update-cluster' and consider the gear ratio: cartridges/openshift-origin-cartridge-haproxy/usr/bin/update-cluster -=~~~~~~~~~~~~~~~~~~~~~~~~~~=- echo "Web/Proxy gears ratio $ratio" if [ "$ratio" -ge ${OPENSHIFT_HAPROXY_GEAR_RATIO-"3"} ]; then echo "Disabling colocated gears ${info[@]}" nohup $OPENSHIFT_HAPROXY_DIR/usr/bin/disable-colocated-gears ${info[@]} & else echo "No disabling required" fi -=~~~~~~~~~~~~~~~~~~~~~~~~~~=- We can see that on creation, with OPENSHIFT_HAPROXY_GEAR_RATIO = 1, that the framework gear is desabled: platform-trace.log -=~~~~~~~~~~~~~~~~~~~~~~~~~~=- July 17 10:54:07 INFO oo_spawn running /sbin/runuser -s /bin/sh 53c7e383e3c9c3a31c0000ba -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c5,c976' /bin/sh -c \"set -e; /var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy/bin/control update-cluster 01141730-admin.voyager.com\|node1.voyager.com:63411\"": {:unsetenv_others=>true, :close_others=>true, :in=>"/dev/null", :chdir=>"/var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy", :out=>#<IO:fd 12>, :err=>#<IO:fd 8>} July 17 10:54:10 INFO oo_spawn buffer(11/) Web/Proxy gears ratio 1 Disabling colocated gears 53c7e383e3c9c3a31c0000ba -=~~~~~~~~~~~~~~~~~~~~~~~~~~=- When the application is restarted, the bit of code that uses OPENSHIFT_HARPROXY_GEAR_RATIO is never touched. Instead, we see control/eneable-gear used: -=~~~~~~~~~~~~~~~~~~~~~~~~~~=- July 17 10:54:56 INFO oo_spawn running /sbin/runuser -s /bin/sh 53c7e383e3c9c3a31c0000ba -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c5,c976' /bin/sh -c \"set -e; /var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy/bin/control restart \"": {:unsetenv_others=>true, :close_others=>true, :in=>"/dev/null", :chdir=>"/var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy", :out=>#<IO:fd 12>, :err=>#<IO:fd 8>} July 17 10:54:56 INFO oo_spawn buffer(11/) Restarted HAProxy instance July 17 10:54:56 INFO oo_spawn running /sbin/runuser -s /bin/sh 53c7e383e3c9c3a31c0000ba -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c5,c976' /bin/sh -c \"set -e; /var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy/bin/control enable-server 53c7e383e3c9c3a31c0000ba\"": {:unsetenv_others=>true, :close_others=>true, :in=>"/dev/null", :chdir=>"/var/lib/openshift/53c7e383e3c9c3a31c0000ba/haproxy", :out=>#<IO:fd 12>, :err=>#<IO:fd 8>} July 17 10:54:56 INFO oo_spawn buffer(11/) Enabling server 53c7e383e3c9c3a31c0000ba -=~~~~~~~~~~~~~~~~~~~~~~~~~~=- Then, the framework gear is enabled. Looking at the code, the only place we actually consider the OPENSHIFT_HAPROXY_GEAR_RATIO is in update-cluster: $ grep -R HAPROXY_GEAR_RATIO /enterprise-server/cartridges/openshift-origin-cartridge-haproxy ./usr/bin/update-cluster:if [ "$ratio" -ge ${OPENSHIFT_HAPROXY_GEAR_RATIO-"3"} ]; then Version-Release number of selected component (if applicable): 2.1.3 How reproducible: Always Steps to Reproduce: 1. Create a file on each node with just '1' inside '/etc/openshift/env/OPENSHIFT_HAPROXY_GEAR_RATIO' 2. Create a scaled application of any type. 3. Curl the application to ensure a 503 is returned (as expected) 4. Restart the application 5. Curl the application again Actual results: 200 returned Expected results: 503 returned Additional info: Spun off from investigation in bugzilla # 1119338