Bug 1054944 - If deployment registration fails in a scaled app, haproxy needs restarting
Summary: If deployment registration fails in a scaled app, haproxy needs restarting
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 2.0.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Luke Meyer
QA Contact: libra bugs
URL:
Whiteboard:
Depends On: 1055653
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-17 19:19 UTC by Luke Meyer
Modified: 2017-03-08 17:36 UTC (History)
7 users (show)

Fixed In Version: rubygem-openshift-origin-node-1.17.5.7-1
Doc Type: Bug Fix
Doc Text:
When an application deployment is performed using the git push command, a REST API call registers the new deployment with the broker. If this call fails for any reason, the HAProxy cartridge in a scalable application is not correctly restarted, and the application is unavailable until the HAProxy cartridge is restarted. This bug fix adds logic to allow the HAProxy cartridge to restart during the deployment even if the registration failed. Therefore, in the event that the registration fails, the application is correctly deployed and remains available. Because all known deployments are reported each time, the broker receives a fully updated list after the next successful deployment registration.
Clone Of:
: 1055653 (view as bug list)
Environment:
Last Closed: 2014-02-25 15:43:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0209 0 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 2.0.3 bugfix and enhancement update 2014-02-25 20:40:32 UTC

Description Luke Meyer 2014-01-17 19:19:26 UTC
Description of problem:
When a git push to a gear is done, the new deployment is registered with the broker. If this fails for any reason, activation of the deploy fails and HAproxy is left in a state where it returns 503 errors until restarted.

How reproducible:
100% so far

Steps to Reproduce:
1. Create a scaled app.
2. on broker: service httpd stop
3. git push a change to the app
4. Try to access the app. To clearly see what's happening, start httpd broker again, port-forward from the app, and curl -I each of the forwarded ports. The one from HAproxy will return 503 even though the frameworkd cartridge itself is 200.

Actual results:
App is unavailable, error 503

Expected results:
App is available, even though the broker is not.


Additional info:
Haven't tested with non-scaled apps. Might be an issue there too. Need to test whether this occurs Online as well.

Comment 4 Luke Meyer 2014-01-18 11:00:08 UTC
This doesn't appear to be a problem with non-scaled apps. It's just HAproxy that doesn't survive the deployment; perhaps there's some haproxy reconfigure step that's supposed to complete after the deployment is registered?

Comment 5 Luke Meyer 2014-01-20 17:18:08 UTC
It occurs with Online too. Fortunately our brokers are never down.

Comment 6 Luke Meyer 2014-01-28 19:45:35 UTC
Fix by cherry-picking from origin-server:

commit 19e2995306bff7bea037823675f5cf279bafe880
Author: Paul Morie <pmorie>
Date:   Tue Jan 21 16:05:29 2014 -0500

    Fix bug 1055653 and improve post-receive output readability

commit 1fa84300ec27093f0f7f10643f4d46ecd1ba8eec
Author: Paul Morie <pmorie>
Date:   Thu Jan 23 11:06:51 2014 -0500

    Fix bug 1055653: handle exceptions from RestClient

commit 2a7ca5491b59bbcbbaa7504cd0c383215b28465a
Author: Paul Morie <pmorie>
Date:   Mon Jan 27 10:26:16 2014 -0500

    Fix bug 1055653 for cases when httpd is down

Comment 8 Luke Meyer 2014-01-28 19:52:38 UTC
In the meantime, workarounds are:
1) Make sure node can always communicate with the broker
2) If a deployment to a scaled app fails activation with messages similar to the following, restart the haproxy cartridge (rhc cartridge restart haproxy):

remote: Activation status: failure
remote: Activation failed for the following gears:
remote: <uuid> (Error activating gear: Connection refused - connect(2))
remote: Deployment completed with status: failure
remote: postreceive failed

"Error activating gear:" may also indicate other errors, e.g. connection timed out, status 401, 502, 503 depending on the problem with reaching the broker.

Comment 10 Peter Ruan 2014-01-31 21:46:46 UTC
verified with puddle-2014-01-30

with service openshift-broker down and rhc port-forward 

[pruan@homer-linux <DEV> mynodejsapp1]# curl -I 127.0.0.1:8082
HTTP/1.1 200 OK
X-Powered-By: Express
Content-Type: text/html
Content-Length: 5235
Date: Fri, 31 Jan 2014 21:12:00 GMT
Connection: keep-alive

[pruan@homer-linux <DEV> mynodejsapp1]# curl -I 127.0.0.1:8082
HTTP/1.1 200 OK
X-Powered-By: Express
Content-Type: text/html
Content-Length: 5235
Date: Fri, 31 Jan 2014 21:12:36 GMT
Connection: keep-alive

[pruan@homer-linux <DEV> mynodejsapp1]# curl -I 127.0.0.1:8081
HTTP/1.0 200 OK
Cache-Control: no-cache
Connection: close
Content-Type: text/html

[pruan@homer-linux <DEV> mynodejsapp1]# curl -I 127.0.0.1:8080
HTTP/1.1 200 OK
X-Powered-By: Express
Content-Type: text/html
Content-Length: 5235
Date: Fri, 31 Jan 2014 21:12:42 GMT
Set-Cookie: GEAR=local-52eb49663eefa979ea000001; path=/
Cache-control: private

Comment 12 errata-xmlrpc 2014-02-25 15:43:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0209.html


Note You need to log in before you can comment on or make changes to this bug.