Bug 1270660
Summary: | Haproxy health check should be in sync with rolling updates in EWS | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jaspreet Kaur <jkaur> | ||||
Component: | ImageStreams | Assignee: | Timothy Williams <tiwillia> | ||||
Status: | CLOSED ERRATA | QA Contact: | DeShuai Ma <dma> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 2.2.0 | CC: | adellape, aos-bugs, bperkins, erich, gpei, jokerman, misalunk, mmccomas, nicholas_schuetz, pep, tiwillia | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | openshift-origin-cartridge-jbossews-1.35.3.2-1.el6op | Doc Type: | Enhancement | ||||
Doc Text: |
Previously, the EWS cartridge started an application's gear and marked the application as "up" in HAProxy. If the application took some time to deploy, it caused an outage window as traffic was routed to the application. This was corrected either when HAProxy executed its next health check and disabled the application or when the deployment finished. As a result, it was possible for EWS cartridge deployments to be marked "up" and routable when in fact they were not yet ready to service requests. This bug fix introduces the OPENSHIFT_JBOSSEWS_START_DELAY EWS environment variable to allow application owners to delay the registration of there deployment with haproxy. Set this variable to include a delay (in seconds) which causes deployments to halt after a gear start in a similar manner to the EAP cartridge. The difference between the EWS and EAP cartridges is that Tomcat implements a sleep (or hang) because it does not have a management interface (https://access.redhat.com/solutions/901043) to interact with like EAP to check that deployments have finished. Using the OPENSHIFT_JBOSSEWS_START_DELAY variable can cause application deployments to take longer but it can be used to avoid outages with new deployments.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1285084 (view as bug list) | Environment: | |||||
Last Closed: | 2015-12-17 17:11:00 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1273542, 1285084 | ||||||
Attachments: |
|
Description
Jaspreet Kaur
2015-10-12 05:14:24 UTC
Verify this bug with openshift-origin-cartridge-jbossews-1.35.3.2-1.el6op.noarch Steps: 1. Create a scaleble ews app rhc app create bin jbossews-2.0 -s --no-git 2. Scale-up this app rhc cartridge scale jbossews-2.0 -a bin --min 2 --max 4 3. Configure this app as binary deployment type rhc app configure -a bin --deployment-type binary 4. Run the binary deployment [root@broker ~]# rhc deploy new.tar.gz -a bin Deployment of file '/root/new.tar.gz' in progress for application bin ... Starting deploy for binary artifact Stopping gear Stopping jbossews cartridge Creating new deployment directory Preparing deployment Preparing build for deployment Deployment id is f9492a36 Distributing deployment Distributing deployment to child gears Distribution status: success Activating deployment HAProxy already running CLIENT_RESULT: HAProxy instance is started Starting jbossews cartridge Found 127.3.192.1:8080 listening port Activation status: success Deployment status: success Success [root@broker ~]# rhc app show bin --gears ID State Cartridges Size SSH URL --------- ------- ------------------------ ----- ----------------------------------------- yes-bin-1 started haproxy-1.4 jbossews-2.0 small yes-bin-1.com.cn yes-bin-2 started haproxy-1.4 jbossews-2.0 small yes-bin-2.com.cn In app-root/logs/haproxy.log: [WARNING] 322/032732 (32529) : Proxy stats stopped (FE: 1 conns, BE: 0 conns). [WARNING] 322/032732 (32529) : Proxy express stopped (FE: 0 conns, BE: 0 conns). [WARNING] 322/034851 (1544) : Server express/local-gear is DOWN for maintenance. [WARNING] 322/034911 (1544) : Server express/local-gear is UP (leaving maintenance). [WARNING] 322/034911 (1544) : Server express/gear-yes-bin-2-yes is DOWN for maintenance. [WARNING] 322/034913 (1544) : Server express/local-gear is DOWN, reason: Layer7 wrong status, code: 404, info: "HTTP status check returned code <3C>404<3E>", check duration: 1ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. [ALERT] 322/034913 (1544) : proxy 'express' has no server available! [WARNING] 322/034929 (1544) : Server express/gear-yes-bin-2-yes is UP (leaving maintenance). [WARNING] 322/034929 (1544) : Server express/gear-yes-bin-2-yes is DOWN, reason: Layer7 wrong status, code: 404, info: "HTTP status check returned code <3C>404<3E>", check duration: 2ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. [ALERT] 322/034929 (1544) : proxy 'express' has no server available! The app couldn't be accessed. [root@broker ~]# curl -I http://bin-yes.ose22-auto.com.cn/ HTTP/1.1 503 Service Unavailable Date: Thu, 19 Nov 2015 08:54:28 GMT Cache-Control: no-cache Content-Type: text/html; charset=UTF-8 Connection: close QA: I apologize for the broken reproducer. Please follow the below steps to verify this bug is resolved: 1. Create a jbossews-2.0 scalable application: rhc app create jbtest jbossews-2.0 -s 2. Set a new environment variable on the application to extend the amount of tries jbossews will attempt to wait for a deployment to complete before starting. rhc set-env OPENSHIFT_JBOSSEWS_START_TRIES=30 jbtest 3. Scale the cartridge up to at least two: rhc cartridge scale jbossews-2.0 -a jbtest --min 2 --max 4 4. In the application's git repository, add the attached 'SlowStart.java' file to the './src/main/java/' directory mv SlowStart.java jbtest/src/main/java/ 5. Make the following changes to the src/main/webapp/WEB-INF/web.xml file: diff --git a/src/main/webapp/WEB-INF/web.xml b/src/main/webapp/WEB-INF/web.xml index fa91269..84f53f9 100755 --- a/src/main/webapp/WEB-INF/web.xml +++ b/src/main/webapp/WEB-INF/web.xml @@ -6,6 +6,15 @@ xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd" metadata-complete="false"> + <servlet> + <servlet-name>SlowStart</servlet-name> + <servlet-class>SlowStart</servlet-class> + </servlet> + + <servlet-mapping> + <servlet-name>SlowStart</servlet-name> + <url-pattern>/SlowStart</url-pattern> + <load-on-startup>1</load-on-startup> + </servlet-mapping> </web-app> 6. Be ready to watch the haproxy.log on the head gear and ensure that, when the changes are pushed, no gear is started and then taken back down due to a 503 or 404. The gears should not be made available until they are done actually starting. This process will take ~30 seconds. 7. Git add, commit, and push the changes to the application. Watch the haproxy.log for the failures. Created attachment 1097883 [details]
Reproducer Class
There was an error in the steps provided. In step #5, the diff is incorrect. The url-pattern sepcified is '/SlowStart', but should just be '/'. Below is the correct diff: diff --git a/src/main/webapp/WEB-INF/web.xml b/src/main/webapp/WEB-INF/web.xml index fa91269..84f53f9 100755 --- a/src/main/webapp/WEB-INF/web.xml +++ b/src/main/webapp/WEB-INF/web.xml @@ -6,6 +6,15 @@ xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd" metadata-complete="false"> + <servlet> + <servlet-name>SlowStart</servlet-name> + <servlet-class>SlowStart</servlet-class> + </servlet> + + <servlet-mapping> + <servlet-name>SlowStart</servlet-name> + <url-pattern>/</url-pattern> + <load-on-startup>1</load-on-startup> + </servlet-mapping> </web-app> Test this bug by following the steps in Comment 14, the gears are back to UP finally. And the app is available after the git push. ==> app-root/logs/haproxy.log <== [WARNING] 327/020259 (11151) : Server express/local-gear is DOWN for maintenance. [WARNING] 327/020700 (11151) : Server express/local-gear is UP (leaving maintenance). [WARNING] 327/020700 (11151) : Server express/gear-yes-jbtest-2-yes is DOWN for maintenance. [WARNING] 327/020824 (11151) : Server express/gear-yes-jbtest-2-yes is UP (leaving maintenance). Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-2666.html |