Description of problem: A number of the 'quick-start' applications have a 30 second delay before the first Readiness check. Additionally, the routers have a default 30 second timeout before returning an error page. With the combination of these two behaviours, we've had reports that applications "return an error the first time they're accessed". 503 Service Unavailable Version-Release number of selected component (if applicable): OCP 3.4 How reproducible: Steps to Reproduce: 1. Created an application using jboss quickstart provided as default application in templates 2. Idle the application : oc idle <application-service> Application is correctly idled 3. Now access the route. Actual results: It returns 503 errors. Expected results: When accessing route it should have wakeup the pod and should not provide the error. Additional info: Tried timeout changes but still provided the error. It's an annoyance rather than a major issue, but combined with a lack of easy way to tell when a route was last accessed results in a reduced customer experience.
Commit pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/845e285645adc5ddea1eb68d953c8045cf5da621 Increased the time the proxy will hold connections when unidling Before we would wait 30 seconds for a pod to come live before dropping the connections. That time is too short, so we have increased it to 120 seconds. Fixes bug 1416037 (https://bugzilla.redhat.com/show_bug.cgi?id=1416037)
This has been merged into ocp and is in OCP v3.5.0.16 or newer.
Tested on OCP 3.5.0.16 It will throw a 504 error before the pod being started. [root@openshift-127 application-templates]# oc idle dc helloworld warning: continuing on for valid scalable resources, but an error occurred while finding scalable resources to idle: endpoints "dc" not foundMarked service default/helloworld to unidle resource DeploymentConfig default/helloworld (unidle to 1 replicas) Idled DeploymentConfig default/helloworld [root@openshift-127 application-templates]# time curl -v http://helloworld-default.0203-s5y.qe.rhcloud.com/ * About to connect() to helloworld-default.0203-s5y.qe.rhcloud.com port 80 (#0) * Trying 10.14.6.106... * Connected to helloworld-default.0203-s5y.qe.rhcloud.com (10.14.6.106) port 80 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.29.0 > Host: helloworld-default.0203-s5y.qe.rhcloud.com > Accept: */* > * HTTP 1.0, assume close after body < HTTP/1.0 504 Gateway Time-out < Cache-Control: no-cache < Connection: close < Content-Type: text/html < <html><body><h1>504 Gateway Time-out</h1> The server didn't respond in time. </body></html> * Closing connection 0 real 0m30.138s user 0m0.007s sys 0m0.007s
If you are going through a router then you will need to change the router's timeout. The default is 30 seconds. You can either set the router env ROUTER_DEFAULT_SERVER_TIMEOUT, or just change the route by setting the annotation haproxy.router.openshift.io/timeout. Set the time to 120s to get the maximum wait the service proxy will allow.
Tested with OCP 3.5.0.17, after set the router env, the unidling access will wait for up to 120s. Verify the bug.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0884