Description of problem: A number of the 'quick-start' applications have a 30 second delay before the first Readiness check. Additionally, the routers have a default 30 second timeout before returning an error page.
With the combination of these two behaviours, we've had reports that applications "return an error the first time they're accessed".
503 Service Unavailable
Version-Release number of selected component (if applicable): OCP 3.4
Steps to Reproduce:
1. Created an application using jboss quickstart provided as default application in templates
2. Idle the application :
oc idle <application-service>
Application is correctly idled
3. Now access the route.
Actual results: It returns 503 errors.
Expected results: When accessing route it should have wakeup the pod and should not provide the error.
Additional info: Tried timeout changes but still provided the error. It's an annoyance rather than a major issue, but combined with a lack of easy way to tell when a route was last accessed results in a reduced customer experience.
Commit pushed to master at https://github.com/openshift/origin
Increased the time the proxy will hold connections when unidling
Before we would wait 30 seconds for a pod to come live before dropping
the connections. That time is too short, so we have increased it to
Fixes bug 1416037 (https://bugzilla.redhat.com/show_bug.cgi?id=1416037)
This has been merged into ocp and is in OCP v18.104.22.168 or newer.
Tested on OCP 22.214.171.124
It will throw a 504 error before the pod being started.
[root@openshift-127 application-templates]# oc idle dc helloworld
warning: continuing on for valid scalable resources, but an error occurred while finding scalable resources to idle: endpoints "dc" not foundMarked service default/helloworld to unidle resource DeploymentConfig default/helloworld (unidle to 1 replicas)
Idled DeploymentConfig default/helloworld
[root@openshift-127 application-templates]# time curl -v http://helloworld-default.0203-s5y.qe.rhcloud.com/
* About to connect() to helloworld-default.0203-s5y.qe.rhcloud.com port 80 (#0)
* Trying 10.14.6.106...
* Connected to helloworld-default.0203-s5y.qe.rhcloud.com (10.14.6.106) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: helloworld-default.0203-s5y.qe.rhcloud.com
> Accept: */*
* HTTP 1.0, assume close after body
< HTTP/1.0 504 Gateway Time-out
< Cache-Control: no-cache
< Connection: close
< Content-Type: text/html
<html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
* Closing connection 0
If you are going through a router then you will need to change the router's timeout. The default is 30 seconds.
You can either set the router env ROUTER_DEFAULT_SERVER_TIMEOUT, or just change the route by setting the annotation haproxy.router.openshift.io/timeout. Set the time to 120s to get the maximum wait the service proxy will allow.
Tested with OCP 126.96.36.199, after set the router env, the unidling access will wait for up to 120s.
Verify the bug.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.