Description of problem: There is an application [1] deployed on evg dedicated cluster in two ways: * pre-build Docker image [2] * s2i using EAP [3] the pre-build container is deployed using: ``` oc new-app docker.io/osevg/workshopper:ui -e WORKSHOPS_URLS="http://workshopper.pixy.io/export/a8ce3820425f41caa843e0e1842b8a70" -e CONTENT_URL_PREFIX="https://raw.githubusercontent.com/osevg/workshopper-content/master/" --name work oc expose svc work ``` when user tries to load the page (tested on at least 4 different connections) the requests are randomly responded with 503 errors by the haproxy. When using the same image on localhost I do not see this behaviour. [1] https://github.com/osevg/workshopper [2] http://work-workshopper.e203.evg.openshiftapps.com/ [3] http://workshopper-workshopper.e203.evg.openshiftapps.com/ Version-Release number of selected component (if applicable): OpenShift Master is v3.4.1.7 How reproducible: intermittent, but very often Steps to Reproduce: 1. View route URL from [2] or [3] and view request responses in browser 2. 3. Actual results: Either route request itself will result in 503 or component requests (.css, .js) will return 503, for example, from Chrome console: 'http://work-workshopper.e203.evg.openshiftapps.com/css/coreui.css Failed to load resource: the server responded with a status of 503 (Service Unavailable)' routes/resources that fail are not consistent with each page reload. Expected results: No 503s - route and component requests should resolve properly. Additional info:
Is there anything interesting in the logs from the router pod? What does 'oc logs router...' say?
Shawn: Is this still happening? Can you work with Ops to get some logs from the router pod when this happens?
(In reply to Abhishek Gupta from comment #2) > Shawn: Is this still happening? Can you work with Ops to get some logs from > the router pod when this happens? This appears to have been resolved through a router pod restart by Operations - one of the pods was experiencing issues related to a separate bug/issue. This was also affecting metrics, which appears to be working properly now as well. This same issue was also present on the engint cluster after upgrade to 3.4 but was resolved with a router pod restart.
Ben: is this still under investigation or can this be moved over to QE?
This is either https://bugzilla.redhat.com/show_bug.cgi?id=1434574 or https://bugzilla.redhat.com/show_bug.cgi?id=1419771 . I was hoping to have the logs to identify which. But I'm going to guess 1419771 and dupe it. This will be in the next 3.4 bugfix release. *** This bug has been marked as a duplicate of bug 1419771 ***