Bug 1430863 - Requests are randomly responding with 503 errors by haproxy
Summary: Requests are randomly responding with 503 errors by haproxy
Keywords:
Status: CLOSED DUPLICATE of bug 1419771
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Routing
Version: 3.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Ben Bennett
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-09 18:03 UTC by Shawn Purtell
Modified: 2017-03-29 15:11 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-29 15:11:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Shawn Purtell 2017-03-09 18:03:08 UTC
Description of problem:

There is an application [1] deployed on evg dedicated cluster in two ways:

* pre-build Docker image [2]
* s2i using EAP [3]

the pre-build container is deployed using:

```
oc new-app docker.io/osevg/workshopper:ui -e WORKSHOPS_URLS="http://workshopper.pixy.io/export/a8ce3820425f41caa843e0e1842b8a70" -e CONTENT_URL_PREFIX="https://raw.githubusercontent.com/osevg/workshopper-content/master/" --name work
oc expose svc work
```

when user tries to load the page (tested on at least 4 different connections)
the requests are randomly responded with 503 errors by the haproxy. When using the same image
on localhost I do not see this behaviour.

[1] https://github.com/osevg/workshopper
[2] http://work-workshopper.e203.evg.openshiftapps.com/
[3] http://workshopper-workshopper.e203.evg.openshiftapps.com/

Version-Release number of selected component (if applicable):
OpenShift Master is v3.4.1.7

How reproducible:
intermittent, but very often

Steps to Reproduce:
1. View route URL from [2] or [3] and view request responses in browser
2. 
3.

Actual results:
Either route request itself will result in 503 or component requests (.css, .js) will return 503, for example, from Chrome console:

'http://work-workshopper.e203.evg.openshiftapps.com/css/coreui.css Failed to load resource: the server responded with a status of 503 (Service Unavailable)'

routes/resources that fail are not consistent with each page reload.

Expected results:
No 503s - route and component requests should resolve properly.

Additional info:

Comment 1 Ben Bennett 2017-03-09 18:39:21 UTC
Is there anything interesting in the logs from the router pod?  What does 'oc logs router...' say?

Comment 2 Abhishek Gupta 2017-03-28 16:40:01 UTC
Shawn: Is this still happening? Can you work with Ops to get some logs from the router pod when this happens?

Comment 3 Shawn Purtell 2017-03-29 13:49:32 UTC
(In reply to Abhishek Gupta from comment #2)
> Shawn: Is this still happening? Can you work with Ops to get some logs from
> the router pod when this happens?

This appears to have been resolved through a router pod restart by Operations - one of the pods was experiencing issues related to a separate bug/issue. This was also affecting metrics, which appears to be working properly now as well.

This same issue was also present on the engint cluster after upgrade to 3.4 but was resolved with a router pod restart.

Comment 4 Abhishek Gupta 2017-03-29 15:02:30 UTC
Ben: is this still under investigation or can this be moved over to QE?

Comment 5 Ben Bennett 2017-03-29 15:11:41 UTC
This is either https://bugzilla.redhat.com/show_bug.cgi?id=1434574 or https://bugzilla.redhat.com/show_bug.cgi?id=1419771 . I was hoping to have the logs to identify which.  But I'm going to guess 1419771 and dupe it.  This will be in the next 3.4 bugfix release.

*** This bug has been marked as a duplicate of bug 1419771 ***


Note You need to log in before you can comment on or make changes to this bug.