Bug 1609751 - [starter-ca-central-1] [starter-us-east-1] random 503s when accessing exposed services externally
Summary: [starter-ca-central-1] [starter-us-east-1] random 503s when accessing exposed...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Routing
Version: 3.x
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Ivan Chavero
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-30 11:24 UTC by Jiří Fiala
Modified: 2018-08-15 14:39 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-15 14:39:56 UTC
Target Upstream Version:
Embargoed:
jfiala: needinfo-


Attachments (Terms of Use)
example app 200/503 response rate (6.51 KB, text/plain)
2018-07-30 11:24 UTC, Jiří Fiala
no flags Details
In some cases happens total routing servicios. (102.01 KB, image/png)
2018-07-31 08:24 UTC, Paco Boga
no flags Details

Description Jiří Fiala 2018-07-30 11:24:24 UTC
Created attachment 1471504 [details]
example app 200/503 response rate

Description of problem:
Users reported seemingly random, but quite frequent 503's when accessing applications over routes on starter-ca-central-1 (v3.10.14). The issue may have starter occurring after the recent upgrade to 3.10.14 on July 25th; v3.10.9 Starter clusters do not seem to be affected. The application itself seems to be running properly in all cases - indicating the issue could be caused by the router.
I have induced this by deploying the node.js example app and trying to access the default page every two seconds:

Version-Release number of selected component (if applicable):
Server https://api.starter-ca-central-1.openshift.com:443
openshift v3.10.14
kubernetes v1.10.0+b81c8f8

How reproducible:
appears to be consistently reproducible

Steps to Reproduce:
1. Deploy a new app on starter-ca-central-1 (or use an already existing one)
2. Expose a service and wait for the route to be admitted by the router
3. Hit the route repeatedly

Actual results:
At least some 503s while the app is running properly

Expected results:
Consistent response, same as when hitting the service from within the cluster

Comment 1 Paco Boga 2018-07-31 08:24:58 UTC
Created attachment 1471717 [details]
In some cases happens total routing servicios.

Comment 5 Jiří Fiala 2018-08-03 06:04:59 UTC
This issue was induced on starter-us-east-1 too, so it does not seem to be v3.10.14 specific, as suggested in my first comment.

Comment 6 Casey Callendrello 2018-08-03 12:19:15 UTC
Kicking over to the Router team, which is now separate from SDN.

Comment 9 Dan Mace 2018-08-15 14:39:56 UTC
Closing per https://bugzilla.redhat.com/show_bug.cgi?id=1609751#c8; if the issue recurs please feel free to re-open with new details.


Note You need to log in before you can comment on or make changes to this bug.