Created attachment 1471504 [details] example app 200/503 response rate Description of problem: Users reported seemingly random, but quite frequent 503's when accessing applications over routes on starter-ca-central-1 (v3.10.14). The issue may have starter occurring after the recent upgrade to 3.10.14 on July 25th; v3.10.9 Starter clusters do not seem to be affected. The application itself seems to be running properly in all cases - indicating the issue could be caused by the router. I have induced this by deploying the node.js example app and trying to access the default page every two seconds: Version-Release number of selected component (if applicable): Server https://api.starter-ca-central-1.openshift.com:443 openshift v3.10.14 kubernetes v1.10.0+b81c8f8 How reproducible: appears to be consistently reproducible Steps to Reproduce: 1. Deploy a new app on starter-ca-central-1 (or use an already existing one) 2. Expose a service and wait for the route to be admitted by the router 3. Hit the route repeatedly Actual results: At least some 503s while the app is running properly Expected results: Consistent response, same as when hitting the service from within the cluster
Created attachment 1471717 [details] In some cases happens total routing servicios.
This issue was induced on starter-us-east-1 too, so it does not seem to be v3.10.14 specific, as suggested in my first comment.
Kicking over to the Router team, which is now separate from SDN.
Closing per https://bugzilla.redhat.com/show_bug.cgi?id=1609751#c8; if the issue recurs please feel free to re-open with new details.