Bug 1471957

Summary: When performing a rolling deployment of a large container on Online us-east-1, some router instances don't include endpoints
Product: OpenShift Online Reporter: Clayton Coleman <ccoleman>
Component: RoutingAssignee: Ben Bennett <bbennett>
Status: CLOSED DUPLICATE QA Contact: zhaozhanqi <zzhao>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.xCC: aos-bugs, xtian
Target Milestone: ---Keywords: OnlineStarter
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-31 17:33:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Clayton Coleman 2017-07-17 19:16:32 UTC
We have a prometheus instance in us-east-1 that, when updated, takes 20minutes to 3hours for each router instance to start including it again.  The endpoints are in place for the service (the new pod IP is in endpoints), but only some of the routers show up.

Project openshift-devops-monitor route prometheus

Comment 1 Clayton Coleman 2017-07-17 19:17:06 UTC
Only some of the router instances return the app - the others return a 503 for a very long time.  This is 3.6.126/8

Comment 2 Ben Bennett 2017-07-28 19:17:47 UTC
Please see the comment at https://bugzilla.redhat.com/show_bug.cgi?id=1471899#c2 for a way to tune things to work around this problem for the short term.

Comment 3 Ben Bennett 2017-08-31 17:33:20 UTC

*** This bug has been marked as a duplicate of bug 1471899 ***