Bug 1471957

Summary:	When performing a rolling deployment of a large container on Online us-east-1, some router instances don't include endpoints
Product:	OpenShift Online	Reporter:	Clayton Coleman <ccoleman>
Component:	Routing	Assignee:	Ben Bennett <bbennett>
Status:	CLOSED DUPLICATE	QA Contact:	zhaozhanqi <zzhao>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.x	CC:	aos-bugs, xtian
Target Milestone:	---	Keywords:	OnlineStarter
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-08-31 17:33:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Clayton Coleman 2017-07-17 19:16:32 UTC

We have a prometheus instance in us-east-1 that, when updated, takes 20minutes to 3hours for each router instance to start including it again.  The endpoints are in place for the service (the new pod IP is in endpoints), but only some of the routers show up.

Project openshift-devops-monitor route prometheus

Comment 1 Clayton Coleman 2017-07-17 19:17:06 UTC

Only some of the router instances return the app - the others return a 503 for a very long time.  This is 3.6.126/8

Comment 2 Ben Bennett 2017-07-28 19:17:47 UTC

Please see the comment at https://bugzilla.redhat.com/show_bug.cgi?id=1471899#c2 for a way to tune things to work around this problem for the short term.

Comment 3 Ben Bennett 2017-08-31 17:33:20 UTC


*** This bug has been marked as a duplicate of bug 1471899 ***