Bug 1566671

Summary:	Routers unresponsive
Product:	OpenShift Container Platform	Reporter:	Robert Bost <rbost>
Component:	Networking	Assignee:	Ben Bennett <bbennett>
Networking sub component:	router	QA Contact:	zhaozhanqi <zzhao>
Status:	CLOSED DUPLICATE	Docs Contact:
Severity:	urgent
Priority:	urgent	CC:	aos-bugs, rhowe
Version:	3.3.1
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-04-12 21:55:00 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Robert Bost 2018-04-12 18:27:57 UTC

Description of problem:

Customer is having an outage due to their router pods being unresponsive. Removing health checks allows pods to start up but performing health checks manually (curl http://localhost:1936/healthz) still fail after a long pause with Connection Timeout.

Unfortunately, no events or logging output from haproxy.

Version-Release number of selected component (if applicable): atomic-openshift-3.3.1.17-1.git.0.b82e86c


How reproducible: Happening constantly for customer.


Actual results: Unable to utilize pod routes.

Comment 3 Ben Bennett 2018-04-12 18:58:09 UTC

I think the iptables errors are a red herring since the router runs with host networking and a service is not being used to access it.

What's the cpu utilization of haproxy?  Is it under high load?

Comment 4 Ryan Howe 2018-04-12 21:55:00 UTC


*** This bug has been marked as a duplicate of bug 1384746 ***