Bug 1566671

Summary: Routers unresponsive
Product: OpenShift Container Platform Reporter: Robert Bost <rbost>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Networking sub component: router QA Contact: zhaozhanqi <zzhao>
Status: CLOSED DUPLICATE Docs Contact:
Severity: urgent    
Priority: urgent CC: aos-bugs, rhowe
Version: 3.3.1   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-12 21:55:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robert Bost 2018-04-12 18:27:57 UTC
Description of problem:

Customer is having an outage due to their router pods being unresponsive. Removing health checks allows pods to start up but performing health checks manually (curl http://localhost:1936/healthz) still fail after a long pause with Connection Timeout.

Unfortunately, no events or logging output from haproxy.

Version-Release number of selected component (if applicable): atomic-openshift-3.3.1.17-1.git.0.b82e86c


How reproducible: Happening constantly for customer.


Actual results: Unable to utilize pod routes.

Comment 3 Ben Bennett 2018-04-12 18:58:09 UTC
I think the iptables errors are a red herring since the router runs with host networking and a service is not being used to access it.

What's the cpu utilization of haproxy?  Is it under high load?

Comment 4 Ryan Howe 2018-04-12 21:55:00 UTC

*** This bug has been marked as a duplicate of bug 1384746 ***