Bug 2002295

Summary: API intermittently not reachable - Cisco ACI integrated OpenShift cluster
Product: OpenShift Container Platform Reporter: Manish Pandey <mapandey>
Component: NetworkingAssignee: Ben Nemec <bnemec>
Networking sub component: runtime-cfg QA Contact: Victor Voronkov <vvoronko>
Status: CLOSED NOTABUG Docs Contact:
Severity: urgent    
Priority: urgent CC: adubey, ancollin, augol, bbennett, bnemec, eduen, palonsor, vpickard
Version: 4.6.zKeywords: Triaged
Target Milestone: ---Flags: mapandey: needinfo-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-20 20:21:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 10 Ben Nemec 2021-09-14 16:27:04 UTC
I think the first and simplest thing we should try here is to manually apply the changes in https://github.com/openshift/machine-config-operator/pull/2741. Based on everything I've seen about this bug, I have a strong suspicion that it's triggered by ungraceful keepalived stops. Eliminating those won't fix the underlying problem, but it will prevent it from happening under normal circumstances.