1880451 – Node becoming NotReady frequently with openshift IPI on vsphere

Bug 1880451 - Node becoming NotReady frequently with openshift IPI on vsphere

Summary: Node becoming NotReady frequently with openshift IPI on vsphere

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Ben Nemec
QA Contact:	Jian Zhang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-09-18 15:19 UTC by puraut
Modified:	2023-09-15 00:48 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-10-14 20:09:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	5438831	0	None	None	None	2020-09-28 09:30:14 UTC

Description puraut 2020-09-18 15:19:07 UTC

Description of problem:
RHOCP 4.5 node become not ready and freqently 

Version-Release number of selected component (if applicable):
RHOCP 4.5.8

How reproducible:
Every time on vsphere 

Steps to Reproduce:
1.Install 4.5.8 cluster and let it run for > 24hrs 
2.
3.

Actual results:
1]Node not ready :kubelet unable to post status though it is runing 
2] unable to reslove api-int with ipi installtion: 
nslookup  api-int.ocp46aipi.ocp.gsslab.pnq2.redhat.com
;; Got recursion not available from 10.73.105.242, trying next server                   <----------- trying to first reslover not working
Server:		10.73.2.107                                                             <----------  it went for outside cluster 
Address:	10.73.2.107#53

** server can't find api-int.ocp46aipi.ocp.gsslab.pnq2.redhat.com: NXDOMAIN

nslookup sucess 

nslookup -debug api-int.ocp46aipi.ocp.gsslab.pnq2.redhat.com 10.73.105.242
Server:		10.73.105.242         <-- node IP
Address:	10.73.105.242#53

Name:	api-int.ocp46aipi.ocp.gsslab.pnq2.redhat.com
Address: 10.73.105.98

Expected results:
Cluster should be up and running 

Additional info:

Attaching sosreport of node and cluster details

Comment 3 Andrew McDermott 2020-09-21 17:18:44 UTC

Target set to next release version while investigation is either ongoing or pending. Will be considered for earlier release versions when diagnosed and resolved.

Comment 27 Ben Nemec 2021-10-14 20:09:01 UTC

Apologies for the long delay on this. In the meantime, the reporter's account seems to have been disabled so I can't request an update on the status of this problem. We've made a number of improvements to the keepalived configuration since 4.5, so if the problem was keepalived it's possible it has been fixed in 4.6 or a later release.

Given that the reporter isn't available to continue debugging this I'm going to close it, but feel free to reopen if anyone else is hitting this.

Comment 28 Red Hat Bugzilla 2023-09-15 00:48:25 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.