Bug 1880451 - Node becoming NotReady frequently with openshift IPI on vsphere
Summary: Node becoming NotReady frequently with openshift IPI on vsphere
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.9.0
Assignee: Ben Nemec
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-18 15:19 UTC by puraut
Modified: 2023-09-15 00:48 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-14 20:09:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5438831 0 None None None 2020-09-28 09:30:14 UTC

Description puraut 2020-09-18 15:19:07 UTC
Description of problem:
RHOCP 4.5 node become not ready and freqently 

Version-Release number of selected component (if applicable):
RHOCP 4.5.8

How reproducible:
Every time on vsphere 

Steps to Reproduce:
1.Install 4.5.8 cluster and let it run for > 24hrs 
2.
3.

Actual results:
1]Node not ready :kubelet unable to post status though it is runing 
2] unable to reslove api-int with ipi installtion: 
nslookup  api-int.ocp46aipi.ocp.gsslab.pnq2.redhat.com
;; Got recursion not available from 10.73.105.242, trying next server                   <----------- trying to first reslover not working
Server:		10.73.2.107                                                             <----------  it went for outside cluster 
Address:	10.73.2.107#53

** server can't find api-int.ocp46aipi.ocp.gsslab.pnq2.redhat.com: NXDOMAIN

nslookup sucess 

nslookup -debug api-int.ocp46aipi.ocp.gsslab.pnq2.redhat.com 10.73.105.242
Server:		10.73.105.242         <-- node IP
Address:	10.73.105.242#53

Name:	api-int.ocp46aipi.ocp.gsslab.pnq2.redhat.com
Address: 10.73.105.98

Expected results:
Cluster should be up and running 

Additional info:

Attaching sosreport of node and cluster details

Comment 3 Andrew McDermott 2020-09-21 17:18:44 UTC
Target set to next release version while investigation is either ongoing or pending. Will be considered for earlier release versions when diagnosed and resolved.

Comment 27 Ben Nemec 2021-10-14 20:09:01 UTC
Apologies for the long delay on this. In the meantime, the reporter's account seems to have been disabled so I can't request an update on the status of this problem. We've made a number of improvements to the keepalived configuration since 4.5, so if the problem was keepalived it's possible it has been fixed in 4.6 or a later release.

Given that the reporter isn't available to continue debugging this I'm going to close it, but feel free to reopen if anyone else is hitting this.

Comment 28 Red Hat Bugzilla 2023-09-15 00:48:25 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.