Description of problem: RHOCP 4.5 node become not ready and freqently Version-Release number of selected component (if applicable): RHOCP 4.5.8 How reproducible: Every time on vsphere Steps to Reproduce: 1.Install 4.5.8 cluster and let it run for > 24hrs 2. 3. Actual results: 1]Node not ready :kubelet unable to post status though it is runing 2] unable to reslove api-int with ipi installtion: nslookup api-int.ocp46aipi.ocp.gsslab.pnq2.redhat.com ;; Got recursion not available from 10.73.105.242, trying next server <----------- trying to first reslover not working Server: 10.73.2.107 <---------- it went for outside cluster Address: 10.73.2.107#53 ** server can't find api-int.ocp46aipi.ocp.gsslab.pnq2.redhat.com: NXDOMAIN nslookup sucess nslookup -debug api-int.ocp46aipi.ocp.gsslab.pnq2.redhat.com 10.73.105.242 Server: 10.73.105.242 <-- node IP Address: 10.73.105.242#53 Name: api-int.ocp46aipi.ocp.gsslab.pnq2.redhat.com Address: 10.73.105.98 Expected results: Cluster should be up and running Additional info: Attaching sosreport of node and cluster details
Target set to next release version while investigation is either ongoing or pending. Will be considered for earlier release versions when diagnosed and resolved.
Apologies for the long delay on this. In the meantime, the reporter's account seems to have been disabled so I can't request an update on the status of this problem. We've made a number of improvements to the keepalived configuration since 4.5, so if the problem was keepalived it's possible it has been fixed in 4.6 or a later release. Given that the reporter isn't available to continue debugging this I'm going to close it, but feel free to reopen if anyone else is hitting this.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days