Description of problem: ----------------------- One node in the cluster with static IP address of nodes in OCP 4.3.0 after upgrade to OCP v4.3.18 is NotReady and an additional localhost node with the same IP address is present. The monitoring and network are processing and degraded other cluster operators are in 4.3.18 and correct. NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE monitoring 4.3.18 False True True 2d17h network 4.3.18 True True True 62d Version-Release number: ----------------------- Server Version: 4.3.18 Kubernetes Version: v1.16.2 Additional info: ---------------- There is probably a solution on how to correct this issue with: $ oc delete node localhost $ oc delete node worker-04 $ -> restart of worker-04 But it is important for us to know the root cause. - There is a lot of pending CSR for the localhost node.
Reassigning to RHCOS. There is a bug regarding localhost being set in certain situations.
Additional info: ================ The oc delete localhost and worker-node plus restart of worker-node did not help to solve it. Worker-node was connected to the cluster like localhost. But this time this localhost node is regular part of the cluster with IP for the worker-node. DNS and PTR records are set correctly. After restart of another worker node this restarted worker node is not ready with localhost: $ openssl x509 -text -in /var/lib/kubelet/pki/kubelet-client-current.pem | grep CN Issuer: CN = kube-csr-signer_@1588779788 Subject: O = system:nodes, CN = system:node:localhost From the logs of network-online: It looks like it is set correctly but after 4 seconds back as localhost: -------------------------------- [debug node] $ journalctl -u network-online.target -- Logs begin at Thu 2020-05-21 10:30:01 UTC, end at Fri 2020-05-22 13:43:38 UTC. -- May 22 10:40:21 worker-04.example.com systemd[1]: Stopped target Network is Online. -- Reboot -- May 22 10:43:11 localhost systemd[1]: Reached target Network is Online.
Setting priority as medium and targeted for 4.6. There are a handful of other BZs related to how the hostname is handled that may be related to this one. We will investigate and do more diligent triage of this issue when capacity allows.
This is a duplicate of 1809345 Backport was released via https://github.com/openshift/machine-config-operator/commit/0b2741b3c0d735446cedb3d2494d85a4cbd74b90 *** This bug has been marked as a duplicate of bug 1809345 ***