Description of problem: This cluster was OK when setted up, this issue was found after running for 3 days. this cluster is 3 masters + 3 infra + 3 comupte, runs on openstack with behind proxy. If restart node service, the IP will be back. Version-Release number of selected component (if applicable): openshift v3.11.0-0.11.0 kubernetes v1.11.0+d4cacc0 How reproducible: N/A Steps to Reproduce: 1. install HA OCP 3.11 and running for 3 days with multi-users. Actual results: 5 out of 9 nodes lost IP. # oc get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME preserve-sharefr2-master-etcd-1 Ready master 3d v1.11.0+d4cacc0 <none> <none> Red Hat Enterprise Linux Server 7.4 (Maipo) 3.10.0-693.21.1.el7.x86_64 cri-o://1.11.1 preserve-sharefr2-master-etcd-2 Ready master 3d v1.11.0+d4cacc0 172.16.120.8 10.8.247.142 Red Hat Enterprise Linux Server 7.4 (Maipo) 3.10.0-693.21.1.el7.x86_64 cri-o://1.11.1 preserve-sharefr2-master-etcd-3 Ready master 3d v1.11.0+d4cacc0 <none> <none> Red Hat Enterprise Linux Server 7.4 (Maipo) 3.10.0-693.21.1.el7.x86_64 cri-o://1.11.1 preserve-sharefr2-node-1 Ready compute 3d v1.11.0+d4cacc0 <none> <none> Red Hat Enterprise Linux Server 7.4 (Maipo) 3.10.0-693.21.1.el7.x86_64 cri-o://1.11.1 preserve-sharefr2-node-2 Ready compute 3d v1.11.0+d4cacc0 172.16.120.78 10.8.251.177 Red Hat Enterprise Linux Server 7.4 (Maipo) 3.10.0-693.21.1.el7.x86_64 cri-o://1.11.1 preserve-sharefr2-node-3 Ready compute 3d v1.11.0+d4cacc0 172.16.120.79 10.8.247.128 Red Hat Enterprise Linux Server 7.4 (Maipo) 3.10.0-693.21.1.el7.x86_64 cri-o://1.11.1 preserve-sharefr2-node-infra-1 Ready infra 3d v1.11.0+d4cacc0 <none> <none> Red Hat Enterprise Linux Server 7.4 (Maipo) 3.10.0-693.21.1.el7.x86_64 cri-o://1.11.1 preserve-sharefr2-node-infra-2 Ready infra 3d v1.11.0+d4cacc0 <none> <none> Red Hat Enterprise Linux Server 7.4 (Maipo) 3.10.0-693.21.1.el7.x86_64 cri-o://1.11.1 preserve-sharefr2-node-infra-3 Ready infra 3d v1.11.0+d4cacc0 172.16.120.6 10.8.241.153 Red Hat Enterprise Linux Server 7.4 (Maipo) 3.10.0-693.21.1.el7.x86_64 cri-o://1.11.1 lots of pods in terminating status Expected results: all nodes IP are remaining, pods are not in terminating status for long time Additional info: heapster logs E0810 06:46:05.000408 1 summary.go:376] Node preserve-sharefr2-master-etcd-3 has no valid hostname and/or IP address: preserve-sharefr2-master-etcd-3 E0810 06:46:05.000420 1 summary.go:376] Node preserve-sharefr2-node-infra-2 has no valid hostname and/or IP address: preserve-sharefr2-node-infra-2 E0810 06:46:05.000432 1 summary.go:376] Node preserve-sharefr2-node-1 has no valid hostname and/or IP address: preserve-sharefr2-node-1 E0810 06:46:05.000437 1 summary.go:376] Node preserve-sharefr2-node-infra-1 has no valid hostname and/or IP address: preserve-sharefr2-node-infra-1 E0810 06:46:05.000442 1 summary.go:376] Node preserve-sharefr2-master-etcd-1 has no valid hostname and/or IP address: preserve-sharefr2-master-etcd-1
already set up clusters, if do not meet this issue while running for 4 days, I will mark this bug is fixed.
Fixed. openshift v3.11.0-0.20.0 kubernetes v1.11.0+d4cacc0 running 2 clusters for 4 days, not meet node IP lost issue. Kernel Version: 3.10.0-862.9.1.el7.x86_64 Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo) Kernel Version: 3.10.0-933.el7.x86_64 Operating System: Red Hat Enterprise Linux Server 7.6 Beta (Maipo)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652