Changed the title to reflect what is actually happening. On aws as new additional IP addresses are added they are added to the end of the list of IP addresses as expected. However, when the new address is added to the Node's list of InternalIP addresses, the ordering of IP addresses in the list is not preserver. Kubeernetes/Openshift always uses the first address in the list as the node address for cluster operations. When this address changes (by the reordering) the cluster loses access to the node.
Fixed in upstream master. Not sure if we want to do backports upstream or just backport it here? The patch ought to apply pretty cleanly back to 3.11.
Upstream PR is linked in the External trackers above, but I'll paste here too: https://github.com/kubernetes/kubernetes/pull/79391
It looks like this bug would have been reported sooner and by more people, except for the fact that you can mostly use --node-ip to work around it in 3.x. But in 4.x that's not available, and so more people are running into this (https://github.com/openshift/machine-config-operator/issues/944). We may need to backport this to 4.1.z.
Verified on 4.2.0-0.nightly-2019-07-18-120653. From AWS console added 4th secondary IP addresses to node ip-10-0-137-237.us-east-2.compute.internal. Also rebooted the node, it is still using original IP address. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-07-18-120653 True False 170m Cluster version is 4.2.0-0.nightly-2019-07-18-120653 $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-137-203.us-east-2.compute.internal Ready master 145m v1.14.0+bbfcbc8ac 10.0.137.203 <none> Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa) 4.18.0-80.4.2.el8_0.x86_64 cri-o://1.14.8-3.rhaos4.2.el8 ip-10-0-137-237.us-east-2.compute.internal Ready worker 136m v1.14.0+bbfcbc8ac 10.0.137.237 <none> Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa) 4.18.0-80.4.2.el8_0.x86_64 cri-o://1.14.8-3.rhaos4.2.el8 ip-10-0-145-136.us-east-2.compute.internal Ready worker 136m v1.14.0+bbfcbc8ac 10.0.145.136 <none> Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa) 4.18.0-80.4.2.el8_0.x86_64 cri-o://1.14.8-3.rhaos4.2.el8 ip-10-0-154-217.us-east-2.compute.internal Ready master 145m v1.14.0+bbfcbc8ac 10.0.154.217 <none> Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa) 4.18.0-80.4.2.el8_0.x86_64 cri-o://1.14.8-3.rhaos4.2.el8 ip-10-0-173-64.us-east-2.compute.internal Ready master 145m v1.14.0+bbfcbc8ac 10.0.173.64 <none> Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa) 4.18.0-80.4.2.el8_0.x86_64 cri-o://1.14.8-3.rhaos4.2.el8 $ aws ec2 describe-instances --instance-ids i-0632805cfdff1e0c4 --query 'Reservations[].Instances[].NetworkInterfaces[].PrivateIpAddresses' [ [ { "Primary": true, "PrivateDnsName": "ip-10-0-134-237.ap-south-1.compute.internal", "PrivateIpAddress": "10.0.134.237" }, { "Primary": false, "PrivateDnsName": "ip-10-0-137-187.ap-south-1.compute.internal", "PrivateIpAddress": "10.0.137.187" }, { "Primary": false, "PrivateDnsName": "ip-10-0-131-93.ap-south-1.compute.internal", "PrivateIpAddress": "10.0.131.93" }, { "Primary": false, "PrivateDnsName": "ip-10-0-132-193.ap-south-1.compute.internal", "PrivateIpAddress": "10.0.132.193" }, { "Primary": false, "PrivateDnsName": "ip-10-0-135-119.ap-south-1.compute.internal", "PrivateIpAddress": "10.0.135.119" } ] ] $ oc describe node ip-10-0-137-237.us-east-2.compute.internal Name: ip-10-0-137-237.us-east-2.compute.internal ... Addresses: InternalIP: 10.0.137.237 InternalDNS: ip-10-0-137-237.us-east-2.compute.internal Hostname: ip-10-0-137-237.us-east-2.compute.internal
Ignore previous comment. Output from wrong node. Below is output from correct node. Verified on 4.2.0-0.nightly-2019-07-18-120653. From AWS console added 4th secondary IP addresses to node ip-10-0-145-136.us-east-2.compute.internal. Also rebooted the node, it is still using original IP address. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-07-18-120653 True False 170m Cluster version is 4.2.0-0.nightly-2019-07-18-120653 $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-137-203.us-east-2.compute.internal Ready master 145m v1.14.0+bbfcbc8ac 10.0.137.203 <none> Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa) 4.18.0-80.4.2.el8_0.x86_64 cri-o://1.14.8-3.rhaos4.2.el8 ip-10-0-137-237.us-east-2.compute.internal Ready worker 136m v1.14.0+bbfcbc8ac 10.0.137.237 <none> Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa) 4.18.0-80.4.2.el8_0.x86_64 cri-o://1.14.8-3.rhaos4.2.el8 ip-10-0-145-136.us-east-2.compute.internal Ready worker 136m v1.14.0+bbfcbc8ac 10.0.145.136 <none> Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa) 4.18.0-80.4.2.el8_0.x86_64 cri-o://1.14.8-3.rhaos4.2.el8 ip-10-0-154-217.us-east-2.compute.internal Ready master 145m v1.14.0+bbfcbc8ac 10.0.154.217 <none> Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa) 4.18.0-80.4.2.el8_0.x86_64 cri-o://1.14.8-3.rhaos4.2.el8 ip-10-0-173-64.us-east-2.compute.internal Ready master 145m v1.14.0+bbfcbc8ac 10.0.173.64 <none> Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa) 4.18.0-80.4.2.el8_0.x86_64 cri-o://1.14.8-3.rhaos4.2.el8 Added 4 secondary IPs sh-4.4# curl -s http://169.254.169.254/latest/meta-data/network/interfaces/macs/06:8b:1b:6c:bc:24/local-ipv4s 10.0.145.136 10.0.150.98 10.0.156.131 10.0.146.68 10.0.155.234 $ aws ec2 describe-instances --instance-ids i-0c27102bad9011ea1 --query 'Reservations[].Instances[].NetworkInterfaces[].PrivateIpAddresses' [ [ { "Primary": true, "PrivateDnsName": "ip-10-0-145-136.us-east-2.compute.internal", "PrivateIpAddress": "10.0.145.136" }, { "Primary": false, "PrivateDnsName": "ip-10-0-150-98.us-east-2.compute.internal", "PrivateIpAddress": "10.0.150.98" }, { "Primary": false, "PrivateDnsName": "ip-10-0-156-131.us-east-2.compute.internal", "PrivateIpAddress": "10.0.156.131" }, { "Primary": false, "PrivateDnsName": "ip-10-0-146-68.us-east-2.compute.internal", "PrivateIpAddress": "10.0.146.68" }, { "Primary": false, "PrivateDnsName": "ip-10-0-155-234.us-east-2.compute.internal", "PrivateIpAddress": "10.0.155.234" } ] ] $ oc describe node ip-10-0-145-136.us-east-2.compute.internal Name: ip-10-0-145-136.us-east-2.compute.internal ... Addresses: InternalIP: 10.0.145.136 InternalIP: 10.0.150.98 InternalIP: 10.0.156.131 InternalIP: 10.0.146.68 InternalIP: 10.0.155.234 InternalDNS: ip-10-0-145-136.us-east-2.compute.internal Hostname: ip-10-0-145-136.us-east-2.compute.internal $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-137-203.us-east-2.compute.internal Ready master 4h35m v1.14.0+bbfcbc8ac 10.0.137.203 <none> Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa) 4.18.0-80.4.2.el8_0.x86_64 cri-o://1.14.8-3.rhaos4.2.el8 ip-10-0-137-237.us-east-2.compute.internal Ready worker 4h26m v1.14.0+bbfcbc8ac 10.0.137.237 <none> Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa) 4.18.0-80.4.2.el8_0.x86_64 cri-o://1.14.8-3.rhaos4.2.el8 ip-10-0-145-136.us-east-2.compute.internal Ready worker 4h26m v1.14.0+bbfcbc8ac 10.0.145.136 <none> Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa) 4.18.0-80.4.2.el8_0.x86_64 cri-o://1.14.8-3.rhaos4.2.el8 ip-10-0-154-217.us-east-2.compute.internal Ready master 4h35m v1.14.0+bbfcbc8ac 10.0.154.217 <none> Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa) 4.18.0-80.4.2.el8_0.x86_64 cri-o://1.14.8-3.rhaos4.2.el8 ip-10-0-173-64.us-east-2.compute.internal Ready master 4h35m v1.14.0+bbfcbc8ac 10.0.173.64 <none> Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa) 4.18.0-80.4.2.el8_0.x86_64 cri-o://1.14.8-3.rhaos4.2.el8
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922