Bug 1696628

Summary: Using AWS when a secondary IP address is added the order of the list of InternalIPs is not preserved.
Product: OpenShift Container Platform Reporter: Oscar Casal Sanchez <ocasalsa>
Component: NodeAssignee: Robert Krawitz <rkrawitz>
Status: CLOSED ERRATA QA Contact: Weinan Liu <weinliu>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: aos-bugs, bbennett, danw, dcbw, dmoessne, florin-alexandru.peter, jokerman, lmartinh, mmccomas, pcameron, rkrawitz, rsandu, schoudha, weliang
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Merging algorithm for additional IP addresses added to a node was incorrect. Consequence: After adding an additional IP address to a node, the list of addresses was out of order, resulting in the node being unable to talk to the api server. Fix: Change the merge algorithm for addresses to not reorder the addresses. Result: Adding secondary IP addresses to a node no longer changes the ordering and the node is able to continue communication with the api server.
Story Points: ---
Clone Of:
: 1729276 (view as bug list) Environment:
Last Closed: 2019-10-16 06:28:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1729276, 1734385    

Comment 37 Phil Cameron 2019-05-14 15:14:55 UTC
Changed the title to reflect what is actually happening. On aws as new additional IP addresses are added they are added to the end of the list of IP addresses as expected. However, when the new address is added to the Node's list of InternalIP addresses, the ordering of IP addresses in the list is not preserver. Kubeernetes/Openshift always uses the first address in the list as the node address for cluster operations. When this address changes (by the reordering) the cluster loses access to the node.

Comment 38 Dan Winship 2019-07-08 16:20:21 UTC
Fixed in upstream master. Not sure if we want to do backports upstream or just backport it here? The patch ought to apply pretty cleanly back to 3.11.

Comment 39 Dan Williams 2019-07-09 14:57:36 UTC
Upstream PR is linked in the External trackers above, but I'll paste here too: https://github.com/kubernetes/kubernetes/pull/79391

Comment 41 Dan Winship 2019-07-11 14:59:02 UTC
It looks like this bug would have been reported sooner and by more people, except for the fact that you can mostly use --node-ip to work around it in 3.x. But in 4.x that's not available, and so more people are running into this (https://github.com/openshift/machine-config-operator/issues/944). We may need to backport this to 4.1.z.

Comment 42 Sunil Choudhary 2019-07-19 11:22:57 UTC
Verified on 4.2.0-0.nightly-2019-07-18-120653.

From AWS console added 4th secondary IP addresses to node ip-10-0-137-237.us-east-2.compute.internal. Also rebooted the node, it is still using original IP address.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-07-18-120653   True        False         170m    Cluster version is 4.2.0-0.nightly-2019-07-18-120653

$ oc get nodes -o wide
NAME                                         STATUS   ROLES    AGE    VERSION             INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                   KERNEL-VERSION               CONTAINER-RUNTIME
ip-10-0-137-203.us-east-2.compute.internal   Ready    master   145m   v1.14.0+bbfcbc8ac   10.0.137.203   <none>        Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa)   4.18.0-80.4.2.el8_0.x86_64   cri-o://1.14.8-3.rhaos4.2.el8
ip-10-0-137-237.us-east-2.compute.internal   Ready    worker   136m   v1.14.0+bbfcbc8ac   10.0.137.237   <none>        Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa)   4.18.0-80.4.2.el8_0.x86_64   cri-o://1.14.8-3.rhaos4.2.el8
ip-10-0-145-136.us-east-2.compute.internal   Ready    worker   136m   v1.14.0+bbfcbc8ac   10.0.145.136   <none>        Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa)   4.18.0-80.4.2.el8_0.x86_64   cri-o://1.14.8-3.rhaos4.2.el8
ip-10-0-154-217.us-east-2.compute.internal   Ready    master   145m   v1.14.0+bbfcbc8ac   10.0.154.217   <none>        Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa)   4.18.0-80.4.2.el8_0.x86_64   cri-o://1.14.8-3.rhaos4.2.el8
ip-10-0-173-64.us-east-2.compute.internal    Ready    master   145m   v1.14.0+bbfcbc8ac   10.0.173.64    <none>        Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa)   4.18.0-80.4.2.el8_0.x86_64   cri-o://1.14.8-3.rhaos4.2.el8

$ aws ec2 describe-instances --instance-ids i-0632805cfdff1e0c4 --query 'Reservations[].Instances[].NetworkInterfaces[].PrivateIpAddresses'
[
    [
        {
            "Primary": true,
            "PrivateDnsName": "ip-10-0-134-237.ap-south-1.compute.internal",
            "PrivateIpAddress": "10.0.134.237"
        },
        {
            "Primary": false,
            "PrivateDnsName": "ip-10-0-137-187.ap-south-1.compute.internal",
            "PrivateIpAddress": "10.0.137.187"
        },
        {
            "Primary": false,
            "PrivateDnsName": "ip-10-0-131-93.ap-south-1.compute.internal",
            "PrivateIpAddress": "10.0.131.93"
        },
        {
            "Primary": false,
            "PrivateDnsName": "ip-10-0-132-193.ap-south-1.compute.internal",
            "PrivateIpAddress": "10.0.132.193"
        },
        {
            "Primary": false,
            "PrivateDnsName": "ip-10-0-135-119.ap-south-1.compute.internal",
            "PrivateIpAddress": "10.0.135.119"
        }
    ]
]

$ oc describe node ip-10-0-137-237.us-east-2.compute.internal
Name:               ip-10-0-137-237.us-east-2.compute.internal
...
Addresses:
  InternalIP:   10.0.137.237
  InternalDNS:  ip-10-0-137-237.us-east-2.compute.internal
  Hostname:     ip-10-0-137-237.us-east-2.compute.internal

Comment 43 Sunil Choudhary 2019-07-19 11:52:14 UTC
Ignore previous comment. Output from wrong node. Below is output from correct node. Verified on 4.2.0-0.nightly-2019-07-18-120653.

From AWS console added 4th secondary IP addresses to node ip-10-0-145-136.us-east-2.compute.internal. Also rebooted the node, it is still using original IP address.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-07-18-120653   True        False         170m    Cluster version is 4.2.0-0.nightly-2019-07-18-120653

$ oc get nodes -o wide
NAME                                         STATUS   ROLES    AGE    VERSION             INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                   KERNEL-VERSION               CONTAINER-RUNTIME
ip-10-0-137-203.us-east-2.compute.internal   Ready    master   145m   v1.14.0+bbfcbc8ac   10.0.137.203   <none>        Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa)   4.18.0-80.4.2.el8_0.x86_64   cri-o://1.14.8-3.rhaos4.2.el8
ip-10-0-137-237.us-east-2.compute.internal   Ready    worker   136m   v1.14.0+bbfcbc8ac   10.0.137.237   <none>        Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa)   4.18.0-80.4.2.el8_0.x86_64   cri-o://1.14.8-3.rhaos4.2.el8
ip-10-0-145-136.us-east-2.compute.internal   Ready    worker   136m   v1.14.0+bbfcbc8ac   10.0.145.136   <none>        Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa)   4.18.0-80.4.2.el8_0.x86_64   cri-o://1.14.8-3.rhaos4.2.el8
ip-10-0-154-217.us-east-2.compute.internal   Ready    master   145m   v1.14.0+bbfcbc8ac   10.0.154.217   <none>        Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa)   4.18.0-80.4.2.el8_0.x86_64   cri-o://1.14.8-3.rhaos4.2.el8
ip-10-0-173-64.us-east-2.compute.internal    Ready    master   145m   v1.14.0+bbfcbc8ac   10.0.173.64    <none>        Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa)   4.18.0-80.4.2.el8_0.x86_64   cri-o://1.14.8-3.rhaos4.2.el8

Added 4 secondary IPs

sh-4.4# curl -s http://169.254.169.254/latest/meta-data/network/interfaces/macs/06:8b:1b:6c:bc:24/local-ipv4s
10.0.145.136
10.0.150.98
10.0.156.131
10.0.146.68
10.0.155.234

$ aws ec2 describe-instances --instance-ids i-0c27102bad9011ea1 --query 'Reservations[].Instances[].NetworkInterfaces[].PrivateIpAddresses'
[
    [
        {
            "Primary": true,
            "PrivateDnsName": "ip-10-0-145-136.us-east-2.compute.internal",
            "PrivateIpAddress": "10.0.145.136"
        },
        {
            "Primary": false,
            "PrivateDnsName": "ip-10-0-150-98.us-east-2.compute.internal",
            "PrivateIpAddress": "10.0.150.98"
        },
        {
            "Primary": false,
            "PrivateDnsName": "ip-10-0-156-131.us-east-2.compute.internal",
            "PrivateIpAddress": "10.0.156.131"
        },
        {
            "Primary": false,
            "PrivateDnsName": "ip-10-0-146-68.us-east-2.compute.internal",
            "PrivateIpAddress": "10.0.146.68"
        },
        {
            "Primary": false,
            "PrivateDnsName": "ip-10-0-155-234.us-east-2.compute.internal",
            "PrivateIpAddress": "10.0.155.234"
        }
    ]
]

$ oc describe node ip-10-0-145-136.us-east-2.compute.internal
Name:               ip-10-0-145-136.us-east-2.compute.internal
...
Addresses:
  InternalIP:   10.0.145.136
  InternalIP:   10.0.150.98
  InternalIP:   10.0.156.131
  InternalIP:   10.0.146.68
  InternalIP:   10.0.155.234
  InternalDNS:  ip-10-0-145-136.us-east-2.compute.internal
  Hostname:     ip-10-0-145-136.us-east-2.compute.internal

$ oc get nodes -o wide
NAME                                         STATUS   ROLES    AGE     VERSION             INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                   KERNEL-VERSION               CONTAINER-RUNTIME
ip-10-0-137-203.us-east-2.compute.internal   Ready    master   4h35m   v1.14.0+bbfcbc8ac   10.0.137.203   <none>        Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa)   4.18.0-80.4.2.el8_0.x86_64   cri-o://1.14.8-3.rhaos4.2.el8
ip-10-0-137-237.us-east-2.compute.internal   Ready    worker   4h26m   v1.14.0+bbfcbc8ac   10.0.137.237   <none>        Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa)   4.18.0-80.4.2.el8_0.x86_64   cri-o://1.14.8-3.rhaos4.2.el8
ip-10-0-145-136.us-east-2.compute.internal   Ready    worker   4h26m   v1.14.0+bbfcbc8ac   10.0.145.136   <none>        Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa)   4.18.0-80.4.2.el8_0.x86_64   cri-o://1.14.8-3.rhaos4.2.el8
ip-10-0-154-217.us-east-2.compute.internal   Ready    master   4h35m   v1.14.0+bbfcbc8ac   10.0.154.217   <none>        Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa)   4.18.0-80.4.2.el8_0.x86_64   cri-o://1.14.8-3.rhaos4.2.el8
ip-10-0-173-64.us-east-2.compute.internal    Ready    master   4h35m   v1.14.0+bbfcbc8ac   10.0.173.64    <none>        Red Hat Enterprise Linux CoreOS 420.8.20190718.1 (Ootpa)   4.18.0-80.4.2.el8_0.x86_64   cri-o://1.14.8-3.rhaos4.2.el8

Comment 44 errata-xmlrpc 2019-10-16 06:28:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922