+++ This bug was initially created as a clone of Bug #1734319 +++ Description of problem: We tested 4.1.8 that should include the fix for 1696628 but there are still nodes with wrong order of addresses. Version-Release number of selected component (if applicable): 4.1.8 How reproducible: Install cluster with machineCIDR: 192.168.32.0/19 Steps to Reproduce: 1. Add additional subnets for all 3 AZ 192.168.56.0/23, 192.168.58.0/23 and 192.168.60.0/23 2. Add to each node a additional interface 3. reboot the whole cluster Actual results: $ oc get node NAME STATUS ROLES AGE VERSION ip-192-168-48-169.eu-central-1.compute.internal Ready infra,worker 20h v1.13.4+ab8449285 ip-192-168-48-9.eu-central-1.compute.internal Ready master 20h v1.13.4+ab8449285 ip-192-168-49-118.eu-central-1.compute.internal NotReady primary,worker 20h v1.13.4+ab8449285 ip-192-168-50-9.eu-central-1.compute.internal Ready master 20h v1.13.4+ab8449285 ip-192-168-51-0.eu-central-1.compute.internal NotReady infra,worker 20h v1.13.4+ab8449285 ip-192-168-51-1.eu-central-1.compute.internal Ready primary,worker 20h v1.13.4+ab8449285 ip-192-168-52-9.eu-central-1.compute.internal Ready master 20h v1.13.4+ab8449285 ip-192-168-53-77.eu-central-1.compute.internal NotReady primary,worker 20h v1.13.4+ab8449285 ip-192-168-53-97.eu-central-1.compute.internal NotReady infra,worker 20h v1.13.4+ab8449285 $ oc get node ip-192-168-53-97.eu-central-1.compute.internal -o yaml apiVersion: v1 kind: Node metadata: annotations: machine.openshift.io/machine: openshift-machine-api/t102-7g2zc-infra-m4large-eu-central-1c-7r6rk machineconfiguration.openshift.io/currentConfig: rendered-worker-d9f0d4c1d4ad58f19abd3eb7a491fa47 machineconfiguration.openshift.io/desiredConfig: rendered-worker-d9f0d4c1d4ad58f19abd3eb7a491fa47 machineconfiguration.openshift.io/state: Done volumes.kubernetes.io/controller-managed-attach-detach: "true" creationTimestamp: "2019-07-29T12:19:21Z" labels: beta.kubernetes.io/arch: amd64 beta.kubernetes.io/instance-type: m4.large beta.kubernetes.io/os: linux failure-domain.beta.kubernetes.io/region: eu-central-1 failure-domain.beta.kubernetes.io/zone: eu-central-1c kubernetes.io/hostname: ip-192-168-53-97 node-role.kubernetes.io/infra: "" node-role.kubernetes.io/worker: "" node.openshift.io/os_id: rhcos node.openshift.io/os_version: "4.1" name: ip-192-168-53-97.eu-central-1.compute.internal resourceVersion: "475138" selfLink: /api/v1/nodes/ip-192-168-53-97.eu-central-1.compute.internal uid: 18131f11-b1fb-11e9-8c9e-0627ac171b24 spec: providerID: aws:///eu-central-1c/i-0070510acd0211610 taints: - effect: NoSchedule key: node.kubernetes.io/not-ready timeAdded: "2019-07-29T19:16:20Z" - effect: NoExecute key: node.kubernetes.io/not-ready timeAdded: "2019-07-30T07:16:07Z" status: addresses: - address: 192.168.61.113 type: InternalIP - address: 192.168.53.97 type: InternalIP - address: ip-192-168-53-97.eu-central-1.compute.internal type: InternalDNS - address: ip-192-168-53-97.eu-central-1.compute.internal type: Hostname allocatable: attachable-volumes-aws-ebs: "39" cpu: 1500m hugepages-1Gi: "0" hugepages-2Mi: "0" memory: 7548500Ki pods: "250" capacity: attachable-volumes-aws-ebs: "39" cpu: "2" hugepages-1Gi: "0" hugepages-2Mi: "0" memory: 8162900Ki pods: "250" conditions: - lastHeartbeatTime: "2019-07-30T08:52:06Z" lastTransitionTime: "2019-07-29T12:19:21Z" message: kubelet has sufficient memory available reason: KubeletHasSufficientMemory status: "False" type: MemoryPressure - lastHeartbeatTime: "2019-07-30T08:52:06Z" lastTransitionTime: "2019-07-29T12:19:21Z" message: kubelet has no disk pressure reason: KubeletHasNoDiskPressure status: "False" type: DiskPressure - lastHeartbeatTime: "2019-07-30T08:52:06Z" lastTransitionTime: "2019-07-29T12:19:21Z" message: kubelet has sufficient PID available reason: KubeletHasSufficientPID status: "False" type: PIDPressure - lastHeartbeatTime: "2019-07-30T08:52:06Z" lastTransitionTime: "2019-07-29T19:16:01Z" message: 'runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni config uninitialized' reason: KubeletNotReady status: "False" type: Ready daemonEndpoints: kubeletEndpoint: Port: 10250 images: - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:922b6a35a8fccd340acfadc06ed4d61b3bb52ccbbde266308d0d4c9f33cbd5bf - <none>:<none> sizeBytes: 759456116 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:29f6d90c6b0130aeb4adfa554630feb56220cf2b253f63225e01c214739bd11f - <none>:<none> sizeBytes: 393935893 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:11503fe3d8ac3a25a7b58575f8c0a0d9923780c56765f8fb805274f6e6eeb3fc - <none>:<none> sizeBytes: 326787011 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6d80264c4d43421fa342c43d6fdc7f953e7d4086d82b1314c4c757d02ef57cc7 - <none>:<none> sizeBytes: 309636152 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b06aa13d7d69c5e40f83b3c5ab448c23981debd037357befb94cd8b0a76539cd - <none>:<none> sizeBytes: 308816069 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9fb0aee3654a5a3085cf68e1733d954bce0fcd29259c472b90d96f0ab3b6cae5 - <none>:<none> sizeBytes: 308244944 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:315edc515624d3ea914ca8ff29bb696ff867e2201567aed3b512663c7d180235 - <none>:<none> sizeBytes: 307705316 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e615166bd1582c2df4d60953988bb46f569e5c8ccf3297c3adeaa0ca2a655589 - <none>:<none> sizeBytes: 305536176 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ba9de202096a20fb2fd5d90c461b25d0f716b10c39f7cc97efdd6e8ecd238288 - <none>:<none> sizeBytes: 290850353 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2bb650de48f0591ee1a96ad49ec1ae7fee6c63ced2841f2bc10a935344339524 - <none>:<none> sizeBytes: 289813489 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9338f63ccf727b4f541148086a26d6cceb73112b1b5335c9deadfdb383a5ba44 - <none>:<none> sizeBytes: 288704566 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:22ed5f9b584b127735cf3885cf68402a99e1eebc9be9ca5232136e9ceb0b0c6e - <none>:<none> sizeBytes: 288032413 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:00b5664ace1f77f884b2f3c74b5672b0024ada16799faf053ff342385bc1af7d - <none>:<none> sizeBytes: 272086341 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1df6aa92e871743fdf3d7531207db99242dcdfe2d7f2aee7ca4e444b98e73789 - <none>:<none> sizeBytes: 269911675 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:be6439d5942d4749fe19c933613e785e087fc89d3588b0fb0a6d438e74602488 - <none>:<none> sizeBytes: 269204107 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3d9537613c6069794d3522dc7c296f1460f3b92bf6227f24d278ab46a2be5110 - <none>:<none> sizeBytes: 259800180 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4272ef33e1dd99f1579d50645cd92c30a06e3e7ec141626b82c7ede1b62299db - <none>:<none> sizeBytes: 254212177 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bba5dc11e50b1bbb5785e5585aad2d49db2e9418bdfc47e569b904e816880c9e - <none>:<none> sizeBytes: 251498500 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2c7d5f372ef8831f843e18098b08364ef4eecd7e3a20feabc9d8bdcc04f94a68 - <none>:<none> sizeBytes: 240643246 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:38a2a4472fb0f56d61cd7f8a269b28c065a87eaf33f13d5df18c88533ded65ce - <none>:<none> sizeBytes: 240182053 - names: - mtr.external.otc.telekomcloud.com/ocp4/mcs-cluster-health@sha256:74540595cd6b7c8a94ffc04e7ef95f084b149ac287046b06c6e25c67c0bb99f7 - mtr.external.otc.telekomcloud.com/ocp4/mcs-cluster-health:v1.0.24 sizeBytes: 146449258 - names: - mtr.external.otc.telekomcloud.com/ocp4/mcs-letsencrypt@sha256:19d8e3a6f4b01240e647561485e81ab1ec37a62c72ec3a7dbd1bbe675f18e79b - mtr.external.otc.telekomcloud.com/ocp4/mcs-letsencrypt:v1.0.16 sizeBytes: 139951428 nodeInfo: architecture: amd64 bootID: 82d3eb6c-2cee-43f8-8045-fe0c109b71c9 containerRuntimeVersion: cri-o://1.13.10-0.1.dev.rhaos4.1.git9e2e1de.el8-dev kernelVersion: 4.18.0-80.4.2.el8_0.x86_64 kubeProxyVersion: v1.13.4+ab8449285 kubeletVersion: v1.13.4+ab8449285 machineID: 951d900f24a84243806a44286a7fc01d operatingSystem: linux osImage: Red Hat Enterprise Linux CoreOS 410.8.20190724.0 (Ootpa) systemUUID: ec2dfcda-dd34-ba52-02ec-1e788553beb9 Expected results: Additional info: --- Additional comment from Florin Peter on 2019-07-30 04:58:05 EDT --- --- Additional comment from Dan Winship on 2019-07-30 06:35:48 EDT --- > 2. Add to each node a additional interface The specific thing that bug 1696628 was fixing was that if you had a *single* interface with multiple IPs, the cloud provider would report the IPs in the correct order, but they would end up listed in the node in a different order. It's possible that there is a different bug for multiple interfaces, where the cloud provider is just not reporting the IPs in the correct order to begin with. Can you get the output (from the node) of: ip link curl -s http://169.254.169.254/latest/meta-data/network/interfaces/macs/ for MAC in $(curl -s http://169.254.169.254/latest/meta-data/network/interfaces/macs/); do curl -s http://169.254.169.254/latest/meta-data/network/interfaces/macs/$MAC/device-number echo ": $MAC" done --- Additional comment from Florin Peter on 2019-07-30 06:43:20 EDT --- Hey Dan, thanks that you pick this up I really appreciate it ;) [root@ip-192-168-53-97 core]# ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 0a:e5:72:94:4e:cc brd ff:ff:ff:ff:ff:ff 3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 0a:21:c9:18:5b:38 brd ff:ff:ff:ff:ff:ff 4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 86:2d:75:71:3a:8b brd ff:ff:ff:ff:ff:ff 5: tun0: <BROADCAST,MULTICAST> mtu 8951 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 1e:fa:a6:8e:03:8c brd ff:ff:ff:ff:ff:ff 6: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000 link/ether d6:c5:31:0b:11:d2 brd ff:ff:ff:ff:ff:ff 7: br0: <BROADCAST,MULTICAST> mtu 8951 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether a6:3c:4e:61:2f:40 brd ff:ff:ff:ff:ff:ff [root@ip-192-168-53-97 core]# curl -s http://169.254.169.254/latest/meta-data/network/interfaces/macs/ 0a:21:c9:18:5b:38/ 0a:e5:72:94:4e:cc/ [root@ip-192-168-53-97 core]# for MAC in $(curl -s http://169.254.169.254/latest/meta-data/network/interfaces/macs/); do > curl -s http://169.254.169.254/latest/meta-data/network/interfaces/macs/$MAC/device-number > echo ": $MAC" > done 1: 0a:21:c9:18:5b:38/ 0: 0a:e5:72:94:4e:cc/ --- Additional comment from Dan Winship on 2019-07-30 07:34:22 EDT --- > 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000 > link/ether 0a:e5:72:94:4e:cc brd ff:ff:ff:ff:ff:ff > 3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000 > link/ether 0a:21:c9:18:5b:38 brd ff:ff:ff:ff:ff:ff > 1: 0a:21:c9:18:5b:38/ > 0: 0a:e5:72:94:4e:cc/ OK, so the AWS cloudprovider code returns IP addresses sorted by interface in the order that http://169.254.169.254/latest/meta-data/network/interfaces/macs/ returns them, which appears to be sorted lexicographically by MAC rather than sorted by device number. So you end up with the ens4 IPs before the ens3 ones. This can be fixed but will require another upstream patch and backport. As a workaround, you could delete and recreate your extra interfaces until you end up with MAC addresses that sort correctly. --- Additional comment from Florin Peter on 2019-07-30 07:44:28 EDT --- Thx for the explanation. Well the workaround is not really helpful for us as we add the additional interfaces automatically. We need the extra interface as this is a security requirement from our security team this means that we are currently blocked by this issue. What do you think is it possible to get this fixed and backported for 4.1? --- Additional comment from Seth Jennings on 2019-07-30 11:02:27 EDT --- aligning component with assignee --- Additional comment from Dan Winship on 2019-07-30 11:37:26 EDT --- not really POST yet; I linked to the upstream bug, not an Origin PR --- Additional comment from Dan Winship on 2019-08-23 11:33:39 EDT --- This needs an upstream commit that hasn't merged yet which will then need to be backported. It's not going to make it in time for 4.2.0. We can get it into a 4.2.z release as soon as it's fixed in master.
oops, no, we're only backporting this to 4.1