Bug 2072439

Summary: openshift-cloud-network-config-controller reports wrong range of IP addresses for Azure worker nodes
Product: OpenShift Container Platform Reporter: Andreas Karis <akaris>
Component: NetworkingAssignee: Andreas Karis <akaris>
Networking sub component: ovn-kubernetes QA Contact: jechen <jechen>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: jechen
Version: 4.11   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:04:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2073475, 2075444    

Description Andreas Karis 2022-04-06 10:16:45 UTC
Description of problem:
openshift-cloud-network-config-controller reports wrong range of IP addresses for Azure worker nodes

Version-Release number of selected component (if applicable):
nightly latest

~~~
[akaris@linux origin (egressip-tests-option3)]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-04-01-172551   True        False         79m     Cluster version is 4.11.0-0.nightly-2022-04-01-172551
~~~

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

The range in annotation ` cloud.network.openshift.io/egress-ipconfig` should be considered a guidance for administrators. However, on Azure, we report the master subnet instead of the worker subnet:
~~~
[akaris@linux origin (egressip-tests-option3)]$ oc get nodes -o yaml ci-ln-9yqgdmt-1d09d-m9kx7-worker-eastus21-vwvpc | grep cloud
    cloud.network.openshift.io/egress-ipconfig: '[{"interface":"ci-ln-9yqgdmt-1d09d-m9kx7-worker-eastus21-vwvpc-nic","ifaddr":{"ipv4":"10.0.0.0/16"},"capacity":{"ip":255}}]'
[akaris@linux origin (egressip-tests-option3)]$ oc get nodes -o wide
NAME                                              STATUS   ROLES    AGE    VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                        KERNEL-VERSION                 CONTAINER-RUNTIME
ci-ln-9yqgdmt-1d09d-m9kx7-master-0                Ready    master   98m    v1.23.3+54654d2   10.0.0.7      <none>        Red Hat Enterprise Linux CoreOS 411.85.202203242008-0 (Ootpa)   4.18.0-348.20.1.el8_5.x86_64   cri-o://1.24.0-5.rhaos4.11.gitd020fdb.el8
ci-ln-9yqgdmt-1d09d-m9kx7-master-1                Ready    master   101m   v1.23.3+54654d2   10.0.0.8      <none>        Red Hat Enterprise Linux CoreOS 411.85.202203242008-0 (Ootpa)   4.18.0-348.20.1.el8_5.x86_64   cri-o://1.24.0-5.rhaos4.11.gitd020fdb.el8
ci-ln-9yqgdmt-1d09d-m9kx7-master-2                Ready    master   101m   v1.23.3+54654d2   10.0.0.6      <none>        Red Hat Enterprise Linux CoreOS 411.85.202203242008-0 (Ootpa)   4.18.0-348.20.1.el8_5.x86_64   cri-o://1.24.0-5.rhaos4.11.gitd020fdb.el8
ci-ln-9yqgdmt-1d09d-m9kx7-worker-eastus21-vwvpc   Ready    worker   87m    v1.23.3+54654d2   10.0.128.4    <none>        Red Hat Enterprise Linux CoreOS 411.85.202203242008-0 (Ootpa)   4.18.0-348.20.1.el8_5.x86_64   cri-o://1.24.0-5.rhaos4.11.gitd020fdb.el8
ci-ln-9yqgdmt-1d09d-m9kx7-worker-eastus22-s2gfc   Ready    worker   87m    v1.23.3+54654d2   10.0.128.5    <none>        Red Hat Enterprise Linux CoreOS 411.85.202203242008-0 (Ootpa)   4.18.0-348.20.1.el8_5.x86_64   cri-o://1.24.0-5.rhaos4.11.gitd020fdb.el8
ci-ln-9yqgdmt-1d09d-m9kx7-worker-eastus23-cdq5r   Ready    worker   80m    v1.23.3+54654d2   10.0.128.6    <none>        Red Hat Enterprise Linux CoreOS 411.85.202203242008-0 (Ootpa)   4.18.0-348.20.1.el8_5.x86_64   cri-o://1.24.0-5.rhaos4.11.gitd020fdb.el8
~~~

In turn, when picking an EgressIP from that subnet, Azure obviously doesn't like the request for the address in the master subnet:
~~~
[akaris@linux origin (egressip-tests-option3)]$ oc logs -n openshift-cloud-network-config-controller          cloud-network-config-controller-65cc8949f8-wfb9v | tail -1
E0406 10:09:50.739630       1 controller.go:165] error syncing '10.0.0.4': error assigning CloudPrivateIPConfig: "10.0.0.4" to node: "ci-ln-9yqgdmt-1d09d-m9kx7-worker-eastus23-cdq5r", err: network.InterfacesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="PrivateIPAddressNotInSubnet" Message="Private static IP address 10.0.0.4 does not belong to the range of subnet prefix 10.0.128.0/17." Details=[], requeuing in cloud-private-ip-config workqueue
~~~

Comment 4 ffernand 2022-04-18 18:44:15 UTC
*** Bug 2073180 has been marked as a duplicate of this bug. ***

Comment 7 jechen 2022-04-19 22:48:43 UTC
Verified in 4.11.0-0.nightly-2022-04-16-163450

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-04-16-163450   True        False         19m     Cluster version is 4.11.0-0.nightly-2022-04-16-163450

$ oc get nodes -o yaml jechen-0419b-22ljj-worker-northcentralus-1 | grep cloud
    cloud.network.openshift.io/egress-ipconfig: '[{"interface":"jechen-0419b-22ljj-worker-northcentralus-1-nic","ifaddr":{"ipv4":"10.0.1.0/24"},"capacity":{"ip":255}}]'


$ oc get -n openshift-cloud-network-config-controller all
NAME                                                   READY   STATUS    RESTARTS   AGE
pod/cloud-network-config-controller-69c955cb75-5r99z   1/1     Running   0          45m

NAME                                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cloud-network-config-controller   1/1     1            1           45m

NAME                                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/cloud-network-config-controller-69c955cb75   1         1         1       45m


$ oc logs -n openshift-cloud-network-config-controller cloud-network-config-controller-69c955cb75-5r99z | tail -10
I0419 21:36:52.427548       1 node_controller.go:106] Setting annotation: 'cloud.network.openshift.io/egress-ipconfig: [{"interface":"jechen-0419b-22ljj-master-2-nic","ifaddr":{"ipv4":"10.0.0.0/24"},"capacity":{"ip":255}}]' on node: jechen-0419b-22ljj-master-2
I0419 21:36:52.432235       1 controller.go:160] Dropping key 'jechen-0419b-22ljj-master-0' from the node workqueue
I0419 21:36:52.443103       1 controller.go:160] Dropping key 'jechen-0419b-22ljj-master-1' from the node workqueue
I0419 21:36:52.491550       1 controller.go:160] Dropping key 'jechen-0419b-22ljj-master-2' from the node workqueue
I0419 21:47:48.397719       1 controller.go:182] Assigning key: jechen-0419b-22ljj-worker-northcentralus-1 to node workqueue
I0419 21:47:48.655711       1 node_controller.go:106] Setting annotation: 'cloud.network.openshift.io/egress-ipconfig: [{"interface":"jechen-0419b-22ljj-worker-northcentralus-1-nic","ifaddr":{"ipv4":"10.0.1.0/24"},"capacity":{"ip":255}}]' on node: jechen-0419b-22ljj-worker-northcentralus-1
I0419 21:47:48.682529       1 controller.go:160] Dropping key 'jechen-0419b-22ljj-worker-northcentralus-1' from the node workqueue
I0419 21:48:59.986957       1 controller.go:182] Assigning key: jechen-0419b-22ljj-worker-northcentralus-2 to node workqueue
I0419 21:49:00.146596       1 node_controller.go:106] Setting annotation: 'cloud.network.openshift.io/egress-ipconfig: [{"interface":"jechen-0419b-22ljj-worker-northcentralus-2-nic","ifaddr":{"ipv4":"10.0.1.0/24"},"capacity":{"ip":255}}]' on node: jechen-0419b-22ljj-worker-northcentralus-2
I0419 21:49:00.176475       1 controller.go:160] Dropping key 'jechen-0419b-22ljj-worker-northcentralus-2' from the node workqueue



$ oc logs -n openshift-cloud-network-config-controller  -l app=cloud-network-config-controller -f
I0419 21:36:52.427548       1 node_controller.go:106] Setting annotation: 'cloud.network.openshift.io/egress-ipconfig: [{"interface":"jechen-0419b-22ljj-master-2-nic","ifaddr":{"ipv4":"10.0.0.0/24"},"capacity":{"ip":255}}]' on node: jechen-0419b-22ljj-master-2
I0419 21:36:52.432235       1 controller.go:160] Dropping key 'jechen-0419b-22ljj-master-0' from the node workqueue
I0419 21:36:52.443103       1 controller.go:160] Dropping key 'jechen-0419b-22ljj-master-1' from the node workqueue
I0419 21:36:52.491550       1 controller.go:160] Dropping key 'jechen-0419b-22ljj-master-2' from the node workqueue
I0419 21:47:48.397719       1 controller.go:182] Assigning key: jechen-0419b-22ljj-worker-northcentralus-1 to node workqueue
I0419 21:47:48.655711       1 node_controller.go:106] Setting annotation: 'cloud.network.openshift.io/egress-ipconfig: [{"interface":"jechen-0419b-22ljj-worker-northcentralus-1-nic","ifaddr":{"ipv4":"10.0.1.0/24"},"capacity":{"ip":255}}]' on node: jechen-0419b-22ljj-worker-northcentralus-1
I0419 21:47:48.682529       1 controller.go:160] Dropping key 'jechen-0419b-22ljj-worker-northcentralus-1' from the node workqueue
I0419 21:48:59.986957       1 controller.go:182] Assigning key: jechen-0419b-22ljj-worker-northcentralus-2 to node workqueue
I0419 21:49:00.146596       1 node_controller.go:106] Setting annotation: 'cloud.network.openshift.io/egress-ipconfig: [{"interface":"jechen-0419b-22ljj-worker-northcentralus-2-nic","ifaddr":{"ipv4":"10.0.1.0/24"},"capacity":{"ip":255}}]' on node: jechen-0419b-22ljj-worker-northcentralus-2
I0419 21:49:00.176475       1 controller.go:160] Dropping key 'jechen-0419b-22ljj-worker-northcentralus-2' from the node workqueue


$ oc get node
NAME                                         STATUS   ROLES    AGE   VERSION
jechen-0419b-22ljj-master-0                  Ready    master   65m   v1.23.3+54654d2
jechen-0419b-22ljj-master-1                  Ready    master   65m   v1.23.3+54654d2
jechen-0419b-22ljj-master-2                  Ready    master   65m   v1.23.3+54654d2
jechen-0419b-22ljj-worker-northcentralus-1   Ready    worker   50m   v1.23.3+54654d2
jechen-0419b-22ljj-worker-northcentralus-2   Ready    worker   49m   v1.23.3+54654d2


$ oc patch hostsubnet  jechen-0419b-22ljj-worker-northcentralus-1 --type=merge -p '{"egressCIDRs":["10.0.1.0/24"]}'
hostsubnet.network.openshift.io/jechen-0419b-22ljj-worker-northcentralus-1 patched

$ oc new-project test


$ oc patch netnamespace test --type=merge -p '{"egressIPs":["10.0.1.101"]}'
netnamespace.network.openshift.io/test patched

created some test pod

$ oc get pod -n test
NAME                READY   STATUS              RESTARTS   AGE
pod/test-rc-tqrg9   0/1     ContainerCreating   0          5s
pod/test-rc-xmcr4   0/1     ContainerCreating   0          5s
pod/test-rc-zjplg   0/1     ContainerCreating   0          6s


$ oc rsh test-rc-tqrg9
~ $ curl 10.0.99.4:9152
10.0.1.101~ $ 
~ $ 
~ $ curl 10.0.99.4:9152
10.0.1.101~ $ 
~ $ curl 10.0.99.4:9152
10.0.1.101~ $ 


egressIP works correctly.

Comment 9 errata-xmlrpc 2022-08-10 11:04:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069