Created attachment 1751320 [details] MachineSet configuration Description of problem: Environment is RHOSP 16.1, OCP 4.6.12 with IPI deployment. When adding a custom MachineSet with a secondary network attached, we would see inconsistent interface naming, and kuryr-cni would pick the wrong interface to bind to. kuryr.conf would always contain link_iface = ens3 also when our interfaces were swapped and the secondary network was bound ens3 while the primary network was bound to ens4. Version-Release number of selected component (if applicable): OCP 4.6.12 How reproducible: Sometimes Steps to Reproduce: 1. Adding a machineset with the below network config in the providerSpec: $ yq e '.spec.template.spec.providerSpec.value.networks' ocp-machineset.yaml - filter: {} subnets: - filter: name: shiftstack-kbw6f-nodes tags: openshiftClusterID=shiftstack-kbw6f - filter: name: data-subnet We observed inconsistent interface ordering, and kuryr-cni binds always to the first interface (ens3) also when this would be bound to our secondary network. Actual results: kuryr-cni would always select ens3 for its link_iface. Expected results: kuryr-cni should select the interface bound to the primary network used by the kubelet. Additional info: Changing our network configuration in the MachineSet to the following value would give us consistent interface ordering with the secondary network always bound to the secondary interface ens4 and we have correct operation from kuryr-cni. $ yq e '.spec.template.spec.providerSpec.value.networks' ocp-machineset-consistent.yaml - filter: {} subnets: - filter: name: shiftstack-kbw6f-nodes tags: openshiftClusterID=shiftstack-kbw6f - filter: {} subnets: - filter: name: data-subnet
Seems to be hard-coded here https://github.com/openshift/cluster-network-operator/blob/master/bindata/network/kuryr/003-config.yaml#L16
Verified on OCP4.8.0-0.nightly-2021-02-21-102854 over OSP13 (2021-01-20.1) with amphora provider. link_iface attribute is not harcoded on kuryr.conf anymore on kuryr-cni pod running on the worker. $ oc get nodes -o wide | grep data ostest-dzghr-data-0-d7rzc Ready worker 30m v1.20.0+01ab7fd 172.16.40.235 <none> Red Hat Enterprise Linux CoreOS 48.83.202102200501-0 (Ootpa) 4.18.0-240.15.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitb422fc2.el8.54 $ ssh core.40.235 [core@ostest-dzghr-data-0-d7rzc ~]$ sudo crictl ps -a | grep kuryr-cni eead0d2fc54a7 b564c3aa454c05132272774d626c5a99e6c22ab189b2c405ccfbbeef59d97c48 29 minutes ago Running kuryr-cni 0 432c957f5714e [core@ostest-dzghr-data-0-d7rzc ~]$ sudo crictl exec -it eead0d2fc54a7 cat /etc/kuryr/kuryr.conf | grep link_iface [core@ostest-dzghr-data-0-d7rzc ~]$ Furthermore, kuryr-tempest tests, NP tests and conformance tests passed for this build. Please refer to the attachment on https://bugzilla.redhat.com/show_bug.cgi?id=1927244#c6
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438