Bug 1921184 - kuryr-cni binds to wrong interface on machine with two interfaces
Summary: kuryr-cni binds to wrong interface on machine with two interfaces
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6.z
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.8.0
Assignee: Michał Dulko
QA Contact: GenadiC
URL:
Whiteboard:
Depends On:
Blocks: 1928028
TreeView+ depends on / blocked
 
Reported: 2021-01-27 17:14 UTC by Martin Eggen
Modified: 2021-07-27 22:37 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:36:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
MachineSet configuration (3.97 KB, text/plain)
2021-01-27 17:14 UTC, Martin Eggen
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 969 0 None closed Bug 1921184: Kuryr: Let Kuryr autodetect primary CNI interface 2021-02-15 23:02:30 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:37:15 UTC

Description Martin Eggen 2021-01-27 17:14:49 UTC
Created attachment 1751320 [details]
MachineSet configuration

Description of problem:

Environment is RHOSP 16.1, OCP 4.6.12 with IPI deployment. 

When adding a custom MachineSet with a secondary network attached, we would see inconsistent interface naming, and kuryr-cni would pick the wrong interface to bind to.

kuryr.conf would always contain
link_iface = ens3

also when our interfaces were swapped and the secondary network was bound ens3 while the primary network was bound to ens4.

Version-Release number of selected component (if applicable):

OCP 4.6.12

How reproducible:

Sometimes

Steps to Reproduce:
1.
Adding a machineset with the below network config in the providerSpec:
$ yq e '.spec.template.spec.providerSpec.value.networks' ocp-machineset.yaml
- filter: {}
  subnets:
    - filter:
        name: shiftstack-kbw6f-nodes
        tags: openshiftClusterID=shiftstack-kbw6f
    - filter:
        name: data-subnet

We observed inconsistent interface ordering, and kuryr-cni binds always to the first interface (ens3) also when this would be bound to our secondary network.

Actual results:
kuryr-cni would always select ens3 for its link_iface.

Expected results:

kuryr-cni should select the interface bound to the primary network used by the kubelet.

Additional info:

Changing our network configuration in the MachineSet to the following value would give us consistent interface ordering with the secondary network always bound to the secondary interface ens4 and we have correct operation from kuryr-cni.

$ yq e '.spec.template.spec.providerSpec.value.networks' ocp-machineset-consistent.yaml
- filter: {}
  subnets:
    - filter:
        name: shiftstack-kbw6f-nodes
        tags: openshiftClusterID=shiftstack-kbw6f
- filter: {}
  subnets:
    - filter:
        name: data-subnet

Comment 5 rlobillo 2021-02-23 12:31:10 UTC
Verified on OCP4.8.0-0.nightly-2021-02-21-102854 over OSP13 (2021-01-20.1) with amphora provider.

link_iface attribute is not harcoded on kuryr.conf anymore on kuryr-cni pod running on the worker.

$ oc get nodes -o wide | grep data
ostest-dzghr-data-0-d7rzc     Ready    worker   30m    v1.20.0+01ab7fd   172.16.40.235   <none>        Red Hat Enterprise Linux CoreOS 48.83.202102200501-0 (Ootpa)   4.18.0-240.15.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitb422fc2.el8.54

$ ssh core.40.235
[core@ostest-dzghr-data-0-d7rzc ~]$ sudo crictl ps -a | grep kuryr-cni
eead0d2fc54a7       b564c3aa454c05132272774d626c5a99e6c22ab189b2c405ccfbbeef59d97c48                                                         29 minutes ago      Running             kuryr-cni                        0                   432c957f5714e

[core@ostest-dzghr-data-0-d7rzc ~]$ sudo crictl exec -it eead0d2fc54a7 cat /etc/kuryr/kuryr.conf | grep link_iface
[core@ostest-dzghr-data-0-d7rzc ~]$ 


Furthermore, kuryr-tempest tests, NP tests and conformance tests
passed for this build. Please refer to the attachment on 
https://bugzilla.redhat.com/show_bug.cgi?id=1927244#c6

Comment 8 errata-xmlrpc 2021-07-27 22:36:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.