Bug 1810574 - NetlinkError: (17, 'File exists') when kuryr-daemon gets restarted in a wrong moment
Summary: NetlinkError: (17, 'File exists') when kuryr-daemon gets restarted in a wrong...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: 4.5.0
Assignee: Michał Dulko
QA Contact: GenadiC
URL:
Whiteboard:
: 1846262 (view as bug list)
Depends On: 1846225
Blocks: 1779163
TreeView+ depends on / blocked
 
Reported: 2020-03-05 14:25 UTC by Michał Dulko
Modified: 2020-08-04 18:04 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 1779163
: 1846225 (view as bug list)
Environment:
Last Closed: 2020-08-04 18:04:16 UTC
Target Upstream Version:


Attachments (Terms of Use)
CNI logs on 4.5.0-0.nightly-2020-05-29-005153 (543.81 KB, application/gzip)
2020-06-02 09:15 UTC, rlobillo
no flags Details


Links
System ID Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 179 None closed Bug 1810574: Nested CNI: Remove interfaces on DEL requests 2020-07-21 09:29:29 UTC
Github openshift kuryr-kubernetes pull 244 None closed Bug 1810574: CNI: Confirm pods in cache before connecting 2020-07-21 09:29:29 UTC
Github openshift kuryr-kubernetes pull 277 None closed [release-4.5] Bug 1810574: CNI: Don't wait for missing pods on DEL 2020-07-21 09:29:29 UTC
Red Hat Product Errata RHBA-2020:2409 None None None 2020-08-04 18:04:19 UTC

Comment 3 Jon Uriarte 2020-04-07 13:31:14 UTC
Failed QA in OCP 4.5.0-0.nightly-2020-04-07-040639 on OSP 16 RHOS_TRUNK-16.0-RHEL-8-20200403.n.1 (with OVS).

The installer fails waiting for the cluster to initialize, and one kury-cni pod remains in crashloop:

$ oc get pods -n openshift-kuryr
NAME                              READY   STATUS    RESTARTS   AGE
kuryr-cni-7sbvw                   1/1     Running   0          5h11m
kuryr-cni-bg2hj                   1/1     Running   57         5h23m
kuryr-cni-nlxld                   1/1     Running   4          5h43m
kuryr-cni-sc922                   1/1     Running   2          5h43m
kuryr-cni-vfb7n                   1/1     Running   2          5h43m
kuryr-cni-vsvnc                   1/1     Running   0          5h11m
kuryr-controller-9dff6456-7kjw9   1/1     Running   1          5h43m


ERROR kuryr_kubernetes.cni.binding.nested [-] Creation of pod interface failed, most likely due to duplicated VLAN id. This will probably cause kuryr-daemon to crashloop. Trying to gat
 pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
ERROR kuryr_kubernetes.cni.binding.nested Traceback (most recent call last):
ERROR kuryr_kubernetes.cni.binding.nested   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/binding/nested.py", line 86, in connect
ERROR kuryr_kubernetes.cni.binding.nested     iface.net_ns_fd = utils.convert_netns(netns) 
ERROR kuryr_kubernetes.cni.binding.nested   File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/transactional.py", line 209, in __exit__
ERROR kuryr_kubernetes.cni.binding.nested     self.commit()
ERROR kuryr_kubernetes.cni.binding.nested   File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 650, in commit
ERROR kuryr_kubernetes.cni.binding.nested     raise newif
ERROR kuryr_kubernetes.cni.binding.nested   File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 589, in commit
ERROR kuryr_kubernetes.cni.binding.nested     self.nl.link('add', **request)
ERROR kuryr_kubernetes.cni.binding.nested   File "/usr/lib/python3.6/site-packages/pyroute2/iproute/linux.py", line 1163, in link
ERROR kuryr_kubernetes.cni.binding.nested     msg_flags=msg_flags)
ERROR kuryr_kubernetes.cni.binding.nested   File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 373, in nlm_request
ERROR kuryr_kubernetes.cni.binding.nested     return tuple(self._genlm_request(*argv, **kwarg))
ERROR kuryr_kubernetes.cni.binding.nested   File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 864, in nlm_request
ERROR kuryr_kubernetes.cni.binding.nested     callback=callback):
ERROR kuryr_kubernetes.cni.binding.nested   File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 376, in get
ERROR kuryr_kubernetes.cni.binding.nested     return tuple(self._genlm_get(*argv, **kwarg))
ERROR kuryr_kubernetes.cni.binding.nested   File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 701, in get
ERROR kuryr_kubernetes.cni.binding.nested     raise msg['header']['error']
ERROR kuryr_kubernetes.cni.binding.nested pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
ERROR kuryr_kubernetes.cni.binding.nested 
ERROR kuryr_kubernetes.cni.binding.nested [-] List of host interfaces: {1: {'address': '00:00:00:00:00:00', 'broadcast': '00:00:00:00:00:00', 'ifname': 'lo', 'mtu': 65536, 'qdisc': 'no 
queue', 'txqlen': 1000, 'operstate': 'UNKNOWN', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 0, 'proto_down': 0, 'gso_max_segs': 65535, 'g
so_max_size': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 0, 'carrier_down_count': 0, 'index': 1, 'flags': 65609, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipaddr': (('::1', 128)
, ('127.0.0.1', 8)), 'ports': (), 'family': 0, 'ifi_type': 772, 'state': 'up', 'neighbours': ('0.0.0.0',)}, 'lo': {'address': '00:00:00:00:00:00', 'broadcast': '00:00:00:00:00:00', 'ifname': 'lo', 'mtu': 65536,
'qdisc': 'noqueue', 'txqlen': 1000, 'operstate': 'UNKNOWN', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 0, 'proto_down': 0, 'gso_max_segs
': 65535, 'gso_max_size': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 0, 'carrier_down_count': 0, 'index': 1, 'flags': 65609, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipaddr': (
('::1', 128), ('127.0.0.1', 8)), 'ports': (), 'family': 0, 'ifi_type': 772, 'state': 'up', 'neighbours': ('0.0.0.0',)}, 2: {'address': 'fa:16:3e:ff:5e:0f', 'broadcast': 'ff:ff:ff:ff:ff:ff', 'ifname': 'ens3', 'mt
u': 1450, 'qdisc': 'fq_codel', 'txqlen': 1000, 'operstate': 'UP', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 2, 'proto_down': 0, 'gso_ma
x_segs': 65535, 'gso_max_size': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 1, 'carrier_down_count': 1, 'index': 2, 'flags': 69699, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipad
dr': (('fe80::a777:cb9f:a473:bc9a', 64), ('10.196.1.28', 16)), 'ports': (), 'family': 0, 'ifi_type': 1, 'state': 'up', 'neighbours': ('10.196.0.12', '10.196.3.50', '10.196.0.102', '10.196.0.11', '224.0.0.22', '1
0.196.0.1', '10.196.0.5', '224.0.0.18', '10.196.0.7', '10.196.1.206', '224.0.0.251')}, 'ens3': {'address': 'fa:16:3e:ff:5e:0f', 'broadcast': 'ff:ff:ff:ff:ff:ff', 'ifname': 'ens3', 'mtu': 1450, 'qdisc': 'fq_codel
', 'txqlen': 1000, 'operstate': 'UP', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 2, 'proto_down': 0, 'gso_max_segs': 65535, 'gso_max_siz
e': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 1, 'carrier_down_count': 1, 'index': 2, 'flags': 69699, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipaddr': (('fe80::a777:cb9f:a473:bc9a', 64), ('10.196.1.28', 16)), 'ports': (), 'family': 0, 'ifi_type': 1, 'state': 'up', 'neighbours': ('10.196.0.12', '10.196.3.50', '10.196.0.102', '10.196.0.11', '224.0.0.22', '10.196.0.1', '10.196.0.5', '2
24.0.0.18', '10.196.0.7', '10.196.1.206', '224.0.0.251')}}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
ERROR kuryr_kubernetes.cni.binding.nested [-] List of pod namespace interfaces: {1: {'address': '00:00:00:00:00:00', 'broadcast': '00:00:00:00:00:00', 'ifname': 'lo', 'mtu': 65536, 'qd
isc': 'noqueue', 'txqlen': 1000, 'operstate': 'UNKNOWN', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 0, 'proto_down': 0, 'gso_max_segs':
65535, 'gso_max_size': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 0, 'carrier_down_count': 0, 'index': 1, 'flags': 65609, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipaddr': ((':
:1', 128), ('127.0.0.1', 8)), 'ports': (), 'family': 0, 'ifi_type': 772, 'state': 'up', 'neighbours': ()}, 'lo': {'address': '00:00:00:00:00:00', 'broadcast': '00:00:00:00:00:00', 'ifname': 'lo', 'mtu': 65536, '
qdisc': 'noqueue', 'txqlen': 1000, 'operstate': 'UNKNOWN', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 0, 'proto_down': 0, 'gso_max_segs'
: 65535, 'gso_max_size': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 0, 'carrier_down_count': 0, 'index': 1, 'flags': 65609, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipaddr': ((
'::1', 128), ('127.0.0.1', 8)), 'ports': (), 'family': 0, 'ifi_type': 772, 'state': 'up', 'neighbours': ()}}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
 ERROR kuryr_kubernetes.cni.daemon.service [-] Error when processing addNetwork request. CNI Params: {'CNI_IFNAME': 'eth0', 'CNI_NETNS': '/proc/134124/ns/net', 'CNI_PATH': '/opt/multus/
bin:/var/lib/cni/bin', 'CNI_COMMAND': 'ADD', 'CNI_CONTAINERID': '97359d9fb8a375cb2d0dcb86baa6981366363d04f912b910a3c3b4308baf1396', 'CNI_ARGS': 'IgnoreUnknown=true;K8S_POD_NAMESPACE=openshift-monitoring;K8S_POD_
NAME=thanos-querier-6ccdccb756-znrtz;K8S_POD_INFRA_CONTAINER_ID=97359d9fb8a375cb2d0dcb86baa6981366363d04f912b910a3c3b4308baf1396'}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')

Comment 5 rlobillo 2020-06-02 09:15:59 UTC
Created attachment 1694358 [details]
CNI logs on 4.5.0-0.nightly-2020-05-29-005153

Netlink errors observed

Comment 6 rlobillo 2020-06-02 09:19:25 UTC
Failed QA in 4.5.0-0.nightly-2020-05-29-005153 with OSP13 2020-05-19.2. 

Installation is OK. Errors are observed while running NP tests. CNI logs Attached at Comment #5.

Comment 7 Luis Tomas Bolivar 2020-06-11 08:46:50 UTC
*** Bug 1846262 has been marked as a duplicate of this bug. ***

Comment 9 Wei Sun 2020-06-17 07:36:39 UTC
Today we set up the OCP 4.5 on OSP with kuryr successfully against 4.5.0-0.nightly-2020-06-16-205659, move the bug to VERIFIED.

Comment 11 errata-xmlrpc 2020-08-04 18:04:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.