Bug 1929170 - kuryr-cni pods in crashloop after updating OCP due to RuntimeError caused by attempting to delete eth0 host interface
Summary: kuryr-cni pods in crashloop after updating OCP due to RuntimeError caused by ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 3.11.z
Assignee: Michał Dulko
QA Contact: Itzik Brown
URL:
Whiteboard:
Depends On:
Blocks: 1873346
TreeView+ depends on / blocked
 
Reported: 2021-02-16 10:49 UTC by Itzik Brown
Modified: 2021-03-03 12:28 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-03 12:28:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 456 0 None open Bug 1929170: CNI: Protect from '' being passed as CNI_NETNS 2021-02-16 10:54:08 UTC
Red Hat Product Errata RHSA-2021:0637 0 None None None 2021-03-03 12:28:12 UTC

Description Itzik Brown 2021-02-16 10:49:40 UTC
Description of problem:
When updating from v3.11.346 to v3.11.386 I got the following:

(shiftstack) [stack@undercloud-0 ~]$ oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
demo-68dbc445d-8dt5m       1/1       Running   0          7h
demo-68dbc445d-cw8p5       1/1       Running   0          7h
demo-68dbc445d-nrfxt       0/1       Error     0          7h
docker-registry-1-cm2wk    1/1       Running   0          8h
registry-console-1-h2lv9   0/1       Error     0          8h
router-1-8mkt2             1/1       Running   0          8h
router-1-9mtbp             1/1       Running   0          8h
router-1-bkcjf             1/1       Running   0          8h

and 
(shiftstack) [stack@undercloud-0 ~]$ oc get pods -n kuryr
NAME                                READY     STATUS             RESTARTS   AGE
kuryr-cni-ds-4g78t                  1/2       CrashLoopBackOff   21         1h
kuryr-cni-ds-565df                  2/2       Running            0          8h
kuryr-cni-ds-7gm75                  1/2       CrashLoopBackOff   19         1h
kuryr-cni-ds-j4nrl                  2/2       Running            0          8h
kuryr-cni-ds-jqt4j                  1/2       CrashLoopBackOff   23         1h
kuryr-cni-ds-l99xw                  2/2       Running            0          8h
kuryr-cni-ds-n5n8h                  2/2       Running            0          8h
kuryr-cni-ds-q9fr7                  2/2       Running            0          8h
kuryr-controller-74c988b946-tldhv   0/1       Running            21         1h

demo pods were created before the update 
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Michał Dulko 2021-02-16 10:53:16 UTC
Here's the kuryr-cni log indicating the problem:

2021-02-16 09:46:13.310 176 INFO werkzeug [-] 127.0.0.1 - - [16/Feb/2021 09:46:13] "POST /delNetwork HTTP/1.1" 500 -
2021-02-16 09:46:19.255 181 WARNING kuryr_kubernetes.cni.binding.base [-] Found hanging interface eth0 inside  netns. Most likely it is a leftover from a kuryr-daemon restart. Trying to delet
e it.
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service [-] Error when processing delNetwork request. CNI Params: {'CNI_IFNAME': u'eth0', 'CNI_NETNS': u'', 'CNI_PATH': u'/opt/cn
i/bin', 'CNI_ARGS': u'IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-console;K8S_POD_NAME=console-65bdb545-j57hr;K8S_POD_INFRA_CONTAINER_ID=9c9a87b6f17a7e24df4b0089febb8e7da76656e7e0874c61ae3a31
0c12fc6269', 'CNI_COMMAND': u'DEL', 'CNI_CONTAINERID': u'9c9a87b6f17a7e24df4b0089febb8e7da76656e7e0874c61ae3a310c12fc6269'}.: RuntimeError
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service Traceback (most recent call last):
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/daemon/service.py", line 103, in delete
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service     self.plugin.delete(params)
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/plugins/k8s_cni_registry.py", line 127, in delete
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service     self._do_work(params, b_base.disconnect, 5)
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/plugins/k8s_cni_registry.py", line 179, in _do_work
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service     container_id=params.CNI_CONTAINERID)
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/binding/base.py", line 128, in disconnect
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service     driver.disconnect(vif, ifname, netns, container_id)
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/binding/nested.py", line 109, in disconnect
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service     self._remove_ifaces(c_ipdb, (vif.vif_name, ifname), netns)
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/cni/binding/base.py", line 49, in _remove_ifaces
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service     iface.remove()
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python2.7/site-packages/pyroute2/ipdb/transactional.py", line 209, in __exit__
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service     self.commit()
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python2.7/site-packages/pyroute2/ipdb/interfaces.py", line 1078, in commit
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service     raise error
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service RuntimeError
2021-02-16 09:46:19.306 181 ERROR kuryr_kubernetes.cni.daemon.service

Comment 5 Itzik Brown 2021-02-22 20:22:27 UTC
When updating from v3.11.346 to v3.11.394 all kuryr pods seems to be ok 

(shiftstack) [stack@undercloud-0 ~]$ oc get pods -n kuryr
NAME                                READY     STATUS    RESTARTS   AGE
kuryr-cni-ds-8mrqb                  2/2       Running   0          1h
kuryr-cni-ds-8sg4j                  2/2       Running   0          1h
kuryr-cni-ds-m7hgw                  2/2       Running   0          1h
kuryr-cni-ds-pkkd7                  2/2       Running   0          1h
kuryr-cni-ds-r9ncs                  2/2       Running   0          1h
kuryr-cni-ds-vrtcn                  2/2       Running   0          1h
kuryr-cni-ds-w978z                  2/2       Running   0          1h
kuryr-cni-ds-z5dk6                  2/2       Running   0          1h
kuryr-controller-6bf6f8958f-8lbb8   1/1       Running   0          1h
(shiftstack) [stack@undercloud-0 ~]$ oc version
oc v3.11.388
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://console.openshift.example.com:8443
openshift v3.11.394
kubernetes v1.11.0+d4cacc0

Comment 7 errata-xmlrpc 2021-03-03 12:28:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 3.11.394 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0637


Note You need to log in before you can comment on or make changes to this bug.