Bug 1786298
Summary: | NetlinkError: (17, 'File exists') when kuryr-daemon gets restarted in a wrong moment | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Luis Tomas Bolivar <ltomasbo> | ||||
Component: | Networking | Assignee: | Michał Dulko <mdulko> | ||||
Networking sub component: | kuryr | QA Contact: | GenadiC <gcheresh> | ||||
Status: | CLOSED CURRENTRELEASE | Docs Contact: | |||||
Severity: | medium | ||||||
Priority: | medium | CC: | chaoyang, gcheresh, itbrown, juriarte, mdulko, rlobillo, scuppett, vlaad | ||||
Version: | 4.2.0 | Keywords: | Reopened | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.3.z | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 1779163 | Environment: | |||||
Last Closed: | 2021-01-20 14:50:26 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1779163 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Luis Tomas Bolivar
2019-12-24 09:45:23 UTC
*** Bug 1786218 has been marked as a duplicate of this bug. *** Hi, Please add information about how to verify. Taking a look at the kuryr cni pod logs, while for instance running the OCP QE tests. And then check that those errors are now not there even if the kuryr-cni pods gets restarted Hi I reproduced this issue on 4.3.0-0.nightly-2020-01-06-185654 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service [-] Error when processing addNetwork request. CNI Params: {'CNI_IFNAME': 'eth0', 'CNI_NETNS': '/proc/1909009/ns/net', 'CNI_PATH': '/opt/multus/bin:/var/lib/cni/bin', 'CNI_COMMAND': 'ADD', 'CNI_CONTAINERID': '4de18b6bcd163bea42cae9f619052080aad749474228630827cd3edaa2bf532b', 'CNI_ARGS': 'IgnoreUnknown=true;K8S_POD_NAMESPACE=openshift-kuryr;K8S_POD_NAME=iscsi-target;K8S_POD_INFRA_CONTAINER_ID=4de18b6bcd163bea42cae9f619052080aad749474228630827cd3edaa2bf532b'}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists') 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service Traceback (most recent call last): 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/daemon/service.py", line 81, in add 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service vif = self.plugin.add(params) 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/plugins/k8s_cni_registry.py", line 51, in add 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service vifs = self._do_work(params, b_base.connect) 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/plugins/k8s_cni_registry.py", line 155, in _do_work 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service container_id=params.CNI_CONTAINERID) 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/binding/base.py", line 143, in connect 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service driver.connect(vif, ifname, netns, container_id) 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/binding/nested.py", line 70, in connect 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service iface.net_ns_fd = utils.convert_netns(netns) 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/transactional.py", line 209, in __exit__ 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service self.commit() 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 649, in commit 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service raise newif 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 588, in commit 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service self.nl.link('add', **request) 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/iproute/linux.py", line 1115, in link 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service msg_flags=msg_flags) 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 352, in nlm_request 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service return tuple(self._genlm_request(*argv, **kwarg)) 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 836, in nlm_request 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service callback=callback): 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 355, in get 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service return tuple(self._genlm_get(*argv, **kwarg)) 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 679, in get 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service raise msg['header']['error'] 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists') 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service 2020-01-08 09:03:24.667 37 INFO werkzeug [-] 127.0.0.1 - - [08/Jan/2020 09:03:24] "POST /addNetwork HTTP/1.1" 500 - 2020-01-08 09:03:24.696 52 INFO os_vif [-] Successfully unplugged vif VIFVlanNested(active=True,address=fa:16:3e:54:cf:9d,has_traffic_filtering=False,id=03464583-05be-4cfd-bbef-ecc485c3658c,network=Network(d049a1a6-bb45-411d-b5e1-124a17dd301b),plugin='noop',port_profile=<?>,preserve_on_delete=False,vif_name='tap03464583-05',vlan_id=3386) 2020-01-08 09:03:24.703 52 INFO werkzeug [-] 127.0.0.1 - - [08/Jan/2020 09:03:24] "POST /delNetwork HTTP/1.1" 204 - 2020-01-08 09:03:24.766 54 INFO os_vif [-] Successfully unplugged vif VIFVlanNested(active=True,address=fa:16:3e:54:cf:9d,has_traffic_filtering=False,id=03464583-05be-4cfd-bbef-ecc485c3658c,network=Network(d049a1a6-bb45-411d-b5e1-124a17dd301b),plugin='noop',port_profile=<?>,preserve_on_delete=False,vif_name='tap03464583-05',vlan_id=3386) 2020-01-08 09:03:24.773 54 INFO werkzeug [-] 127.0.0.1 - - [08/Jan/2020 09:03:24] "POST /delNetwork HTTP/1.1" 204 - 2020-01-08 09:03:27.887 31 INFO kuryr_kubernetes.cni.health [-] CNI driver readiness verified. 2020-01-08 09:03:27.887 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 09:03:27] "GET /ready HTTP/1.1" 200 - 2020-01-08 09:03:32.648 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 09:03:32] "GET /alive HTTP/1.1" 200 - 2020-01-08 09:03:37.880 31 INFO kuryr_kubernetes.cni.health [-] CNI driver readiness verified. 2020-01-08 09:03:37.880 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 09:03:37] "GET /ready HTTP/1.1" 200 - 2020-01-08 09:03:42.649 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 09:03:42] "GET /alive HTTP/1.1" 200 - 2020-01-08 09:03:43.604 66 INFO os_vif [-] Successfully plugged vif VIFVlanNested(active=True,address=fa:16:3e:54:cf:9d,has_traffic_filtering=False,id=03464583-05be-4cfd-bbef-ecc485c3658c,network=Network(d049a1a6-bb45-411d-b5e1-124a17dd301b),plugin='noop',port_profile=<?>,preserve_on_delete=False,vif_name='tap03464583-05',vlan_id=3386) 2020-01-08 09:03:47.877 31 INFO kuryr_kubernetes.cni.health [-] CNI driver readiness verified. 2020-01-08 09:03:47.878 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 09:03:47] "GET /ready HTTP/1.1" 200 - 2020-01-08 09:03:52.649 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 09:03:52] "GET /alive HTTP/1.1" 200 - (In reply to Chao Yang from comment #5) > Hi > > I reproduced this issue on > > 4.3.0-0.nightly-2020-01-06-185654 > > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service [-] > Error when processing addNetwork request. CNI Params: {'CNI_IFNAME': 'eth0', > 'CNI_NETNS': '/proc/1909009/ns/net', 'CNI_PATH': > '/opt/multus/bin:/var/lib/cni/bin', 'CNI_COMMAND': 'ADD', 'CNI_CONTAINERID': > '4de18b6bcd163bea42cae9f619052080aad749474228630827cd3edaa2bf532b', > 'CNI_ARGS': > 'IgnoreUnknown=true;K8S_POD_NAMESPACE=openshift-kuryr;K8S_POD_NAME=iscsi- > target; > K8S_POD_INFRA_CONTAINER_ID=4de18b6bcd163bea42cae9f619052080aad749474228630827 > cd3edaa2bf532b'}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File > exists') > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > Traceback (most recent call last): > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/daemon/service.py", > line 81, in add > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service vif > = self.plugin.add(params) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/plugins/ > k8s_cni_registry.py", line 51, in add > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > vifs = self._do_work(params, b_base.connect) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/plugins/ > k8s_cni_registry.py", line 155, in _do_work > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > container_id=params.CNI_CONTAINERID) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/binding/base.py", > line 143, in connect > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > driver.connect(vif, ifname, netns, container_id) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/binding/nested.py", > line 70, in connect > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > iface.net_ns_fd = utils.convert_netns(netns) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/pyroute2/ipdb/transactional.py", line 209, > in __exit__ > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > self.commit() > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 649, in > commit > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > raise newif > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 588, in > commit > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > self.nl.link('add', **request) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/pyroute2/iproute/linux.py", line 1115, in > link > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > msg_flags=msg_flags) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 352, > in nlm_request > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > return tuple(self._genlm_request(*argv, **kwarg)) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 836, > in nlm_request > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > callback=callback): > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 355, > in get > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > return tuple(self._genlm_get(*argv, **kwarg)) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 679, > in get > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > raise msg['header']['error'] > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists') > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > 2020-01-08 09:03:24.667 37 INFO werkzeug [-] 127.0.0.1 - - [08/Jan/2020 > 09:03:24] "POST /addNetwork HTTP/1.1" 500 - > 2020-01-08 09:03:24.696 52 INFO os_vif [-] Successfully unplugged vif > VIFVlanNested(active=True,address=fa:16:3e:54:cf:9d, > has_traffic_filtering=False,id=03464583-05be-4cfd-bbef-ecc485c3658c, > network=Network(d049a1a6-bb45-411d-b5e1-124a17dd301b),plugin='noop', > port_profile=<?>,preserve_on_delete=False,vif_name='tap03464583-05', > vlan_id=3386) > 2020-01-08 09:03:24.703 52 INFO werkzeug [-] 127.0.0.1 - - [08/Jan/2020 > 09:03:24] "POST /delNetwork HTTP/1.1" 204 - > 2020-01-08 09:03:24.766 54 INFO os_vif [-] Successfully unplugged vif > VIFVlanNested(active=True,address=fa:16:3e:54:cf:9d, > has_traffic_filtering=False,id=03464583-05be-4cfd-bbef-ecc485c3658c, > network=Network(d049a1a6-bb45-411d-b5e1-124a17dd301b),plugin='noop', > port_profile=<?>,preserve_on_delete=False,vif_name='tap03464583-05', > vlan_id=3386) > 2020-01-08 09:03:24.773 54 INFO werkzeug [-] 127.0.0.1 - - [08/Jan/2020 > 09:03:24] "POST /delNetwork HTTP/1.1" 204 - > 2020-01-08 09:03:27.887 31 INFO kuryr_kubernetes.cni.health [-] CNI driver > readiness verified. > 2020-01-08 09:03:27.887 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 > 09:03:27] "GET /ready HTTP/1.1" 200 - > 2020-01-08 09:03:32.648 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 > 09:03:32] "GET /alive HTTP/1.1" 200 - > 2020-01-08 09:03:37.880 31 INFO kuryr_kubernetes.cni.health [-] CNI driver > readiness verified. > 2020-01-08 09:03:37.880 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 > 09:03:37] "GET /ready HTTP/1.1" 200 - > 2020-01-08 09:03:42.649 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 > 09:03:42] "GET /alive HTTP/1.1" 200 - > 2020-01-08 09:03:43.604 66 INFO os_vif [-] Successfully plugged vif > VIFVlanNested(active=True,address=fa:16:3e:54:cf:9d, > has_traffic_filtering=False,id=03464583-05be-4cfd-bbef-ecc485c3658c, > network=Network(d049a1a6-bb45-411d-b5e1-124a17dd301b),plugin='noop', > port_profile=<?>,preserve_on_delete=False,vif_name='tap03464583-05', > vlan_id=3386) > 2020-01-08 09:03:47.877 31 INFO kuryr_kubernetes.cni.health [-] CNI driver > readiness verified. > 2020-01-08 09:03:47.878 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 > 09:03:47] "GET /ready HTTP/1.1" 200 - > 2020-01-08 09:03:52.649 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 > 09:03:52] "GET /alive HTTP/1.1" 200 - Weird, that interface is supposed to be removed by the remove_ifaces method. Was the kuryr-cni got restarted? In the logs you posted, is there any WARNING message including: "Found hanging interface %(ifname)s inside...."? Moving this BZ back to Assigned after some quick debugging with Luis. The kuryr-cni pod running in a master node is restarting continuously, as it keeps unhealthy. The WARNING message including: "Found hanging interface %(ifname)s inside...." is not present in kuryr-cni pod logs. We've checked the fix [1] is included in the deployment (4.3.0-0.nightly-2020-01-06-185654). $ oc get pods -n openshift-kuryr -o wide NAME READY STATUS RESTARTS AGE IP NODE iscsi-target 0/1 ContainerCreating 0 5h44m <none> ostest-vqq5d-worker-6jcw4 kuryr-cni-gqpgb 1/1 Running 0 24h 10.196.0.39 ostest-vqq5d-worker-9tn5z kuryr-cni-hbtj9 1/1 Running 0 25h 10.196.0.13 ostest-vqq5d-master-0 kuryr-cni-m7vrl 0/1 CrashLoopBackOff 59 5h44m 10.196.0.14 ostest-vqq5d-worker-6jcw4 kuryr-cni-qwlxw 1/1 Running 0 24h 10.196.0.28 ostest-vqq5d-worker-ltmfm kuryr-cni-rzp9g 1/1 Running 0 25h 10.196.0.33 ostest-vqq5d-master-1 kuryr-cni-t4c7g 1/1 Running 0 25h 10.196.0.26 ostest-vqq5d-master-2 kuryr-controller-bdd967688-h9nzx 1/1 Running 8 25h 10.196.0.13 ostest-vqq5d-master-0 kuryr-dns-admission-controller-6d979 1/1 Running 0 25h 10.128.46.13 ostest-vqq5d-master-1 kuryr-dns-admission-controller-mqrwv 1/1 Running 0 25h 10.128.46.10 ostest-vqq5d-master-2 kuryr-dns-admission-controller-rgqss 1/1 Running 0 25h 10.128.46.21 ostest-vqq5d-master-0 $ oc get nodes NAME STATUS ROLES AGE VERSION ostest-vqq5d-master-0 Ready master 25h v1.16.2 ostest-vqq5d-master-1 Ready master 25h v1.16.2 ostest-vqq5d-master-2 Ready master 25h v1.16.2 ostest-vqq5d-worker-6jcw4 Ready worker 24h v1.16.2 ostest-vqq5d-worker-9tn5z Ready,SchedulingDisabled worker 24h v1.16.2 ostest-vqq5d-worker-ltmfm Ready,SchedulingDisabled worker 24h v1.16.2 The pod iscsi-target cannot be created since the network is failing. $ oc get pods -n openshift-kuryr describe pod iscsi-target [9/1910] NAME READY STATUS RESTARTS AGE iscsi-target 0/1 ContainerCreating 0 6h3m Error from server (NotFound): pods "describe" not found Error from server (NotFound): pods "pod" not found (shiftstack) [stack@undercloud-0 ~]$ oc -n openshift-kuryr describe pod iscsi-target Name: iscsi-target Namespace: openshift-kuryr Priority: 0 Node: ostest-vqq5d-worker-6jcw4/10.196.0.14 Start Time: Wed, 08 Jan 2020 03:39:24 -0500 Labels: storage=iscsi-target Annotations: k8s.v1.cni.cncf.io/networks-status: openshift.io/description: iscsi target openstack.org/kuryr-pod-label: {"storage": "iscsi-target"} openstack.org/kuryr-vif: {"versioned_object.changes": ["default_vif"], "versioned_object.data": {"additional_vifs": {}, "default_vif": {"versioned_object.changes":... Status: Pending IP: IPs: <none> Containers: iscsi-target: Container ID: Image: aosqe/iscsi Image ID: Port: <none> Host Port: <none> State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Readiness: exec [targetcli ls /iscsi/iqn.2016-04.test.com:storage.target00/tpg1] delay=30s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /lib/modules from kernel (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-ctxjw (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kernel: Type: HostPath (bare host directory volume) Path: /lib/modules HostPathType: default-token-ctxjw: Type: Secret (a volume populated by a Secret) SecretName: default-token-ctxjw Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreatePodSandBox 3m18s (x885 over 6h1m) kubelet, ostest-vqq5d-worker-6jcw4 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_iscsi-target_openshift-kuryr_cc05883f-cdc1-47d7-b740-d2a8350c8052_0(0b7b71eb8b381dc70af6d18693386480f9a93dcd414d9be3ab5e7b683a82fe90): Multus: error adding pod to network "kuryr": delegateAdd: error invoking DelegateAdd - "kuryr-cni": error in getting result from AddNetwork: CNI Daemon returned error 500 [1] https://github.com/openshift/kuryr-kubernetes/pull/120 Verified in 4.3.0-0.nightly-2020-03-23-130439 on top of OSP 16 RHOS_TRUNK-16.0-RHEL-8-20200226.n.1 After running kubernetes Network Policy tests and conformance tests, I couldn't find the error described in the BZ in any of the kuryr cni pods. As https://bugzilla.redhat.com/show_bug.cgi?id=1779163#c4 says, this bug is really hard and random to reproduce so do not hesitate to reopen this issue if it appears again. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0930 Reopening as it was reproduced in OCP 4.5 and the fix needs to be backported: Failed QA in OCP 4.5.0-0.nightly-2020-04-07-040639 on OSP 16 RHOS_TRUNK-16.0-RHEL-8-20200403.n.1 (with OVS). The installer fails waiting for the cluster to initialize, and one kury-cni pod remains in crashloop: $ oc get pods -n openshift-kuryr NAME READY STATUS RESTARTS AGE kuryr-cni-7sbvw 1/1 Running 0 5h11m kuryr-cni-bg2hj 1/1 Running 57 5h23m kuryr-cni-nlxld 1/1 Running 4 5h43m kuryr-cni-sc922 1/1 Running 2 5h43m kuryr-cni-vfb7n 1/1 Running 2 5h43m kuryr-cni-vsvnc 1/1 Running 0 5h11m kuryr-controller-9dff6456-7kjw9 1/1 Running 1 5h43m ERROR kuryr_kubernetes.cni.binding.nested [-] Creation of pod interface failed, most likely due to duplicated VLAN id. This will probably cause kuryr-daemon to crashloop. Trying to gat pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists') ERROR kuryr_kubernetes.cni.binding.nested Traceback (most recent call last): ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/binding/nested.py", line 86, in connect ERROR kuryr_kubernetes.cni.binding.nested iface.net_ns_fd = utils.convert_netns(netns) ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/transactional.py", line 209, in __exit__ ERROR kuryr_kubernetes.cni.binding.nested self.commit() ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 650, in commit ERROR kuryr_kubernetes.cni.binding.nested raise newif ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 589, in commit ERROR kuryr_kubernetes.cni.binding.nested self.nl.link('add', **request) ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/pyroute2/iproute/linux.py", line 1163, in link ERROR kuryr_kubernetes.cni.binding.nested msg_flags=msg_flags) ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 373, in nlm_request ERROR kuryr_kubernetes.cni.binding.nested return tuple(self._genlm_request(*argv, **kwarg)) ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 864, in nlm_request ERROR kuryr_kubernetes.cni.binding.nested callback=callback): ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 376, in get ERROR kuryr_kubernetes.cni.binding.nested return tuple(self._genlm_get(*argv, **kwarg)) ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 701, in get ERROR kuryr_kubernetes.cni.binding.nested raise msg['header']['error'] ERROR kuryr_kubernetes.cni.binding.nested pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists') ERROR kuryr_kubernetes.cni.binding.nested ERROR kuryr_kubernetes.cni.binding.nested [-] List of host interfaces: {1: {'address': '00:00:00:00:00:00', 'broadcast': '00:00:00:00:00:00', 'ifname': 'lo', 'mtu': 65536, 'qdisc': 'no queue', 'txqlen': 1000, 'operstate': 'UNKNOWN', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 0, 'proto_down': 0, 'gso_max_segs': 65535, 'g so_max_size': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 0, 'carrier_down_count': 0, 'index': 1, 'flags': 65609, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipaddr': (('::1', 128) , ('127.0.0.1', 8)), 'ports': (), 'family': 0, 'ifi_type': 772, 'state': 'up', 'neighbours': ('0.0.0.0',)}, 'lo': {'address': '00:00:00:00:00:00', 'broadcast': '00:00:00:00:00:00', 'ifname': 'lo', 'mtu': 65536, 'qdisc': 'noqueue', 'txqlen': 1000, 'operstate': 'UNKNOWN', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 0, 'proto_down': 0, 'gso_max_segs ': 65535, 'gso_max_size': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 0, 'carrier_down_count': 0, 'index': 1, 'flags': 65609, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipaddr': ( ('::1', 128), ('127.0.0.1', 8)), 'ports': (), 'family': 0, 'ifi_type': 772, 'state': 'up', 'neighbours': ('0.0.0.0',)}, 2: {'address': 'fa:16:3e:ff:5e:0f', 'broadcast': 'ff:ff:ff:ff:ff:ff', 'ifname': 'ens3', 'mt u': 1450, 'qdisc': 'fq_codel', 'txqlen': 1000, 'operstate': 'UP', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 2, 'proto_down': 0, 'gso_ma x_segs': 65535, 'gso_max_size': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 1, 'carrier_down_count': 1, 'index': 2, 'flags': 69699, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipad dr': (('fe80::a777:cb9f:a473:bc9a', 64), ('10.196.1.28', 16)), 'ports': (), 'family': 0, 'ifi_type': 1, 'state': 'up', 'neighbours': ('10.196.0.12', '10.196.3.50', '10.196.0.102', '10.196.0.11', '224.0.0.22', '1 0.196.0.1', '10.196.0.5', '224.0.0.18', '10.196.0.7', '10.196.1.206', '224.0.0.251')}, 'ens3': {'address': 'fa:16:3e:ff:5e:0f', 'broadcast': 'ff:ff:ff:ff:ff:ff', 'ifname': 'ens3', 'mtu': 1450, 'qdisc': 'fq_codel ', 'txqlen': 1000, 'operstate': 'UP', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 2, 'proto_down': 0, 'gso_max_segs': 65535, 'gso_max_siz e': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 1, 'carrier_down_count': 1, 'index': 2, 'flags': 69699, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipaddr': (('fe80::a777:cb9f:a473:bc9a', 64), ('10.196.1.28', 16)), 'ports': (), 'family': 0, 'ifi_type': 1, 'state': 'up', 'neighbours': ('10.196.0.12', '10.196.3.50', '10.196.0.102', '10.196.0.11', '224.0.0.22', '10.196.0.1', '10.196.0.5', '2 24.0.0.18', '10.196.0.7', '10.196.1.206', '224.0.0.251')}}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists') ERROR kuryr_kubernetes.cni.binding.nested [-] List of pod namespace interfaces: {1: {'address': '00:00:00:00:00:00', 'broadcast': '00:00:00:00:00:00', 'ifname': 'lo', 'mtu': 65536, 'qd isc': 'noqueue', 'txqlen': 1000, 'operstate': 'UNKNOWN', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 0, 'proto_down': 0, 'gso_max_segs': 65535, 'gso_max_size': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 0, 'carrier_down_count': 0, 'index': 1, 'flags': 65609, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipaddr': ((': :1', 128), ('127.0.0.1', 8)), 'ports': (), 'family': 0, 'ifi_type': 772, 'state': 'up', 'neighbours': ()}, 'lo': {'address': '00:00:00:00:00:00', 'broadcast': '00:00:00:00:00:00', 'ifname': 'lo', 'mtu': 65536, ' qdisc': 'noqueue', 'txqlen': 1000, 'operstate': 'UNKNOWN', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 0, 'proto_down': 0, 'gso_max_segs' : 65535, 'gso_max_size': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 0, 'carrier_down_count': 0, 'index': 1, 'flags': 65609, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipaddr': (( '::1', 128), ('127.0.0.1', 8)), 'ports': (), 'family': 0, 'ifi_type': 772, 'state': 'up', 'neighbours': ()}}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists') ERROR kuryr_kubernetes.cni.daemon.service [-] Error when processing addNetwork request. CNI Params: {'CNI_IFNAME': 'eth0', 'CNI_NETNS': '/proc/134124/ns/net', 'CNI_PATH': '/opt/multus/ bin:/var/lib/cni/bin', 'CNI_COMMAND': 'ADD', 'CNI_CONTAINERID': '97359d9fb8a375cb2d0dcb86baa6981366363d04f912b910a3c3b4308baf1396', 'CNI_ARGS': 'IgnoreUnknown=true;K8S_POD_NAMESPACE=openshift-monitoring;K8S_POD_ NAME=thanos-querier-6ccdccb756-znrtz;K8S_POD_INFRA_CONTAINER_ID=97359d9fb8a375cb2d0dcb86baa6981366363d04f912b910a3c3b4308baf1396'}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists') Hm, seems like bot got stuck with this one? Created attachment 1702172 [details]
NP test results
Verified on 4.3.0-0.nightly-2020-07-22-001315 over OSP16.1 (4.3.0-0.nightly-2020-07-22-001315) with OVN. Installation and NP tests run OK. After installation: [stack@undercloud-0 ~]$ oc get all -n openshift-kuryr NAME READY STATUS RESTARTS AGE pod/kuryr-cni-5mzzr 1/1 Running 0 20m pod/kuryr-cni-7m6xd 1/1 Running 0 37m pod/kuryr-cni-d4hvs 1/1 Running 0 20m pod/kuryr-cni-gn4w2 1/1 Running 0 37m pod/kuryr-cni-kp2hb 1/1 Running 0 21m pod/kuryr-cni-sv7j4 1/1 Running 0 37m pod/kuryr-controller-8997796c9-pdzcn 1/1 Running 0 37m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/kuryr-cni 6 6 6 6 6 <none> 37m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/kuryr-controller 1/1 1 1 37m NAME DESIRED CURRENT READY AGE replicaset.apps/kuryr-controller-8997796c9 1 1 1 37m After NP tests run: [stack@undercloud-0 ~]$ oc get all -n openshift-kuryr NAME READY STATUS RESTARTS AGE pod/kuryr-cni-5mzzr 1/1 Running 0 174m pod/kuryr-cni-7m6xd 1/1 Running 0 3h11m pod/kuryr-cni-d4hvs 1/1 Running 0 175m pod/kuryr-cni-gn4w2 1/1 Running 0 3h11m pod/kuryr-cni-kp2hb 1/1 Running 2 175m pod/kuryr-cni-sv7j4 1/1 Running 0 3h11m pod/kuryr-controller-8997796c9-pdzcn 1/1 Running 0 3h11m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/kuryr-cni 6 6 6 6 6 <none> 3h11m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/kuryr-controller 1/1 1 1 3h11m NAME DESIRED CURRENT READY AGE replicaset.apps/kuryr-controller-8997796c9 1 1 1 3h11m 1 NetlinkError observed on pod/kuryr-cni-sv7j4 without restart so the pod recovered by itself and functionality is not affected. (overcloud) [stack@undercloud-0 ~]$ for i in $(oc get pods -n openshift-kuryr -l app=kuryr-cni -o NAME ); do echo "# $i"; oc logs -n openshift-kuryr $i | grep "NetlinkError"; done # pod/kuryr-cni-5mzzr # pod/kuryr-cni-7m6xd # pod/kuryr-cni-d4hvs # pod/kuryr-cni-gn4w2 # pod/kuryr-cni-kp2hb # pod/kuryr-cni-sv7j4 2020-07-22 13:49:24.211 1170 ERROR kuryr_kubernetes.cni.binding.nested [-] Creation of pod interface failed due to VLAN ID (vlan_info={'kind': 'vlan', 'vlan_id': 3000}) conflict. Probably the CRI had not cleaned up the network namespace of deleted pods. This should not be a permanent issue but may cause restart of kuryr-cni pod.: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists') 2020-07-22 13:49:24.211 1170 ERROR kuryr_kubernetes.cni.binding.nested pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists') 2020-07-22 13:49:24.221 1170 ERROR kuryr_kubernetes.cni.daemon.service [-] Error when processing addNetwork request. CNI Params: {'CNI_IFNAME': 'eth0', 'CNI_NETNS': '/proc/28181/ns/net', 'CNI_PATH': '/opt/multus/bin:/var/lib/cni/bin:/usr/libexec/cni', 'CNI_COMMAND': 'ADD', 'CNI_CONTAINERID': '0887fed1ab99a3e623635395b4594d42cf0bf3ca95e44fd66d586e62e7a48fa3', 'CNI_ARGS': 'IgnoreUnknown=true;K8S_POD_NAMESPACE=openshift-kube-apiserver;K8S_POD_NAME=installer-2-ostest-z4b6x-master-1;K8S_POD_INFRA_CONTAINER_ID=0887fed1ab99a3e623635395b4594d42cf0bf3ca95e44fd66d586e62e7a48fa3'}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists') 2020-07-22 13:49:24.221 1170 ERROR kuryr_kubernetes.cni.daemon.service pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists') [stack@undercloud-0 ~]$ for i in $(oc get pods -n openshift-kuryr -l app=kuryr-cni -o NAME ); do echo "# $i"; oc logs -n openshift-kuryr $i -p| grep "NetlinkError"; done # pod/kuryr-cni-5mzzr Error from server (BadRequest): previous terminated container "kuryr-cni" in pod "kuryr-cni-5mzzr" not found # pod/kuryr-cni-7m6xd Error from server (BadRequest): previous terminated container "kuryr-cni" in pod "kuryr-cni-7m6xd" not found # pod/kuryr-cni-d4hvs Error from server (BadRequest): previous terminated container "kuryr-cni" in pod "kuryr-cni-d4hvs" not found # pod/kuryr-cni-gn4w2 Error from server (BadRequest): previous terminated container "kuryr-cni" in pod "kuryr-cni-gn4w2" not found # pod/kuryr-cni-kp2hb # pod/kuryr-cni-sv7j4 Error from server (BadRequest): previous terminated container "kuryr-cni" in pod "kuryr-cni-sv7j4" not found [stack@undercloud-0 ~]$ NP tests logs on np_results_4.3.0-0.nightly-2020-07-22-001315.tgz Please note: (errata-xmlrpc from comment #19) > If the solution does not work for you, open a new bug report. You will need to clone this bug if a further fix is required after the errata ships; by design we cannot re-ship a bug that has already shipped. |