Bug 1786298
| Summary: | NetlinkError: (17, 'File exists') when kuryr-daemon gets restarted in a wrong moment | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Luis Tomas Bolivar <ltomasbo> | ||||
| Component: | Networking | Assignee: | Michał Dulko <mdulko> | ||||
| Networking sub component: | kuryr | QA Contact: | GenadiC <gcheresh> | ||||
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |||||
| Severity: | medium | ||||||
| Priority: | medium | CC: | chaoyang, gcheresh, itbrown, juriarte, mdulko, rlobillo, scuppett, vlaad | ||||
| Version: | 4.2.0 | Keywords: | Reopened | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.3.z | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | 1779163 | Environment: | |||||
| Last Closed: | 2021-01-20 14:50:26 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1779163 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Description
Luis Tomas Bolivar
2019-12-24 09:45:23 UTC
*** Bug 1786218 has been marked as a duplicate of this bug. *** Hi, Please add information about how to verify. Taking a look at the kuryr cni pod logs, while for instance running the OCP QE tests. And then check that those errors are now not there even if the kuryr-cni pods gets restarted Hi
I reproduced this issue on
4.3.0-0.nightly-2020-01-06-185654
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service [-] Error when processing addNetwork request. CNI Params: {'CNI_IFNAME': 'eth0', 'CNI_NETNS': '/proc/1909009/ns/net', 'CNI_PATH': '/opt/multus/bin:/var/lib/cni/bin', 'CNI_COMMAND': 'ADD', 'CNI_CONTAINERID': '4de18b6bcd163bea42cae9f619052080aad749474228630827cd3edaa2bf532b', 'CNI_ARGS': 'IgnoreUnknown=true;K8S_POD_NAMESPACE=openshift-kuryr;K8S_POD_NAME=iscsi-target;K8S_POD_INFRA_CONTAINER_ID=4de18b6bcd163bea42cae9f619052080aad749474228630827cd3edaa2bf532b'}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service Traceback (most recent call last):
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/daemon/service.py", line 81, in add
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service vif = self.plugin.add(params)
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/plugins/k8s_cni_registry.py", line 51, in add
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service vifs = self._do_work(params, b_base.connect)
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/plugins/k8s_cni_registry.py", line 155, in _do_work
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service container_id=params.CNI_CONTAINERID)
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/binding/base.py", line 143, in connect
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service driver.connect(vif, ifname, netns, container_id)
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/binding/nested.py", line 70, in connect
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service iface.net_ns_fd = utils.convert_netns(netns)
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/transactional.py", line 209, in __exit__
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service self.commit()
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 649, in commit
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service raise newif
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 588, in commit
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service self.nl.link('add', **request)
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/iproute/linux.py", line 1115, in link
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service msg_flags=msg_flags)
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 352, in nlm_request
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service return tuple(self._genlm_request(*argv, **kwarg))
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 836, in nlm_request
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service callback=callback):
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 355, in get
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service return tuple(self._genlm_get(*argv, **kwarg))
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 679, in get
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service raise msg['header']['error']
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service
2020-01-08 09:03:24.667 37 INFO werkzeug [-] 127.0.0.1 - - [08/Jan/2020 09:03:24] "POST /addNetwork HTTP/1.1" 500 -
2020-01-08 09:03:24.696 52 INFO os_vif [-] Successfully unplugged vif VIFVlanNested(active=True,address=fa:16:3e:54:cf:9d,has_traffic_filtering=False,id=03464583-05be-4cfd-bbef-ecc485c3658c,network=Network(d049a1a6-bb45-411d-b5e1-124a17dd301b),plugin='noop',port_profile=<?>,preserve_on_delete=False,vif_name='tap03464583-05',vlan_id=3386)
2020-01-08 09:03:24.703 52 INFO werkzeug [-] 127.0.0.1 - - [08/Jan/2020 09:03:24] "POST /delNetwork HTTP/1.1" 204 -
2020-01-08 09:03:24.766 54 INFO os_vif [-] Successfully unplugged vif VIFVlanNested(active=True,address=fa:16:3e:54:cf:9d,has_traffic_filtering=False,id=03464583-05be-4cfd-bbef-ecc485c3658c,network=Network(d049a1a6-bb45-411d-b5e1-124a17dd301b),plugin='noop',port_profile=<?>,preserve_on_delete=False,vif_name='tap03464583-05',vlan_id=3386)
2020-01-08 09:03:24.773 54 INFO werkzeug [-] 127.0.0.1 - - [08/Jan/2020 09:03:24] "POST /delNetwork HTTP/1.1" 204 -
2020-01-08 09:03:27.887 31 INFO kuryr_kubernetes.cni.health [-] CNI driver readiness verified.
2020-01-08 09:03:27.887 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 09:03:27] "GET /ready HTTP/1.1" 200 -
2020-01-08 09:03:32.648 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 09:03:32] "GET /alive HTTP/1.1" 200 -
2020-01-08 09:03:37.880 31 INFO kuryr_kubernetes.cni.health [-] CNI driver readiness verified.
2020-01-08 09:03:37.880 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 09:03:37] "GET /ready HTTP/1.1" 200 -
2020-01-08 09:03:42.649 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 09:03:42] "GET /alive HTTP/1.1" 200 -
2020-01-08 09:03:43.604 66 INFO os_vif [-] Successfully plugged vif VIFVlanNested(active=True,address=fa:16:3e:54:cf:9d,has_traffic_filtering=False,id=03464583-05be-4cfd-bbef-ecc485c3658c,network=Network(d049a1a6-bb45-411d-b5e1-124a17dd301b),plugin='noop',port_profile=<?>,preserve_on_delete=False,vif_name='tap03464583-05',vlan_id=3386)
2020-01-08 09:03:47.877 31 INFO kuryr_kubernetes.cni.health [-] CNI driver readiness verified.
2020-01-08 09:03:47.878 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 09:03:47] "GET /ready HTTP/1.1" 200 -
2020-01-08 09:03:52.649 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 09:03:52] "GET /alive HTTP/1.1" 200 -
(In reply to Chao Yang from comment #5) > Hi > > I reproduced this issue on > > 4.3.0-0.nightly-2020-01-06-185654 > > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service [-] > Error when processing addNetwork request. CNI Params: {'CNI_IFNAME': 'eth0', > 'CNI_NETNS': '/proc/1909009/ns/net', 'CNI_PATH': > '/opt/multus/bin:/var/lib/cni/bin', 'CNI_COMMAND': 'ADD', 'CNI_CONTAINERID': > '4de18b6bcd163bea42cae9f619052080aad749474228630827cd3edaa2bf532b', > 'CNI_ARGS': > 'IgnoreUnknown=true;K8S_POD_NAMESPACE=openshift-kuryr;K8S_POD_NAME=iscsi- > target; > K8S_POD_INFRA_CONTAINER_ID=4de18b6bcd163bea42cae9f619052080aad749474228630827 > cd3edaa2bf532b'}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File > exists') > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > Traceback (most recent call last): > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/daemon/service.py", > line 81, in add > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service vif > = self.plugin.add(params) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/plugins/ > k8s_cni_registry.py", line 51, in add > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > vifs = self._do_work(params, b_base.connect) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/plugins/ > k8s_cni_registry.py", line 155, in _do_work > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > container_id=params.CNI_CONTAINERID) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/binding/base.py", > line 143, in connect > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > driver.connect(vif, ifname, netns, container_id) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/binding/nested.py", > line 70, in connect > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > iface.net_ns_fd = utils.convert_netns(netns) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/pyroute2/ipdb/transactional.py", line 209, > in __exit__ > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > self.commit() > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 649, in > commit > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > raise newif > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 588, in > commit > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > self.nl.link('add', **request) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/pyroute2/iproute/linux.py", line 1115, in > link > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > msg_flags=msg_flags) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 352, > in nlm_request > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > return tuple(self._genlm_request(*argv, **kwarg)) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 836, > in nlm_request > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > callback=callback): > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 355, > in get > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > return tuple(self._genlm_get(*argv, **kwarg)) > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service File > "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 679, > in get > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > raise msg['header']['error'] > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists') > 2020-01-08 09:03:24.660 37 ERROR kuryr_kubernetes.cni.daemon.service > 2020-01-08 09:03:24.667 37 INFO werkzeug [-] 127.0.0.1 - - [08/Jan/2020 > 09:03:24] "POST /addNetwork HTTP/1.1" 500 - > 2020-01-08 09:03:24.696 52 INFO os_vif [-] Successfully unplugged vif > VIFVlanNested(active=True,address=fa:16:3e:54:cf:9d, > has_traffic_filtering=False,id=03464583-05be-4cfd-bbef-ecc485c3658c, > network=Network(d049a1a6-bb45-411d-b5e1-124a17dd301b),plugin='noop', > port_profile=<?>,preserve_on_delete=False,vif_name='tap03464583-05', > vlan_id=3386) > 2020-01-08 09:03:24.703 52 INFO werkzeug [-] 127.0.0.1 - - [08/Jan/2020 > 09:03:24] "POST /delNetwork HTTP/1.1" 204 - > 2020-01-08 09:03:24.766 54 INFO os_vif [-] Successfully unplugged vif > VIFVlanNested(active=True,address=fa:16:3e:54:cf:9d, > has_traffic_filtering=False,id=03464583-05be-4cfd-bbef-ecc485c3658c, > network=Network(d049a1a6-bb45-411d-b5e1-124a17dd301b),plugin='noop', > port_profile=<?>,preserve_on_delete=False,vif_name='tap03464583-05', > vlan_id=3386) > 2020-01-08 09:03:24.773 54 INFO werkzeug [-] 127.0.0.1 - - [08/Jan/2020 > 09:03:24] "POST /delNetwork HTTP/1.1" 204 - > 2020-01-08 09:03:27.887 31 INFO kuryr_kubernetes.cni.health [-] CNI driver > readiness verified. > 2020-01-08 09:03:27.887 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 > 09:03:27] "GET /ready HTTP/1.1" 200 - > 2020-01-08 09:03:32.648 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 > 09:03:32] "GET /alive HTTP/1.1" 200 - > 2020-01-08 09:03:37.880 31 INFO kuryr_kubernetes.cni.health [-] CNI driver > readiness verified. > 2020-01-08 09:03:37.880 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 > 09:03:37] "GET /ready HTTP/1.1" 200 - > 2020-01-08 09:03:42.649 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 > 09:03:42] "GET /alive HTTP/1.1" 200 - > 2020-01-08 09:03:43.604 66 INFO os_vif [-] Successfully plugged vif > VIFVlanNested(active=True,address=fa:16:3e:54:cf:9d, > has_traffic_filtering=False,id=03464583-05be-4cfd-bbef-ecc485c3658c, > network=Network(d049a1a6-bb45-411d-b5e1-124a17dd301b),plugin='noop', > port_profile=<?>,preserve_on_delete=False,vif_name='tap03464583-05', > vlan_id=3386) > 2020-01-08 09:03:47.877 31 INFO kuryr_kubernetes.cni.health [-] CNI driver > readiness verified. > 2020-01-08 09:03:47.878 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 > 09:03:47] "GET /ready HTTP/1.1" 200 - > 2020-01-08 09:03:52.649 31 INFO werkzeug [-] 10.196.0.14 - - [08/Jan/2020 > 09:03:52] "GET /alive HTTP/1.1" 200 - Weird, that interface is supposed to be removed by the remove_ifaces method. Was the kuryr-cni got restarted? In the logs you posted, is there any WARNING message including: "Found hanging interface %(ifname)s inside...."? Moving this BZ back to Assigned after some quick debugging with Luis.
The kuryr-cni pod running in a master node is restarting continuously, as it keeps unhealthy.
The WARNING message including: "Found hanging interface %(ifname)s inside...." is not present in kuryr-cni pod logs.
We've checked the fix [1] is included in the deployment (4.3.0-0.nightly-2020-01-06-185654).
$ oc get pods -n openshift-kuryr -o wide
NAME READY STATUS RESTARTS AGE IP NODE
iscsi-target 0/1 ContainerCreating 0 5h44m <none> ostest-vqq5d-worker-6jcw4
kuryr-cni-gqpgb 1/1 Running 0 24h 10.196.0.39 ostest-vqq5d-worker-9tn5z
kuryr-cni-hbtj9 1/1 Running 0 25h 10.196.0.13 ostest-vqq5d-master-0
kuryr-cni-m7vrl 0/1 CrashLoopBackOff 59 5h44m 10.196.0.14 ostest-vqq5d-worker-6jcw4
kuryr-cni-qwlxw 1/1 Running 0 24h 10.196.0.28 ostest-vqq5d-worker-ltmfm
kuryr-cni-rzp9g 1/1 Running 0 25h 10.196.0.33 ostest-vqq5d-master-1
kuryr-cni-t4c7g 1/1 Running 0 25h 10.196.0.26 ostest-vqq5d-master-2
kuryr-controller-bdd967688-h9nzx 1/1 Running 8 25h 10.196.0.13 ostest-vqq5d-master-0
kuryr-dns-admission-controller-6d979 1/1 Running 0 25h 10.128.46.13 ostest-vqq5d-master-1
kuryr-dns-admission-controller-mqrwv 1/1 Running 0 25h 10.128.46.10 ostest-vqq5d-master-2
kuryr-dns-admission-controller-rgqss 1/1 Running 0 25h 10.128.46.21 ostest-vqq5d-master-0
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ostest-vqq5d-master-0 Ready master 25h v1.16.2
ostest-vqq5d-master-1 Ready master 25h v1.16.2
ostest-vqq5d-master-2 Ready master 25h v1.16.2
ostest-vqq5d-worker-6jcw4 Ready worker 24h v1.16.2
ostest-vqq5d-worker-9tn5z Ready,SchedulingDisabled worker 24h v1.16.2
ostest-vqq5d-worker-ltmfm Ready,SchedulingDisabled worker 24h v1.16.2
The pod iscsi-target cannot be created since the network is failing.
$ oc get pods -n openshift-kuryr describe pod iscsi-target [9/1910]
NAME READY STATUS RESTARTS AGE
iscsi-target 0/1 ContainerCreating 0 6h3m
Error from server (NotFound): pods "describe" not found
Error from server (NotFound): pods "pod" not found
(shiftstack) [stack@undercloud-0 ~]$ oc -n openshift-kuryr describe pod iscsi-target
Name: iscsi-target
Namespace: openshift-kuryr
Priority: 0
Node: ostest-vqq5d-worker-6jcw4/10.196.0.14
Start Time: Wed, 08 Jan 2020 03:39:24 -0500
Labels: storage=iscsi-target
Annotations: k8s.v1.cni.cncf.io/networks-status:
openshift.io/description: iscsi target
openstack.org/kuryr-pod-label: {"storage": "iscsi-target"}
openstack.org/kuryr-vif:
{"versioned_object.changes": ["default_vif"], "versioned_object.data": {"additional_vifs": {}, "default_vif": {"versioned_object.changes":...
Status: Pending
IP:
IPs: <none>
Containers:
iscsi-target:
Container ID:
Image: aosqe/iscsi
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Readiness: exec [targetcli ls /iscsi/iqn.2016-04.test.com:storage.target00/tpg1] delay=30s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/lib/modules from kernel (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-ctxjw (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kernel:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
default-token-ctxjw:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-ctxjw
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 3m18s (x885 over 6h1m) kubelet, ostest-vqq5d-worker-6jcw4 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_iscsi-target_openshift-kuryr_cc05883f-cdc1-47d7-b740-d2a8350c8052_0(0b7b71eb8b381dc70af6d18693386480f9a93dcd414d9be3ab5e7b683a82fe90): Multus: error adding pod to network "kuryr": delegateAdd: error invoking DelegateAdd - "kuryr-cni": error in getting result from AddNetwork: CNI Daemon returned error 500
[1] https://github.com/openshift/kuryr-kubernetes/pull/120
Verified in 4.3.0-0.nightly-2020-03-23-130439 on top of OSP 16 RHOS_TRUNK-16.0-RHEL-8-20200226.n.1 After running kubernetes Network Policy tests and conformance tests, I couldn't find the error described in the BZ in any of the kuryr cni pods. As https://bugzilla.redhat.com/show_bug.cgi?id=1779163#c4 says, this bug is really hard and random to reproduce so do not hesitate to reopen this issue if it appears again. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0930 Reopening as it was reproduced in OCP 4.5 and the fix needs to be backported:
Failed QA in OCP 4.5.0-0.nightly-2020-04-07-040639 on OSP 16 RHOS_TRUNK-16.0-RHEL-8-20200403.n.1 (with OVS).
The installer fails waiting for the cluster to initialize, and one kury-cni pod remains in crashloop:
$ oc get pods -n openshift-kuryr
NAME READY STATUS RESTARTS AGE
kuryr-cni-7sbvw 1/1 Running 0 5h11m
kuryr-cni-bg2hj 1/1 Running 57 5h23m
kuryr-cni-nlxld 1/1 Running 4 5h43m
kuryr-cni-sc922 1/1 Running 2 5h43m
kuryr-cni-vfb7n 1/1 Running 2 5h43m
kuryr-cni-vsvnc 1/1 Running 0 5h11m
kuryr-controller-9dff6456-7kjw9 1/1 Running 1 5h43m
ERROR kuryr_kubernetes.cni.binding.nested [-] Creation of pod interface failed, most likely due to duplicated VLAN id. This will probably cause kuryr-daemon to crashloop. Trying to gat
pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
ERROR kuryr_kubernetes.cni.binding.nested Traceback (most recent call last):
ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/binding/nested.py", line 86, in connect
ERROR kuryr_kubernetes.cni.binding.nested iface.net_ns_fd = utils.convert_netns(netns)
ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/transactional.py", line 209, in __exit__
ERROR kuryr_kubernetes.cni.binding.nested self.commit()
ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 650, in commit
ERROR kuryr_kubernetes.cni.binding.nested raise newif
ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 589, in commit
ERROR kuryr_kubernetes.cni.binding.nested self.nl.link('add', **request)
ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/pyroute2/iproute/linux.py", line 1163, in link
ERROR kuryr_kubernetes.cni.binding.nested msg_flags=msg_flags)
ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 373, in nlm_request
ERROR kuryr_kubernetes.cni.binding.nested return tuple(self._genlm_request(*argv, **kwarg))
ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 864, in nlm_request
ERROR kuryr_kubernetes.cni.binding.nested callback=callback):
ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 376, in get
ERROR kuryr_kubernetes.cni.binding.nested return tuple(self._genlm_get(*argv, **kwarg))
ERROR kuryr_kubernetes.cni.binding.nested File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 701, in get
ERROR kuryr_kubernetes.cni.binding.nested raise msg['header']['error']
ERROR kuryr_kubernetes.cni.binding.nested pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
ERROR kuryr_kubernetes.cni.binding.nested
ERROR kuryr_kubernetes.cni.binding.nested [-] List of host interfaces: {1: {'address': '00:00:00:00:00:00', 'broadcast': '00:00:00:00:00:00', 'ifname': 'lo', 'mtu': 65536, 'qdisc': 'no
queue', 'txqlen': 1000, 'operstate': 'UNKNOWN', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 0, 'proto_down': 0, 'gso_max_segs': 65535, 'g
so_max_size': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 0, 'carrier_down_count': 0, 'index': 1, 'flags': 65609, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipaddr': (('::1', 128)
, ('127.0.0.1', 8)), 'ports': (), 'family': 0, 'ifi_type': 772, 'state': 'up', 'neighbours': ('0.0.0.0',)}, 'lo': {'address': '00:00:00:00:00:00', 'broadcast': '00:00:00:00:00:00', 'ifname': 'lo', 'mtu': 65536,
'qdisc': 'noqueue', 'txqlen': 1000, 'operstate': 'UNKNOWN', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 0, 'proto_down': 0, 'gso_max_segs
': 65535, 'gso_max_size': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 0, 'carrier_down_count': 0, 'index': 1, 'flags': 65609, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipaddr': (
('::1', 128), ('127.0.0.1', 8)), 'ports': (), 'family': 0, 'ifi_type': 772, 'state': 'up', 'neighbours': ('0.0.0.0',)}, 2: {'address': 'fa:16:3e:ff:5e:0f', 'broadcast': 'ff:ff:ff:ff:ff:ff', 'ifname': 'ens3', 'mt
u': 1450, 'qdisc': 'fq_codel', 'txqlen': 1000, 'operstate': 'UP', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 2, 'proto_down': 0, 'gso_ma
x_segs': 65535, 'gso_max_size': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 1, 'carrier_down_count': 1, 'index': 2, 'flags': 69699, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipad
dr': (('fe80::a777:cb9f:a473:bc9a', 64), ('10.196.1.28', 16)), 'ports': (), 'family': 0, 'ifi_type': 1, 'state': 'up', 'neighbours': ('10.196.0.12', '10.196.3.50', '10.196.0.102', '10.196.0.11', '224.0.0.22', '1
0.196.0.1', '10.196.0.5', '224.0.0.18', '10.196.0.7', '10.196.1.206', '224.0.0.251')}, 'ens3': {'address': 'fa:16:3e:ff:5e:0f', 'broadcast': 'ff:ff:ff:ff:ff:ff', 'ifname': 'ens3', 'mtu': 1450, 'qdisc': 'fq_codel
', 'txqlen': 1000, 'operstate': 'UP', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 2, 'proto_down': 0, 'gso_max_segs': 65535, 'gso_max_siz
e': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 1, 'carrier_down_count': 1, 'index': 2, 'flags': 69699, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipaddr': (('fe80::a777:cb9f:a473:bc9a', 64), ('10.196.1.28', 16)), 'ports': (), 'family': 0, 'ifi_type': 1, 'state': 'up', 'neighbours': ('10.196.0.12', '10.196.3.50', '10.196.0.102', '10.196.0.11', '224.0.0.22', '10.196.0.1', '10.196.0.5', '2
24.0.0.18', '10.196.0.7', '10.196.1.206', '224.0.0.251')}}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
ERROR kuryr_kubernetes.cni.binding.nested [-] List of pod namespace interfaces: {1: {'address': '00:00:00:00:00:00', 'broadcast': '00:00:00:00:00:00', 'ifname': 'lo', 'mtu': 65536, 'qd
isc': 'noqueue', 'txqlen': 1000, 'operstate': 'UNKNOWN', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 0, 'proto_down': 0, 'gso_max_segs':
65535, 'gso_max_size': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 0, 'carrier_down_count': 0, 'index': 1, 'flags': 65609, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipaddr': ((':
:1', 128), ('127.0.0.1', 8)), 'ports': (), 'family': 0, 'ifi_type': 772, 'state': 'up', 'neighbours': ()}, 'lo': {'address': '00:00:00:00:00:00', 'broadcast': '00:00:00:00:00:00', 'ifname': 'lo', 'mtu': 65536, '
qdisc': 'noqueue', 'txqlen': 1000, 'operstate': 'UNKNOWN', 'linkmode': 0, 'group': 0, 'promiscuity': 0, 'num_tx_queues': 1, 'num_rx_queues': 1, 'carrier': 1, 'carrier_changes': 0, 'proto_down': 0, 'gso_max_segs'
: 65535, 'gso_max_size': 65536, 'xdp': '05:00:02:00:00:00:00:00', 'carrier_up_count': 0, 'carrier_down_count': 0, 'index': 1, 'flags': 65609, 'ipdb_scope': 'system', 'ipdb_priority': 0, 'vlans': (), 'ipaddr': ((
'::1', 128), ('127.0.0.1', 8)), 'ports': (), 'family': 0, 'ifi_type': 772, 'state': 'up', 'neighbours': ()}}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
ERROR kuryr_kubernetes.cni.daemon.service [-] Error when processing addNetwork request. CNI Params: {'CNI_IFNAME': 'eth0', 'CNI_NETNS': '/proc/134124/ns/net', 'CNI_PATH': '/opt/multus/
bin:/var/lib/cni/bin', 'CNI_COMMAND': 'ADD', 'CNI_CONTAINERID': '97359d9fb8a375cb2d0dcb86baa6981366363d04f912b910a3c3b4308baf1396', 'CNI_ARGS': 'IgnoreUnknown=true;K8S_POD_NAMESPACE=openshift-monitoring;K8S_POD_
NAME=thanos-querier-6ccdccb756-znrtz;K8S_POD_INFRA_CONTAINER_ID=97359d9fb8a375cb2d0dcb86baa6981366363d04f912b910a3c3b4308baf1396'}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
Hm, seems like bot got stuck with this one? Created attachment 1702172 [details]
NP test results
Verified on 4.3.0-0.nightly-2020-07-22-001315 over OSP16.1 (4.3.0-0.nightly-2020-07-22-001315) with OVN.
Installation and NP tests run OK.
After installation:
[stack@undercloud-0 ~]$ oc get all -n openshift-kuryr
NAME READY STATUS RESTARTS AGE
pod/kuryr-cni-5mzzr 1/1 Running 0 20m
pod/kuryr-cni-7m6xd 1/1 Running 0 37m
pod/kuryr-cni-d4hvs 1/1 Running 0 20m
pod/kuryr-cni-gn4w2 1/1 Running 0 37m
pod/kuryr-cni-kp2hb 1/1 Running 0 21m
pod/kuryr-cni-sv7j4 1/1 Running 0 37m
pod/kuryr-controller-8997796c9-pdzcn 1/1 Running 0 37m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kuryr-cni 6 6 6 6 6 <none> 37m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kuryr-controller 1/1 1 1 37m
NAME DESIRED CURRENT READY AGE
replicaset.apps/kuryr-controller-8997796c9 1 1 1 37m
After NP tests run:
[stack@undercloud-0 ~]$ oc get all -n openshift-kuryr
NAME READY STATUS RESTARTS AGE
pod/kuryr-cni-5mzzr 1/1 Running 0 174m
pod/kuryr-cni-7m6xd 1/1 Running 0 3h11m
pod/kuryr-cni-d4hvs 1/1 Running 0 175m
pod/kuryr-cni-gn4w2 1/1 Running 0 3h11m
pod/kuryr-cni-kp2hb 1/1 Running 2 175m
pod/kuryr-cni-sv7j4 1/1 Running 0 3h11m
pod/kuryr-controller-8997796c9-pdzcn 1/1 Running 0 3h11m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kuryr-cni 6 6 6 6 6 <none> 3h11m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kuryr-controller 1/1 1 1 3h11m
NAME DESIRED CURRENT READY AGE
replicaset.apps/kuryr-controller-8997796c9 1 1 1 3h11m
1 NetlinkError observed on pod/kuryr-cni-sv7j4 without restart so the pod recovered by itself and functionality is not affected.
(overcloud) [stack@undercloud-0 ~]$ for i in $(oc get pods -n openshift-kuryr -l app=kuryr-cni -o NAME ); do echo "# $i"; oc logs -n openshift-kuryr $i | grep "NetlinkError"; done
# pod/kuryr-cni-5mzzr
# pod/kuryr-cni-7m6xd
# pod/kuryr-cni-d4hvs
# pod/kuryr-cni-gn4w2
# pod/kuryr-cni-kp2hb
# pod/kuryr-cni-sv7j4
2020-07-22 13:49:24.211 1170 ERROR kuryr_kubernetes.cni.binding.nested [-] Creation of pod interface failed due to VLAN ID (vlan_info={'kind': 'vlan', 'vlan_id': 3000}) conflict. Probably the CRI had not cleaned up the network namespace of deleted pods. This should not be a permanent issue but may cause restart of kuryr-cni pod.: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
2020-07-22 13:49:24.211 1170 ERROR kuryr_kubernetes.cni.binding.nested pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
2020-07-22 13:49:24.221 1170 ERROR kuryr_kubernetes.cni.daemon.service [-] Error when processing addNetwork request. CNI Params: {'CNI_IFNAME': 'eth0', 'CNI_NETNS': '/proc/28181/ns/net', 'CNI_PATH': '/opt/multus/bin:/var/lib/cni/bin:/usr/libexec/cni', 'CNI_COMMAND': 'ADD', 'CNI_CONTAINERID': '0887fed1ab99a3e623635395b4594d42cf0bf3ca95e44fd66d586e62e7a48fa3', 'CNI_ARGS': 'IgnoreUnknown=true;K8S_POD_NAMESPACE=openshift-kube-apiserver;K8S_POD_NAME=installer-2-ostest-z4b6x-master-1;K8S_POD_INFRA_CONTAINER_ID=0887fed1ab99a3e623635395b4594d42cf0bf3ca95e44fd66d586e62e7a48fa3'}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
2020-07-22 13:49:24.221 1170 ERROR kuryr_kubernetes.cni.daemon.service pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
[stack@undercloud-0 ~]$ for i in $(oc get pods -n openshift-kuryr -l app=kuryr-cni -o NAME ); do echo "# $i"; oc logs -n openshift-kuryr $i -p| grep "NetlinkError"; done
# pod/kuryr-cni-5mzzr
Error from server (BadRequest): previous terminated container "kuryr-cni" in pod "kuryr-cni-5mzzr" not found
# pod/kuryr-cni-7m6xd
Error from server (BadRequest): previous terminated container "kuryr-cni" in pod "kuryr-cni-7m6xd" not found
# pod/kuryr-cni-d4hvs
Error from server (BadRequest): previous terminated container "kuryr-cni" in pod "kuryr-cni-d4hvs" not found
# pod/kuryr-cni-gn4w2
Error from server (BadRequest): previous terminated container "kuryr-cni" in pod "kuryr-cni-gn4w2" not found
# pod/kuryr-cni-kp2hb
# pod/kuryr-cni-sv7j4
Error from server (BadRequest): previous terminated container "kuryr-cni" in pod "kuryr-cni-sv7j4" not found
[stack@undercloud-0 ~]$
NP tests logs on np_results_4.3.0-0.nightly-2020-07-22-001315.tgz
Please note: (errata-xmlrpc from comment #19) > If the solution does not work for you, open a new bug report. You will need to clone this bug if a further fix is required after the errata ships; by design we cannot re-ship a bug that has already shipped. |