Hide Forgot
Description of problem: During upgrade of 4.5.40 to 4.6.31 the CNI is restarting due to unable to plug the VIF provided as it is already being used by another Pod. 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service [-] Error when processing addNetwork request. CNI Params: {'CNI_IFNAME': 'eth0', 'CNI_NETNS': '/var/run/netns/0420f2a3-d2fe-40e6-86f0-9a38a17c933a', 'CNI_PATH': '/opt/multus/bin:/var/lib/cni/bin:/usr/libexec/cni', 'CNI_COMMAND': 'ADD', 'CNI_CONTAINERID': '73eee9240ae6bcfec8b539fa2b12c8e82f51f8a95f29aaaedc95e4e05f7cb734', 'CNI_ARGS': 'IgnoreUnknown=true;K8S_POD_NAMESPACE=openshift-monitoring;K8S_POD_NAME=prometheus-k8s-0;K8S_POD_INFRA_CONTAINER_ID=73eee9240ae6bcfec8b539fa2b12c8e82f51f8a95f29aaaedc95e4e05f7cb734'}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists') 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service Traceback (most recent call last): 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/daemon/service.py", line 82, in add 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service vif = self.plugin.add(params) 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/plugins/k8s_cni_registry.py", line 75, in add 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service vifs = self._do_work(params, b_base.connect, timeout) 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/plugins/k8s_cni_registry.py", line 184, in _do_work 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service container_id=params.CNI_CONTAINERID) 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/binding/base.py", line 156, in connect 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service driver.connect(vif, ifname, netns, container_id) 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/binding/nested.py", line 126, in connect 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service iface.net_ns_fd = utils.convert_netns(netns) 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/transactional.py", line 209, in __exit__ 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service self.commit() 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 650, in commit 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service raise newif 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 589, in commit 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service self.nl.link('add', **request) 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/iproute/linux.py", line 1163, in link 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service msg_flags=msg_flags) 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 373, in nlm_request 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service return tuple(self._genlm_request(*argv, **kwarg)) 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 864, in nlm_request 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service callback=callback): 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 376, in get 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service return tuple(self._genlm_get(*argv, **kwarg)) 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 701, in get 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service raise msg['header']['error'] 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists') 2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service 2021-07-16 10:55:02.585 232 INFO werkzeug [-] 127.0.0.1 - - [16/Jul/2021 10:55:02] "POST /addNetwork HTTP/1.1" 500 - 2021-07-16 10:55:02.656 251 INFO os_vif [-] Successfully unplugged vif VIFVlanNested(active=True,address=fa:16:3e:c1:cd:25,has_traffic_filtering=False,id=88bdb7f9-65e6-4c54-83d1-73341876da08,network=Network(cc5c0761-5f89-42b8-a4fc-0d829eba818d),plugin='noop',port_profile=<?>,preserve_on_delete=False,vif_name='tap88bdb7f9-65',vlan_id=2482) The prometheus Pod is configured to used the same IP as the alert Pod, and the alert Pod is using IP different than the one specified on annotation: [stack@undercloud-0 ~]$ oc get po prometheus-k8s-0 -n openshift-monitoring -o yaml apiVersion: v1 kind: Pod metadata: annotations: openshift.io/scc: anyuid openstack.org/kuryr-pod-label: '{"app": "prometheus", "controller-revision-hash": "prometheus-k8s-5949f47544", "prometheus": "k8s", "statefulset.kubernetes.io/pod-name": "prometheus-k8s-0"}' openstack.org/kuryr-vif: '{"versioned_object.changes": ["default_vif"], "versioned_object.data": {"additional_vifs": {}, "default_vif": {"versioned_object.changes": ["has_traffic_filtering", "plugin", "active", "vif_name", "preserve_on_delete", "network", "id", "address", "vlan_id"], "versioned_object.data": {"active": true, "address": "fa:16:3e:c1:cd:25", "has_traffic_filtering": false, "id": "88bdb7f9-65e6-4c54-83d1-73341876da08", "network": {"versioned_object.changes": ["mtu", "multi_host", "subnets", "label", "id", "should_provide_bridge", "should_provide_vlan"], "versioned_object.data": {"id": "cc5c0761-5f89-42b8-a4fc-0d829eba818d", "label": "ns/openshift-monitoring-net", "mtu": 1442, "multi_host": false, "should_provide_bridge": false, "should_provide_vlan": false, "subnets": {"versioned_object.changes": ["objects"], "versioned_object.data": {"objects": [{"versioned_object.changes": ["ips", "gateway", "routes", "cidr", "dns"], "versioned_object.data": {"cidr": "10.128.8.0/23", "dns": [], "gateway": "10.128.8.1", "ips": {"versioned_object.changes": ["objects"], "versioned_object.data": {"objects": [{"versioned_object.changes": ["address"], "versioned_object.data": {"address": "10.128.9.175"}, "versioned_object.name": "FixedIP", "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}]}, "versioned_object.name": "FixedIPList", "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}, "routes": {"versioned_object.changes": ["objects"], "versioned_object.data": {"objects": []}, "versioned_object.name": "RouteList", "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}}, "versioned_object.name": "Subnet", "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}]}, "versioned_object.name": "SubnetList", "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}}, "versioned_object.name": "Network", "versioned_object.namespace": "os_vif", "versioned_object.version": "1.1"}, "plugin": "noop", "preserve_on_delete": false, "vif_name": "tap88bdb7f9-65", "vlan_id": 2482}, "versioned_object.name": "VIFVlanNested", "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}}, "versioned_object.name": "PodState", "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}' creationTimestamp: "2021-07-15T12:24:52Z" generateName: prometheus-k8s- labels: app: prometheus controller-revision-hash: prometheus-k8s-5949f47544 prometheus: k8s statefulset.kubernetes.io/pod-name: prometheus-k8s-0 name: prometheus-k8s-0 namespace: openshift-monitoring ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: StatefulSet name: prometheus-k8s uid: 08334f30-2552-499b-9245-e4f61fe92a76 resourceVersion: "112100" selfLink: /api/v1/namespaces/openshift-monitoring/pods/prometheus-k8s-0 uid: 1087ca8f-9f00-486a-8471-60956e9c27a4 [stack@undercloud-0 ~]$ oc get po -A -o wide |grep 10.128.9.175 openshift-monitoring alertmanager-main-2 5/5 Running 0 22h 10.128.9.175 ostest-f57bt-worker-vprrk <none> <none> [stack@undercloud-0 ~]$ oc get po alertmanager-main-2 -n openshift-monitoring -o yaml apiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/network-status: |- [{ "name": "kuryr", "interface": "eth0", "ips": [ "10.128.9.175" ], "mac": "fa:16:3e:c1:cd:25", "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: |- [{ "name": "kuryr", "interface": "eth0", "ips": [ "10.128.9.175" ], "mac": "fa:16:3e:c1:cd:25", "default": true, "dns": {} }] openshift.io/scc: anyuid openstack.org/kuryr-pod-label: '{"alertmanager": "main", "app": "alertmanager", "controller-revision-hash": "alertmanager-main-5548759bbd", "statefulset.kubernetes.io/pod-name": "alertmanager-main-2"}' openstack.org/kuryr-vif: '{"versioned_object.changes": ["default_vif"], "versioned_object.data": {"additional_vifs": {}, "default_vif": {"versioned_object.changes": ["active", "has_traffic_filtering", "network", "address", "id", "preserve_on_delete", "vlan_id", "plugin", "vif_name"], "versioned_object.data": {"active": true, "address": "fa:16:3e:77:a3:12", "has_traffic_filtering": false, "id": "f6dd52db-40e1-4339-a7e6-1e2bd2f6f772", "network": {"versioned_object.changes": ["multi_host", "label", "should_provide_vlan", "should_provide_bridge", "mtu", "id", "subnets"], "versioned_object.data": {"id": "cc5c0761-5f89-42b8-a4fc-0d829eba818d", "label": "ns/openshift-monitoring-net", "mtu": 1442, "multi_host": false, "should_provide_bridge": false, "should_provide_vlan": false, "subnets": {"versioned_object.changes": ["objects"], "versioned_object.data": {"objects": [{"versioned_object.changes": ["routes", "dns", "cidr", "gateway", "ips"], "versioned_object.data": {"cidr": "10.128.8.0/23", "dns": [], "gateway": "10.128.8.1", "ips": {"versioned_object.changes": ["objects"], "versioned_object.data": {"objects": [{"versioned_object.changes": ["address"], "versioned_object.data": {"address": "10.128.9.238"}, "versioned_object.name": "FixedIP", "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}]}, "versioned_object.name": "FixedIPList", "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}, "routes": {"versioned_object.changes": ["objects"], "versioned_object.data": {"objects": []}, "versioned_object.name": "RouteList", "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}}, "versioned_object.name": "Subnet", "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}]}, "versioned_object.name": "SubnetList", "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}}, "versioned_object.name": "Network", "versioned_object.namespace": "os_vif", "versioned_object.version": "1.1"}, "plugin": "noop", "preserve_on_delete": false, "vif_name": "tapf6dd52db-40", "vlan_id": 3914}, "versioned_object.name": "VIFVlanNested", "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}}, "versioned_object.name": "PodState", "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}' creationTimestamp: "2021-07-15T12:23:41Z" generateName: alertmanager-main- labels: alertmanager: main app: alertmanager controller-revision-hash: alertmanager-main-5548759bbd statefulset.kubernetes.io/pod-name: alertmanager-main-2 name: alertmanager-main-2 namespace: openshift-monitoring (shiftstack) [stack@undercloud-0 ~]$ openstack port list |grep 10.128.9.175 | 88bdb7f9-65e6-4c54-83d1-73341876da08 | | fa:16:3e:c1:cd:25 | ip_address='10.128.9.175', subnet_id='a4ee6044-8ddd-4dbf-bcd3-22f95ec4ce16' | ACTIVE | (shiftstack) [stack@undercloud-0 ~]$ openstack port list |grep 10.128.9.238 | f6dd52db-40e1-4339-a7e6-1e2bd2f6f772 | | fa:16:3e:77:a3:12 | ip_address='10.128.9.238', subnet_id='a4ee6044-8ddd-4dbf-bcd3-22f95ec4ce16' | ACTIVE | (shiftstack) [stack@undercloud-0 ~]$ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.6.31 True False False 22h cloud-credential 4.6.31 True False False 26h cluster-autoscaler 4.6.31 True False False 25h config-operator 4.6.31 True False False 25h console 4.6.31 True False False 22h csi-snapshot-controller 4.6.31 True False False 25h dns 4.5.40 True False False 25h etcd 4.6.31 True False False 25h image-registry 4.6.31 True False False 25h ingress 4.6.31 True False False 22h insights 4.6.31 True False False 25h kube-apiserver 4.6.31 True False False 25h kube-controller-manager 4.6.31 True False False 25h kube-scheduler 4.6.31 True False False 25h kube-storage-version-migrator 4.6.31 True False False 25h machine-api 4.6.31 True False False 25h machine-approver 4.6.31 True False False 25h machine-config 4.5.40 True False False 23h marketplace 4.6.31 True False False 22h monitoring 4.5.40 False True True 22h network 4.5.40 True True False 25h node-tuning 4.6.31 True False False 22h openshift-apiserver 4.6.31 True False False 25h openshift-controller-manager 4.6.31 True False False 22h openshift-samples 4.6.31 True False False 22h operator-lifecycle-manager 4.6.31 True False False 25h operator-lifecycle-manager-catalog 4.6.31 True False False 25h operator-lifecycle-manager-packageserver 4.6.31 True False False 22h service-ca 4.6.31 True False False 25h storage 4.6.31 True False False 22h (shiftstack) [stack@undercloud-0 ~]$ oc get po -A -o wide |grep 10.128.9.238 |wc -l 0 The same issue would be possible on 3.11 as it's also based on Annotations. Version-Release number of selected component (if applicable): Red Hat OpenStack Platform release 16.1.6 GA How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Removing the Triaged keyword because: * the QE automation assessment (flag qe_test_coverage) is missing
Verified by deleting prometheus-k8s-x pods and alertmanager-main-x pods several times and saw that they are recreated successfully OCP 4.11.0-0.nightly-2022-05-25-193227 OSP RHOS-16.1-RHEL-8-2022032
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069