Bug 1983056

Summary: IP conflict while recreating Pod with fixed name
Product: OpenShift Container Platform Reporter: Maysa Macedo <mdemaced>
Component: NetworkingAssignee: MichaƂ Dulko <mdulko>
Networking sub component: kuryr QA Contact: Itzik Brown <itbrown>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: itbrown
Version: 4.5Keywords: Triaged
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 10:36:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Maysa Macedo 2021-07-16 11:04:47 UTC
Description of problem:

During upgrade of 4.5.40 to 4.6.31 the CNI is restarting due to unable to plug the VIF provided as it is already being used by another Pod.

2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service [-] Error when processing addNetwork request. CNI Params: {'CNI_IFNAME': 'eth0', 'CNI_NETNS': '/var/run/netns/0420f2a3-d2fe-40e6-86f0-9a38a17c933a', 'CNI_PATH': '/opt/multus/bin:/var/lib/cni/bin:/usr/libexec/cni', 'CNI_COMMAND': 'ADD', 'CNI_CONTAINERID': '73eee9240ae6bcfec8b539fa2b12c8e82f51f8a95f29aaaedc95e4e05f7cb734', 'CNI_ARGS': 'IgnoreUnknown=true;K8S_POD_NAMESPACE=openshift-monitoring;K8S_POD_NAME=prometheus-k8s-0;K8S_POD_INFRA_CONTAINER_ID=73eee9240ae6bcfec8b539fa2b12c8e82f51f8a95f29aaaedc95e4e05f7cb734'}: pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service Traceback (most recent call last):
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/daemon/service.py", line 82, in add
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service     vif = self.plugin.add(params)
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/plugins/k8s_cni_registry.py", line 75, in add
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service     vifs = self._do_work(params, b_base.connect, timeout)
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/plugins/k8s_cni_registry.py", line 184, in _do_work
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service     container_id=params.CNI_CONTAINERID)
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/binding/base.py", line 156, in connect
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service     driver.connect(vif, ifname, netns, container_id)
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/cni/binding/nested.py", line 126, in connect
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service     iface.net_ns_fd = utils.convert_netns(netns)
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/transactional.py", line 209, in __exit__
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service     self.commit()
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 650, in commit
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service     raise newif
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python3.6/site-packages/pyroute2/ipdb/interfaces.py", line 589, in commit
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service     self.nl.link('add', **request)
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python3.6/site-packages/pyroute2/iproute/linux.py", line 1163, in link
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service     msg_flags=msg_flags)
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 373, in nlm_request
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service     return tuple(self._genlm_request(*argv, **kwarg))
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 864, in nlm_request
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service     callback=callback):
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 376, in get
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service     return tuple(self._genlm_get(*argv, **kwarg))
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service   File "/usr/lib/python3.6/site-packages/pyroute2/netlink/nlsocket.py", line 701, in get
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service     raise msg['header']['error']
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service pyroute2.netlink.exceptions.NetlinkError: (17, 'File exists')
2021-07-16 10:55:02.580 232 ERROR kuryr_kubernetes.cni.daemon.service 
2021-07-16 10:55:02.585 232 INFO werkzeug [-] 127.0.0.1 - - [16/Jul/2021 10:55:02] "POST /addNetwork HTTP/1.1" 500 -
2021-07-16 10:55:02.656 251 INFO os_vif [-] Successfully unplugged vif VIFVlanNested(active=True,address=fa:16:3e:c1:cd:25,has_traffic_filtering=False,id=88bdb7f9-65e6-4c54-83d1-73341876da08,network=Network(cc5c0761-5f89-42b8-a4fc-0d829eba818d),plugin='noop',port_profile=<?>,preserve_on_delete=False,vif_name='tap88bdb7f9-65',vlan_id=2482)

The prometheus Pod is configured to used the same IP as the alert Pod, and the alert Pod is using IP different than the one specified on annotation:

[stack@undercloud-0 ~]$ oc get po prometheus-k8s-0 -n openshift-monitoring -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    openshift.io/scc: anyuid
    openstack.org/kuryr-pod-label: '{"app": "prometheus", "controller-revision-hash":
      "prometheus-k8s-5949f47544", "prometheus": "k8s", "statefulset.kubernetes.io/pod-name":
      "prometheus-k8s-0"}'
    openstack.org/kuryr-vif: '{"versioned_object.changes": ["default_vif"], "versioned_object.data":
      {"additional_vifs": {}, "default_vif": {"versioned_object.changes": ["has_traffic_filtering",
      "plugin", "active", "vif_name", "preserve_on_delete", "network", "id", "address",
      "vlan_id"], "versioned_object.data": {"active": true, "address": "fa:16:3e:c1:cd:25",
      "has_traffic_filtering": false, "id": "88bdb7f9-65e6-4c54-83d1-73341876da08",
      "network": {"versioned_object.changes": ["mtu", "multi_host", "subnets", "label",
      "id", "should_provide_bridge", "should_provide_vlan"], "versioned_object.data":
      {"id": "cc5c0761-5f89-42b8-a4fc-0d829eba818d", "label": "ns/openshift-monitoring-net",
      "mtu": 1442, "multi_host": false, "should_provide_bridge": false, "should_provide_vlan":
      false, "subnets": {"versioned_object.changes": ["objects"], "versioned_object.data":
      {"objects": [{"versioned_object.changes": ["ips", "gateway", "routes", "cidr",
      "dns"], "versioned_object.data": {"cidr": "10.128.8.0/23", "dns": [], "gateway":
      "10.128.8.1", "ips": {"versioned_object.changes": ["objects"], "versioned_object.data":
      {"objects": [{"versioned_object.changes": ["address"], "versioned_object.data":
      {"address": "10.128.9.175"}, "versioned_object.name": "FixedIP", "versioned_object.namespace":
      "os_vif", "versioned_object.version": "1.0"}]}, "versioned_object.name": "FixedIPList",
      "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"},
      "routes": {"versioned_object.changes": ["objects"], "versioned_object.data":
      {"objects": []}, "versioned_object.name": "RouteList", "versioned_object.namespace":
      "os_vif", "versioned_object.version": "1.0"}}, "versioned_object.name": "Subnet",
      "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}]},
      "versioned_object.name": "SubnetList", "versioned_object.namespace": "os_vif",
      "versioned_object.version": "1.0"}}, "versioned_object.name": "Network", "versioned_object.namespace":
      "os_vif", "versioned_object.version": "1.1"}, "plugin": "noop", "preserve_on_delete":
      false, "vif_name": "tap88bdb7f9-65", "vlan_id": 2482}, "versioned_object.name":
      "VIFVlanNested", "versioned_object.namespace": "os_vif", "versioned_object.version":
      "1.0"}}, "versioned_object.name": "PodState", "versioned_object.namespace":
      "os_vif", "versioned_object.version": "1.0"}'
  creationTimestamp: "2021-07-15T12:24:52Z"
  generateName: prometheus-k8s-
  labels:
    app: prometheus
    controller-revision-hash: prometheus-k8s-5949f47544
    prometheus: k8s
    statefulset.kubernetes.io/pod-name: prometheus-k8s-0
  name: prometheus-k8s-0
  namespace: openshift-monitoring
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: prometheus-k8s
    uid: 08334f30-2552-499b-9245-e4f61fe92a76
  resourceVersion: "112100"
  selfLink: /api/v1/namespaces/openshift-monitoring/pods/prometheus-k8s-0
  uid: 1087ca8f-9f00-486a-8471-60956e9c27a4


[stack@undercloud-0 ~]$ oc get po -A -o wide |grep 10.128.9.175
openshift-monitoring                               alertmanager-main-2                                          5/5     Running             0          22h     10.128.9.175     ostest-f57bt-worker-vprrk   <none>           <none>
[stack@undercloud-0 ~]$ oc get po alertmanager-main-2 -n openshift-monitoring -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "kuryr",
          "interface": "eth0",
          "ips": [
              "10.128.9.175"
          ],
          "mac": "fa:16:3e:c1:cd:25",
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "kuryr",
          "interface": "eth0",
          "ips": [
              "10.128.9.175"
          ],
          "mac": "fa:16:3e:c1:cd:25",
          "default": true,
          "dns": {}
      }]
    openshift.io/scc: anyuid
    openstack.org/kuryr-pod-label: '{"alertmanager": "main", "app": "alertmanager",
      "controller-revision-hash": "alertmanager-main-5548759bbd", "statefulset.kubernetes.io/pod-name":
      "alertmanager-main-2"}'
    openstack.org/kuryr-vif: '{"versioned_object.changes": ["default_vif"], "versioned_object.data":
      {"additional_vifs": {}, "default_vif": {"versioned_object.changes": ["active",
      "has_traffic_filtering", "network", "address", "id", "preserve_on_delete", "vlan_id",
      "plugin", "vif_name"], "versioned_object.data": {"active": true, "address":
      "fa:16:3e:77:a3:12", "has_traffic_filtering": false, "id": "f6dd52db-40e1-4339-a7e6-1e2bd2f6f772",
      "network": {"versioned_object.changes": ["multi_host", "label", "should_provide_vlan",
      "should_provide_bridge", "mtu", "id", "subnets"], "versioned_object.data": {"id":
      "cc5c0761-5f89-42b8-a4fc-0d829eba818d", "label": "ns/openshift-monitoring-net",
      "mtu": 1442, "multi_host": false, "should_provide_bridge": false, "should_provide_vlan":
      false, "subnets": {"versioned_object.changes": ["objects"], "versioned_object.data":
      {"objects": [{"versioned_object.changes": ["routes", "dns", "cidr", "gateway",
      "ips"], "versioned_object.data": {"cidr": "10.128.8.0/23", "dns": [], "gateway":
      "10.128.8.1", "ips": {"versioned_object.changes": ["objects"], "versioned_object.data":
      {"objects": [{"versioned_object.changes": ["address"], "versioned_object.data":
      {"address": "10.128.9.238"}, "versioned_object.name": "FixedIP", "versioned_object.namespace":
      "os_vif", "versioned_object.version": "1.0"}]}, "versioned_object.name": "FixedIPList",
      "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"},
      "routes": {"versioned_object.changes": ["objects"], "versioned_object.data":
      {"objects": []}, "versioned_object.name": "RouteList", "versioned_object.namespace":
      "os_vif", "versioned_object.version": "1.0"}}, "versioned_object.name": "Subnet",
      "versioned_object.namespace": "os_vif", "versioned_object.version": "1.0"}]},
      "versioned_object.name": "SubnetList", "versioned_object.namespace": "os_vif",
      "versioned_object.version": "1.0"}}, "versioned_object.name": "Network", "versioned_object.namespace":
      "os_vif", "versioned_object.version": "1.1"}, "plugin": "noop", "preserve_on_delete":
      false, "vif_name": "tapf6dd52db-40", "vlan_id": 3914}, "versioned_object.name":
      "VIFVlanNested", "versioned_object.namespace": "os_vif", "versioned_object.version":
      "1.0"}}, "versioned_object.name": "PodState", "versioned_object.namespace":
      "os_vif", "versioned_object.version": "1.0"}'
  creationTimestamp: "2021-07-15T12:23:41Z"
  generateName: alertmanager-main-
  labels:
    alertmanager: main
    app: alertmanager
    controller-revision-hash: alertmanager-main-5548759bbd
    statefulset.kubernetes.io/pod-name: alertmanager-main-2
  name: alertmanager-main-2
  namespace: openshift-monitoring

(shiftstack) [stack@undercloud-0 ~]$ openstack port list |grep 10.128.9.175
| 88bdb7f9-65e6-4c54-83d1-73341876da08 |                                                      | fa:16:3e:c1:cd:25 | ip_address='10.128.9.175', subnet_id='a4ee6044-8ddd-4dbf-bcd3-22f95ec4ce16'   | ACTIVE |

(shiftstack) [stack@undercloud-0 ~]$ openstack port list |grep 10.128.9.238
| f6dd52db-40e1-4339-a7e6-1e2bd2f6f772 |                                                      | fa:16:3e:77:a3:12 | ip_address='10.128.9.238', subnet_id='a4ee6044-8ddd-4dbf-bcd3-22f95ec4ce16'   | ACTIVE |

(shiftstack) [stack@undercloud-0 ~]$ oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.6.31    True        False         False      22h
cloud-credential                           4.6.31    True        False         False      26h
cluster-autoscaler                         4.6.31    True        False         False      25h
config-operator                            4.6.31    True        False         False      25h
console                                    4.6.31    True        False         False      22h
csi-snapshot-controller                    4.6.31    True        False         False      25h
dns                                        4.5.40    True        False         False      25h
etcd                                       4.6.31    True        False         False      25h
image-registry                             4.6.31    True        False         False      25h
ingress                                    4.6.31    True        False         False      22h
insights                                   4.6.31    True        False         False      25h
kube-apiserver                             4.6.31    True        False         False      25h
kube-controller-manager                    4.6.31    True        False         False      25h
kube-scheduler                             4.6.31    True        False         False      25h
kube-storage-version-migrator              4.6.31    True        False         False      25h
machine-api                                4.6.31    True        False         False      25h
machine-approver                           4.6.31    True        False         False      25h
machine-config                             4.5.40    True        False         False      23h
marketplace                                4.6.31    True        False         False      22h
monitoring                                 4.5.40    False       True          True       22h
network                                    4.5.40    True        True          False      25h
node-tuning                                4.6.31    True        False         False      22h
openshift-apiserver                        4.6.31    True        False         False      25h
openshift-controller-manager               4.6.31    True        False         False      22h
openshift-samples                          4.6.31    True        False         False      22h
operator-lifecycle-manager                 4.6.31    True        False         False      25h
operator-lifecycle-manager-catalog         4.6.31    True        False         False      25h
operator-lifecycle-manager-packageserver   4.6.31    True        False         False      22h
service-ca                                 4.6.31    True        False         False      25h
storage                                    4.6.31    True        False         False      22h

(shiftstack) [stack@undercloud-0 ~]$ oc get po -A -o wide |grep 10.128.9.238 |wc -l
0

The same issue would be possible on 3.11 as it's also based on Annotations.

Version-Release number of selected component (if applicable):

Red Hat OpenStack Platform release 16.1.6 GA
How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 ShiftStack Bugwatcher 2022-03-05 07:07:11 UTC
Removing the Triaged keyword because:
* the QE automation assessment (flag qe_test_coverage) is missing

Comment 7 Itzik Brown 2022-06-01 13:19:37 UTC
Verified by deleting prometheus-k8s-x pods and alertmanager-main-x pods several times and saw that they are recreated successfully
OCP 4.11.0-0.nightly-2022-05-25-193227
OSP RHOS-16.1-RHEL-8-2022032

Comment 9 errata-xmlrpc 2022-08-10 10:36:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069