Description of problem: When editing an EgressIP: > oc edit egressip egressip ~~~ 1 # Please edit the object below. Lines beginning with a '#' will be ignored, 2 # and an empty file will abort the edit. If an error occurs while saving this file will be 3 # reopened with the relevant failures. 4 # 5 apiVersion: k8s.ovn.org/v1 6 kind: EgressIP 7 metadata: 8 annotations: 9 kubectl.kubernetes.io/last-applied-configuration: | 10 {"apiVersion":"k8s.ovn.org/v1","kind":"EgressIP","metadata":{"annotations":{},"name":"egressip"},"spec":{"egressIPs":["10.0.129.5"],"namespaceSelector":{"matchLabels":{"kubernetes.io/metadata.name":"egressip"}}}} 11 creationTimestamp: "2022-07-09T19:48:26Z" 12 generation: 8 13 name: egressip 14 resourceVersion: "38677" 15 uid: 84cb1880-39ce-4ffa-bdd6-1adaeb8507ce 16 spec: 17 egressIPs: 18 - 10.0.129.7 19 namespaceSelector: 20 matchLabels: 21 kubernetes.io/metadata.name: egressip 22 status: 23 items: 24 - egressIP: 10.0.129.7 25 node: ci-ln-birng3t-1d09d-qkr8s-worker-centralus1-28kjf ~~~ To: ~~~ 1 # Please edit the object below. Lines beginning with a '#' will be ignored, 2 # and an empty file will abort the edit. If an error occurs while saving this file will be 3 # reopened with the relevant failures. 4 # 5 apiVersion: k8s.ovn.org/v1 6 kind: EgressIP 7 metadata: 8 annotations: 9 kubectl.kubernetes.io/last-applied-configuration: | 10 {"apiVersion":"k8s.ovn.org/v1","kind":"EgressIP","metadata":{"annotations":{},"name":"egressip"},"spec":{"egressIPs":["10.0.129.5"],"namespaceSelector":{"matchLabels":{"kubernetes.io/metadata.name":"egressip"}}}} 11 creationTimestamp: "2022-07-09T19:48:26Z" 12 generation: 8 13 name: egressip 14 resourceVersion: "38677" 15 uid: 84cb1880-39ce-4ffa-bdd6-1adaeb8507ce 16 spec: 17 egressIPs: 18 - 10.0.129.7 19 namespaceSelector: 20 matchLabels: 21 kubernetes.io/metadata.name: egressip 22 status: 23 items: 24 - egressIP: 10.0.129.10 25 node: ci-ln-birng3t-1d09d-qkr8s-worker-centralus1-28kjf ~~~ The code is subject to races whenever executeCloudPrivateIPConfigChange executed the following code ... ~~~ oc.eIPC.pendingCloudPrivateIPConfigsOps[egressIPName] = ops ~~~ ... before removePendingOpsAndGetResyncs called by reconcileCloudPrivateIPConfig has a chance to process the delete operation. This happens because the add operation unconditionally overwrites the current entry in oc.eIPC.pendingCloudPrivateIPConfigsOps[egressIPName]. The delete operation thus reports a failure and is requeued and the following log message is the result: ~~~ [akaris@linux 2101992]$ oc logs -c ovnkube-master -n openshift-ovn-kubernetes ovnkube-master-gtwvn -f | grep "no pending operation found" I0709 19:54:42.118862 1 obj_retry.go:1119] *v1.CloudPrivateIPConfig retry delete failed for 10.0.129.7, will try again later: no pending operation found for EgressIP: egressip I0709 19:55:12.129453 1 obj_retry.go:1119] *v1.CloudPrivateIPConfig retry delete failed for 10.0.129.7, will try again later: no pending operation found for EgressIP: egressip I0709 19:55:42.114429 1 obj_retry.go:1119] *v1.CloudPrivateIPConfig retry delete failed for 10.0.129.7, will try again later: no pending operation found for EgressIP: egressip I0709 19:56:12.118145 1 obj_retry.go:1119] *v1.CloudPrivateIPConfig retry delete failed for 10.0.129.7, will try again later: no pending operation found for EgressIP: egressip I0709 19:57:42.119788 1 obj_retry.go:1119] *v1.CloudPrivateIPConfig retry delete failed for 10.0.129.7, will try again later: no pending operation found for EgressIP: egressip I0709 19:59:12.117858 1 obj_retry.go:1119] *v1.CloudPrivateIPConfig retry delete failed for 10.0.129.7, will try again later: no pending operation found for EgressIP: egressip I0709 20:00:42.116703 1 obj_retry.go:1119] *v1.CloudPrivateIPConfig retry delete failed for 10.0.129.7, will try again later: no pending operation found for EgressIP: egressip ~~~ Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
*** Bug 2105712 has been marked as a duplicate of this bug. ***
How to reproduce: * spawn cluster on Azure ~~~ [akaris@linux 2101992]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-07-08-231743 True False 30m Cluster version is 4.11.0-0.nightly-2022-07-08-231743 ~~~ * create an egressip setup, e.g. [0] * edit the egressIP and change the IP address, e.g. from: ~~~ egressIPs: - 10.0.129.7 ~~~ to: ~~~ egressIPs: - 10.0.129.8 ~~~ * check the ovnkube-master leader's logs for message: ~~~ [akaris@linux 2101992]$ p=ovnkube-master-wxp4z; oc logs -n openshift-ovn-kubernetes $p -c ovnkube-master -f | grep 'will try again later: no pending operation found for EgressIP: egressip' I0710 17:47:11.513301 1 obj_retry.go:1119] *v1.CloudPrivateIPConfig retry delete failed for 10.0.129.7, will try again later: no pending operation found for EgressIP: egressip I0710 17:47:41.498750 1 obj_retry.go:1119] *v1.CloudPrivateIPConfig retry delete failed for 10.0.129.7, will try again later: no pending operation found for EgressIP: egressip I0710 17:48:11.500208 1 obj_retry.go:1119] *v1.CloudPrivateIPConfig retry delete failed for 10.0.129.7, will try again later: no pending operation found for EgressIP: egressip I0710 17:48:41.502791 1 obj_retry.go:1119] *v1.CloudPrivateIPConfig retry delete failed for 10.0.129.7, will try again later: no pending operation found for EgressIP: egressip I0710 17:50:11.499971 1 obj_retry.go:1119] *v1.CloudPrivateIPConfig retry delete failed for 10.0.129.7, will try again later: no pending operation found for EgressIP: egressip I0710 17:51:41.499073 1 obj_retry.go:1119] *v1.CloudPrivateIPConfig retry delete failed for 10.0.129.7, will try again later: no pending operation found for EgressIP: egressip ~~~ That message shows even though the update was successful: ~~~ [akaris@linux 2101992]$ oc get egressip NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip 10.0.129.8 ci-ln-4p9hg2b-1d09d-c4pt7-worker-eastus21-g2f59 10.0.129.8 ~~~ This is a race, so several updates to the IP might be needed. Potentially you might have to restart the ovnkube-master pod if you can't hit it, at all. ================================================== [0] [akaris@linux 2101992]$ cat fedora.yaml # oc adm policy add-scc-to-user privileged -z default apiVersion: apps/v1 kind: Deployment metadata: name: fedora-deployment labels: app: fedora-deployment spec: replicas: 1 selector: matchLabels: app: fedora-pod template: metadata: labels: app: fedora-pod spec: containers: - name: fedora image: fedora command: - sleep - infinity imagePullPolicy: IfNotPresent securityContext: runAsUser: 0 capabilities: add: - "SETFCAP" - "CAP_NET_RAW" - "CAP_NET_ADMIN" [akaris@linux 2101992]$ cat egressip.yaml apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egressip spec: egressIPs: - 10.0.129.7 namespaceSelector: matchLabels: kubernetes.io/metadata.name: egressip [akaris@linux 2101992]$ cat label.sh #!/bin/bash EGRESS_ASSIGNABLE_LABEL="k8s.ovn.org/egress-assignable" n=$(oc get nodes -o name -l node-role.kubernetes.io/worker= | head -1) oc label $n ${EGRESS_ASSIGNABLE_LABEL}="" --overwrite echo "When picking the egressip, select it from the following range:" oc get -o yaml $n | grep egress-ipconfig oc new-project egressip oc adm policy add-scc-to-user privileged -z default oc apply -f fedora.yaml oc apply -f egressip.yaml
Downstream merge https://github.com/openshift/ovn-kubernetes/pull/1231 landed
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0-0.nightly-2022-08-15-092951 True False 65m Cluster version is 4.12.0-0.nightly-2022-08-15-092951 $ oc get node NAME STATUS ROLES AGE VERSION jechen-0815a-7nnn7-master-0 Ready control-plane,master 98m v1.24.0+da80cd0 jechen-0815a-7nnn7-master-1 Ready control-plane,master 98m v1.24.0+da80cd0 jechen-0815a-7nnn7-master-2 Ready control-plane,master 99m v1.24.0+da80cd0 jechen-0815a-7nnn7-worker-southcentralus-1 Ready worker 81m v1.24.0+da80cd0 jechen-0815a-7nnn7-worker-southcentralus-2 Ready worker 82m v1.24.0+da80cd0 jechen-0815a-7nnn7-worker-southcentralus-3 Ready worker 82m v1.24.0+da80cd0 $ oc label node jechen-0815a-7nnn7-worker-southcentralus-1 "k8s.ovn.org/egress-assignable"="" node/jechen-0815a-7nnn7-worker-southcentralus-1 labeled $ cat config_egressip1_ovn_ns_team_red_azure.yaml apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egressip1 spec: egressIPs: - 10.0.1.201 namespaceSelector: matchLabels: team: red $ oc new-project test $ oc create -f config_egressip1_ovn_ns_team_red_azure.yaml egressip.k8s.ovn.org/egressip1 created $ oc get egressips.k8s.ovn.org NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.1.201 jechen-0815a-7nnn7-worker-southcentralus-1 10.0.1.201 $ oc get egressip egressip1 -oyaml apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: creationTimestamp: "2022-08-15T19:17:30Z" generation: 3 name: egressip1 resourceVersion: "64866" uid: 7dbfee19-db99-4e64-950a-f73a9dbfd3e8 spec: egressIPs: - 10.0.1.201 namespaceSelector: matchLabels: team: red status: items: - egressIP: 10.0.1.201 node: jechen-0815a-7nnn7-worker-southcentralus-1 $ oc get -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}' -n openshift-ovn-kubernetes cm ovn-kubernetes-master {"holderIdentity":"jechen-0815a-7nnn7-master-0","leaseDurationSeconds":60,"acquireTime":"2022-08-15T17:35:59Z","renewTime":"2022-08-15T19:20:48Z","leaderTransitions":0} $ oc get pod -n openshift-ovn-kubernetes -l app=ovnkube-master --field-selector=spec.nodeName=jechen-0815a-7nnn7-master-0 -o jsonpath={.items[*].metadata.name} ovnkube-master-5dd9t # updated egressip from 10.0.1.201 to 10.0.1.202, then check log on leader node ovnkube-master pod $ p=ovnkube-master-5dd9t; oc logs -n openshift-ovn-kubernetes $p -c ovnkube-master -f | grep 'will try again later: no pending operation found for EgressIP: egressip' ==> no CloudPrivateIPConfig retry message with "will try again later: no pending operation found for EgressIP: egressip" was found in log $ oc get egressip NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.1.202 jechen-0815a-7nnn7-worker-southcentralus-1 10.0.1.202 $ oc get egressip egressip1 -oyaml apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: creationTimestamp: "2022-08-15T19:17:30Z" generation: 5 name: egressip1 resourceVersion: "65451" uid: 7dbfee19-db99-4e64-950a-f73a9dbfd3e8 spec: egressIPs: - 10.0.1.202 namespaceSelector: matchLabels: team: red status: items: - egressIP: 10.0.1.202 node: jechen-0815a-7nnn7-worker-southcentralus-1 ==> egressip was updated correctly
verified in 4.12.0-0.nightly-2022-08-15-092951 above.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399