1952819 – failed to configure pod interface: error while waiting on flows for pod: timed out waiting for OVS flows

Bug 1952819 - failed to configure pod interface: error while waiting on flows for pod: timed out waiting for OVS flows

Summary: failed to configure pod interface: error while waiting on flows for pod: time...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Federico Paolinelli
QA Contact:	Anurag saxena
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1983109 (view as bug list)
Depends On:
Blocks:	1962608
TreeView+	depends on / blocked

Reported:	2021-04-23 08:58 UTC by mchebbi@redhat.com
Modified:	2024-12-20 19:56 UTC (History)
CC List:	25 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1962608 (view as bug list)
Environment:
Last Closed:	2021-07-27 23:03:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	eBay go-ovn pull 148	None	closed	Connect - reconnect cache correctness	2021-05-19 13:46:25 UTC
Github	openshift ovn-kubernetes pull 528	None	closed	Bug 1952819: handle nbdb deletes on reconnect / delete stale ports using goovn client	2021-05-19 13:46:20 UTC
Github	openshift ovn-kubernetes pull 537	None	closed	Bug 1952819: Bump the go-ovn version to f0122836cc35fcadf72de8698b26157c5a42a9f8	2021-05-19 13:46:22 UTC
Red Hat Knowledge Base (Solution)	6217031	None	None	None	2021-07-27 20:42:33 UTC
Red Hat Product Errata	RHSA-2021:2438	None	None	None	2021-07-27 23:03:53 UTC

Internal Links: 2005181

Description mchebbi@redhat.com 2021-04-23 08:58:54 UTC

gathered data URL: shorturl.at/clCQS

Hello,

Customer did an upgrade to 4.7.6 but with a degraded monitoring operator.

[mchebbi@fedora namespaces]$ omg get pod  -n openshift-monitoring -o wide
NAME                                          READY  STATUS   RESTARTS  AGE    IP           NODE
alertmanager-main-0                           0/5    Pending  0         10m                 openshift-dev-hw62p-worker-xbrn9          <=== no ip address
alertmanager-main-1                           5/5    Running  0         11m    10.131.0.17  openshift-dev-hw62p-worker-cch82
alertmanager-main-2                           0/5    Pending  0         10m                 openshift-dev-hw62p-worker-lc5wk          <=== no ip address
cluster-monitoring-operator-68bd597889-k9ftp  2/2    Running  0         1h16m  10.128.0.18  openshift-dev-hw62p-master-1
grafana-7874c696f6-j9z6p                      2/2    Running  0         1h13m  10.131.0.14  openshift-dev-hw62p-worker-cch82
kube-state-metrics-67c786b7f6-n9vrh           3/3    Running  0         1h13m  10.129.2.18  openshift-dev-hw62p-worker-lc5wk
node-exporter-48z57                           2/2    Running  0         1h53m  172.21.8.53  openshift-dev-hw62p-master-2
node-exporter-5kms7                           2/2    Running  0         1h52m  172.21.8.55  openshift-dev-hw62p-master-1
node-exporter-7ljnl                           2/2    Running  0         1h53m  172.21.8.59  openshift-dev-hw62p-master-0
node-exporter-k5647                           2/2    Running  0         1h52m  172.21.8.72  openshift-dev-hw62p-worker-cch82
node-exporter-svjf4                           2/2    Running  0         1h52m  172.21.8.73  openshift-dev-hw62p-worker-lc5wk
node-exporter-x7vf4                           2/2    Running  0         1h51m  172.21.8.80  openshift-dev-hw62p-worker-xbrn9
openshift-state-metrics-56b7d9ff8-8tjnj       3/3    Running  0         1h13m  10.129.2.12  openshift-dev-hw62p-worker-lc5wk
prometheus-adapter-77db96648b-jz6dt           1/1    Running  0         1h13m  10.129.2.19  openshift-dev-hw62p-worker-lc5wk
prometheus-adapter-77db96648b-xdbcr           1/1    Running  0         1h13m  10.129.2.9   openshift-dev-hw62p-worker-lc5wk
prometheus-k8s-0                              0/7    Pending  0         33m                 openshift-dev-hw62p-worker-xbrn9          <=== no ip address
prometheus-k8s-1                              7/7    Running  1         1h22m  10.131.0.10  openshift-dev-hw62p-worker-cch82
prometheus-operator-67bff998f9-5lhtq          2/2    Running  0         1h16m  10.128.0.16  openshift-dev-hw62p-master-1
telemeter-client-78fdcf7976-tn5f4             3/3    Running  0         1h13m  10.129.2.13  openshift-dev-hw62p-worker-lc5wk
thanos-querier-856ffddfcf-9vdjc               5/5    Running  0         1h13m  10.129.2.15  openshift-dev-hw62p-worker-lc5wk
thanos-querier-856ffddfcf-wzzr8               5/5    Running  0         1h13m  10.131.0.11  openshift-dev-hw62p-worker-cch82

$ oc get po -n openshift-monitoring 
NAMESPACE                                          NAME                                                      READY   STATUS              RESTARTS   AGE
openshift-monitoring                               alertmanager-main-0                                       0/5     ContainerCreating   0          16h
openshift-monitoring                               alertmanager-main-1                                       0/5     ContainerCreating   0          16h
openshift-monitoring                               alertmanager-main-2                                       5/5     Running             0          16h
openshift-monitoring                               cluster-monitoring-operator-68bd597889-6h6bh              2/2     Running             0          16h
openshift-monitoring                               grafana-7874c696f6-4gk9v                                  2/2     Running             0          16h
openshift-monitoring                               kube-state-metrics-67c786b7f6-6x9lp                       3/3     Running             0          16h
openshift-monitoring                               node-exporter-48z57                                       2/2     Running             0          21h
openshift-monitoring                               node-exporter-5kms7                                       2/2     Running             0          21h
openshift-monitoring                               node-exporter-7ljnl                                       2/2     Running             0          21h
openshift-monitoring                               node-exporter-k5647                                       2/2     Running             0          21h
openshift-monitoring                               node-exporter-svjf4                                       2/2     Running             0          21h
openshift-monitoring                               node-exporter-x7vf4                                       2/2     Running             0          21h
openshift-monitoring                               openshift-state-metrics-56b7d9ff8-mcmdn                   3/3     Running             0          16h
openshift-monitoring                               prometheus-adapter-6f9548f67f-fbn2v                       1/1     Running             0          3h59m
openshift-monitoring                               prometheus-adapter-6f9548f67f-twg7c                       1/1     Running             0          3h59m
openshift-monitoring                               prometheus-k8s-0                                          0/7     ContainerCreating   0          16h
openshift-monitoring                               prometheus-k8s-1                                          7/7     Running             0          16h
openshift-monitoring                               prometheus-operator-67bff998f9-9d7bx                      2/2     Running             0          16h
openshift-monitoring                               telemeter-client-78fdcf7976-zzkmm                         3/3     Running             0          16h
openshift-monitoring                               thanos-querier-856ffddfcf-c2pwv                           5/5     Running             0          16h
openshift-monitoring                               thanos-querier-856ffddfcf-pnmbp                           5/5     Running             0          16h
=======================================================================================================================================================================================

[root@ocp-dev-console ocp]# oc get events -n openshift-monitoring
NAMESPACE                            LAST SEEN   TYPE      REASON                   OBJECT                                     MESSAGE
openshift-monitoring                 20s         Warning   FailedCreatePodSandBox   pod/alertmanager-main-0                    (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_alertmanager-main-0_openshift-monitoring_28ace80c-c675-43f9-a5b5-7cc6b8589eef_0(dd714fe33a86d802c9b16523a6fca8f574f59a9e31bfe70ed6bdbf147dc0a84a): [openshift-monitoring/alertmanager-main-0:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-monitoring/alertmanager-main-0 dd714fe33a86d802c9b16523a6fca8f574f59a9e31bfe70ed6bdbf147dc0a84a] [openshift-monitoring/alertmanager-main-0 dd714fe33a86d802c9b16523a6fca8f574f59a9e31bfe70ed6bdbf147dc0a84a] failed to configure pod interface: error while waiting on flows for pod: timed out waiting for OVS flows
'
openshift-monitoring                 35s         Warning   FailedCreatePodSandBox   pod/alertmanager-main-1                    (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_alertmanager-main-1_openshift-monitoring_6fe22f18-6d30-449b-b6fc-949992ed0688_0(97449af2ab65893a4925852a00b3beac360626dc220cf44789489fc59620b85f): [openshift-monitoring/alertmanager-main-1:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-monitoring/alertmanager-main-1 97449af2ab65893a4925852a00b3beac360626dc220cf44789489fc59620b85f] [openshift-monitoring/alertmanager-main-1 97449af2ab65893a4925852a00b3beac360626dc220cf44789489fc59620b85f] failed to configure pod interface: error while waiting on flows for pod: timed out waiting for OVS flows
'
openshift-monitoring                 2m24s       Warning   FailedCreatePodSandBox   pod/prometheus-k8s-0                       (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-k8s-0_openshift-monitoring_b305a9e5-0b53-4c72-9d2e-e040bfe6629d_0(e0321fe6a997fc5129072f5c5c59a271b87a80cdc7b091180da0f6f20ecb1609): [openshift-monitoring/prometheus-k8s-0:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-monitoring/prometheus-k8s-0 e0321fe6a997fc5129072f5c5c59a271b87a80cdc7b091180da0f6f20ecb1609] [openshift-monitoring/prometheus-k8s-0 e0321fe6a997fc5129072f5c5c59a271b87a80cdc7b091180da0f6f20ecb1609] failed to configure pod interface: error while waiting on flows for pod: timed out waiting for OVS flows
=======================================================================================================================================================================================

[root@ocp-dev-console ocp]# oc get clusterversion 
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.6     True        False         20h     Error while reconciling 4.7.6: the cluster operator monitoring has not yet successfully rolled out
=======================================================================================================================================================================================

[root@ocp-dev-console ocp]# oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
monitoring                                 4.7.6     False       True          True       16h
=======================================================================================================================================================================================
[mchebbi@fedora logs]$ omg get pods -n openshift-monitoring|grep -i pending
alertmanager-main-0                           0/5    Pending  0         10m
alertmanager-main-2                           0/5    Pending  0         10m
prometheus-k8s-0                              0/7    Pending  0         33m


2021-04-19T12:42:45.003983515Z I0419 12:42:45.003948       1 tasks.go:46] ran task 15 of 15: Updating User Workload Thanos Ruler
2021-04-19T12:46:48.337049930Z I0419 12:46:48.336997       1 operator.go:237] ConfigMap to configure stack does not exist. Reconciling with default config every 15m0s.
2021-04-19T12:47:35.051205028Z I0419 12:47:35.051173       1 tasks.go:46] ran task 7 of 15: Updating Alertmanager
2021-04-19T12:47:37.478192708Z I0419 12:47:37.478159       1 tasks.go:46] ran task 5 of 15: Updating Prometheus-k8s
2021-04-19T12:47:37.478228208Z I0419 12:47:37.478188       1 operator.go:374] Updating ClusterOperator status to failed. Err: running task Updating Alertmanager failed: waiting for Alertmanager object changes failed: waiting for Alertmanager openshift-monitoring/main: expected 3 replicas, got 1 updated replicas
2021-04-19T12:47:37.486270474Z E0419 12:47:37.486240       1 operator.go:306] Syncing "openshift-monitoring/cluster-monitoring-config" failed
2021-04-19T12:47:37.486297985Z E0419 12:47:37.486266       1 operator.go:307] sync "openshift-monitoring/cluster-monitoring-config" failed: running task Updating Alertmanager failed: waiting for Alertmanager object changes failed: waiting for Alertmanager openshift-monitoring/main: expected 3 replicas, got 1 updated replicas
2021-04-19T12:47:37.486297985Z W0419 12:47:37.486292       1 operator.go:427] No Cluster Monitoring ConfigMap was found. Using defaults.
2021-04-19T12:47:37.503450533Z I0419 12:47:37.503400       1 operator.go:366] Updating ClusterOperator status to in progress.
2021-04-19T12:47:48.581634518Z I0419 12:47:48.581600       1 tasks.go:46] ran task 15 of 15: Updating User Workload Thanos Ruler
2021-04-19T12:51:48.146898447Z I0419 12:51:48.146852       1 operator.go:279] Triggering an update due to ConfigMap or Secret: kube-system/extension-apiserver-authentication
2021-04-19T12:51:48.224846771Z I0419 12:51:48.224815       1 operator.go:279] Triggering an update due to ConfigMap or Secret: openshift-monitoring/prometheus-adapter-tls
2021-04-19T12:51:48.224936825Z I0419 12:51:48.224911       1 operator.go:279] Triggering an update due to ConfigMap or Secret: openshift-monitoring/grpc-tls
2021-04-19T12:51:48.226328174Z I0419 12:51:48.226304       1 operator.go:279] Triggering an update due to ConfigMap or Secret: openshift-config-managed/kubelet-serving-ca
2021-04-19T12:51:48.291335021Z I0419 12:51:48.291300       1 operator.go:279] Triggering an update due to ConfigMap or Secret: openshift-monitoring/alertmanager-trusted-ca-bundle
2021-04-19T12:51:48.291335021Z I0419 12:51:48.291325       1 operator.go:279] Triggering an update due to ConfigMap or Secret: openshift-monitoring/telemeter-trusted-ca-bundle
2021-04-19T12:51:48.337058104Z I0419 12:51:48.336998       1 operator.go:237] ConfigMap to configure stack does not exist. Reconciling with default config every 15m0s.


- lastTransitionTime: "2021-04-19T09:15:31Z"
      message: Done applying 4.7.6
      status: "True"
      type: Available
    - lastTransitionTime: "2021-04-19T11:54:47Z"
      message: Cluster operator monitoring is not available
      reason: ClusterOperatorNotAvailable
      status: "True"
      type: Failing
    - lastTransitionTime: "2021-04-19T11:42:23Z"
      message: 'Error while reconciling 4.7.6: the cluster operator monitoring has not yet successfully rolled out'
      reason: ClusterOperatorNotAvailable
      status: "False"
      type: Progressing

===========================================

alertmanager-main-0

- lastProbeTime: null
    lastTransitionTime: '2021-04-19T12:41:42Z'
    message: 'containers with unready status: [alertmanager config-reloader alertmanager-proxy
      kube-rbac-proxy prom-label-proxy]'
    reason: ContainersNotReady
    status: 'False'
    type: Ready
===========================================
alertmanager-main-2

- lastProbeTime: null
    lastTransitionTime: '2021-04-19T12:42:02Z'
    message: 'containers with unready status: [alertmanager config-reloader alertmanager-proxy
      kube-rbac-proxy prom-label-proxy]'
    reason: ContainersNotReady
    status: 'False'

-------------------------

Thanks for your help.

Comment 2 Tim Rozet 2021-04-27 13:20:04 UTC

Looking at the logs:

ovnkube-master-7dl6w/ovnkube-master/ovnkube-master/logs/current.log:2021-04-19T11:35:24.233947454Z I0419 11:35:24.233935       1 pods.go:297] LSP already exists for port: openshift-monitoring_prometheus-k8s-0
ovnkube-master-7dl6w/ovnkube-master/ovnkube-master/logs/current.log:2021-04-19T11:35:24.327140074Z I0419 11:35:24.327114       1 pods.go:261] [openshift-monitoring/prometheus-k8s-0] addLogicalPort took 93.1917ms
ovnkube-master-86vmg/ovnkube-master/ovnkube-master/logs/current.log:2021-04-19T11:40:01.923797846Z I0419 11:40:01.923775       1 pods.go:60] Stale logical port found: openshift-monitoring_prometheus-k8s-0. This logical port will be deleted.
ovnkube-master-86vmg/ovnkube-master/ovnkube-master/logs/current.log:2021-04-19T11:40:01.925549283Z 2021-04-19T11:40:01.925Z|00290|nbctl|INFO|Running command run --if-exists -- lsp-del openshift-monitoring_prometheus-k8s-0
ovnkube-master-86vmg/ovnkube-master/ovnkube-master/logs/current.log:2021-04-19T11:40:29.156372889Z I0419 11:40:29.156364       1 pods.go:297] LSP already exists for port: openshift-monitoring_prometheus-k8s-0
ovnkube-master-86vmg/ovnkube-master/ovnkube-master/logs/current.log:2021-04-19T11:40:29.156415283Z I0419 11:40:29.156403       1 kube.go:63] Setting annotations map[k8s.ovn.org/pod-networks:{"default":{"ip_addresses":["10.129.2.17/23"],"mac_address":"0a:58:0a:81:02:11","gateway_ips":["10.129.2.1"],"routes":[{"dest":"10.132.0.0/14","nextHop":"10.129.2.3"}],"ip_address":"10.129.2.17/23","gateway_ip":"10.129.2.1"}}] on pod openshift-monitoring/prometheus-k8s-0
ovnkube-master-86vmg/ovnkube-master/ovnkube-master/logs/current.log:2021-04-19T11:40:29.208380617Z I0419 11:40:29.208352       1 pods.go:261] [openshift-monitoring/prometheus-k8s-0] addLogicalPort took 51.992078ms
ovnkube-master-86vmg/ovnkube-master/ovnkube-master/logs/current.log:2021-04-19T11:40:29.208508428Z E0419 11:40:29.208494       1 ovn.go:627] error while creating logical port openshift-monitoring_prometheus-k8s-0 error: connection is shut down

One issue I see here is that we sync the pod via a nbctl command, but then try to add the port again using go-ovn bindings. It's possible we could run into a cache out of sync issue where the go-ovn cache thinks the port is still there, but its really been deleted via nbctl. We should fix the sync method to use go-ovn as well. I'm not saying that's the root cause of this bug, just noticing it while I'm looking at the logs.

Comment 3 Tim Rozet 2021-04-27 13:40:41 UTC

What looks odd from these logs is the timing between binding and annotations:

namespaces/openshift-ovn-kubernetes/pods/ovnkube-master-86vmg/ovnkube-master/ovnkube-master/logs/current.log:2021-04-19T11:40:29.156415283Z I0419 11:40:29.156403       1 kube.go:63] Setting annotations map[k8s.ovn.org/pod-networks:{"default":{"ip_addresses":["10.129.2.17/23"],"mac_address":"0a:58:0a:81:02:11","gateway_ips":["10.129.2.1"],"routes":[{"dest":"10.132.0.0/14","nextHop":"10.129.2.3"}],"ip_address":"10.129.2.17/23","gateway_ip":"10.129.2.1"}}] on pod openshift-monitoring/prometheus-k8s-0


[trozet@trozet ci.]$ grep -r prometheus-k8s-0 * | grep ovn-controller
namespaces/openshift-ovn-kubernetes/pods/ovnkube-node-whdjz/ovn-controller/ovn-controller/logs/current.log:2021-04-19T11:26:22.817976433Z 2021-04-19T11:26:22Z|00038|binding|INFO|Releasing lport openshift-monitoring_prometheus-k8s-0 from this chassis.
namespaces/openshift-ovn-kubernetes/pods/ovnkube-node-vmzzl/ovn-controller/ovn-controller/logs/current.log:2021-04-19T11:26:36.082279873Z 2021-04-19T11:26:36Z|00028|binding|INFO|Claiming lport openshift-monitoring_prometheus-k8s-0 for this chassis.
namespaces/openshift-ovn-kubernetes/pods/ovnkube-node-vmzzl/ovn-controller/ovn-controller/logs/current.log:2021-04-19T11:26:36.082279873Z 2021-04-19T11:26:36Z|00029|binding|INFO|openshift-monitoring_prometheus-k8s-0: Claiming 0a:58:0a:80:02:09 10.128.2.9
namespaces/openshift-ovn-kubernetes/pods/ovnkube-node-vmzzl/ovn-controller/ovn-controller/logs/current.log:2021-04-19T11:38:58.508607415Z 2021-04-19T11:38:58Z|00049|binding|INFO|Releasing lport openshift-monitoring_prometheus-k8s-0 from this chassis.


We can see at 11:26:36 ovn-controller bound to 10.128.2.9, but we dont see that new IP annotated until:
namespaces/openshift-ovn-kubernetes/pods/ovnkube-master-86vmg/ovnkube-master/ovnkube-master/logs/current.log:2021-04-19T12:19:31.964007339Z I0419 12:19:31.963998       1 kube.go:63] Setting annotations map[k8s.ovn.org/pod-networks:{"default":{"ip_addresses":["10.128.2.9/23"],"mac_address":"0a:58:0a:80:02:09","gateway_ips":["10.128.2.1"],"routes":[{"dest":"10.132.0.0/14","nextHop":"10.128.2.3"}],"ip_address":"10.128.2.9/23","gateway_ip":"10.128.2.1"}}] on pod openshift-monitoring/prometheus-k8s-0

Comment 5 Federico Paolinelli 2021-04-30 15:41:11 UTC

We think we found what happened here.

When ovnkube-master started, it found the prometheus pod not running and deleted its port.

2021-04-19T11:40:01.923797846Z I0419 11:40:01.923775 1 pods.go:60] Stale logical port found: openshift-monitoring_prometheus-k8s-0. This logical port will be deleted.
2021-04-19T11:40:01.925549283Z 2021-04-19T11:40:01.925Z|00290|nbctl|INFO|Running command run --if-exists -- lsp-del openshift-monitoring_prometheus-k8s-0

There was then a disconnection between kubemaster and the nbdb:

2021-04-19T11:40:26.193749278Z 2021/04/19 11:40:26 rpc2: client protocol error: read tcp 172.21.8.55:52212->172.21.8.53:9642: read: connection reset by peer
2021-04-19T11:40:26.193812317Z 2021/04/19 11:40:26 ssl:172.21.8.53:9642,ssl:172.21.8.55:9642,ssl:172.21.8.59:9642 disconnected. Reconnecting ...
2021-04-19T11:40:26.701238456Z 2021/04/19 11:40:26 ssl:172.21.8.53:9642,ssl:172.21.8.55:9642,ssl:172.21.8.59:9642 reconnected.
2021-04-19T11:40:28.754648075Z 2021/04/19 11:40:28 rpc2: client protocol error: read tcp 172.21.8.55:45308->172.21.8.53:9641: read: connection reset by peer
2021-04-19T11:40:28.754903588Z 2021/04/19 11:40:28 ssl:172.21.8.53:9641,ssl:172.21.8.55:9641,ssl:172.21.8.59:9641 disconnected. Reconnecting ...

which recovered here

2021-04-19T11:40:29.286480541Z 2021/04/19 11:40:29 ssl:172.21.8.53:9641,ssl:172.21.8.55:9641,ssl:172.21.8.59:9641 reconnected.

In the meanwhile, the prometheus pod came back (with fixed name, which is associated to the switch logical port):

2021-04-19T11:40:29.156372889Z I0419 11:40:29.156364 1 pods.go:297] LSP already exists for port: openshift-monitoring_prometheus-k8s-0
2021-04-19T11:40:29.156415283Z I0419 11:40:29.156403 1 kube.go:63] Setting annotations map[k8s.ovn.org/pod-networks:{"default":{"ip_addresses":[
2021-04-19T11:40:29.208380617Z I0419 11:40:29.208352 1 pods.go:261] [openshift-monitoring/prometheus-k8s-0] addLogicalPort took 51.992078ms
2021-04-19T11:40:29.208508428Z E0419 11:40:29.208494 1 ovn.go:627] error while creating logical port openshift-monitoring_prometheus-k8s-0 error: connection is shut down

What's odd is the fact that the master thinks the LSP is there, but the connection is not up. This is because the go-ovn binding library we use holds a cache of the content of the nbdb.
The cache gets refreshed when it receives updates (or deletion) events, and gets aligned when the master starts OR when it reconnects.
We found an issue where deletion events handled while the connection was down are not processed when reconnecting, which causes ovnkube master to think that the port is already there while it's not.

There was a disconnection between ovn-kubemaster and north bound database:

Comment 25 Tim Rozet 2021-07-22 16:09:02 UTC

*** Bug 1983109 has been marked as a duplicate of this bug. ***

Comment 30 errata-xmlrpc 2021-07-27 23:03:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.

aconstan
aharchin
alegrand
anbhat
anpicker
astoycos
bjarolim
dblack
erooth
fpaoline
gwest
kakkoyun
lcosic
mkarg
nkim
openshift-bugs-escalate
pawankum
pkrupa
rbrattai
spasquie
surbania
trozet
uselessidbr
ychoukse
zzhao