Bug 1973286

Summary:	getting timeout error while waiting for OVS port binding when ds pod recreated
Product:	OpenShift Container Platform	Reporter:	Shelly Miron <smiron>
Component:	Networking	Assignee:	Tim Rozet <trozet>
Networking sub component:	ovn-kubernetes	QA Contact:	Shelly Miron <smiron>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	medium	CC:	abeekhof, anbhat, bbennett, dcbw, dosmith, trozet, vpickard, weliang
Version:	4.8	Flags:	trozet: needinfo-
Target Milestone:	---
Target Release:	4.10.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-03-16 11:12:07 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1969958

Comment 3 Tim Rozet 2021-06-23 20:56:00 UTC

Can you please recreate this setup so I can debug it live? The logs unfortunately are cutoff/missing so its hard to tell exactly why the containers cannot be created. If that isn't an option then please try to get must gather without deleting/re-adding the node and also get must-gather network.

There is one thing I noticed where it looks like after you deleted worker-0-1 and brought it back ovnkube-node crashes a bunch of times, with the previous log showing:

2021-06-17T12:52:18.503485080Z E0617 12:52:18.503211   92439 healthcheck.go:188] Fatal error: ofport of patch-br-ex_worker-0-1-to-br-int has changed from 11 to 12


I believe this patch port is changing because:
[trozet@fedora 1973286]$ cat master-kvrmk.log  | grep ls-del | grep ext_worker-0-1
2021-06-17T12:36:50.101915693Z 2021-06-17T12:36:50.101Z|152356|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:38:23.122604598Z 2021-06-17T12:38:23.122Z|152819|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:39:55.130813574Z 2021-06-17T12:39:55.130Z|153402|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:41:27.116256832Z 2021-06-17T12:41:27.116Z|153649|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:42:59.096477260Z 2021-06-17T12:42:59.096Z|153790|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:44:32.173426933Z 2021-06-17T12:44:32.173Z|154291|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:46:04.105295242Z 2021-06-17T12:46:04.104Z|155023|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:47:37.110030879Z 2021-06-17T12:47:37.109Z|155163|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:49:10.112840715Z 2021-06-17T12:49:10.112Z|155303|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:50:42.106781719Z 2021-06-17T12:50:42.106Z|155766|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:52:15.105149269Z 2021-06-17T12:52:15.104Z|156336|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:53:48.101879536Z 2021-06-17T12:53:48.101Z|156560|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:55:20.120689498Z 2021-06-17T12:55:20.120Z|156700|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
[trozet@fedora 1973286]$ cat master-kvrmk.log  | grep ls-del | grep ext_worker-0-1 | wc -l
13


More logs reveal there are a bunch of adds/deletes sequentially:
2021-06-17T12:52:15.105149269Z 2021-06-17T12:52:15.104Z|156336|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:52:16.436966415Z 2021-06-17T12:52:16.436Z|156370|nbctl|INFO|Running command run --may-exist -- ls-add ext_worker-0-1
2021-06-17T12:52:16.448744564Z 2021-06-17T12:52:16.448Z|156371|nbctl|INFO|Running command run -- --may-exist lsp-add ext_worker-0-1 br-ex_worker-0-1 -- lsp-set-addresses br-ex_worker-0-1 unknown -- lsp-set-type br-ex_worker-0-1 localnet -- lsp-set-options br-ex_worker-0-1 network_name=physnet -- set logical_switch_port br-ex_worker-0-1 tag_request=0
2021-06-17T12:52:16.471584179Z 2021-06-17T12:52:16.471Z|156373|nbctl|INFO|Running command run -- --may-exist lsp-add ext_worker-0-1 etor-GR_worker-0-1 -- set logical_switch_port etor-GR_worker-0-1 type=router options:router-port=rtoe-GR_worker-0-1 "addresses=\"52:54:00:c2:0c:15\""
2021-06-17T12:52:16.725633591Z 2021-06-17T12:52:16.725Z|156422|nbctl|INFO|Running command run --may-exist -- ls-add ext_worker-0-1
2021-06-17T12:52:16.729942478Z 2021-06-17T12:52:16.729Z|156424|nbctl|INFO|Running command run -- --may-exist lsp-add ext_worker-0-1 br-ex_worker-0-1 -- lsp-set-addresses br-ex_worker-0-1 unknown -- lsp-set-type br-ex_worker-0-1 localnet -- lsp-set-options br-ex_worker-0-1 network_name=physnet -- set logical_switch_port br-ex_worker-0-1 tag_request=0
2021-06-17T12:52:16.752157853Z 2021-06-17T12:52:16.752Z|156430|nbctl|INFO|Running command run -- --may-exist lsp-add ext_worker-0-1 etor-GR_worker-0-1 -- set logical_switch_port etor-GR_worker-0-1 type=router options:router-port=rtoe-GR_worker-0-1 "addresses=\"52:54:00:c2:0c:15\""
2021-06-17T12:53:48.101879536Z 2021-06-17T12:53:48.101Z|156560|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:53:49.504964179Z 2021-06-17T12:53:49.504Z|156594|nbctl|INFO|Running command run --may-exist -- ls-add ext_worker-0-1
2021-06-17T12:53:49.527507951Z 2021-06-17T12:53:49.526Z|156595|nbctl|INFO|Running command run -- --may-exist lsp-add ext_worker-0-1 br-ex_worker-0-1 -- lsp-set-addresses br-ex_worker-0-1 unknown -- lsp-set-type br-ex_worker-0-1 localnet -- lsp-set-options br-ex_worker-0-1 network_name=physnet -- set logical_switch_port br-ex_worker-0-1 tag_request=0
2021-06-17T12:53:49.556131703Z 2021-06-17T12:53:49.555Z|156597|nbctl|INFO|Running command run -- --may-exist lsp-add ext_worker-0-1 etor-GR_worker-0-1 -- set logical_switch_port etor-GR_worker-0-1 type=router options:router-port=rtoe-GR_worker-0-1 "addresses=\"52:54:00:c2:0c:15\""



This happens when a node is deleted, but from your description it seems like you only deleted the node once? So why are we seeing so many add/delete events for this node?

Comment 4 Tim Rozet 2021-06-25 03:13:38 UTC

Thanks Shelly for providing a live setup. I see what's going on now. So when we get the delete event we tear down everything related to that node, this includes the OVN logical switch and the switch ports for all the containers associated with it. However, in this test the node comes back as a node ADD, and the containers still exist on the node. In other words there are no new pod adds happening here. Due to the fact that the ports have been deleted, and we do not get any new pod Add/Update events for the pods on this node, ovnkube-master never recreates the ports for these pods. If we look in nbdb for worker-0-2, we see it barely has any ports for pods:

switch a6476b33-71b9-4e3c-9fa5-9729b284ba50 (worker-0-2)
    port k8s-worker-0-2
        addresses: ["aa:16:3f:21:e1:69 10.129.2.2"]
    port stor-worker-0-2
        type: router
        addresses: ["0a:58:0a:81:02:01"]
        router-port: rtos-worker-0-2
    port openshift-monitoring_prometheus-adapter-7d9df9b7f6-k7rrw
        addresses: ["0a:58:0a:81:02:04 10.129.2.4"]


This is a sign that ovnkube-master is the root cause because it should have created ports for containers that are trying to launch via CNI on the node. If I restart the active ovnkube-master, all the missing ports are added and the pods go to running state:

[root@master-0-1 ~]# ovn-nbctl --no-leader-only show worker-0-2
switch a6476b33-71b9-4e3c-9fa5-9729b284ba50 (worker-0-2)
    port openshift-ingress-canary_ingress-canary-pkm9s
        addresses: ["0a:58:0a:81:02:06 10.129.2.6"]
    port openshift-multus_network-metrics-daemon-258kt
        addresses: ["0a:58:0a:81:02:03 10.129.2.3"]
    port openshift-monitoring_thanos-querier-6794c57947-2pwtv
        addresses: ["0a:58:0a:81:02:20 10.129.2.32"]
    port openshift-monitoring_alertmanager-main-1
        addresses: ["0a:58:0a:81:02:22 10.129.2.34"]
    port openshift-dns_dns-default-cmj9p
        addresses: ["0a:58:0a:81:02:05 10.129.2.5"]
    port openshift-image-registry_image-registry-678cc4657-45jqd
        addresses: ["0a:58:0a:81:02:1f 10.129.2.31"]
    port openshift-operators_poison-pill-ds-xzlps
        addresses: ["0a:58:0a:81:02:23 10.129.2.35"]
    port k8s-worker-0-2
        addresses: ["aa:16:3f:21:e1:69 10.129.2.2"]
    port stor-worker-0-2
        type: router
        addresses: ["0a:58:0a:81:02:01"]
        router-port: rtos-worker-0-2
    port openshift-monitoring_prometheus-k8s-1
        addresses: ["0a:58:0a:81:02:21 10.129.2.33"]
    port openshift-network-diagnostics_network-check-target-8q9q6
        addresses: ["0a:58:0a:81:02:2b 10.129.2.43"]
    port openshift-monitoring_prometheus-adapter-7d9df9b7f6-k7rrw
        addresses: ["0a:58:0a:81:02:04 10.129.2.4"]


This is because an ovnkube-master restart will cause the informers to replay all of the pod events, forcing us to create ports that are missing in nbdb.

tl;dr we need to be able to reconcile missing pods during node add it looks like to handle this case. Shelly, is there a way I can reproduce this state of a node being deleted, but its pods still existing? I want to add an upstream test for this as well, but I cannot exhaust the storage or memory of a node as we use kubernetes in docker (KIND) and a very small VM to handle our upstream tests.

Comment 5 Nir 2021-06-25 07:02:05 UTC

Tim, thanks for the detailed analysis.
can we have a fix in 4.8 timeframe?
is there a way to workaround this from an operator? e.g. can we modify some CR to trigger the replay of all pod events?

if that helps to reproduce, the flow is basically:
1. marking node as unschedudlable (by setting node.Spec.Unschedulable to true)
2. keeping a copy of the node object (can be simulated by oc get node <nodeName> -oyaml > backup.yaml)
3. the node itself runs a software reboot command
4. delete the node (oc delete node <nodeName>)
5. restore the node (can be simulated with oc apply -f backup.yaml)

I don't have ovn cluster to check if this reproduces, I will try next week.
note that when we restore the node, we also restore all the OVN annotations. we can avoid that if this would help

Comment 6 Tim Rozet 2021-06-25 21:06:28 UTC

I was able to reproduce using the steps you listed Nir. It looks like if you delete a node, there is some grace period before pods are evicted and a new pod will be created when the node is re-added. If you re-add the node withing the grace period, the existing containers stay put. I have a fix here: https://github.com/ovn-org/ovn-kubernetes/pull/2281

Will start a downstream merge for it.

Comment 16 Shelly Miron 2021-11-07 17:52:51 UTC

*** Bug 1969958 has been marked as a duplicate of this bug. ***

Comment 25 Tim Rozet 2022-01-24 14:52:25 UTC

Surya, found a bug in the logic to add retry pods during node add:
https://github.com/openshift/ovn-kubernetes/pull/926

Without this fix the pod retry cache could become corrupted and the pod would never be retried correctly.

Comment 27 Mohamed Mahmoud 2022-02-01 12:10:00 UTC

*** Bug 2047334 has been marked as a duplicate of this bug. ***

Comment 33 errata-xmlrpc 2022-03-16 11:12:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.4 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0811