Bug 1973286 - getting timeout error while waiting for OVS port binding when ds pod recreated
Summary: getting timeout error while waiting for OVS port binding when ds pod recreated
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Tim Rozet
QA Contact: Shelly Miron
URL:
Whiteboard:
: 1969958 2047334 (view as bug list)
Depends On:
Blocks: 1969958
TreeView+ depends on / blocked
 
Reported: 2021-06-17 14:47 UTC by Shelly Miron
Modified: 2022-03-16 11:12 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-16 11:12:07 UTC
Target Upstream Version:
Embargoed:
trozet: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 600 0 None Merged Bug 1973286: Merge 2021-07-06 2022-01-31 12:32:55 UTC
Github openshift ovn-kubernetes pull 926 0 None Merged Bug 2043961: Fix pod-creation-retry 2022-01-31 12:32:53 UTC
Github ovn-org ovn-kubernetes pull 2281 0 None Merged Adds back logical ports for pods during node add 2022-01-31 12:32:58 UTC
Red Hat Product Errata RHBA-2022:0811 0 None None None 2022-03-16 11:12:32 UTC

Comment 3 Tim Rozet 2021-06-23 20:56:00 UTC
Can you please recreate this setup so I can debug it live? The logs unfortunately are cutoff/missing so its hard to tell exactly why the containers cannot be created. If that isn't an option then please try to get must gather without deleting/re-adding the node and also get must-gather network.

There is one thing I noticed where it looks like after you deleted worker-0-1 and brought it back ovnkube-node crashes a bunch of times, with the previous log showing:

2021-06-17T12:52:18.503485080Z E0617 12:52:18.503211   92439 healthcheck.go:188] Fatal error: ofport of patch-br-ex_worker-0-1-to-br-int has changed from 11 to 12


I believe this patch port is changing because:
[trozet@fedora 1973286]$ cat master-kvrmk.log  | grep ls-del | grep ext_worker-0-1
2021-06-17T12:36:50.101915693Z 2021-06-17T12:36:50.101Z|152356|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:38:23.122604598Z 2021-06-17T12:38:23.122Z|152819|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:39:55.130813574Z 2021-06-17T12:39:55.130Z|153402|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:41:27.116256832Z 2021-06-17T12:41:27.116Z|153649|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:42:59.096477260Z 2021-06-17T12:42:59.096Z|153790|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:44:32.173426933Z 2021-06-17T12:44:32.173Z|154291|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:46:04.105295242Z 2021-06-17T12:46:04.104Z|155023|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:47:37.110030879Z 2021-06-17T12:47:37.109Z|155163|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:49:10.112840715Z 2021-06-17T12:49:10.112Z|155303|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:50:42.106781719Z 2021-06-17T12:50:42.106Z|155766|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:52:15.105149269Z 2021-06-17T12:52:15.104Z|156336|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:53:48.101879536Z 2021-06-17T12:53:48.101Z|156560|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:55:20.120689498Z 2021-06-17T12:55:20.120Z|156700|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
[trozet@fedora 1973286]$ cat master-kvrmk.log  | grep ls-del | grep ext_worker-0-1 | wc -l
13


More logs reveal there are a bunch of adds/deletes sequentially:
2021-06-17T12:52:15.105149269Z 2021-06-17T12:52:15.104Z|156336|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:52:16.436966415Z 2021-06-17T12:52:16.436Z|156370|nbctl|INFO|Running command run --may-exist -- ls-add ext_worker-0-1
2021-06-17T12:52:16.448744564Z 2021-06-17T12:52:16.448Z|156371|nbctl|INFO|Running command run -- --may-exist lsp-add ext_worker-0-1 br-ex_worker-0-1 -- lsp-set-addresses br-ex_worker-0-1 unknown -- lsp-set-type br-ex_worker-0-1 localnet -- lsp-set-options br-ex_worker-0-1 network_name=physnet -- set logical_switch_port br-ex_worker-0-1 tag_request=0
2021-06-17T12:52:16.471584179Z 2021-06-17T12:52:16.471Z|156373|nbctl|INFO|Running command run -- --may-exist lsp-add ext_worker-0-1 etor-GR_worker-0-1 -- set logical_switch_port etor-GR_worker-0-1 type=router options:router-port=rtoe-GR_worker-0-1 "addresses=\"52:54:00:c2:0c:15\""
2021-06-17T12:52:16.725633591Z 2021-06-17T12:52:16.725Z|156422|nbctl|INFO|Running command run --may-exist -- ls-add ext_worker-0-1
2021-06-17T12:52:16.729942478Z 2021-06-17T12:52:16.729Z|156424|nbctl|INFO|Running command run -- --may-exist lsp-add ext_worker-0-1 br-ex_worker-0-1 -- lsp-set-addresses br-ex_worker-0-1 unknown -- lsp-set-type br-ex_worker-0-1 localnet -- lsp-set-options br-ex_worker-0-1 network_name=physnet -- set logical_switch_port br-ex_worker-0-1 tag_request=0
2021-06-17T12:52:16.752157853Z 2021-06-17T12:52:16.752Z|156430|nbctl|INFO|Running command run -- --may-exist lsp-add ext_worker-0-1 etor-GR_worker-0-1 -- set logical_switch_port etor-GR_worker-0-1 type=router options:router-port=rtoe-GR_worker-0-1 "addresses=\"52:54:00:c2:0c:15\""
2021-06-17T12:53:48.101879536Z 2021-06-17T12:53:48.101Z|156560|nbctl|INFO|Running command run --if-exists -- ls-del ext_worker-0-1
2021-06-17T12:53:49.504964179Z 2021-06-17T12:53:49.504Z|156594|nbctl|INFO|Running command run --may-exist -- ls-add ext_worker-0-1
2021-06-17T12:53:49.527507951Z 2021-06-17T12:53:49.526Z|156595|nbctl|INFO|Running command run -- --may-exist lsp-add ext_worker-0-1 br-ex_worker-0-1 -- lsp-set-addresses br-ex_worker-0-1 unknown -- lsp-set-type br-ex_worker-0-1 localnet -- lsp-set-options br-ex_worker-0-1 network_name=physnet -- set logical_switch_port br-ex_worker-0-1 tag_request=0
2021-06-17T12:53:49.556131703Z 2021-06-17T12:53:49.555Z|156597|nbctl|INFO|Running command run -- --may-exist lsp-add ext_worker-0-1 etor-GR_worker-0-1 -- set logical_switch_port etor-GR_worker-0-1 type=router options:router-port=rtoe-GR_worker-0-1 "addresses=\"52:54:00:c2:0c:15\""



This happens when a node is deleted, but from your description it seems like you only deleted the node once? So why are we seeing so many add/delete events for this node?

Comment 4 Tim Rozet 2021-06-25 03:13:38 UTC
Thanks Shelly for providing a live setup. I see what's going on now. So when we get the delete event we tear down everything related to that node, this includes the OVN logical switch and the switch ports for all the containers associated with it. However, in this test the node comes back as a node ADD, and the containers still exist on the node. In other words there are no new pod adds happening here. Due to the fact that the ports have been deleted, and we do not get any new pod Add/Update events for the pods on this node, ovnkube-master never recreates the ports for these pods. If we look in nbdb for worker-0-2, we see it barely has any ports for pods:

switch a6476b33-71b9-4e3c-9fa5-9729b284ba50 (worker-0-2)
    port k8s-worker-0-2
        addresses: ["aa:16:3f:21:e1:69 10.129.2.2"]
    port stor-worker-0-2
        type: router
        addresses: ["0a:58:0a:81:02:01"]
        router-port: rtos-worker-0-2
    port openshift-monitoring_prometheus-adapter-7d9df9b7f6-k7rrw
        addresses: ["0a:58:0a:81:02:04 10.129.2.4"]


This is a sign that ovnkube-master is the root cause because it should have created ports for containers that are trying to launch via CNI on the node. If I restart the active ovnkube-master, all the missing ports are added and the pods go to running state:

[root@master-0-1 ~]# ovn-nbctl --no-leader-only show worker-0-2
switch a6476b33-71b9-4e3c-9fa5-9729b284ba50 (worker-0-2)
    port openshift-ingress-canary_ingress-canary-pkm9s
        addresses: ["0a:58:0a:81:02:06 10.129.2.6"]
    port openshift-multus_network-metrics-daemon-258kt
        addresses: ["0a:58:0a:81:02:03 10.129.2.3"]
    port openshift-monitoring_thanos-querier-6794c57947-2pwtv
        addresses: ["0a:58:0a:81:02:20 10.129.2.32"]
    port openshift-monitoring_alertmanager-main-1
        addresses: ["0a:58:0a:81:02:22 10.129.2.34"]
    port openshift-dns_dns-default-cmj9p
        addresses: ["0a:58:0a:81:02:05 10.129.2.5"]
    port openshift-image-registry_image-registry-678cc4657-45jqd
        addresses: ["0a:58:0a:81:02:1f 10.129.2.31"]
    port openshift-operators_poison-pill-ds-xzlps
        addresses: ["0a:58:0a:81:02:23 10.129.2.35"]
    port k8s-worker-0-2
        addresses: ["aa:16:3f:21:e1:69 10.129.2.2"]
    port stor-worker-0-2
        type: router
        addresses: ["0a:58:0a:81:02:01"]
        router-port: rtos-worker-0-2
    port openshift-monitoring_prometheus-k8s-1
        addresses: ["0a:58:0a:81:02:21 10.129.2.33"]
    port openshift-network-diagnostics_network-check-target-8q9q6
        addresses: ["0a:58:0a:81:02:2b 10.129.2.43"]
    port openshift-monitoring_prometheus-adapter-7d9df9b7f6-k7rrw
        addresses: ["0a:58:0a:81:02:04 10.129.2.4"]


This is because an ovnkube-master restart will cause the informers to replay all of the pod events, forcing us to create ports that are missing in nbdb.

tl;dr we need to be able to reconcile missing pods during node add it looks like to handle this case. Shelly, is there a way I can reproduce this state of a node being deleted, but its pods still existing? I want to add an upstream test for this as well, but I cannot exhaust the storage or memory of a node as we use kubernetes in docker (KIND) and a very small VM to handle our upstream tests.

Comment 5 Nir 2021-06-25 07:02:05 UTC
Tim, thanks for the detailed analysis.
can we have a fix in 4.8 timeframe?
is there a way to workaround this from an operator? e.g. can we modify some CR to trigger the replay of all pod events?

if that helps to reproduce, the flow is basically:
1. marking node as unschedudlable (by setting node.Spec.Unschedulable to true)
2. keeping a copy of the node object (can be simulated by oc get node <nodeName> -oyaml > backup.yaml)
3. the node itself runs a software reboot command
4. delete the node (oc delete node <nodeName>)
5. restore the node (can be simulated with oc apply -f backup.yaml)

I don't have ovn cluster to check if this reproduces, I will try next week.
note that when we restore the node, we also restore all the OVN annotations. we can avoid that if this would help

Comment 6 Tim Rozet 2021-06-25 21:06:28 UTC
I was able to reproduce using the steps you listed Nir. It looks like if you delete a node, there is some grace period before pods are evicted and a new pod will be created when the node is re-added. If you re-add the node withing the grace period, the existing containers stay put. I have a fix here: https://github.com/ovn-org/ovn-kubernetes/pull/2281

Will start a downstream merge for it.

Comment 16 Shelly Miron 2021-11-07 17:52:51 UTC
*** Bug 1969958 has been marked as a duplicate of this bug. ***

Comment 25 Tim Rozet 2022-01-24 14:52:25 UTC
Surya, found a bug in the logic to add retry pods during node add:
https://github.com/openshift/ovn-kubernetes/pull/926

Without this fix the pod retry cache could become corrupted and the pod would never be retried correctly.

Comment 27 Mohamed Mahmoud 2022-02-01 12:10:00 UTC
*** Bug 2047334 has been marked as a duplicate of this bug. ***

Comment 33 errata-xmlrpc 2022-03-16 11:12:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.4 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0811


Note You need to log in before you can comment on or make changes to this bug.