Bug 2005598 - [4.9]Failed to configure pod interface: timed out waiting for OVS port binding
Summary: [4.9]Failed to configure pod interface: timed out waiting for OVS port binding
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: ---
Assignee: Tim Rozet
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-18 14:56 UTC by Qiujie Li
Modified: 2022-11-02 07:34 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-10 16:50:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Comment 2 Tim Rozet 2021-09-20 19:23:44 UTC
From the must-gather its hard to tell exactly what happened, as the the ovnkube-master logs have rotated so I don't see the add for http-perf-122 -n http-scale-passthrough. On the nodes it looks like this pod has rotated to different nodes. When you repeat the test to trigger the problem, do you wait for ovnkube-master to finish deleting all the pods before you start the next test? For example, instead of just doing oc delete namespace http-scale-passthrough and waiting for it to return, also oc logs -n openshift-ovn-kubernetes <master> -c ovnkube-master --follow; #wait for all the pods to finish getting deleted, before starting the next test

To me this looks like the scale issues identified in https://bugzilla.redhat.com/show_bug.cgi?id=1959352

I can see 5 seconds to annotate the pod in ovnkube-master for some other pods:


2021-09-17T15:53:47.025928199Z I0917 15:53:47.025886       1 pods.go:251] [http-scale-reencrypt/http-perf-303] addLogicalPort took 5.000645366s
2021-09-17T15:53:47.188129212Z I0917 15:53:47.188067       1 pods.go:251] [http-scale-reencrypt/http-perf-358] addLogicalPort took 5.085611168s
2021-09-17T15:53:47.243243616Z I0917 15:53:47.243197       1 pods.go:251] [http-scale-reencrypt/http-perf-281] addLogicalPort took 5.042022155s


Would also be helpful to get the must-gather while the affected pods still exist so that we can see when they were scheduled on the node.

Comment 3 zhaozhanqi 2021-09-22 07:48:43 UTC
Thanks Qiujie reported this issue. this looks like same bug with https://bugzilla.redhat.com/show_bug.cgi?id=2003558

@Tim Please see https://bugzilla.redhat.com/show_bug.cgi?id=2003558#c17 There is lived cluster kubeconfig and must-gather for debugging, thanks.

Comment 4 zhaozhanqi 2021-09-22 07:51:19 UTC
and also this one https://bugzilla.redhat.com/show_bug.cgi?id=1997205

Comment 19 bowredhat 2022-02-14 22:09:33 UTC
Was there ever resolution on this?  Ran into this issue on a 4.8.24 cluster and then a clsuster after upgrade from 4.9.17 to 4.9.18.

Comment 20 Anurag saxena 2022-11-01 13:24:29 UTC
@

Comment 22 Qiujie Li 2022-11-02 07:34:20 UTC
@anusaxen Added.


Note You need to log in before you can comment on or make changes to this bug.