Bug 1885713 - failed to configure pod interface: timed out waiting for pod flows for pod
Summary: failed to configure pod interface: timed out waiting for pod flows for pod
Status: CLOSED DUPLICATE of bug 1859924
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: x86_64
OS: Linux
Target Milestone: ---
: 4.7.0
Assignee: Anil Vishnoi
QA Contact: Anurag saxena
Depends On:
TreeView+ depends on / blocked
Reported: 2020-10-06 19:15 UTC by Sai Sindhur Malleni
Modified: 2021-05-26 05:37 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2020-11-18 19:53:24 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Sai Sindhur Malleni 2020-10-06 19:15:15 UTC
Description of problem:
During an API stress test on a 4.6 cluster on baremetal (3 masters + 110 worker nodes), 
we create 
10 Deployment Configs
10 services
3 Routes
and other control plane resource per project.

We are doing this across 100 projects serially.

So the flow of test is,

The control plane objects in each namespace are first created before moving on to the next namespace to create objects

After a few projects, we see errors like
0s          Warning   FailedCreatePodSandBox   pod/deploymentconfig9-1-deploy         Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_deploymentconfig9-1-deploy_mastervert084_f81fe01c-bab7-460f-881a-7fd2b6b2055d_0(c8993ebc2d95ae3d4be80f7a3242ff4e63b10205c65c540f1196c87d2b93001b): [mastervert084/deploymentconfig9-1-deploy:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[mastervert084/deploymentconfig9-1-deploy] failed to configure pod interface: timed out waiting for pod flows for pod: deploymentconfig9-1-deploy, error: timed out waiting for the condition

in the project events.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Deploy a large cluster
2. Create multiple deployments/services per project across 100 projects

Actual results:
Certain pods fail to launch due to the above mentioned error

Expected results:
We shouldn't be seeing such errors.

Additional info:

Comment 8 Tim Rozet 2020-11-18 19:53:24 UTC
Looking at the must gather, we can see that the pod creation from NB side comes in at 15:23:51:
ovnkube-master-gv6kt/ovnkube-master/ovnkube-master/logs/previous.log:2020-10-06T15:23:51.220803029Z I1006 15:23:51.220765       1 kube.go:63] Setting annotations map[k8s.ovn.org/pod-networks:{"default":{"ip_addresses":[""],"mac_address":"0a:58:0a:83:16:30","gateway_ips":[""],"ip_address":"","gateway_ip":""}}] on pod mastervert058/deploymentconfig9-1-deploy

and CNI request happens at roughly the same time:
ovnkube-node-4gnnj/ovnkube-node/ovnkube-node/logs/current.log:2020-10-06T15:23:51.566462152Z I1006 15:23:51.566413    6673 cniserver.go:147] Waiting for ADD result for pod mastervert058/deploymentconfig9-1-deploy

then in ovn-controller on the node, the port isn't bound until 15:24:05:
ovnkube-node-4gnnj/ovn-controller/ovn-controller/logs/current.log:2020-10-06T15:24:05.302862332Z 2020-10-06T15:24:05Z|01231|binding|INFO|Claiming lport mastervert058_deploymentconfig9-1-deploy for this chassis.
ovnkube-node-4gnnj/ovn-controller/ovn-controller/logs/current.log:2020-10-06T15:24:05.302862332Z 2020-10-06T15:24:05Z|01232|binding|INFO|mastervert058_deploymentconfig9-1-deploy: Claiming 0a:58:0a:83:16:30
ovnkube-node-4gnnj/ovn-controller/ovn-controller/logs/current.log:2020-10-06T15:24:30.871492290Z 2020-10-06T15:24:30Z|01248|binding|INFO|Releasing lport mastervert058_deploymentconfig9-1-deploy from this chassis.

and then CNI times out waiting for the flows at 15:24:13:
ovnkube-node-4gnnj/ovnkube-node/ovnkube-node/logs/current.log:2020-10-06T15:24:13.526463348Z I1006 15:24:13.526362    6673 cni.go:157] [mastervert058/deploymentconfig9-1-deploy] CNI request &{ADD mastervert058 deploymentconfig9-1-deploy 0ec8e8b4da2fbc6c482cd98b07c6b2fb92f094b65a707b4b096cc6881273b0e5 /var/run/netns/f595d15a-093a-4240-8539-ecc384c65666 eth0 0xc003244d00}, result "", err failed to configure pod interface: timed out waiting for pod flows for pod: deploymentconfig9-1-deploy, error: timed out waiting for the condition

tl;dr OVN is under too much stress here and is taking too long to wire the port and add the flows. The fixes for 1855408, 1888829, 1859924 should improve OVN handling of requests. We can close this for now and re-open if we see this issue again.

*** This bug has been marked as a duplicate of bug 1859924 ***

Note You need to log in before you can comment on or make changes to this bug.