Bug 1828343 - timed out dumping br-int flow entries for sandbox
Summary: timed out dumping br-int flow entries for sandbox
Keywords:
Status: CLOSED DUPLICATE of bug 1828637
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.3.z
Hardware: x86_64
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.5.0
Assignee: OVN Team
QA Contact: Anurag saxena
URL:
Whiteboard: SDN-CI-IMPACT
Depends On:
Blocks: 1937118
TreeView+ depends on / blocked
 
Reported: 2020-04-27 15:08 UTC by Jason Huddleston
Modified: 2023-10-06 19:46 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-08 19:12:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jason Huddleston 2020-04-27 15:08:07 UTC
Description of problem:

Pods failing to start on two clusters (4c57bf78-3d14-4252-81e2-12295800593f and c70ef99-e32f-45e9-80e5-a90753a445c5 ) with : pr 25 02:14:16 worker-12 hyperkube[27834]: E0425 02:14:16.076827   27834 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "logreceiver-1_tmtrfledvzwczts-y-nk-x-000(c4eb1666-364d-4281-9e3c-85b7d573862f)" failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_logreceiver-1_tmtrfledvzwczts-y-nk-x-000_c4eb1666-364d-4281-9e3c-85b7d573862f_0(8828aafb8112ec8ca8a8b17de957ff968ea31496be049b42469bde0eda2c807f): Multus: [tmtrfledvzwczts-y-nk-x-000/logreceiver-1]: error adding container to network "ovn-kubernetes": delegateAdd: error invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[tmtrfledvzwczts-y-nk-x-000/logreceiver-1] failed to configure pod interface: timed out dumping br-int flow entries for sandbox: timed out waiting for the condition 



Version-Release number of selected component (if applicable):

4.3.8

How reproducible:

Very

Steps to Reproduce:

A few notes:

This cluster is using OVN for the overlay network

If I restart a node I get a Missing CNI default network error

Apr 27 12:43:46 master-1 hyperkube[4126]: I0427 12:43:46.904572    4126 event.go:255] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-apiserver", Name:"apiserver-l46gf", UID:"21c94177-5d57-445f-9de3-b758ae3295e2", APIVersion:"v1", ResourceVersion:"48151759", FieldPath:""}): type: 'Warning' reason: 'NetworkNotReady' network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network ```

* If I create a non istio enabled pod i get a status 400 error

```console
51s         Warning   FailedCreatePodSandBox   pod/busybox-app-86f489cc5b-vdv44   (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_busybox-app-86f489cc5b-vdv44_jay-sbx2_f766e036-dead-414c-a8b8-aec5e131673f_0(0693c7cf10ca7b13dc702d037ce2eef8f4291a5096b774d29f89127bf5a4e86a): Multus: [jay-sbx2/busybox-app-86f489cc5b-vdv44]: error adding container to network "ovn-kubernetes": delegateAdd: error invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[jay-sbx2/busybox-app-86f489cc5b-vdv44] failed to configure pod interface: timed out dumping br-int flow entries for sandbox: timed out waiting for the condition 
If I create an Istio enabled pod I get a CNI request failed error
42s         Warning   FailedCreatePodSandBox   pod/busybox-cdb78779d-qwpk5    (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_busybox-cdb78779d-qwpk5_jay-sbx_fabc6fc4-9dc4-4dbf-be2e-23f48fd6a1d1_0(79df7504d4038ca7e12591ca06b520829d84d8399de09001ec4d1b025286d0f0): Multus: [jay-sbx/busybox-cdb78779d-qwpk5]: error adding container to network "ovn-kubernetes": delegateAdd: error invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[jay-sbx/busybox-cdb78779d-qwpk5] failed to configure pod interface: timed out dumping br-int flow entries for sandbox: timed out waiting for the condition 

Actual results:

unable to start new pods and if I restart a node or rebuild it the node will not be able to start pods

Expected results:

Pods and nodes should be able to start normally

Additional info:

Comment 1 Jason Huddleston 2020-04-27 15:08:41 UTC
I am also seeing errors in the OVN pods 

oc logs ovnkube-node-bpd5g -c ovs-daemons

```console
2020-04-27T13:31:43.479Z|58251|bridge|INFO|bridge br-int: added interface 7b5956ce1296ef8 on port 5353
2020-04-27T13:31:43.481Z|58252|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device)                                                                                                    
2020-04-27T13:31:43.494Z|58253|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device)                                                                                                    
2020-04-27T13:31:47.887Z|58254|bridge|INFO|bridge br-int: deleted interface 63d2bde7f49557c on port 5352                                                                                                     
2020-04-27T13:31:47.896Z|58255|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device)                                                                                                    
2020-04-27T13:31:47.901Z|58256|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device)                                                                                                    
2020-04-27T13:31:48.072Z|58257|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device)                                                                                                    
2020-04-27T13:31:48.109Z|58258|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device)                                                                                                    
2020-04-27T13:31:49.142Z|58259|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device)                                                                                                    
2020-04-27T13:31:49.222Z|58260|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device)                                                                                                    
2020-04-27T13:31:49.241Z|58261|bridge|INFO|bridge br-int: added interface 820b3f50d5e9a30 on port 5354
2020-04-27T13:31:49.243Z|58262|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device)                                                                                                    
2020-04-27T13:31:49.254Z|58263|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device)                                                                                                    
2020-04-27T13:31:50.245Z|58264|connmgr|INFO|br-int<->unix#1: 16 flow_mods in the 46 s starting 58 s ago (5 adds, 3 deletes, 8 modifications)                                                                 
2020-04-27T13:32:01.843Z|58265|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device)                                               
```

oc logs ovnkube-node-bpd5g -c ovnkube-node

```console 
time="2020-04-27T13:32:35Z" level=warning msg="failed to clear stale OVS port \"\" iface-id \"bbtpnj33vzwcnrf-y-or-x-001-test-orch_1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg\": failed to run 'ovs-vsctl --timeout=30 remove Interface  external-ids iface-id': exit status 1\n  \"ovs-vsctl: no row \\\"\\\" in table Interface\\n\"\n  \"\""                                                       
time="2020-04-27T13:32:57Z" level=info msg="[bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg] CNI request &{ADD bbtpnj33vzwcnrf-y-or-x-001-test-orch 1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg 7481d3fda2660ad310a219d3f5dfdce4769efe6497a8e7f7e581cba1e0907fbe /proc/2548639/ns/net eth0 0xc0003ff6c0}, result \"\", err failed to configure pod interface: timed out dumping br-int flow entries for sandbox: timed out waiting for the condition"
time="2020-04-27T13:32:57Z" level=info msg="Waiting for DEL result for pod bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg"                             
time="2020-04-27T13:32:57Z" level=info msg="[bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg] dispatching pod network request &{DEL bbtpnj33vzwcnrf-y-or-x-001-test-orch 1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg 7481d3fda2660ad310a219d3f5dfdce4769efe6497a8e7f7e581cba1e0907fbe /proc/2548639/ns/net eth0 0xc001d3e0d0}"                    
time="2020-04-27T13:32:57Z" level=info msg="[bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg] CNI request &{DEL bbtpnj33vzwcnrf-y-or-x-001-test-orch 1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg 7481d3fda2660ad310a219d3f5dfdce4769efe6497a8e7f7e581cba1e0907fbe /proc/2548639/ns/net eth0 0xc001d3e0d0}, result \"\", err <nil>"                
time="2020-04-27T13:32:57Z" level=info msg="Waiting for DEL result for pod bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg"                             
time="2020-04-27T13:32:57Z" level=info msg="[bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg] dispatching pod network request &{DEL bbtpnj33vzwcnrf-y-or-x-001-test-orch 1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg 7481d3fda2660ad310a219d3f5dfdce4769efe6497a8e7f7e581cba1e0907fbe /proc/2548639/ns/net eth0 0xc00015c9c0}"                    
time="2020-04-27T13:32:57Z" level=info msg="[bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg] CNI request &{DEL bbtpnj33vzwcnrf-y-or-x-001-test-orch 1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg 7481d3fda2660ad310a219d3f5dfdce4769efe6497a8e7f7e581cba1e0907fbe /proc/2548639/ns/net eth0 0xc00015c9c0}, result \"\", err <nil>"                
time="2020-04-27T13:32:58Z" level=info msg="Waiting for ADD result for pod bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg"                             
time="2020-04-27T13:32:58Z" level=info msg="[bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg] dispatching pod network request &{ADD bbtpnj33vzwcnrf-y-or-x-001-test-orch 1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg 87d1674bae5f2a3cc1a9bb318d42a16a4de6f4c6d81910bcac2369d4c2f71ad9 /proc/2551501/ns/net eth0 0xc0003ded00}"                    
time="2020-04-27T13:32:58Z" level=warning msg="failed to clear stale OVS port \"\" iface-id \"bbtpnj33vzwcnrf-y-or-x-001-test-orch_1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg\": failed to run 'ovs-vsctl --timeout=30 remove Interface  external-ids iface-id': exit status 1\n  \"ovs-vsctl: no row \\\"\\\" in table Interface\\n\"\n  \"\""                                                       
time="2020-04-27T13:33:20Z" level=info msg="[bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg] CNI request &{ADD bbtpnj33vzwcnrf-y-or-x-001-test-orch 1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg 87d1674bae5f2a3cc1a9bb318d42a16a4de6f4c6d81910bcac2369d4c2f71ad9 /proc/2551501/ns/net eth0 0xc0003ded00}, result \"\", err failed to configure pod interface: timed out dumping br-int flow entries for sandbox: timed out waiting for the condition"
```

Comment 2 Ben Bennett 2020-04-27 16:04:04 UTC
Setting the target release to 4.5 so that we can work on the issue.  Once we have a fix we will handle the backports for earlier releases.

Comment 3 Glenn West 2020-04-28 21:50:18 UTC
I ran on my local bare metal cluster, started a hello-openshift, ssh to node, and did a halt, then powered off the node, poweron the node.

The app pod status is:
Events:
  Type     Reason           Age                    From               Message
  ----     ------           ----                   ----               -------
  Normal   Scheduled        <unknown>              default-scheduler  Successfully assigned my-test-app/hello-openshift-1-v8l2m to worker-2
  Normal   Pulling          50m                    kubelet, worker-2  Pulling image "openshift/hello-openshift@sha256:aaea76ff622d2f8bcb32e538e7b3cd0ef6d291953f3e7c9f556c1ba5baf47e2e"
  Normal   Pulled           50m                    kubelet, worker-2  Successfully pulled image "openshift/hello-openshift@sha256:aaea76ff622d2f8bcb32e538e7b3cd0ef6d291953f3e7c9f556c1ba5baf47e2e"
  Normal   Created          50m                    kubelet, worker-2  Created container hello-openshift
  Normal   Started          50m                    kubelet, worker-2  Started container hello-openshift
  Warning  NetworkNotReady  3m50s (x6 over 3m59s)  kubelet, worker-2  network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
  Normal   SandboxChanged   3m48s                  kubelet, worker-2  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulling          3m46s                  kubelet, worker-2  Pulling image "openshift/hello-openshift@sha256:aaea76ff622d2f8bcb32e538e7b3cd0ef6d291953f3e7c9f556c1ba5baf47e2e"
  Normal   Pulled           3m44s                  kubelet, worker-2  Successfully pulled image "openshift/hello-openshift@sha256:aaea76ff622d2f8bcb32e538e7b3cd0ef6d291953f3e7c9f556c1ba5baf47e2e"
  Normal   Created          3m43s                  kubelet, worker-2  Created container hello-openshift
  Normal   Started          3m43s                  kubelet, worker-2  Started container hello-openshift

Need to explore the case more.

Comment 4 Glenn West 2020-04-28 22:50:54 UTC
In my first duplication attempt, I did get a condition where the ovn went totally offline, and did not recover:

I manage to grab the ovndb server log:
Starting ovsdb-server.
PMD: net_mlx4: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory
PMD: net_mlx4: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx4)
net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory
net_mlx5: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5)
2020-04-27T22:53:13Z|00001|dns_resolve|WARN|Failed to read etc/hosts: syntax error
Configuring Open vSwitch system IDs.
Inserting openvswitch module.
PMD: net_mlx4: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory
PMD: net_mlx4: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx4)
net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory
net_mlx5: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5)
2020-04-27T22:53:13Z|00001|dns_resolve|WARN|Failed to read etc/hosts: syntax error
Starting ovs-vswitchd.
Enabling remote OVSDB managers.
iptables binary not installed, not adding a rule for udp to port 6081.
2020-04-27T22:53:13.355Z|00063|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device)
2020-04-27T22:53:13.356Z|00064|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:13.357Z|00065|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:13.358Z|00066|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.12.0
2020-04-27T22:53:13.360Z|00067|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:13.361Z|00068|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device)
2020-04-27T22:53:13.362Z|00069|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:13.362Z|00070|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device)
2020-04-27T22:53:13.363Z|00071|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:13.364Z|00072|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:13.046Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log
2020-04-27T22:53:13.074Z|00002|stream_ssl|ERR|SSL_use_certificate_file: error:02001002:system library:fopen:No such file or directory
2020-04-27T22:53:13.074Z|00003|stream_ssl|ERR|SSL_use_PrivateKey_file: error:20074002:BIO routines:FILE_CTRL:system lib
2020-04-27T22:53:13.075Z|00004|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.12.0
2020-04-27T22:53:14.104Z|00073|bridge|INFO|bridge br-int: deleted interface patch-br-int-to-br-local_worker-1 on port 6
2020-04-27T22:53:14.104Z|00074|bridge|INFO|bridge br-local: deleted interface patch-br-local_worker-1-to-br-int on port 2
2020-04-27T22:53:14.109Z|00075|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:14.112Z|00076|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device)
2020-04-27T22:53:14.114Z|00077|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:14.115Z|00078|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device)
2020-04-27T22:53:14.118Z|00079|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:14.120Z|00080|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:14.287Z|00081|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:14.287Z|00082|bridge|INFO|bridge br-int: added interface patch-br-int-to-br-local_worker-1 on port 8
2020-04-27T22:53:14.292Z|00083|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device)
2020-04-27T22:53:14.295Z|00084|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:14.298Z|00085|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device)
2020-04-27T22:53:14.300Z|00086|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:14.301Z|00087|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:14.301Z|00088|bridge|INFO|bridge br-local: added interface patch-br-local_worker-1-to-br-int on port 3
2020-04-27T22:53:14.852Z|00089|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:14.853Z|00090|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device)
2020-04-27T22:53:14.853Z|00091|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:14.854Z|00092|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device)
2020-04-27T22:53:14.855Z|00093|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:14.855Z|00094|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:14.861Z|00095|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:14.862Z|00096|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device)
2020-04-27T22:53:14.863Z|00097|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:14.864Z|00098|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device)
2020-04-27T22:53:14.865Z|00099|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:14.865Z|00100|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:14.878Z|00101|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:14.880Z|00102|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device)
2020-04-27T22:53:14.883Z|00103|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:14.885Z|00104|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device)
2020-04-27T22:53:14.887Z|00105|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:14.888Z|00106|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:14.987Z|00107|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:14.989Z|00108|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device)
2020-04-27T22:53:14.989Z|00109|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:14.990Z|00110|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device)
2020-04-27T22:53:14.991Z|00111|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:14.992Z|00112|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:14.998Z|00113|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:14.999Z|00114|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device)
2020-04-27T22:53:15.000Z|00115|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:15.001Z|00116|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device)
2020-04-27T22:53:15.002Z|00117|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:15.003Z|00118|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:15.068Z|00119|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:15.070Z|00120|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device)
2020-04-27T22:53:15.071Z|00121|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:15.072Z|00122|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device)
2020-04-27T22:53:15.073Z|00123|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:15.073Z|00124|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:15.076Z|00125|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:15.077Z|00126|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device)
2020-04-27T22:53:15.077Z|00127|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:15.078Z|00128|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device)
2020-04-27T22:53:15.080Z|00129|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:15.081Z|00130|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:23.072Z|00005|memory|INFO|8040 kB peak resident set size after 10.0 seconds
2020-04-27T22:53:23.072Z|00006|memory|INFO|cells:1102 monitors:3 sessions:2
2020-04-27T22:53:23.076Z|00131|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:23.079Z|00132|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device)
2020-04-27T22:53:23.080Z|00133|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:23.081Z|00134|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:23.082Z|00135|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:23.219Z|00136|memory|INFO|76680 kB peak resident set size after 10.0 seconds
2020-04-27T22:53:23.219Z|00137|memory|INFO|handlers:2 ofconns:2 ports:11 revalidators:2 rules:3034 udpif keys:14
2020-04-27T22:53:23.222Z|00138|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:23.223Z|00139|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device)
2020-04-27T22:53:23.224Z|00140|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:23.225Z|00141|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:23.225Z|00142|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:23.229Z|00143|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:23.230Z|00144|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device)
2020-04-27T22:53:23.232Z|00145|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:23.233Z|00146|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:23.234Z|00147|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:23.243Z|00148|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:23.244Z|00149|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device)
2020-04-27T22:53:23.244Z|00150|bridge|INFO|bridge br-int: added interface 573979f7b63763b on port 9
2020-04-27T22:53:23.245Z|00151|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:23.246Z|00152|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:23.247Z|00153|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:23.251Z|00154|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:23.252Z|00155|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device)
2020-04-27T22:53:23.253Z|00156|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:23.254Z|00157|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:23.254Z|00158|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:24.114Z|00007|jsonrpc|WARN|unix#35: send error: Broken pipe
2020-04-27T22:53:24.114Z|00008|reconnect|WARN|unix#35: connection dropped (Broken pipe)
2020-04-27T22:53:24.119Z|00009|jsonrpc|WARN|unix#37: send error: Broken pipe
2020-04-27T22:53:24.120Z|00010|reconnect|WARN|unix#37: connection dropped (Broken pipe)
2020-04-27T22:53:24.126Z|00011|jsonrpc|WARN|unix#41: receive error: Connection reset by peer
2020-04-27T22:53:24.126Z|00012|reconnect|WARN|unix#41: connection dropped (Connection reset by peer)
2020-04-27T22:53:24.133Z|00013|jsonrpc|WARN|unix#45: send error: Broken pipe
2020-04-27T22:53:24.133Z|00014|reconnect|WARN|unix#45: connection dropped (Broken pipe)
2020-04-27T22:53:24.149Z|00015|jsonrpc|WARN|unix#50: send error: Broken pipe
2020-04-27T22:53:24.149Z|00016|reconnect|WARN|unix#50: connection dropped (Broken pipe)
2020-04-27T22:53:24.149Z|00017|reconnect|WARN|unix#52: connection dropped (Broken pipe)
2020-04-27T22:53:24.163Z|00018|reconnect|WARN|unix#57: connection dropped (Connection reset by peer)
2020-04-27T22:53:24.105Z|00159|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:24.108Z|00160|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:24.111Z|00161|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device)
2020-04-27T22:53:24.113Z|00162|bridge|WARN|could not open network device 766f8fc4be5144e (No such device)
2020-04-27T22:53:24.113Z|00163|connmgr|INFO|br-int<->unix#1: 3058 flow_mods in the 9 s starting 10 s ago (3048 adds, 1 deletes, 9 modifications)
2020-04-27T22:53:24.116Z|00164|bridge|WARN|could not open network device 49c03f965afc87c (No such device)
2020-04-27T22:53:24.118Z|00165|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device)
2020-04-27T22:53:24.391Z|00166|bridge|INFO|bridge br-int: added interface 21b2bc1054f784e on port 10
2020-04-27T22:53:24.506Z|00167|bridge|INFO|bridge br-int: added interface e5c4782300b3c7e on port 11
2020-04-27T22:53:24.529Z|00168|bridge|INFO|bridge br-int: added interface f69786ff3e0607d on port 12
2020-04-27T22:53:24.539Z|00169|bridge|INFO|bridge br-int: added interface 4875b3670a0dfa0 on port 13
2020-04-27T22:53:24.696Z|00170|bridge|INFO|bridge br-int: added interface 588327160cf4568 on port 14
2020-04-27T22:54:24.102Z|00171|connmgr|INFO|br-int<->unix#1: 185 flow_mods in the 32 s starting 59 s ago (115 adds, 70 modifications)
2020-04-27T22:54:26.702Z|00001|dpif(handler6)|WARN|Dropped 5 log messages in last 73 seconds (most recently, 73 seconds ago) due to excessive rate
2020-04-27T22:54:26.702Z|00002|dpif(handler6)|WARN|system@ovs-system: execute ct(commit,zone=38,label=0/0x1),11 failed (Invalid argument) on packet udp,vlan_tci=0x0000,dl_src=0a:58:0a:80:00:01,dl_dst=de:2a:61:80:00:05,nw_src=10.128.0.4,nw_dst=10.128.0.4,nw_tos=0,nw_ecn=0,nw_ttl=63,tp_src=51782,tp_dst=5353 udp_csum:1e4e
 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x26),ct_tuple4(src=10.128.0.4,dst=10.128.0.4,proto=17,tp_src=51782,tp_dst=5353),in_port(11) mtu 0

Comment 6 Sai Sindhur Malleni 2020-05-06 22:16:59 UTC
Hitting this in OCP 4.4.3 on baremetal as well

I0506 21:00:06.432246    6636 cni.go:163] [nodevertical0/nodevert-pod-29] CNI request &{ADD nodevertical0 nodevert-pod-29 5598cf7e581622b6314ebefa0d1dbbde6207deb23410a7c65da18860e343ae87 /proc/2217053/ns/net eth0 0xc00034c270}, result "", err failed to configure pod interface: timed out dumping br-int flow entries for sandbox: timed out waiting for the condition

sh-4.2# rpm -qa | grep ovn
ovn2.13-2.13.0-11.el7fdp.x86_64
ovn2.13-host-2.13.0-11.el7fdp.x86_64
ovn2.13-central-2.13.0-11.el7fdp.x86_64
ovn2.13-vtep-2.13.0-11.el7fdp.x86_64

Comment 9 Joe Talerico 2020-05-07 11:22:02 UTC
We are seeing this frequently :
  
  Warning  FailedCreatePodSandBox  4m5s   kubelet, master-1  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_local-disks-local-provisioner-5vp8h_local-storage_ba462058-7df7-4e32-b77f
-b79d50be91d1_0(17ef5f8ad8beccc8b19fecd401a09ab191255b9a1eac5b537c3822d151f69861): Multus: [local-storage/local-disks-local-provisioner-5vp8h]: error adding container to network "ovn-kubernetes": delegateAdd: error invoking DelegateAdd -
"ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[local-storage/local-disks-local-provisioner-5vp8h] failed to configure pod interface: timed out dumping br-int flow entries for sandbox:
 timed out waiting for the condition   

To workaround this quickly, we simply delete the ovn-node pod on the specific node experiencing the error above

[kni@e16-h12-b01-fc640 ~]$ oc delete -n openshift-ovn-kubernetes pods/ovnkube-node-f7ks5

However, this sometimes results in failure as well :

[kni@e16-h12-b01-fc640 ~]$ oc logs pods/local-disks-local-provisioner-trdnl -n local-storage                                                                                                                                                  
I0507 10:42:59.367602       1 common.go:320] StorageClass "local-sc" configured with MountDir "/mnt/local-storage/local-sc", HostDir "/mnt/local-storage/local-sc", VolumeMode "Filesystem", FsType "xfs", BlockCleanerCommand ["/scripts/quic
k_reset.sh"]                                                                                                                                                                                                                                  
I0507 10:42:59.367708       1 main.go:63] Loaded configuration: {StorageClassConfig:map[local-sc:{HostDir:/mnt/local-storage/local-sc MountDir:/mnt/local-storage/local-sc BlockCleanerCommand:[/scripts/quick_reset.sh] VolumeMode:Filesystem
 FsType:xfs}] NodeLabelsForPV:[] UseAlphaAPI:false UseJobForCleaning:false MinResyncPeriod:{Duration:5m0s} UseNodeNameOnly:false LabelsForPV:map[storage.openshift.com/local-volume-owner-name:local-disks storage.openshift.com/local-volume-
owner-namespace:local-storage]}                                                                                                                                                                                                               
I0507 10:42:59.367729       1 main.go:64] Ready to run...                                                                                                                                                                                     
W0507 10:42:59.367737       1 main.go:73] MY_NAMESPACE environment variable not set, will be set to default.                                                                                                                                  
W0507 10:42:59.367746       1 main.go:79] JOB_CONTAINER_IMAGE environment variable not set.                                                                                                                                                   
I0507 10:42:59.368016       1 common.go:382] Creating client using in-cluster config                                                                                                                                                          
I0507 10:43:02.463235       1 main.go:126] Could not get node information (remaining retries: 2): Get https://172.30.0.1:443/api/v1/nodes/master-1: dial tcp 172.30.0.1:443: connect: no route to host                                        
I0507 10:43:05.535192       1 main.go:126] Could not get node information (remaining retries: 1): Get https://172.30.0.1:443/api/v1/nodes/master-1: dial tcp 172.30.0.1:443: connect: no route to host                                        
I0507 10:43:08.607020       1 main.go:126] Could not get node information (remaining retries: 0): Get https://172.30.0.1:443/api/v1/nodes/master-1: dial tcp 172.30.0.1:443: connect: no route to host                                        
F0507 10:43:08.607050       1 main.go:129] Could not get node information: Get https://172.30.0.1:443/api/v1/nodes/master-1: dial tcp 172.30.0.1:443: connect: no route to host     

However, with the right combination of deleting the ovn-node pod, we can get our desired outcome, however, this is not a maintainable solution.

Comment 13 Dan Williams 2020-05-07 15:00:34 UTC
Joe,

Comment 14 Dan Williams 2020-05-07 15:00:54 UTC
Joe,

Comment 15 Dan Williams 2020-05-07 15:02:27 UTC
This bug was a placeholder and we tracked the original problem to bug 1828637. Unless somebody disagrees I'll close this bug as a duplicate of 1828637 tomorrow (Friday).

Comment 16 Joe Talerico 2020-05-08 11:07:28 UTC
(In reply to Dan Williams from comment #15)
> This bug was a placeholder and we tracked the original problem to bug
> 1828637. Unless somebody disagrees I'll close this bug as a duplicate of
> 1828637 tomorrow (Friday).

Seems highly likely these are related.

Comment 17 Ben Bennett 2020-05-08 19:12:49 UTC

*** This bug has been marked as a duplicate of bug 1828637 ***


Note You need to log in before you can comment on or make changes to this bug.