Description of problem: Pods failing to start on two clusters (4c57bf78-3d14-4252-81e2-12295800593f and c70ef99-e32f-45e9-80e5-a90753a445c5 ) with : pr 25 02:14:16 worker-12 hyperkube[27834]: E0425 02:14:16.076827 27834 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "logreceiver-1_tmtrfledvzwczts-y-nk-x-000(c4eb1666-364d-4281-9e3c-85b7d573862f)" failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_logreceiver-1_tmtrfledvzwczts-y-nk-x-000_c4eb1666-364d-4281-9e3c-85b7d573862f_0(8828aafb8112ec8ca8a8b17de957ff968ea31496be049b42469bde0eda2c807f): Multus: [tmtrfledvzwczts-y-nk-x-000/logreceiver-1]: error adding container to network "ovn-kubernetes": delegateAdd: error invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[tmtrfledvzwczts-y-nk-x-000/logreceiver-1] failed to configure pod interface: timed out dumping br-int flow entries for sandbox: timed out waiting for the condition Version-Release number of selected component (if applicable): 4.3.8 How reproducible: Very Steps to Reproduce: A few notes: This cluster is using OVN for the overlay network If I restart a node I get a Missing CNI default network error Apr 27 12:43:46 master-1 hyperkube[4126]: I0427 12:43:46.904572 4126 event.go:255] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-apiserver", Name:"apiserver-l46gf", UID:"21c94177-5d57-445f-9de3-b758ae3295e2", APIVersion:"v1", ResourceVersion:"48151759", FieldPath:""}): type: 'Warning' reason: 'NetworkNotReady' network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network ``` * If I create a non istio enabled pod i get a status 400 error ```console 51s Warning FailedCreatePodSandBox pod/busybox-app-86f489cc5b-vdv44 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_busybox-app-86f489cc5b-vdv44_jay-sbx2_f766e036-dead-414c-a8b8-aec5e131673f_0(0693c7cf10ca7b13dc702d037ce2eef8f4291a5096b774d29f89127bf5a4e86a): Multus: [jay-sbx2/busybox-app-86f489cc5b-vdv44]: error adding container to network "ovn-kubernetes": delegateAdd: error invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[jay-sbx2/busybox-app-86f489cc5b-vdv44] failed to configure pod interface: timed out dumping br-int flow entries for sandbox: timed out waiting for the condition If I create an Istio enabled pod I get a CNI request failed error 42s Warning FailedCreatePodSandBox pod/busybox-cdb78779d-qwpk5 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_busybox-cdb78779d-qwpk5_jay-sbx_fabc6fc4-9dc4-4dbf-be2e-23f48fd6a1d1_0(79df7504d4038ca7e12591ca06b520829d84d8399de09001ec4d1b025286d0f0): Multus: [jay-sbx/busybox-cdb78779d-qwpk5]: error adding container to network "ovn-kubernetes": delegateAdd: error invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[jay-sbx/busybox-cdb78779d-qwpk5] failed to configure pod interface: timed out dumping br-int flow entries for sandbox: timed out waiting for the condition Actual results: unable to start new pods and if I restart a node or rebuild it the node will not be able to start pods Expected results: Pods and nodes should be able to start normally Additional info:
I am also seeing errors in the OVN pods oc logs ovnkube-node-bpd5g -c ovs-daemons ```console 2020-04-27T13:31:43.479Z|58251|bridge|INFO|bridge br-int: added interface 7b5956ce1296ef8 on port 5353 2020-04-27T13:31:43.481Z|58252|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device) 2020-04-27T13:31:43.494Z|58253|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device) 2020-04-27T13:31:47.887Z|58254|bridge|INFO|bridge br-int: deleted interface 63d2bde7f49557c on port 5352 2020-04-27T13:31:47.896Z|58255|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device) 2020-04-27T13:31:47.901Z|58256|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device) 2020-04-27T13:31:48.072Z|58257|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device) 2020-04-27T13:31:48.109Z|58258|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device) 2020-04-27T13:31:49.142Z|58259|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device) 2020-04-27T13:31:49.222Z|58260|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device) 2020-04-27T13:31:49.241Z|58261|bridge|INFO|bridge br-int: added interface 820b3f50d5e9a30 on port 5354 2020-04-27T13:31:49.243Z|58262|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device) 2020-04-27T13:31:49.254Z|58263|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device) 2020-04-27T13:31:50.245Z|58264|connmgr|INFO|br-int<->unix#1: 16 flow_mods in the 46 s starting 58 s ago (5 adds, 3 deletes, 8 modifications) 2020-04-27T13:32:01.843Z|58265|bridge|WARN|could not open network device d1fd6396cc18a10 (No such device) ``` oc logs ovnkube-node-bpd5g -c ovnkube-node ```console time="2020-04-27T13:32:35Z" level=warning msg="failed to clear stale OVS port \"\" iface-id \"bbtpnj33vzwcnrf-y-or-x-001-test-orch_1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg\": failed to run 'ovs-vsctl --timeout=30 remove Interface external-ids iface-id': exit status 1\n \"ovs-vsctl: no row \\\"\\\" in table Interface\\n\"\n \"\"" time="2020-04-27T13:32:57Z" level=info msg="[bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg] CNI request &{ADD bbtpnj33vzwcnrf-y-or-x-001-test-orch 1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg 7481d3fda2660ad310a219d3f5dfdce4769efe6497a8e7f7e581cba1e0907fbe /proc/2548639/ns/net eth0 0xc0003ff6c0}, result \"\", err failed to configure pod interface: timed out dumping br-int flow entries for sandbox: timed out waiting for the condition" time="2020-04-27T13:32:57Z" level=info msg="Waiting for DEL result for pod bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg" time="2020-04-27T13:32:57Z" level=info msg="[bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg] dispatching pod network request &{DEL bbtpnj33vzwcnrf-y-or-x-001-test-orch 1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg 7481d3fda2660ad310a219d3f5dfdce4769efe6497a8e7f7e581cba1e0907fbe /proc/2548639/ns/net eth0 0xc001d3e0d0}" time="2020-04-27T13:32:57Z" level=info msg="[bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg] CNI request &{DEL bbtpnj33vzwcnrf-y-or-x-001-test-orch 1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg 7481d3fda2660ad310a219d3f5dfdce4769efe6497a8e7f7e581cba1e0907fbe /proc/2548639/ns/net eth0 0xc001d3e0d0}, result \"\", err <nil>" time="2020-04-27T13:32:57Z" level=info msg="Waiting for DEL result for pod bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg" time="2020-04-27T13:32:57Z" level=info msg="[bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg] dispatching pod network request &{DEL bbtpnj33vzwcnrf-y-or-x-001-test-orch 1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg 7481d3fda2660ad310a219d3f5dfdce4769efe6497a8e7f7e581cba1e0907fbe /proc/2548639/ns/net eth0 0xc00015c9c0}" time="2020-04-27T13:32:57Z" level=info msg="[bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg] CNI request &{DEL bbtpnj33vzwcnrf-y-or-x-001-test-orch 1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg 7481d3fda2660ad310a219d3f5dfdce4769efe6497a8e7f7e581cba1e0907fbe /proc/2548639/ns/net eth0 0xc00015c9c0}, result \"\", err <nil>" time="2020-04-27T13:32:58Z" level=info msg="Waiting for ADD result for pod bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg" time="2020-04-27T13:32:58Z" level=info msg="[bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg] dispatching pod network request &{ADD bbtpnj33vzwcnrf-y-or-x-001-test-orch 1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg 87d1674bae5f2a3cc1a9bb318d42a16a4de6f4c6d81910bcac2369d4c2f71ad9 /proc/2551501/ns/net eth0 0xc0003ded00}" time="2020-04-27T13:32:58Z" level=warning msg="failed to clear stale OVS port \"\" iface-id \"bbtpnj33vzwcnrf-y-or-x-001-test-orch_1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg\": failed to run 'ovs-vsctl --timeout=30 remove Interface external-ids iface-id': exit status 1\n \"ovs-vsctl: no row \\\"\\\" in table Interface\\n\"\n \"\"" time="2020-04-27T13:33:20Z" level=info msg="[bbtpnj33vzwcnrf-y-or-x-001-test-orch/1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg] CNI request &{ADD bbtpnj33vzwcnrf-y-or-x-001-test-orch 1c222209-bde5-4d0e-97e1-0e098c6c76c3-ingressgateway-6c9cfbtcmlg 87d1674bae5f2a3cc1a9bb318d42a16a4de6f4c6d81910bcac2369d4c2f71ad9 /proc/2551501/ns/net eth0 0xc0003ded00}, result \"\", err failed to configure pod interface: timed out dumping br-int flow entries for sandbox: timed out waiting for the condition" ```
Setting the target release to 4.5 so that we can work on the issue. Once we have a fix we will handle the backports for earlier releases.
I ran on my local bare metal cluster, started a hello-openshift, ssh to node, and did a halt, then powered off the node, poweron the node. The app pod status is: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned my-test-app/hello-openshift-1-v8l2m to worker-2 Normal Pulling 50m kubelet, worker-2 Pulling image "openshift/hello-openshift@sha256:aaea76ff622d2f8bcb32e538e7b3cd0ef6d291953f3e7c9f556c1ba5baf47e2e" Normal Pulled 50m kubelet, worker-2 Successfully pulled image "openshift/hello-openshift@sha256:aaea76ff622d2f8bcb32e538e7b3cd0ef6d291953f3e7c9f556c1ba5baf47e2e" Normal Created 50m kubelet, worker-2 Created container hello-openshift Normal Started 50m kubelet, worker-2 Started container hello-openshift Warning NetworkNotReady 3m50s (x6 over 3m59s) kubelet, worker-2 network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network Normal SandboxChanged 3m48s kubelet, worker-2 Pod sandbox changed, it will be killed and re-created. Normal Pulling 3m46s kubelet, worker-2 Pulling image "openshift/hello-openshift@sha256:aaea76ff622d2f8bcb32e538e7b3cd0ef6d291953f3e7c9f556c1ba5baf47e2e" Normal Pulled 3m44s kubelet, worker-2 Successfully pulled image "openshift/hello-openshift@sha256:aaea76ff622d2f8bcb32e538e7b3cd0ef6d291953f3e7c9f556c1ba5baf47e2e" Normal Created 3m43s kubelet, worker-2 Created container hello-openshift Normal Started 3m43s kubelet, worker-2 Started container hello-openshift Need to explore the case more.
In my first duplication attempt, I did get a condition where the ovn went totally offline, and did not recover: I manage to grab the ovndb server log: Starting ovsdb-server. PMD: net_mlx4: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory PMD: net_mlx4: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx4) net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory net_mlx5: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5) 2020-04-27T22:53:13Z|00001|dns_resolve|WARN|Failed to read etc/hosts: syntax error Configuring Open vSwitch system IDs. Inserting openvswitch module. PMD: net_mlx4: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory PMD: net_mlx4: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx4) net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory net_mlx5: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5) 2020-04-27T22:53:13Z|00001|dns_resolve|WARN|Failed to read etc/hosts: syntax error Starting ovs-vswitchd. Enabling remote OVSDB managers. iptables binary not installed, not adding a rule for udp to port 6081. 2020-04-27T22:53:13.355Z|00063|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device) 2020-04-27T22:53:13.356Z|00064|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:13.357Z|00065|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:13.358Z|00066|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.12.0 2020-04-27T22:53:13.360Z|00067|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:13.361Z|00068|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device) 2020-04-27T22:53:13.362Z|00069|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:13.362Z|00070|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device) 2020-04-27T22:53:13.363Z|00071|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:13.364Z|00072|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:13.046Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log 2020-04-27T22:53:13.074Z|00002|stream_ssl|ERR|SSL_use_certificate_file: error:02001002:system library:fopen:No such file or directory 2020-04-27T22:53:13.074Z|00003|stream_ssl|ERR|SSL_use_PrivateKey_file: error:20074002:BIO routines:FILE_CTRL:system lib 2020-04-27T22:53:13.075Z|00004|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.12.0 2020-04-27T22:53:14.104Z|00073|bridge|INFO|bridge br-int: deleted interface patch-br-int-to-br-local_worker-1 on port 6 2020-04-27T22:53:14.104Z|00074|bridge|INFO|bridge br-local: deleted interface patch-br-local_worker-1-to-br-int on port 2 2020-04-27T22:53:14.109Z|00075|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:14.112Z|00076|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device) 2020-04-27T22:53:14.114Z|00077|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:14.115Z|00078|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device) 2020-04-27T22:53:14.118Z|00079|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:14.120Z|00080|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:14.287Z|00081|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:14.287Z|00082|bridge|INFO|bridge br-int: added interface patch-br-int-to-br-local_worker-1 on port 8 2020-04-27T22:53:14.292Z|00083|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device) 2020-04-27T22:53:14.295Z|00084|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:14.298Z|00085|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device) 2020-04-27T22:53:14.300Z|00086|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:14.301Z|00087|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:14.301Z|00088|bridge|INFO|bridge br-local: added interface patch-br-local_worker-1-to-br-int on port 3 2020-04-27T22:53:14.852Z|00089|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:14.853Z|00090|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device) 2020-04-27T22:53:14.853Z|00091|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:14.854Z|00092|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device) 2020-04-27T22:53:14.855Z|00093|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:14.855Z|00094|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:14.861Z|00095|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:14.862Z|00096|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device) 2020-04-27T22:53:14.863Z|00097|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:14.864Z|00098|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device) 2020-04-27T22:53:14.865Z|00099|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:14.865Z|00100|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:14.878Z|00101|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:14.880Z|00102|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device) 2020-04-27T22:53:14.883Z|00103|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:14.885Z|00104|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device) 2020-04-27T22:53:14.887Z|00105|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:14.888Z|00106|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:14.987Z|00107|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:14.989Z|00108|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device) 2020-04-27T22:53:14.989Z|00109|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:14.990Z|00110|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device) 2020-04-27T22:53:14.991Z|00111|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:14.992Z|00112|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:14.998Z|00113|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:14.999Z|00114|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device) 2020-04-27T22:53:15.000Z|00115|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:15.001Z|00116|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device) 2020-04-27T22:53:15.002Z|00117|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:15.003Z|00118|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:15.068Z|00119|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:15.070Z|00120|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device) 2020-04-27T22:53:15.071Z|00121|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:15.072Z|00122|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device) 2020-04-27T22:53:15.073Z|00123|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:15.073Z|00124|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:15.076Z|00125|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:15.077Z|00126|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device) 2020-04-27T22:53:15.077Z|00127|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:15.078Z|00128|bridge|WARN|could not open network device 17758cfa3f003f8 (No such device) 2020-04-27T22:53:15.080Z|00129|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:15.081Z|00130|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:23.072Z|00005|memory|INFO|8040 kB peak resident set size after 10.0 seconds 2020-04-27T22:53:23.072Z|00006|memory|INFO|cells:1102 monitors:3 sessions:2 2020-04-27T22:53:23.076Z|00131|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:23.079Z|00132|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device) 2020-04-27T22:53:23.080Z|00133|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:23.081Z|00134|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:23.082Z|00135|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:23.219Z|00136|memory|INFO|76680 kB peak resident set size after 10.0 seconds 2020-04-27T22:53:23.219Z|00137|memory|INFO|handlers:2 ofconns:2 ports:11 revalidators:2 rules:3034 udpif keys:14 2020-04-27T22:53:23.222Z|00138|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:23.223Z|00139|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device) 2020-04-27T22:53:23.224Z|00140|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:23.225Z|00141|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:23.225Z|00142|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:23.229Z|00143|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:23.230Z|00144|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device) 2020-04-27T22:53:23.232Z|00145|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:23.233Z|00146|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:23.234Z|00147|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:23.243Z|00148|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:23.244Z|00149|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device) 2020-04-27T22:53:23.244Z|00150|bridge|INFO|bridge br-int: added interface 573979f7b63763b on port 9 2020-04-27T22:53:23.245Z|00151|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:23.246Z|00152|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:23.247Z|00153|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:23.251Z|00154|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:23.252Z|00155|bridge|WARN|could not open network device da2b4fe0eddd427 (No such device) 2020-04-27T22:53:23.253Z|00156|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:23.254Z|00157|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:23.254Z|00158|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:24.114Z|00007|jsonrpc|WARN|unix#35: send error: Broken pipe 2020-04-27T22:53:24.114Z|00008|reconnect|WARN|unix#35: connection dropped (Broken pipe) 2020-04-27T22:53:24.119Z|00009|jsonrpc|WARN|unix#37: send error: Broken pipe 2020-04-27T22:53:24.120Z|00010|reconnect|WARN|unix#37: connection dropped (Broken pipe) 2020-04-27T22:53:24.126Z|00011|jsonrpc|WARN|unix#41: receive error: Connection reset by peer 2020-04-27T22:53:24.126Z|00012|reconnect|WARN|unix#41: connection dropped (Connection reset by peer) 2020-04-27T22:53:24.133Z|00013|jsonrpc|WARN|unix#45: send error: Broken pipe 2020-04-27T22:53:24.133Z|00014|reconnect|WARN|unix#45: connection dropped (Broken pipe) 2020-04-27T22:53:24.149Z|00015|jsonrpc|WARN|unix#50: send error: Broken pipe 2020-04-27T22:53:24.149Z|00016|reconnect|WARN|unix#50: connection dropped (Broken pipe) 2020-04-27T22:53:24.149Z|00017|reconnect|WARN|unix#52: connection dropped (Broken pipe) 2020-04-27T22:53:24.163Z|00018|reconnect|WARN|unix#57: connection dropped (Connection reset by peer) 2020-04-27T22:53:24.105Z|00159|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:24.108Z|00160|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:24.111Z|00161|bridge|WARN|could not open network device 3f99901f94d8a5a (No such device) 2020-04-27T22:53:24.113Z|00162|bridge|WARN|could not open network device 766f8fc4be5144e (No such device) 2020-04-27T22:53:24.113Z|00163|connmgr|INFO|br-int<->unix#1: 3058 flow_mods in the 9 s starting 10 s ago (3048 adds, 1 deletes, 9 modifications) 2020-04-27T22:53:24.116Z|00164|bridge|WARN|could not open network device 49c03f965afc87c (No such device) 2020-04-27T22:53:24.118Z|00165|bridge|WARN|could not open network device 6d8cb5c9fcf99df (No such device) 2020-04-27T22:53:24.391Z|00166|bridge|INFO|bridge br-int: added interface 21b2bc1054f784e on port 10 2020-04-27T22:53:24.506Z|00167|bridge|INFO|bridge br-int: added interface e5c4782300b3c7e on port 11 2020-04-27T22:53:24.529Z|00168|bridge|INFO|bridge br-int: added interface f69786ff3e0607d on port 12 2020-04-27T22:53:24.539Z|00169|bridge|INFO|bridge br-int: added interface 4875b3670a0dfa0 on port 13 2020-04-27T22:53:24.696Z|00170|bridge|INFO|bridge br-int: added interface 588327160cf4568 on port 14 2020-04-27T22:54:24.102Z|00171|connmgr|INFO|br-int<->unix#1: 185 flow_mods in the 32 s starting 59 s ago (115 adds, 70 modifications) 2020-04-27T22:54:26.702Z|00001|dpif(handler6)|WARN|Dropped 5 log messages in last 73 seconds (most recently, 73 seconds ago) due to excessive rate 2020-04-27T22:54:26.702Z|00002|dpif(handler6)|WARN|system@ovs-system: execute ct(commit,zone=38,label=0/0x1),11 failed (Invalid argument) on packet udp,vlan_tci=0x0000,dl_src=0a:58:0a:80:00:01,dl_dst=de:2a:61:80:00:05,nw_src=10.128.0.4,nw_dst=10.128.0.4,nw_tos=0,nw_ecn=0,nw_ttl=63,tp_src=51782,tp_dst=5353 udp_csum:1e4e with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x26),ct_tuple4(src=10.128.0.4,dst=10.128.0.4,proto=17,tp_src=51782,tp_dst=5353),in_port(11) mtu 0
Hitting this in OCP 4.4.3 on baremetal as well I0506 21:00:06.432246 6636 cni.go:163] [nodevertical0/nodevert-pod-29] CNI request &{ADD nodevertical0 nodevert-pod-29 5598cf7e581622b6314ebefa0d1dbbde6207deb23410a7c65da18860e343ae87 /proc/2217053/ns/net eth0 0xc00034c270}, result "", err failed to configure pod interface: timed out dumping br-int flow entries for sandbox: timed out waiting for the condition sh-4.2# rpm -qa | grep ovn ovn2.13-2.13.0-11.el7fdp.x86_64 ovn2.13-host-2.13.0-11.el7fdp.x86_64 ovn2.13-central-2.13.0-11.el7fdp.x86_64 ovn2.13-vtep-2.13.0-11.el7fdp.x86_64
We are seeing this frequently : Warning FailedCreatePodSandBox 4m5s kubelet, master-1 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_local-disks-local-provisioner-5vp8h_local-storage_ba462058-7df7-4e32-b77f -b79d50be91d1_0(17ef5f8ad8beccc8b19fecd401a09ab191255b9a1eac5b537c3822d151f69861): Multus: [local-storage/local-disks-local-provisioner-5vp8h]: error adding container to network "ovn-kubernetes": delegateAdd: error invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[local-storage/local-disks-local-provisioner-5vp8h] failed to configure pod interface: timed out dumping br-int flow entries for sandbox: timed out waiting for the condition To workaround this quickly, we simply delete the ovn-node pod on the specific node experiencing the error above [kni@e16-h12-b01-fc640 ~]$ oc delete -n openshift-ovn-kubernetes pods/ovnkube-node-f7ks5 However, this sometimes results in failure as well : [kni@e16-h12-b01-fc640 ~]$ oc logs pods/local-disks-local-provisioner-trdnl -n local-storage I0507 10:42:59.367602 1 common.go:320] StorageClass "local-sc" configured with MountDir "/mnt/local-storage/local-sc", HostDir "/mnt/local-storage/local-sc", VolumeMode "Filesystem", FsType "xfs", BlockCleanerCommand ["/scripts/quic k_reset.sh"] I0507 10:42:59.367708 1 main.go:63] Loaded configuration: {StorageClassConfig:map[local-sc:{HostDir:/mnt/local-storage/local-sc MountDir:/mnt/local-storage/local-sc BlockCleanerCommand:[/scripts/quick_reset.sh] VolumeMode:Filesystem FsType:xfs}] NodeLabelsForPV:[] UseAlphaAPI:false UseJobForCleaning:false MinResyncPeriod:{Duration:5m0s} UseNodeNameOnly:false LabelsForPV:map[storage.openshift.com/local-volume-owner-name:local-disks storage.openshift.com/local-volume- owner-namespace:local-storage]} I0507 10:42:59.367729 1 main.go:64] Ready to run... W0507 10:42:59.367737 1 main.go:73] MY_NAMESPACE environment variable not set, will be set to default. W0507 10:42:59.367746 1 main.go:79] JOB_CONTAINER_IMAGE environment variable not set. I0507 10:42:59.368016 1 common.go:382] Creating client using in-cluster config I0507 10:43:02.463235 1 main.go:126] Could not get node information (remaining retries: 2): Get https://172.30.0.1:443/api/v1/nodes/master-1: dial tcp 172.30.0.1:443: connect: no route to host I0507 10:43:05.535192 1 main.go:126] Could not get node information (remaining retries: 1): Get https://172.30.0.1:443/api/v1/nodes/master-1: dial tcp 172.30.0.1:443: connect: no route to host I0507 10:43:08.607020 1 main.go:126] Could not get node information (remaining retries: 0): Get https://172.30.0.1:443/api/v1/nodes/master-1: dial tcp 172.30.0.1:443: connect: no route to host F0507 10:43:08.607050 1 main.go:129] Could not get node information: Get https://172.30.0.1:443/api/v1/nodes/master-1: dial tcp 172.30.0.1:443: connect: no route to host However, with the right combination of deleting the ovn-node pod, we can get our desired outcome, however, this is not a maintainable solution.
Joe,
This bug was a placeholder and we tracked the original problem to bug 1828637. Unless somebody disagrees I'll close this bug as a duplicate of 1828637 tomorrow (Friday).
(In reply to Dan Williams from comment #15) > This bug was a placeholder and we tracked the original problem to bug > 1828637. Unless somebody disagrees I'll close this bug as a duplicate of > 1828637 tomorrow (Friday). Seems highly likely these are related.
*** This bug has been marked as a duplicate of bug 1828637 ***