Description of problem: UPI on aws with 4 nodes, met sdn pod crashed with following error: I0914 11:25:26.790758 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-09-12-230035 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: oc logs sdn-gpvvp -n openshift-sdn -c sdn I0914 11:25:26.665266 1651659 cmd.go:121] Reading proxy configuration from /config/kube-proxy-config.yaml I0914 11:25:26.666379 1651659 feature_gate.go:243] feature gates: &{map[]} I0914 11:25:26.666421 1651659 cmd.go:216] Watching config file /config/kube-proxy-config.yaml for changes I0914 11:25:26.666455 1651659 cmd.go:216] Watching config file /config/..2020_09_14_11_10_59.920273164/kube-proxy-config.yaml for changes I0914 11:25:26.692552 1651659 node.go:150] Initializing SDN node "ip-10-0-54-32.us-east-2.compute.internal" (10.0.54.32) of type "redhat/openshift-ovs-networkpolicy" I0914 11:25:26.696419 1651659 cmd.go:159] Starting node networking (v0.0.0-alpha.0-201-g79fd230c) I0914 11:25:26.696434 1651659 node.go:338] Starting openshift-sdn network plugin I0914 11:25:26.790758 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:25:27.294809 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:25:27.924311 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:25:28.709851 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:25:29.690655 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:25:30.915621 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:25:32.445503 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:25:34.356446 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:25:36.744748 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:25:39.729168 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:25:39.729195 1651659 sdn_controller.go:139] [SDN setup] full SDN setup required (plugin is not setup) I0914 11:26:09.735482 1651659 ovs.go:180] Error executing ovs-vsctl: 2020-09-14T11:26:09Z|00002|fatal_signal|WARN|terminating with signal 14 (Alarm clock) I0914 11:26:40.275031 1651659 ovs.go:180] Error executing ovs-vsctl: 2020-09-14T11:26:40Z|00002|fatal_signal|WARN|terminating with signal 14 (Alarm clock) I0914 11:26:40.783923 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:26:41.287895 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:26:41.916786 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:26:42.702013 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:26:43.682345 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:26:44.907001 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:26:46.437090 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:26:48.348352 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:26:50.736413 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) I0914 11:26:53.720837 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) F0914 11:26:53.720882 1651659 cmd.go:111] Failed to start sdn: node SDN setup failed: timed out waiting for the condition Expected results: Additional info:
using 'cluster-bot' with this PR https://github.com/openshift/cluster-network-operator/pull/785 on aws, I did not reproduce this issue I think this issue should be not always happen.
Likely a timing thing. But Juan's PR definitely is going to fix things in this area.
Met this issue again with 4.6.0-0.nightly-2020-09-15-205317 on OSP since this issue happen frequently. Move target release to 4.6. @Aniket WDYT? thanks.
I think we have an understanding of the issue. This should be fixed by the CNO PR 785 I mentioned above. Till that PR is merged, there is no point verifying this on nightlies. I am going to dup this bug and have these failures tracked in 1874696. *** This bug has been marked as a duplicate of bug 1874696 ***