Bug 1878707 - [4.6] sdn pod crash with failed to open socket
Summary: [4.6] sdn pod crash with failed to open socket
Keywords:
Status: CLOSED DUPLICATE of bug 1874696
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.6.0
Assignee: Aniket Bhat
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-14 11:43 UTC by zhaozhanqi
Modified: 2020-09-16 13:24 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-16 13:24:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description zhaozhanqi 2020-09-14 11:43:57 UTC
Description of problem:
UPI on aws with 4 nodes, met sdn pod crashed with following error: 

I0914 11:25:26.790758 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)


Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-09-12-230035

How reproducible:


Steps to Reproduce:
1. 
2.
3.

Actual results:
 oc logs sdn-gpvvp -n openshift-sdn -c sdn
I0914 11:25:26.665266 1651659 cmd.go:121] Reading proxy configuration from /config/kube-proxy-config.yaml
I0914 11:25:26.666379 1651659 feature_gate.go:243] feature gates: &{map[]}
I0914 11:25:26.666421 1651659 cmd.go:216] Watching config file /config/kube-proxy-config.yaml for changes
I0914 11:25:26.666455 1651659 cmd.go:216] Watching config file /config/..2020_09_14_11_10_59.920273164/kube-proxy-config.yaml for changes
I0914 11:25:26.692552 1651659 node.go:150] Initializing SDN node "ip-10-0-54-32.us-east-2.compute.internal" (10.0.54.32) of type "redhat/openshift-ovs-networkpolicy"
I0914 11:25:26.696419 1651659 cmd.go:159] Starting node networking (v0.0.0-alpha.0-201-g79fd230c)
I0914 11:25:26.696434 1651659 node.go:338] Starting openshift-sdn network plugin
I0914 11:25:26.790758 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:25:27.294809 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:25:27.924311 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:25:28.709851 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:25:29.690655 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:25:30.915621 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:25:32.445503 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:25:34.356446 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:25:36.744748 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:25:39.729168 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:25:39.729195 1651659 sdn_controller.go:139] [SDN setup] full SDN setup required (plugin is not setup)
I0914 11:26:09.735482 1651659 ovs.go:180] Error executing ovs-vsctl: 2020-09-14T11:26:09Z|00002|fatal_signal|WARN|terminating with signal 14 (Alarm clock)
I0914 11:26:40.275031 1651659 ovs.go:180] Error executing ovs-vsctl: 2020-09-14T11:26:40Z|00002|fatal_signal|WARN|terminating with signal 14 (Alarm clock)
I0914 11:26:40.783923 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:26:41.287895 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:26:41.916786 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:26:42.702013 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:26:43.682345 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:26:44.907001 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:26:46.437090 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:26:48.348352 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:26:50.736413 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I0914 11:26:53.720837 1651659 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
F0914 11:26:53.720882 1651659 cmd.go:111] Failed to start sdn: node SDN setup failed: timed out waiting for the condition


Expected results:


Additional info:

Comment 5 zhaozhanqi 2020-09-15 08:38:49 UTC
using 'cluster-bot' with this PR https://github.com/openshift/cluster-network-operator/pull/785 on aws, I did not reproduce this issue

I think this issue should be not always happen.

Comment 6 Aniket Bhat 2020-09-15 13:05:59 UTC
Likely a timing thing. But Juan's PR definitely is going to fix things in this area.

Comment 8 zhaozhanqi 2020-09-16 05:38:28 UTC
Met this issue again with 4.6.0-0.nightly-2020-09-15-205317 on OSP

since this issue happen frequently. Move target release to 4.6. @Aniket WDYT? thanks.

Comment 10 Aniket Bhat 2020-09-16 13:24:44 UTC
I think we have an understanding of the issue. This should be fixed by the CNO PR 785 I mentioned above. Till that PR is merged, there is no point verifying this on nightlies.

I am going to dup this bug and have these failures tracked in 1874696.

*** This bug has been marked as a duplicate of bug 1874696 ***


Note You need to log in before you can comment on or make changes to this bug.