Bug 1908616 - sdn pod crashed on rhel worker
Summary: sdn pod crashed on rhel worker
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 4.7.0
Assignee: Antonio Ojea
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-17 07:51 UTC by zhaozhanqi
Modified: 2020-12-18 09:44 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-18 09:44:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description zhaozhanqi 2020-12-17 07:51:24 UTC
Description of problem:
setup cluster with 2 rhel worker. found the sdn pod for rhel worker in crash status

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2020-12-14-165231

How reproducible:


Steps to Reproduce:
1. setup cluster and scale up rhel worker (UPI_Bare Metal_Restricted_RHEL 7.9 &RHCOS 4.7_Real Time Kernel  )
2.check the logs of sdn which is in crash
3. Check the ovs logs 

Actual results:

 #oc logs sdn-gj5l8 -n openshift-sdn -c sdn
I1217 07:45:07.002845  222904 cmd.go:121] Reading proxy configuration from /config/kube-proxy-config.yaml
I1217 07:45:07.004513  222904 feature_gate.go:243] feature gates: &{map[EndpointSlice:false EndpointSliceProxying:false]}
I1217 07:45:07.004565  222904 feature_gate.go:243] feature gates: &{map[EndpointSlice:false EndpointSliceProxying:false]}
I1217 07:45:07.004599  222904 cmd.go:218] Watching config file /config/kube-proxy-config.yaml for changes
I1217 07:45:07.004625  222904 cmd.go:218] Watching config file /config/..2020_12_17_03_44_20.784771945/kube-proxy-config.yaml for changes
I1217 07:45:07.049525  222904 node.go:152] Initializing SDN node "piqin-1217-fjnpq-rhel-1" (10.0.98.215) of type "redhat/openshift-ovs-networkpolicy"
I1217 07:45:07.049829  222904 cmd.go:159] Starting node networking (v0.0.0-alpha.0-258-ge48f05e4)
I1217 07:45:07.049846  222904 node.go:340] Starting openshift-sdn network plugin
I1217 07:45:07.205071  222904 sdn_controller.go:139] [SDN setup] full SDN setup required (Link not found)
I1217 07:45:37.238388  222904 ovs.go:158] Error executing ovs-vsctl: 2020-12-17T07:45:37Z|00002|fatal_signal|WARN|terminating with signal 14 (Alarm clock)
I1217 07:46:07.763834  222904 ovs.go:158] Error executing ovs-vsctl: 2020-12-17T07:46:07Z|00002|fatal_signal|WARN|terminating with signal 14 (Alarm clock)
I1217 07:46:08.278434  222904 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I1217 07:46:08.784885  222904 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I1217 07:46:09.416273  222904 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I1217 07:46:10.204059  222904 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I1217 07:46:11.186794  222904 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I1217 07:46:12.414438  222904 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I1217 07:46:13.947346  222904 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I1217 07:46:15.862405  222904 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I1217 07:46:18.254166  222904 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
I1217 07:46:21.240631  222904 ovs.go:158] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused)
F1217 07:46:21.240681  222904 cmd.go:111] Failed to start sdn: node SDN setup failed: timed out waiting for the condition
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0xc000134001, 0xc0003da480, 0x7a, 0xbf)
	k8s.io/klog/v2.0/klog.go:1026 +0xb9
k8s.io/klog/v2.(*loggingT).output(0x2e264c0, 0xc000000003, 0x0, 0x0, 0xc00020f960, 0x2d6c415, 0x6, 0x6f, 0x0)
	k8s.io/klog/v2.0/klog.go:975 +0x19b
k8s.io/klog/v2.(*loggingT).printf(0x2e264c0, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x1eac4c7, 0x17, 0xc000504240, 0x1, ...)
	k8s.io/klog/v2.0/klog.go:750 +0x191
k8s.io/klog/v2.Fatalf(...)
	k8s.io/klog/v2.0/klog.go:1502
github.com/openshift/sdn/pkg/openshift-sdn.(*OpenShiftSDN).Run(0xc00020f8f0, 0xc0008b0780, 0x2129940, 0xc000134010, 0xc00010e120)
	github.com/openshift/sdn/pkg/openshift-sdn/cmd.go:111 +0x6d5
github.com/openshift/sdn/pkg/openshift-sdn.NewOpenShiftSDNCommand.func1.2(0xc000000010, 0x1fb4300)
	github.com/openshift/sdn/pkg/openshift-sdn/cmd.go:63 +0x4e
k8s.io/kubernetes/pkg/util/interrupt.(*Handler).Run(0xc0008ec660, 0xc000907d40, 0x0, 0x0)
	k8s.io/kubernetes.0-rc.0/pkg/util/interrupt/interrupt.go:103 +0xf2
github.com/openshift/sdn/pkg/openshift-sdn.NewOpenShiftSDNCommand.func1(0xc0008b0780, 0xc0000a4980, 0x0, 0x8)
	github.com/openshift/sdn/pkg/openshift-sdn/cmd.go:62 +0x147
github.com/spf13/cobra.(*Command).execute(0xc0008b0780, 0xc000138010, 0x8, 0x8, 0xc0008b0780, 0xc000138010)
	github.com/spf13/cobra.1/command.go:766 +0x2c2
github.com/spf13/cobra.(*Command).ExecuteC(0xc0008b0780, 0xd, 0x2129940, 0xc000134010)
	github.com/spf13/cobra.1/command.go:850 +0x30b
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra.1/command.go:800
main.main()
	github.com/openshift/sdn/cmd/openshift-sdn/openshift-sdn.go:28 +0x185

goroutine 19 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x2e264c0)
	k8s.io/klog/v2.0/klog.go:1169 +0x8b
created by k8s.io/klog/v2.init.0
	k8s.io/klog/v2.0/klog.go:417 +0xdf

goroutine 20 [chan receive]:
k8s.io/klog.(*loggingT).flushDaemon(0x2e262e0)
	k8s.io/klog.0/klog.go:1010 +0x8b
created by k8s.io/klog.init.0
	k8s.io/klog.0/klog.go:411 +0xd8

goroutine 7 [select]:
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x1fb32a8, 0x2128c60, 0xc000670060, 0xc00010e601, 0xc00010e0c0)
	k8s.io/apimachinery.6/pkg/util/wait/wait.go:167 +0x149
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x1fb32a8, 0x12a05f200, 0x0, 0x7d1001, 0xc00010e0c0)
	k8s.io/apimachinery.6/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(...)
	k8s.io/apimachinery.6/pkg/util/wait/wait.go:90
k8s.io/apimachinery/pkg/util/wait.Forever(0x1fb32a8, 0x12a05f200)
	k8s.io/apimachinery.6/pkg/util/wait/wait.go:81 +0x4f
created by k8s.io/component-base/logs.InitLogs
	k8s.io/component-base.6/logs/logs.go:58 +0x8a

goroutine 9 [syscall, 1 minutes]:
os/signal.signal_recv(0x0)
	runtime/sigqueue.go:147 +0x9d
os/signal.loop()
	os/signal/signal_unix.go:23 +0x25
created by os/signal.Notify.func1.1


2.  Check the ovs pod is running as container other than systemd for rhel worker

openvswitch is running in container
Starting ovsdb-server.
Configuring Open vSwitch system IDs.
Enabling remote OVSDB managers.
Starting ovs-vswitchd.
Enabling remote OVSDB managers.
2020-12-17 03:44:25 info: Loading previous flows ...
2020-12-17 03:44:25 info: Remove other config ...
2020-12-17 03:44:25 info: Removed other config ...
2020-12-17T03:44:25.746Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log
2020-12-17T03:44:25.757Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.13.2
2020-12-17T03:44:25.917Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2020-12-17T03:44:25.928Z|00002|ovs_numa|INFO|Discovered 4 CPU cores on NUMA node 0
2020-12-17T03:44:25.928Z|00003|ovs_numa|INFO|Discovered 1 NUMA nodes and 4 CPU cores
2020-12-17T03:44:25.928Z|00004|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2020-12-17T03:44:25.928Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2020-12-17T03:44:25.930Z|00006|dpdk|INFO|DPDK Disabled - Use other_config:dpdk-init to enable
2020-12-17T03:44:25.934Z|00007|dpif_netlink|INFO|The kernel module does not support meters.
2020-12-17T03:44:25.939Z|00008|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.13.2
2020-12-17T03:44:35.764Z|00003|memory|INFO|3492 kB peak resident set size after 10.0 seconds
2020-12-17T03:44:35.764Z|00004|memory|INFO|cells:39 monitors:2 sessions:1
2020-12-17T03:44:39.693Z|00009|memory|INFO|35692 kB peak resident set size after 13.8 seconds
2020-12-17T03:44:39.704Z|00010|ofproto_dpif|INFO|system@ovs-system: Datapath supports recirculation
2020-12-17T03:44:39.704Z|00011|ofproto_dpif|INFO|system@ovs-system: VLAN header stack length probed as 2
2020-12-17T03:44:39.704Z|00012|ofproto_dpif|INFO|system@ovs-system: MPLS label stack length probed as 1
2020-12-17T03:44:39.704Z|00013|ofproto_dpif|INFO|system@ovs-system: Datapath supports truncate action
2020-12-17T03:44:39.704Z|00014|ofproto_dpif|INFO|system@ovs-system: Datapath supports unique flow ids
2020-12-17T03:44:39.704Z|00015|ofproto_dpif|INFO|system@ovs-system: Datapath does not support clone action
2020-12-17T03:44:39.704Z|00016|ofproto_dpif|INFO|system@ovs-system: Max sample nesting level probed as 10
2020-12-17T03:44:39.704Z|00017|ofproto_dpif|INFO|system@ovs-system: Datapath supports eventmask in conntrack action
2020-12-17T03:44:39.704Z|00018|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_clear action
2020-12-17T03:44:39.704Z|00019|ofproto_dpif|INFO|system@ovs-system: Max dp_hash algorithm probed to be 0
2020-12-17T03:44:39.704Z|00020|ofproto_dpif|INFO|system@ovs-system: Datapath does not support check_pkt_len action
2020-12-17T03:44:39.704Z|00021|ofproto_dpif|INFO|system@ovs-system: Datapath does not support timeout policy in conntrack action
2020-12-17T03:44:39.704Z|00022|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state
2020-12-17T03:44:39.704Z|00023|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_zone
2020-12-17T03:44:39.704Z|00024|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_mark
2020-12-17T03:44:39.704Z|00025|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label
2020-12-17T03:44:39.704Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state_nat
2020-12-17T03:44:39.704Z|00027|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple
2020-12-17T03:44:39.704Z|00028|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple6
2020-12-17T03:44:39.704Z|00029|ofproto_dpif|INFO|system@ovs-system: Datapath does not support IPv6 ND Extensions
2020-12-17T03:44:39.716Z|00030|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2020-12-17T03:44:39.716Z|00031|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2020-12-17T03:44:39.716Z|00032|dpif_netlink|INFO|dpif_netlink_meter_transact get failed
2020-12-17T03:44:39.716Z|00033|dpif_netlink|INFO|The kernel module has a broken meter implementation.
2020-12-17T03:44:39.720Z|00034|bridge|INFO|bridge br0: added interface br0 on port 65534
2020-12-17T03:44:39.720Z|00035|bridge|INFO|bridge br0: using datapath ID 000086a3bd75ec44
2020-12-17T03:44:39.721Z|00036|connmgr|INFO|br0: added service controller "punix:/var/run/openvswitch/br0.mgmt"
2020-12-17T03:44:39.776Z|00037|bridge|INFO|bridge br0: added interface vxlan0 on port 1
2020-12-17T03:44:39.810Z|00038|netdev|WARN|failed to set MTU for network device tun0: No such device


Check the logs of ovs-vswitched

sh-4.2# journalctl -u ovs-vswitchd.service
-- Logs begin at Wed 2020-12-16 22:43:49 EST, end at Thu 2020-12-17 02:37:04 EST. --
Dec 16 22:43:54 piqin-1217-fjnpq-rhel-1 systemd[1]: Starting Open vSwitch Forwarding Unit...
Dec 16 22:43:54 piqin-1217-fjnpq-rhel-1 ovs-ctl[869]: Inserting openvswitch module [  OK  ]
Dec 16 22:43:54 piqin-1217-fjnpq-rhel-1 ovs-ctl[869]: Starting ovs-vswitchd [  OK  ]
Dec 16 22:43:54 piqin-1217-fjnpq-rhel-1 ovs-ctl[869]: Enabling remote OVSDB managers [  OK  ]
Dec 16 22:43:54 piqin-1217-fjnpq-rhel-1 ovs-vsctl[926]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . external-ids:hostname=piqin-1217-fjnpq-rhel-1
Dec 16 22:43:54 piqin-1217-fjnpq-rhel-1 systemd[1]: Started Open vSwitch Forwarding Unit.
Dec 16 22:44:32 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00036|stream_unix|ERR|/var/run/openvswitch/br0.mgmt: binding failed: No such file or directory
Dec 16 22:44:32 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00038|stream_unix|ERR|/var/run/openvswitch/br0.snoop: binding failed: No such file or directory
Dec 16 22:44:32 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00039|connmgr|ERR|failed to listen on punix:/var/run/openvswitch/br0.snoop: No such file or directory
Dec 16 22:44:32 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00041|stream_unix|ERR|/var/run/openvswitch/br0.mgmt: binding failed: No such file or directory
Dec 16 22:44:32 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00043|stream_unix|ERR|/var/run/openvswitch/br0.snoop: binding failed: No such file or directory
Dec 16 22:44:32 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00044|connmgr|ERR|failed to listen on punix:/var/run/openvswitch/br0.snoop: No such file or directory
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00050|stream_unix|ERR|/var/run/openvswitch/br0.mgmt: binding failed: No such file or directory
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00052|stream_unix|ERR|/var/run/openvswitch/br0.snoop: binding failed: No such file or directory
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00053|connmgr|ERR|failed to listen on punix:/var/run/openvswitch/br0.snoop: No such file or directory
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00055|stream_unix|ERR|/var/run/openvswitch/br0.mgmt: binding failed: No such file or directory
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00057|stream_unix|ERR|/var/run/openvswitch/br0.snoop: binding failed: No such file or directory
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00058|connmgr|ERR|failed to listen on punix:/var/run/openvswitch/br0.snoop: No such file or directory
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00060|stream_unix|ERR|/var/run/openvswitch/br0.mgmt: binding failed: No such file or directory
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00062|stream_unix|ERR|/var/run/openvswitch/br0.snoop: binding failed: No such file or directory
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00063|connmgr|ERR|failed to listen on punix:/var/run/openvswitch/br0.snoop: No such file or directory
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00066|stream_unix|ERR|/var/run/openvswitch/br0.mgmt: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00069|stream_unix|ERR|/var/run/openvswitch/br0.snoop: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00070|connmgr|ERR|failed to listen on punix:/var/run/openvswitch/br0.snoop: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00073|stream_unix|ERR|/var/run/openvswitch/br0.mgmt: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00076|stream_unix|ERR|/var/run/openvswitch/br0.snoop: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00077|connmgr|ERR|failed to listen on punix:/var/run/openvswitch/br0.snoop: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00080|stream_unix|ERR|/var/run/openvswitch/br0.mgmt: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00083|stream_unix|ERR|/var/run/openvswitch/br0.snoop: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00084|connmgr|ERR|failed to listen on punix:/var/run/openvswitch/br0.snoop: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00087|stream_unix|ERR|/var/run/openvswitch/br0.mgmt: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00090|stream_unix|ERR|/var/run/openvswitch/br0.snoop: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00091|connmgr|ERR|failed to listen on punix:/var/run/openvswitch/br0.snoop: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00094|stream_unix|ERR|/var/run/openvswitch/br0.mgmt: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00097|stream_unix|ERR|/var/run/openvswitch/br0.snoop: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00098|connmgr|ERR|failed to listen on punix:/var/run/openvswitch/br0.snoop: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00101|stream_unix|ERR|/var/run/openvswitch/br0.mgmt: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00104|stream_unix|ERR|/var/run/openvswitch/br0.snoop: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00105|connmgr|ERR|failed to listen on punix:/var/run/openvswitch/br0.snoop: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00108|stream_unix|ERR|/var/run/openvswitch/br0.mgmt: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00111|stream_unix|ERR|/var/run/openvswitch/br0.snoop: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00112|connmgr|ERR|failed to listen on punix:/var/run/openvswitch/br0.snoop: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00115|stream_unix|ERR|/var/run/openvswitch/br0.mgmt: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00118|stream_unix|ERR|/var/run/openvswitch/br0.snoop: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00119|connmgr|ERR|failed to listen on punix:/var/run/openvswitch/br0.snoop: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00122|stream_unix|ERR|/var/run/openvswitch/br0.mgmt: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00125|stream_unix|ERR|/var/run/openvswitch/br0.snoop: binding failed: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00126|connmgr|ERR|failed to listen on punix:/var/run/openvswitch/br0.snoop: Permission denied
Dec 16 22:44:45 piqin-1217-fjnpq-rhel-1 ovs-vswitchd[913]: ovs|00129|stream_unix|ERR|/var/run/openvswitch/br0.mgmt: binding failed: Permiss

Expected results:


Additional info:
master and other rhcos worker is using kuber1.19. However it is 1.20 for rhel worker


 oc get node
NAME                               STATUS   ROLES    AGE     VERSION
piqin-1217-fjnpq-compute-0         Ready    worker   5h29m   v1.19.2+e386040
piqin-1217-fjnpq-compute-1         Ready    worker   5h28m   v1.19.2+e386040
piqin-1217-fjnpq-control-plane-0   Ready    master   5h40m   v1.19.2+e386040
piqin-1217-fjnpq-control-plane-1   Ready    master   5h40m   v1.19.2+e386040
piqin-1217-fjnpq-control-plane-2   Ready    master   5h40m   v1.19.2+e386040
piqin-1217-fjnpq-rhel-0            Ready    worker   3h59m   v1.20.0+87544c5
piqin-1217-fjnpq-rhel-1            Ready    worker   3h59m   v1.20.0+87544c5

Comment 3 zhaozhanqi 2020-12-18 08:23:46 UTC
Yes, Antonio, When setup the cluster the master and rhcos worker are using kube 1.19. but when scale up rhel workers, the rhel worker show 1.20. it's strange and no idea why this happen. 
 and not sure if the version mismatch issue cause the sdn pod crash. I also think it should not be supported for different kube version. 
 will have a try with next accepted build with 1.20 version.

Comment 4 zhaozhanqi 2020-12-18 09:44:00 UTC
setup another cluster with kube 1.20, this issue did not be happened. I will close this bug


Note You need to log in before you can comment on or make changes to this bug.