Bug 1878657
Summary: | Multus-admission-controller pods in crashloopbackoff state due to multus-admission-controller-secret not found error. | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Archana Prabhakar <aprabhak> | ||||||||||||||||
Component: | Multi-Arch | Assignee: | Dennis Gilmore <dgilmore> | ||||||||||||||||
Status: | CLOSED DUPLICATE | QA Contact: | Barry Donahue <bdonahue> | ||||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||||
Priority: | unspecified | ||||||||||||||||||
Version: | 4.6 | CC: | danili, lmcfadde, mjtarsel, mkumatag, mtarsel, pdsilva, pradeep, psundara, sanjum, wilder | ||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||
Target Release: | 4.6.0 | ||||||||||||||||||
Hardware: | ppc64le | ||||||||||||||||||
OS: | Linux | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||
Last Closed: | 2020-09-18 18:41:39 UTC | Type: | Bug | ||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
Embargoed: | |||||||||||||||||||
Attachments: |
|
Description
Archana Prabhakar
2020-09-14 09:59:30 UTC
This issue is noticed on the powervm environment. Created attachment 1714830 [details]
bootstrap gather logs
Created attachment 1714831 [details]
Rendered assets logs
Created attachment 1714832 [details]
Resources json
Created attachment 1714834 [details]
master-1 logs
Created attachment 1714835 [details]
master-2 logs
Created attachment 1714836 [details]
master 3 logs
Looks like the cluster-policy-controller on the bootstrap is crashing with this exception: goroutine 1 [running]: k8s.io/klog/v2.stacks(0xc00083bc01, 0xc0001da1c0, 0x7b, 0xd1) k8s.io/klog/v2.0/klog.go:996 +0xac k8s.io/klog/v2.(*loggingT).output(0x12fe49e0, 0xc000000003, 0x0, 0x0, 0xc000a137a0, 0x12f03a66, 0x6, 0x37, 0x0) k8s.io/klog/v2.0/klog.go:945 +0x1b4 k8s.io/klog/v2.(*loggingT).printDepth(0x12fe49e0, 0xc000000003, 0x0, 0x0, 0x1, 0xc0008dbb30, 0x1, 0x1) k8s.io/klog/v2.0/klog.go:718 +0x128 k8s.io/klog/v2.(*loggingT).print(...) k8s.io/klog/v2.0/klog.go:703 k8s.io/klog/v2.Fatal(...) k8s.io/klog/v2.0/klog.go:1443 github.com/openshift/cluster-policy-controller/pkg/cmd/cluster-policy-controller.NewClusterPolicyControllerCommand.func1(0xc0001c5080, 0xc00087b4a0, 0x0, 0x5) github.com/openshift/cluster-policy-controller/pkg/cmd/cluster-policy-controller/cmd.go:55 +0x31c github.com/spf13/cobra.(*Command).execute(0xc0001c5080, 0xc00087b450, 0x5, 0x5, 0xc0001c5080, 0xc00087b450) github.com/spf13/cobra.0/command.go:846 +0x208 github.com/spf13/cobra.(*Command).ExecuteC(0xc0001c4840, 0xc0001c4840, 0x0, 0x0) github.com/spf13/cobra.0/command.go:950 +0x294 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra.0/command.go:887 main.main() github.com/openshift/cluster-policy-controller/cmd/cluster-policy-controller/main.go:67 +0x298 goroutine 19 [chan receive]: k8s.io/klog/v2.(*loggingT).flushDaemon(0x12fe49e0) k8s.io/klog/v2.0/klog.go:1131 +0x78 created by k8s.io/klog/v2.init.0 k8s.io/klog/v2.0/klog.go:416 +0xe0 goroutine 92 [syscall]: os/signal.signal_recv(0x0) runtime/sigqueue.go:147 +0xf8 os/signal.loop() os/signal/signal_unix.go:23 +0x24 created by os/signal.Notify.func1 os/signal/signal.go:127 +0x4c goroutine 94 [chan receive]: k8s.io/apiserver/pkg/server.SetupSignalContext.func1(0xc00096bd40) k8s.io/apiserver.0-rc.2/pkg/server/signal.go:48 +0x38 created by k8s.io/apiserver/pkg/server.SetupSignalContext k8s.io/apiserver.0-rc.2/pkg/server/signal.go:47 +0xf0 goroutine 95 [select]: k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x11c185a0, 0x11e8e920, 0xc000802030, 0x1, 0xc00053e060) k8s.io/apimachinery.0-rc.2/pkg/util/wait/wait.go:167 +0x120 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x11c185a0, 0x12a05f200, 0x0, 0x1, 0xc00053e060) k8s.io/apimachinery.0-rc.2/pkg/util/wait/wait.go:133 +0x90 k8s.io/apimachinery/pkg/util/wait.Until(...) k8s.io/apimachinery.0-rc.2/pkg/util/wait/wait.go:90 k8s.io/apimachinery/pkg/util/wait.Forever(0x11c185a0, 0x12a05f200) k8s.io/apimachinery.0-rc.2/pkg/util/wait/wait.go:81 +0x50 created by k8s.io/component-base/logs.InitLogs k8s.io/component-base.0-rc.2/logs/logs.go:58 +0x88 ah..seems like that was the side effect - the fatal exception was because of : 16:14:22.617615 1 cmd.go:55] open /etc/kubernetes/config/cluster-policy-config.yaml: no such file or directory On the bootstrap node, [root@bootstrap core]# ls -l /etc/kubernetes/bootstrap-configs total 16 -rw-r--r--. 1 root root 6635 Sep 14 05:54 kube-apiserver-config.yaml -rw-r--r--. 1 root root 1341 Sep 14 05:54 kube-controller-manager-config.yaml -rw-r--r--. 1 root root 81 Sep 14 05:54 kube-scheduler-config.yaml [root@bootstrap core]# pwd /var/home/core [root@bootstrap core]# ls -l /etc/kubernetes/ total 16 drwxr-xr-x. 2 root root 117 Sep 14 05:54 bootstrap-configs drwxr-xr-x. 2 root root 6 Sep 14 05:53 bootstrap-manifests drwx------. 2 root root 4096 Sep 15 13:59 bootstrap-secrets drwxr-xr-x. 2 root root 6 Sep 14 05:54 cloud drwxr-xr-x. 3 root root 19 Sep 1 03:51 cni -rw-------. 1 root root 5795 Sep 14 05:54 kubeconfig -rw-r--r--. 1 root root 146 Sep 14 05:53 kubelet-pause-image-override drwxr-xr-x. 3 root root 20 Sep 14 05:53 kubelet-plugins drwxr-xr-x. 2 root root 252 Sep 15 13:59 manifests drwxr-xr-x. 3 root root 25 Sep 14 05:54 static-pod-resources ignore previous analysis..looks like that error occurs but the installation still proceeds. what i do see on the master node's sdn pods though is concerning: I0914 06:12:20.799579 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:20.925805 1996 proxier.go:370] userspace proxy: processing 0 service events I0914 06:12:20.926133 1996 proxier.go:349] userspace syncProxyRules took 49.44885ms I0914 06:12:21.081209 1996 proxier.go:370] userspace proxy: processing 0 service events I0914 06:12:21.082842 1996 proxier.go:349] userspace syncProxyRules took 50.442527ms I0914 06:12:21.306427 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:21.937474 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:22.724906 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:23.707811 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:24.935827 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:26.469006 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:28.382423 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:30.772801 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:33.759059 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface F0914 06:12:33.759135 1996 healthcheck.go:99] SDN healthcheck detected unhealthy OVS server, restarting: plugin is not setup ovsdb server fails to start Failed to connect to bus: No data available openvswitch is running in container /etc/openvswitch/conf.db does not exist ... (warning). Creating empty database /etc/openvswitch/conf.db. ovsdb-server: /var/run/openvswitch/ovsdb-server.pid: pidfile check failed (No such process), aborting Starting ovsdb-server ... failed! Added network dev from POWER team Mick and squad lead Manju. Noticing no space errors on the interfaces. Below is the data from one of the master nodes. The RHCOS46 images had 20GB disk size. [root@master-2 core]# ovs-vsctl list interface _uuid : eea736c0-7b57-4e1e-aa18-dfc6f20900db admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device veth657700ea to ofproto (No space left on device)" external_ids : {ip="10.128.0.9", sandbox="0e899417fbcbd8b49c511fcbe273ccd2ec07520f94298a71024b3ee4b6b79661"} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : veth657700ea ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : 01c25001-64ca-44c4-bdaa-fcb63a7d9de1 admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device vethf20e8e29 to ofproto (No space left on device)" external_ids : {ip="10.128.0.15", sandbox="130cab3a6214363bdb4994202336fd267ddd9a4a55d9a4a7fac706806f46f026"} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : vethf20e8e29 ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : 51dae8f1-ccee-4752-af78-1861dd366e30 admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device veth6cec1255 to ofproto (No space left on device)" external_ids : {ip="10.128.0.6", sandbox=b6bc118838dce0ca5252d8efe90172a354ebeedb4c59158b5da4b03c38f5827a} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : veth6cec1255 ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : 15a1fb82-79a2-4b10-9192-b4f09cfad6be admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 25 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "1a:81:c6:da:2a:09" mtu : [] mtu_request : [] name : vxlan0 ofport : 1 ofport_request : 1 options : {dst_port="4789", key=flow, remote_ip=flow} other_config : {} statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0} status : {} type : vxlan _uuid : 3752b713-ecab-4798-8167-810bdbf8bdfb admin_state : up bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {} ifindex : 27 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 1 link_speed : [] link_state : up lldp : {} mac : [] mac_in_use : "02:eb:92:6e:4e:87" mtu : 1450 mtu_request : 1450 name : tun0 ofport : 2 ofport_request : 2 options : {} other_config : {} statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=0, tx_bytes=125108, tx_dropped=0, tx_errors=0, tx_packets=2968} status : {driver_name=openvswitch} type : internal _uuid : cf1d8504-e77a-41c4-8bdd-b265c3a2ba67 admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device veth05b2efc4 to ofproto (No space left on device)" external_ids : {ip="10.128.0.18", sandbox=dce0bbe07172b35471abf89641c28d0ec9dcd4e26fd9f671eaaf03cd6ca62ee0} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : veth05b2efc4 ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : 1d1ecee5-7e76-4b50-aa67-a637ecdf3438 admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device veth5f6cb8b8 to ofproto (No space left on device)" external_ids : {ip="10.128.0.11", sandbox="7bfaf063fb454194c9d02fd74d3bcbb90c6a68967ae28e6a2457b5929bd5e990"} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : veth5f6cb8b8 ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : f6bb3176-06fb-492b-9229-ffa0b901c029 admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device veth9dd59021 to ofproto (No space left on device)" external_ids : {ip="10.128.0.10", sandbox="04af944f504e0dbfa1079b438537c655e4c63b7389266b1ea3083a55b31684cf"} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : veth9dd59021 ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : e6c5d907-3957-4d00-961c-baf4e63e2257 admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device veth73694404 to ofproto (No space left on device)" external_ids : {ip="10.128.0.8", sandbox="197847dbee9666f86dde221c719358228109d39b8444aa3172195eb8ce10dfb1"} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : veth73694404 ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : 3f5e0158-b9f6-4f2f-8cac-c30c84e96906 admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device veth64602b83 to ofproto (No space left on device)" external_ids : {ip="10.128.0.12", sandbox="47548db0199e0a59ac7fc86397e9db8877b4318a64ef2b142e268e8153c0f2bc"} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : veth64602b83 ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : cf23fbd8-64ec-4027-8670-4a8bf5dc42bf admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device veth94fb7f81 to ofproto (No space left on device)" external_ids : {ip="10.128.0.14", sandbox=ab7f309f6c0e65f32a1515685da90189a3d09521036600e1be0b811f5518e999} ifindex : [] ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : [] lldp : {} mac : [] mac_in_use : [] mtu : [] mtu_request : [] name : veth94fb7f81 ofport : -1 ofport_request : [] options : {} other_config : {} statistics : {} status : {} type : "" _uuid : 528ed08d-970e-4620-9654-b9dc01752df6 admin_state : [] bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : "could not add network device vethc71f3834 to ofproto (No space left on device)" external_ids : {ip="10.128.0.13", sandbox="4b5145dcd172516b36a35b377df2f34765841f378b6ce6d58c82fdecb14b9a0a"} ifindex : [] Since it complained about space, I retried using a bigger image of RHCOS46.82 with 120GB and reran the install with latest build. Again, I see same failures as discussed in this bug. Even the network interface attach fails with no space on device as shown above. @psundra is there any reason you re-targeted this to 4.7. We need this resolved for 4.6 and it is impacting our ability to complete OCP 4.6 testing in sprint 1. @lmcfadde that was a mistake . it should be 4.6. After some debugging and talking to Mick Tarsel looks like a ovs issue. He will update with more details. Created attachment 1715123 [details] ovs-vswitchd logs on master-2 So with Prashanth's help we narrowed down the error to the openshift-sdn pod on master-2. Something appears to be wrong with openvswitch and openflow. We then looked at the sdn container logs via crictl which showed a lot of... I0916 17:17:46.160062 193200 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) (I can attach these logs if needed). Looks like there is a problem executing OpenFlow commands from here: https://github.com/openshift/sdn/blob/release-4.6/pkg/network/node/ovs/ovs.go#L155 So I looked at the ovs-vswitchd logs in /var/log/openvswitch on master-2 which is attached to this comment. Things start getting funky right after br0 is created (at top of log) 2020-09-16T14:09:44.182Z|00029|bridge|INFO|bridge br0: added interface br0 on port 65534 2020-09-16T14:09:44.183Z|00030|bridge|INFO|bridge br0: using datapath ID 0000e64086f8f744 2020-09-16T14:09:44.183Z|00031|connmgr|INFO|br0: added service controller "punix:/var/run/openvswitch/br0.mgmt" 2020-09-16T14:09:44.291Z|00032|bridge|INFO|bridge br0: added interface vxlan0 on port 1 2020-09-16T14:09:44.325Z|00033|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:09:44.331Z|00034|bridge|INFO|bridge br0: added interface tun0 on port 2 And later on… 2020-09-16T14:10:44.543Z|00042|bridge|INFO|bridge br0: deleted interface tun0 on port 2 2020-09-16T14:10:44.548Z|00043|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:10:44.650Z|00044|bridge|INFO|bridge br0: added interface tun0 on port 2 2020-09-16T14:10:44.655Z|00045|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device 2020-09-16T14:10:44.661Z|00046|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device 2020-09-16T14:10:44.764Z|00047|bridge|INFO|bridge br0: deleted interface br0 on port 65534 Right after br0 is deleted, it’s created again and then it goes into this loop of adding and deleting the tun0 interface from br0 all while vxlan_sys_4789 is still not found. 2020-09-16T14:10:44.854Z|00048|bridge|INFO|bridge br0: added interface br0 on port 65534 2020-09-16T14:10:44.855Z|00001|dpif(revalidator6)|WARN|system@ovs-system: failed to flow_get (No such file or directory) recirc_id(0),dp_hash(0),skb_priority(0),in_port(0),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00), packets:0, bytes:0, used:never 2020-09-16T14:10:44.855Z|00002|ofproto_dpif_upcall(revalidator6)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (No such file or directory): ufid:4cd769d5-92b6-4acd-80b3-5aa16991b6d8 2020-09-16T14:10:44.858Z|00049|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device 2020-09-16T14:10:44.865Z|00050|bridge|INFO|bridge br0: deleted interface tun0 on port 2 2020-09-16T14:10:44.868Z|00051|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:10:44.959Z|00052|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on tun0 device failed: No such device 2020-09-16T14:10:44.960Z|00053|bridge|WARN|could not add network device tun0 to ofproto (No such device) 2020-09-16T14:10:44.962Z|00054|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device 2020-09-16T14:10:45.078Z|00055|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:10:45.079Z|00056|bridge|INFO|bridge br0: added interface tun0 on port 2 2020-09-16T14:10:45.086Z|00057|bridge|INFO|bridge br0: deleted interface tun0 on port 2 2020-09-16T14:10:45.089Z|00058|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:10:45.205Z|00059|bridge|INFO|bridge br0: added interface tun0 on port 2 2020-09-16T14:10:45.208Z|00060|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device 2020-09-16T14:10:45.294Z|00061|bridge|INFO|bridge br0: deleted interface tun0 on port 2 2020-09-16T14:10:45.300Z|00062|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:10:45.300Z|00063|bridge|INFO|bridge br0: added interface tun0 on port 2 2020-09-16T14:10:45.397Z|00064|bridge|INFO|bridge br0: deleted interface tun0 on port 2 2020-09-16T14:10:45.399Z|00065|bridge|INFO|bridge br0: added interface tun0 on port 2 2020-09-16T14:10:47.905Z|00066|timeval|WARN|Unreasonably long 1048ms poll interval (5ms user, 7ms system) 2020-09-16T14:10:47.905Z|00067|timeval|WARN|context switches: 14 voluntary, 7 involuntary So looking at the master-2 interfaces, things get more weird. There are no interfaces on br0 at all. The veth interfaces should be there which are present on the machine: [root@master-2 core]# ovs-vsctl show def875a0-54d7-4075-9468-84ab195df9ad Bridge br0 fail_mode: secure Port br0 Interface br0 type: internal ovs_version: "2.11.4" [root@master-2 core]# ip link show | grep veth 29: veth6c5907ed@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 30: vethfd738fd5@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 32: vethd2ab6e37@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 33: veth0cac84d7@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 34: veth9474b203@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 35: vethad383ec8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 36: vethf2b7c335@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 37: vetheb944dc6@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 38: veth95cbfe1f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 39: veth026eaffb@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 40: veth6affbeee@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 41: vetha6b49861@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 42: vethc168c2be@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 43: veth68364631@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default And here are the problem interfaces which ovs said “No such device”: 26: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000 link/ether 1e:d3:73:f0:0a:47 brd ff:ff:ff:ff:ff:ff inet6 fe80::1cd3:73ff:fef0:a47/64 scope link valid_lft forever preferred_lft forever 28: tun0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether b2:d6:15:c7:f1:bd brd ff:ff:ff:ff:ff:ff inet 10.128.0.1/23 brd 10.128.1.255 scope global tun0 valid_lft forever preferred_lft forever inet6 fe80::b0d6:15ff:fec7:f1bd/64 scope link valid_lft forever preferred_lft forever Also note that tun0 is in UNKNOWN state. Dave Wilder on the LTC Networking team at IBM logged in and was not able to add a veth interface to br0. After he restarted the openvswitch service via systemctl, the interfaces were present on br0. [root@master-2 openvswitch]# ovs-vsctl show f64925b1-a493-4dee-976a-80ba5cf705a4 Bridge br0 fail_mode: secure Port vxlan0 Interface vxlan0 type: vxlan options: {dst_port="4789", key=flow, remote_ip=flow} Port br0 Interface br0 type: internal Port tun0 Interface tun0 type: internal ovs_version: "2.13.2" Looks like a problem with openvswitch. > Dave Wilder on the LTC Networking team at IBM logged in and was not able to
> add a veth interface to br0. After he restarted the openvswitch service via
> systemctl, the interfaces were present on br0.
>
> [root@master-2 openvswitch]# ovs-vsctl show
> f64925b1-a493-4dee-976a-80ba5cf705a4
> Bridge br0
> fail_mode: secure
> Port vxlan0
> Interface vxlan0
> type: vxlan
> options: {dst_port="4789", key=flow, remote_ip=flow}
> Port br0
> Interface br0
> type: internal
> Port tun0
> Interface tun0
> type: internal
> ovs_version: "2.13.2"
>
>
> Looks like a problem with openvswitch.
Some clarification on the test I did:
1)Created a veth pair:
# ip link add vethA type veth peer name vethB
2)Attempted to add vethA to br0
# ovs-vsctl add-port br0 vethA
This hung..... (ctrl-c out)
#ovs-vsctl show did not list vethA.
Restart ovs.
# systemctl restart openvswitch.
[root@master-2 openvswitch]# ovs-vsctl show
f64925b1-a493-4dee-976a-80ba5cf705a4
Bridge br0
fail_mode: secure
Port vxlan0
Interface vxlan0
type: vxlan
options: {dst_port="4789", key=flow, remote_ip=flow}
Port br0
Interface br0
type: internal
Port tun0
Interface tun0
type: internal
ovs_version: "2.13.2"
It appears ovs was not responding until after reboot but still added vethA to db.
I rebooted the node and observed new interfaces were added to br0.
[root@master-2 core]# ovs-vsctl show
def875a0-54d7-4075-9468-84ab195df9ad
Bridge br0
fail_mode: secure
Port veth5f017582
Interface veth5f017582
Port vxlan0
Interface vxlan0
type: vxlan
options: {dst_port="4789", key=flow, remote_ip=flow}
Port br0
Interface br0
type: internal
Port veth08774018
Interface veth08774018
Port vetheb89cc8b
Interface vetheb89cc8b
Port vethfded5e2e
Interface vethfded5e2e
Port veth888a39e8
Interface veth888a39e8
Port tun0
Interface tun0
type: internal
Port veth8878b0bc
Interface veth8878b0bc
Port veth93cb26e2
Interface veth93cb26e2
Port veth31496ad7
Interface veth31496ad7
Port vethfc335352
Interface vethfc335352
Port veth8d94567f
Interface veth8d94567f
Port vethde565abf
Interface vethde565abf
Port vethf3069cbd
Interface vethf3069cbd
Port vethf36bb791
Interface vethf36bb791
Port veth1d8a33aa
Interface veth1d8a33aa
Port vethb1a5133f
Interface vethb1a5133f
ovs_version: "2.11.4"
This looks very similar to https://bugzilla.redhat.com/show_bug.cgi?id=1874696 Adding ovs-devel to CC as well. I see the following results installing OCP with the docker.io/prashanths684/openshift-release:4.6-ppc64le-ovs image # oc get nodes NAME STATUS ROLES AGE VERSION master-0 NotReady master 4h3m v1.19.0+35ab7c5 master-1 NotReady master 4h4m v1.19.0+35ab7c5 master-2 NotReady master 4h3m v1.19.0+35ab7c5 # oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication cloud-credential True False False 4h17m cluster-autoscaler config-operator console csi-snapshot-controller dns etcd image-registry ingress insights kube-apiserver kube-controller-manager kube-scheduler kube-storage-version-migrator machine-api machine-approver machine-config marketplace monitoring network False True True 4h4m node-tuning openshift-apiserver openshift-controller-manager openshift-samples operator-lifecycle-manager operator-lifecycle-manager-catalog operator-lifecycle-manager-packageserver service-ca storage The only pods in Running/ContainerCreating state. Rest of the pods are in Pending state. # oc get pods --all-namespaces | grep "Running\|ContainerCrea" openshift-multus multus-c9f7c 1/1 Running 14 3h56m openshift-multus multus-mwv68 1/1 Running 12 3h57m openshift-multus multus-zksjc 1/1 Running 4 3h56m openshift-multus network-metrics-daemon-bb4m2 0/2 ContainerCreating 0 3h56m openshift-multus network-metrics-daemon-cdt5b 0/2 ContainerCreating 0 3h57m openshift-multus network-metrics-daemon-rc77h 0/2 ContainerCreating 0 3h56m openshift-network-operator network-operator-5bfcfc7cb6-fhbm9 1/1 Running 0 154m openshift-sdn sdn-controller-c5hmd 1/1 Running 0 3h56m openshift-sdn sdn-controller-d6n9w 1/1 Running 0 3h56m openshift-sdn sdn-controller-gbqxf 1/1 Running 0 3h56m # oc describe pod network-metrics-daemon-bb4m2 -n openshift-multus Name: network-metrics-daemon-bb4m2 Namespace: openshift-multus Priority: 0 Node: master-0/9.114.98.137 Start Time: Thu, 17 Sep 2020 03:38:58 -0400 Labels: app=network-metrics-daemon component=network controller-revision-hash=77dd98c48b openshift.io/component=network pod-template-generation=1 type=infra Annotations: <none> Status: Pending IP: IPs: <none> Controlled By: DaemonSet/network-metrics-daemon Containers: network-metrics-daemon: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:83e81b9a1307b2c4137acb9b0fc940e42c245d5c823cbf4860f1ce41deed050d Image ID: Port: <none> Host Port: <none> Command: /usr/bin/network-metrics Args: --node-name $(NODE_NAME) State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 10m memory: 100Mi Environment: NODE_NAME: (v1:spec.nodeName) Mounts: /var/run/secrets/kubernetes.io/serviceaccount from metrics-daemon-sa-token-t95pz (ro) kube-rbac-proxy: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8fdb7e386b43c9f56189e90912b33b1cba9c24a09bde6a3067ff92ef529f71dd Image ID: Port: 8443/TCP Host Port: 0/TCP Args: --logtostderr --secure-listen-address=:8443 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 --upstream=http://127.0.0.1:9091/ --tls-private-key-file=/etc/metrics/tls.key --tls-cert-file=/etc/metrics/tls.crt State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 10m memory: 20Mi Environment: <none> Mounts: /etc/metrics from metrics-certs (ro) /var/run/secrets/kubernetes.io/serviceaccount from metrics-daemon-sa-token-t95pz (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: metrics-certs: Type: Secret (a volume populated by a Secret) SecretName: metrics-daemon-secret Optional: false metrics-daemon-sa-token-t95pz: Type: Secret (a volume populated by a Secret) SecretName: metrics-daemon-sa-token-t95pz Optional: false QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning NetworkNotReady 159m (x2552 over 4h4m) kubelet network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started? Warning FailedMount 96m (x35 over 151m) kubelet MountVolume.SetUp failed for volume "metrics-certs" : secret "metrics-daemon-secret" not found Warning NetworkNotReady 106s (x4502 over 151m) kubelet network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started? # ssh core@master-1 'ip a' 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: env32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000 link/ether fa:16:3e:f3:6e:61 brd ff:ff:ff:ff:ff:ff inet 9.114.98.247/22 brd 9.114.99.255 scope global dynamic noprefixroute env32 valid_lft 12803sec preferred_lft 12803sec inet6 fe80::f816:3eff:fef3:6e61/64 scope link noprefixroute valid_lft forever preferred_lft forever Restarted the nodes to see any change in the results but that did not help. After doing some investigation with Mick and talking to the sdn team , they suggested it might be a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1874696. Today i built a patched ppc64le image with these two PRs: https://github.com/openshift/machine-config-operator/pull/2102 https://github.com/openshift/cluster-network-operator/pull/785 Archana was able to test the image and the install succeeded on powerVM. Marking it as a dup of 1874696 *** This bug has been marked as a duplicate of bug 1874696 *** |