Bug 1878657
| Summary: | Multus-admission-controller pods in crashloopbackoff state due to multus-admission-controller-secret not found error. | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Archana Prabhakar <aprabhak> | ||||||||||||||||
| Component: | Multi-Arch | Assignee: | Dennis Gilmore <dgilmore> | ||||||||||||||||
| Status: | CLOSED DUPLICATE | QA Contact: | Barry Donahue <bdonahue> | ||||||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||
| Version: | 4.6 | CC: | danili, lmcfadde, mjtarsel, mkumatag, mtarsel, pdsilva, pradeep, psundara, sanjum, wilder | ||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||
| Target Release: | 4.6.0 | ||||||||||||||||||
| Hardware: | ppc64le | ||||||||||||||||||
| OS: | Linux | ||||||||||||||||||
| Whiteboard: | |||||||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||
| Last Closed: | 2020-09-18 18:41:39 UTC | Type: | Bug | ||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
| Embargoed: | |||||||||||||||||||
| Attachments: |
|
||||||||||||||||||
|
Description
Archana Prabhakar
2020-09-14 09:59:30 UTC
This issue is noticed on the powervm environment. Created attachment 1714830 [details]
bootstrap gather logs
Created attachment 1714831 [details]
Rendered assets logs
Created attachment 1714832 [details]
Resources json
Created attachment 1714834 [details]
master-1 logs
Created attachment 1714835 [details]
master-2 logs
Created attachment 1714836 [details]
master 3 logs
Looks like the cluster-policy-controller on the bootstrap is crashing with this exception: goroutine 1 [running]: k8s.io/klog/v2.stacks(0xc00083bc01, 0xc0001da1c0, 0x7b, 0xd1) k8s.io/klog/v2.0/klog.go:996 +0xac k8s.io/klog/v2.(*loggingT).output(0x12fe49e0, 0xc000000003, 0x0, 0x0, 0xc000a137a0, 0x12f03a66, 0x6, 0x37, 0x0) k8s.io/klog/v2.0/klog.go:945 +0x1b4 k8s.io/klog/v2.(*loggingT).printDepth(0x12fe49e0, 0xc000000003, 0x0, 0x0, 0x1, 0xc0008dbb30, 0x1, 0x1) k8s.io/klog/v2.0/klog.go:718 +0x128 k8s.io/klog/v2.(*loggingT).print(...) k8s.io/klog/v2.0/klog.go:703 k8s.io/klog/v2.Fatal(...) k8s.io/klog/v2.0/klog.go:1443 github.com/openshift/cluster-policy-controller/pkg/cmd/cluster-policy-controller.NewClusterPolicyControllerCommand.func1(0xc0001c5080, 0xc00087b4a0, 0x0, 0x5) github.com/openshift/cluster-policy-controller/pkg/cmd/cluster-policy-controller/cmd.go:55 +0x31c github.com/spf13/cobra.(*Command).execute(0xc0001c5080, 0xc00087b450, 0x5, 0x5, 0xc0001c5080, 0xc00087b450) github.com/spf13/cobra.0/command.go:846 +0x208 github.com/spf13/cobra.(*Command).ExecuteC(0xc0001c4840, 0xc0001c4840, 0x0, 0x0) github.com/spf13/cobra.0/command.go:950 +0x294 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra.0/command.go:887 main.main() github.com/openshift/cluster-policy-controller/cmd/cluster-policy-controller/main.go:67 +0x298 goroutine 19 [chan receive]: k8s.io/klog/v2.(*loggingT).flushDaemon(0x12fe49e0) k8s.io/klog/v2.0/klog.go:1131 +0x78 created by k8s.io/klog/v2.init.0 k8s.io/klog/v2.0/klog.go:416 +0xe0 goroutine 92 [syscall]: os/signal.signal_recv(0x0) runtime/sigqueue.go:147 +0xf8 os/signal.loop() os/signal/signal_unix.go:23 +0x24 created by os/signal.Notify.func1 os/signal/signal.go:127 +0x4c goroutine 94 [chan receive]: k8s.io/apiserver/pkg/server.SetupSignalContext.func1(0xc00096bd40) k8s.io/apiserver.0-rc.2/pkg/server/signal.go:48 +0x38 created by k8s.io/apiserver/pkg/server.SetupSignalContext k8s.io/apiserver.0-rc.2/pkg/server/signal.go:47 +0xf0 goroutine 95 [select]: k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x11c185a0, 0x11e8e920, 0xc000802030, 0x1, 0xc00053e060) k8s.io/apimachinery.0-rc.2/pkg/util/wait/wait.go:167 +0x120 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x11c185a0, 0x12a05f200, 0x0, 0x1, 0xc00053e060) k8s.io/apimachinery.0-rc.2/pkg/util/wait/wait.go:133 +0x90 k8s.io/apimachinery/pkg/util/wait.Until(...) k8s.io/apimachinery.0-rc.2/pkg/util/wait/wait.go:90 k8s.io/apimachinery/pkg/util/wait.Forever(0x11c185a0, 0x12a05f200) k8s.io/apimachinery.0-rc.2/pkg/util/wait/wait.go:81 +0x50 created by k8s.io/component-base/logs.InitLogs k8s.io/component-base.0-rc.2/logs/logs.go:58 +0x88 ah..seems like that was the side effect - the fatal exception was because of : 16:14:22.617615 1 cmd.go:55] open /etc/kubernetes/config/cluster-policy-config.yaml: no such file or directory On the bootstrap node, [root@bootstrap core]# ls -l /etc/kubernetes/bootstrap-configs total 16 -rw-r--r--. 1 root root 6635 Sep 14 05:54 kube-apiserver-config.yaml -rw-r--r--. 1 root root 1341 Sep 14 05:54 kube-controller-manager-config.yaml -rw-r--r--. 1 root root 81 Sep 14 05:54 kube-scheduler-config.yaml [root@bootstrap core]# pwd /var/home/core [root@bootstrap core]# ls -l /etc/kubernetes/ total 16 drwxr-xr-x. 2 root root 117 Sep 14 05:54 bootstrap-configs drwxr-xr-x. 2 root root 6 Sep 14 05:53 bootstrap-manifests drwx------. 2 root root 4096 Sep 15 13:59 bootstrap-secrets drwxr-xr-x. 2 root root 6 Sep 14 05:54 cloud drwxr-xr-x. 3 root root 19 Sep 1 03:51 cni -rw-------. 1 root root 5795 Sep 14 05:54 kubeconfig -rw-r--r--. 1 root root 146 Sep 14 05:53 kubelet-pause-image-override drwxr-xr-x. 3 root root 20 Sep 14 05:53 kubelet-plugins drwxr-xr-x. 2 root root 252 Sep 15 13:59 manifests drwxr-xr-x. 3 root root 25 Sep 14 05:54 static-pod-resources ignore previous analysis..looks like that error occurs but the installation still proceeds. what i do see on the master node's sdn pods though is concerning: I0914 06:12:20.799579 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:20.925805 1996 proxier.go:370] userspace proxy: processing 0 service events I0914 06:12:20.926133 1996 proxier.go:349] userspace syncProxyRules took 49.44885ms I0914 06:12:21.081209 1996 proxier.go:370] userspace proxy: processing 0 service events I0914 06:12:21.082842 1996 proxier.go:349] userspace syncProxyRules took 50.442527ms I0914 06:12:21.306427 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:21.937474 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:22.724906 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:23.707811 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:24.935827 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:26.469006 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:28.382423 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:30.772801 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface I0914 06:12:33.759059 1996 ovs.go:180] Error executing ovs-vsctl: ovs-vsctl: no row "vxlan0" in table Interface F0914 06:12:33.759135 1996 healthcheck.go:99] SDN healthcheck detected unhealthy OVS server, restarting: plugin is not setup ovsdb server fails to start Failed to connect to bus: No data available openvswitch is running in container /etc/openvswitch/conf.db does not exist ... (warning). Creating empty database /etc/openvswitch/conf.db. ovsdb-server: /var/run/openvswitch/ovsdb-server.pid: pidfile check failed (No such process), aborting Starting ovsdb-server ... failed! Added network dev from POWER team Mick and squad lead Manju. Noticing no space errors on the interfaces. Below is the data from one of the master nodes. The RHCOS46 images had 20GB disk size.
[root@master-2 core]# ovs-vsctl list interface
_uuid : eea736c0-7b57-4e1e-aa18-dfc6f20900db
admin_state : []
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : "could not add network device veth657700ea to ofproto (No space left on device)"
external_ids : {ip="10.128.0.9", sandbox="0e899417fbcbd8b49c511fcbe273ccd2ec07520f94298a71024b3ee4b6b79661"}
ifindex : []
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : []
lldp : {}
mac : []
mac_in_use : []
mtu : []
mtu_request : []
name : veth657700ea
ofport : -1
ofport_request : []
options : {}
other_config : {}
statistics : {}
status : {}
type : ""
_uuid : 01c25001-64ca-44c4-bdaa-fcb63a7d9de1
admin_state : []
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : "could not add network device vethf20e8e29 to ofproto (No space left on device)"
external_ids : {ip="10.128.0.15", sandbox="130cab3a6214363bdb4994202336fd267ddd9a4a55d9a4a7fac706806f46f026"}
ifindex : []
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : []
lldp : {}
mac : []
mac_in_use : []
mtu : []
mtu_request : []
name : vethf20e8e29
ofport : -1
ofport_request : []
options : {}
other_config : {}
statistics : {}
status : {}
type : ""
_uuid : 51dae8f1-ccee-4752-af78-1861dd366e30
admin_state : []
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : "could not add network device veth6cec1255 to ofproto (No space left on device)"
external_ids : {ip="10.128.0.6", sandbox=b6bc118838dce0ca5252d8efe90172a354ebeedb4c59158b5da4b03c38f5827a}
ifindex : []
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : []
lldp : {}
mac : []
mac_in_use : []
mtu : []
mtu_request : []
name : veth6cec1255
ofport : -1
ofport_request : []
options : {}
other_config : {}
statistics : {}
status : {}
type : ""
_uuid : 15a1fb82-79a2-4b10-9192-b4f09cfad6be
admin_state : up
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : []
external_ids : {}
ifindex : 25
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : up
lldp : {}
mac : []
mac_in_use : "1a:81:c6:da:2a:09"
mtu : []
mtu_request : []
name : vxlan0
ofport : 1
ofport_request : 1
options : {dst_port="4789", key=flow, remote_ip=flow}
other_config : {}
statistics : {rx_bytes=0, rx_packets=0, tx_bytes=0, tx_packets=0}
status : {}
type : vxlan
_uuid : 3752b713-ecab-4798-8167-810bdbf8bdfb
admin_state : up
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : []
external_ids : {}
ifindex : 27
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 1
link_speed : []
link_state : up
lldp : {}
mac : []
mac_in_use : "02:eb:92:6e:4e:87"
mtu : 1450
mtu_request : 1450
name : tun0
ofport : 2
ofport_request : 2
options : {}
other_config : {}
statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=0, tx_bytes=125108, tx_dropped=0, tx_errors=0, tx_packets=2968}
status : {driver_name=openvswitch}
type : internal
_uuid : cf1d8504-e77a-41c4-8bdd-b265c3a2ba67
admin_state : []
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : "could not add network device veth05b2efc4 to ofproto (No space left on device)"
external_ids : {ip="10.128.0.18", sandbox=dce0bbe07172b35471abf89641c28d0ec9dcd4e26fd9f671eaaf03cd6ca62ee0}
ifindex : []
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : []
lldp : {}
mac : []
mac_in_use : []
mtu : []
mtu_request : []
name : veth05b2efc4
ofport : -1
ofport_request : []
options : {}
other_config : {}
statistics : {}
status : {}
type : ""
_uuid : 1d1ecee5-7e76-4b50-aa67-a637ecdf3438
admin_state : []
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : "could not add network device veth5f6cb8b8 to ofproto (No space left on device)"
external_ids : {ip="10.128.0.11", sandbox="7bfaf063fb454194c9d02fd74d3bcbb90c6a68967ae28e6a2457b5929bd5e990"}
ifindex : []
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : []
lldp : {}
mac : []
mac_in_use : []
mtu : []
mtu_request : []
name : veth5f6cb8b8
ofport : -1
ofport_request : []
options : {}
other_config : {}
statistics : {}
status : {}
type : ""
_uuid : f6bb3176-06fb-492b-9229-ffa0b901c029
admin_state : []
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : "could not add network device veth9dd59021 to ofproto (No space left on device)"
external_ids : {ip="10.128.0.10", sandbox="04af944f504e0dbfa1079b438537c655e4c63b7389266b1ea3083a55b31684cf"}
ifindex : []
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : []
lldp : {}
mac : []
mac_in_use : []
mtu : []
mtu_request : []
name : veth9dd59021
ofport : -1
ofport_request : []
options : {}
other_config : {}
statistics : {}
status : {}
type : ""
_uuid : e6c5d907-3957-4d00-961c-baf4e63e2257
admin_state : []
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : "could not add network device veth73694404 to ofproto (No space left on device)"
external_ids : {ip="10.128.0.8", sandbox="197847dbee9666f86dde221c719358228109d39b8444aa3172195eb8ce10dfb1"}
ifindex : []
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : []
lldp : {}
mac : []
mac_in_use : []
mtu : []
mtu_request : []
name : veth73694404
ofport : -1
ofport_request : []
options : {}
other_config : {}
statistics : {}
status : {}
type : ""
_uuid : 3f5e0158-b9f6-4f2f-8cac-c30c84e96906
admin_state : []
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : "could not add network device veth64602b83 to ofproto (No space left on device)"
external_ids : {ip="10.128.0.12", sandbox="47548db0199e0a59ac7fc86397e9db8877b4318a64ef2b142e268e8153c0f2bc"}
ifindex : []
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : []
lldp : {}
mac : []
mac_in_use : []
mtu : []
mtu_request : []
name : veth64602b83
ofport : -1
ofport_request : []
options : {}
other_config : {}
statistics : {}
status : {}
type : ""
_uuid : cf23fbd8-64ec-4027-8670-4a8bf5dc42bf
admin_state : []
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : "could not add network device veth94fb7f81 to ofproto (No space left on device)"
external_ids : {ip="10.128.0.14", sandbox=ab7f309f6c0e65f32a1515685da90189a3d09521036600e1be0b811f5518e999}
ifindex : []
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : []
lldp : {}
mac : []
mac_in_use : []
mtu : []
mtu_request : []
name : veth94fb7f81
ofport : -1
ofport_request : []
options : {}
other_config : {}
statistics : {}
status : {}
type : ""
_uuid : 528ed08d-970e-4620-9654-b9dc01752df6
admin_state : []
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : "could not add network device vethc71f3834 to ofproto (No space left on device)"
external_ids : {ip="10.128.0.13", sandbox="4b5145dcd172516b36a35b377df2f34765841f378b6ce6d58c82fdecb14b9a0a"}
ifindex : []
Since it complained about space, I retried using a bigger image of RHCOS46.82 with 120GB and reran the install with latest build. Again, I see same failures as discussed in this bug. Even the network interface attach fails with no space on device as shown above.
@psundra is there any reason you re-targeted this to 4.7. We need this resolved for 4.6 and it is impacting our ability to complete OCP 4.6 testing in sprint 1. @lmcfadde that was a mistake . it should be 4.6. After some debugging and talking to Mick Tarsel looks like a ovs issue. He will update with more details. Created attachment 1715123 [details] ovs-vswitchd logs on master-2 So with Prashanth's help we narrowed down the error to the openshift-sdn pod on master-2. Something appears to be wrong with openvswitch and openflow. We then looked at the sdn container logs via crictl which showed a lot of... I0916 17:17:46.160062 193200 ovs.go:180] Error executing ovs-ofctl: ovs-ofctl: /var/run/openvswitch/br0.mgmt: failed to open socket (Connection refused) (I can attach these logs if needed). Looks like there is a problem executing OpenFlow commands from here: https://github.com/openshift/sdn/blob/release-4.6/pkg/network/node/ovs/ovs.go#L155 So I looked at the ovs-vswitchd logs in /var/log/openvswitch on master-2 which is attached to this comment. Things start getting funky right after br0 is created (at top of log) 2020-09-16T14:09:44.182Z|00029|bridge|INFO|bridge br0: added interface br0 on port 65534 2020-09-16T14:09:44.183Z|00030|bridge|INFO|bridge br0: using datapath ID 0000e64086f8f744 2020-09-16T14:09:44.183Z|00031|connmgr|INFO|br0: added service controller "punix:/var/run/openvswitch/br0.mgmt" 2020-09-16T14:09:44.291Z|00032|bridge|INFO|bridge br0: added interface vxlan0 on port 1 2020-09-16T14:09:44.325Z|00033|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:09:44.331Z|00034|bridge|INFO|bridge br0: added interface tun0 on port 2 And later on… 2020-09-16T14:10:44.543Z|00042|bridge|INFO|bridge br0: deleted interface tun0 on port 2 2020-09-16T14:10:44.548Z|00043|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:10:44.650Z|00044|bridge|INFO|bridge br0: added interface tun0 on port 2 2020-09-16T14:10:44.655Z|00045|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device 2020-09-16T14:10:44.661Z|00046|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device 2020-09-16T14:10:44.764Z|00047|bridge|INFO|bridge br0: deleted interface br0 on port 65534 Right after br0 is deleted, it’s created again and then it goes into this loop of adding and deleting the tun0 interface from br0 all while vxlan_sys_4789 is still not found. 2020-09-16T14:10:44.854Z|00048|bridge|INFO|bridge br0: added interface br0 on port 65534 2020-09-16T14:10:44.855Z|00001|dpif(revalidator6)|WARN|system@ovs-system: failed to flow_get (No such file or directory) recirc_id(0),dp_hash(0),skb_priority(0),in_port(0),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00), packets:0, bytes:0, used:never 2020-09-16T14:10:44.855Z|00002|ofproto_dpif_upcall(revalidator6)|WARN|Failed to acquire udpif_key corresponding to unexpected flow (No such file or directory): ufid:4cd769d5-92b6-4acd-80b3-5aa16991b6d8 2020-09-16T14:10:44.858Z|00049|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device 2020-09-16T14:10:44.865Z|00050|bridge|INFO|bridge br0: deleted interface tun0 on port 2 2020-09-16T14:10:44.868Z|00051|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:10:44.959Z|00052|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on tun0 device failed: No such device 2020-09-16T14:10:44.960Z|00053|bridge|WARN|could not add network device tun0 to ofproto (No such device) 2020-09-16T14:10:44.962Z|00054|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device 2020-09-16T14:10:45.078Z|00055|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:10:45.079Z|00056|bridge|INFO|bridge br0: added interface tun0 on port 2 2020-09-16T14:10:45.086Z|00057|bridge|INFO|bridge br0: deleted interface tun0 on port 2 2020-09-16T14:10:45.089Z|00058|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:10:45.205Z|00059|bridge|INFO|bridge br0: added interface tun0 on port 2 2020-09-16T14:10:45.208Z|00060|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device 2020-09-16T14:10:45.294Z|00061|bridge|INFO|bridge br0: deleted interface tun0 on port 2 2020-09-16T14:10:45.300Z|00062|netdev|WARN|failed to set MTU for network device tun0: No such device 2020-09-16T14:10:45.300Z|00063|bridge|INFO|bridge br0: added interface tun0 on port 2 2020-09-16T14:10:45.397Z|00064|bridge|INFO|bridge br0: deleted interface tun0 on port 2 2020-09-16T14:10:45.399Z|00065|bridge|INFO|bridge br0: added interface tun0 on port 2 2020-09-16T14:10:47.905Z|00066|timeval|WARN|Unreasonably long 1048ms poll interval (5ms user, 7ms system) 2020-09-16T14:10:47.905Z|00067|timeval|WARN|context switches: 14 voluntary, 7 involuntary So looking at the master-2 interfaces, things get more weird. There are no interfaces on br0 at all. The veth interfaces should be there which are present on the machine: [root@master-2 core]# ovs-vsctl show def875a0-54d7-4075-9468-84ab195df9ad Bridge br0 fail_mode: secure Port br0 Interface br0 type: internal ovs_version: "2.11.4" [root@master-2 core]# ip link show | grep veth 29: veth6c5907ed@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 30: vethfd738fd5@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 32: vethd2ab6e37@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 33: veth0cac84d7@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 34: veth9474b203@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 35: vethad383ec8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 36: vethf2b7c335@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 37: vetheb944dc6@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 38: veth95cbfe1f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 39: veth026eaffb@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 40: veth6affbeee@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 41: vetha6b49861@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 42: vethc168c2be@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default 43: veth68364631@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default And here are the problem interfaces which ovs said “No such device”: 26: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000 link/ether 1e:d3:73:f0:0a:47 brd ff:ff:ff:ff:ff:ff inet6 fe80::1cd3:73ff:fef0:a47/64 scope link valid_lft forever preferred_lft forever 28: tun0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether b2:d6:15:c7:f1:bd brd ff:ff:ff:ff:ff:ff inet 10.128.0.1/23 brd 10.128.1.255 scope global tun0 valid_lft forever preferred_lft forever inet6 fe80::b0d6:15ff:fec7:f1bd/64 scope link valid_lft forever preferred_lft forever Also note that tun0 is in UNKNOWN state. Dave Wilder on the LTC Networking team at IBM logged in and was not able to add a veth interface to br0. After he restarted the openvswitch service via systemctl, the interfaces were present on br0. [root@master-2 openvswitch]# ovs-vsctl show f64925b1-a493-4dee-976a-80ba5cf705a4 Bridge br0 fail_mode: secure Port vxlan0 Interface vxlan0 type: vxlan options: {dst_port="4789", key=flow, remote_ip=flow} Port br0 Interface br0 type: internal Port tun0 Interface tun0 type: internal ovs_version: "2.13.2" Looks like a problem with openvswitch. > Dave Wilder on the LTC Networking team at IBM logged in and was not able to
> add a veth interface to br0. After he restarted the openvswitch service via
> systemctl, the interfaces were present on br0.
>
> [root@master-2 openvswitch]# ovs-vsctl show
> f64925b1-a493-4dee-976a-80ba5cf705a4
> Bridge br0
> fail_mode: secure
> Port vxlan0
> Interface vxlan0
> type: vxlan
> options: {dst_port="4789", key=flow, remote_ip=flow}
> Port br0
> Interface br0
> type: internal
> Port tun0
> Interface tun0
> type: internal
> ovs_version: "2.13.2"
>
>
> Looks like a problem with openvswitch.
Some clarification on the test I did:
1)Created a veth pair:
# ip link add vethA type veth peer name vethB
2)Attempted to add vethA to br0
# ovs-vsctl add-port br0 vethA
This hung..... (ctrl-c out)
#ovs-vsctl show did not list vethA.
Restart ovs.
# systemctl restart openvswitch.
[root@master-2 openvswitch]# ovs-vsctl show
f64925b1-a493-4dee-976a-80ba5cf705a4
Bridge br0
fail_mode: secure
Port vxlan0
Interface vxlan0
type: vxlan
options: {dst_port="4789", key=flow, remote_ip=flow}
Port br0
Interface br0
type: internal
Port tun0
Interface tun0
type: internal
ovs_version: "2.13.2"
It appears ovs was not responding until after reboot but still added vethA to db.
I rebooted the node and observed new interfaces were added to br0.
[root@master-2 core]# ovs-vsctl show
def875a0-54d7-4075-9468-84ab195df9ad
Bridge br0
fail_mode: secure
Port veth5f017582
Interface veth5f017582
Port vxlan0
Interface vxlan0
type: vxlan
options: {dst_port="4789", key=flow, remote_ip=flow}
Port br0
Interface br0
type: internal
Port veth08774018
Interface veth08774018
Port vetheb89cc8b
Interface vetheb89cc8b
Port vethfded5e2e
Interface vethfded5e2e
Port veth888a39e8
Interface veth888a39e8
Port tun0
Interface tun0
type: internal
Port veth8878b0bc
Interface veth8878b0bc
Port veth93cb26e2
Interface veth93cb26e2
Port veth31496ad7
Interface veth31496ad7
Port vethfc335352
Interface vethfc335352
Port veth8d94567f
Interface veth8d94567f
Port vethde565abf
Interface vethde565abf
Port vethf3069cbd
Interface vethf3069cbd
Port vethf36bb791
Interface vethf36bb791
Port veth1d8a33aa
Interface veth1d8a33aa
Port vethb1a5133f
Interface vethb1a5133f
ovs_version: "2.11.4"
This looks very similar to https://bugzilla.redhat.com/show_bug.cgi?id=1874696 Adding ovs-devel to CC as well. I see the following results installing OCP with the docker.io/prashanths684/openshift-release:4.6-ppc64le-ovs image
# oc get nodes
NAME STATUS ROLES AGE VERSION
master-0 NotReady master 4h3m v1.19.0+35ab7c5
master-1 NotReady master 4h4m v1.19.0+35ab7c5
master-2 NotReady master 4h3m v1.19.0+35ab7c5
# oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication
cloud-credential True False False 4h17m
cluster-autoscaler
config-operator
console
csi-snapshot-controller
dns
etcd
image-registry
ingress
insights
kube-apiserver
kube-controller-manager
kube-scheduler
kube-storage-version-migrator
machine-api
machine-approver
machine-config
marketplace
monitoring
network False True True 4h4m
node-tuning
openshift-apiserver
openshift-controller-manager
openshift-samples
operator-lifecycle-manager
operator-lifecycle-manager-catalog
operator-lifecycle-manager-packageserver
service-ca
storage
The only pods in Running/ContainerCreating state. Rest of the pods are in Pending state.
# oc get pods --all-namespaces | grep "Running\|ContainerCrea"
openshift-multus multus-c9f7c 1/1 Running 14 3h56m
openshift-multus multus-mwv68 1/1 Running 12 3h57m
openshift-multus multus-zksjc 1/1 Running 4 3h56m
openshift-multus network-metrics-daemon-bb4m2 0/2 ContainerCreating 0 3h56m
openshift-multus network-metrics-daemon-cdt5b 0/2 ContainerCreating 0 3h57m
openshift-multus network-metrics-daemon-rc77h 0/2 ContainerCreating 0 3h56m
openshift-network-operator network-operator-5bfcfc7cb6-fhbm9 1/1 Running 0 154m
openshift-sdn sdn-controller-c5hmd 1/1 Running 0 3h56m
openshift-sdn sdn-controller-d6n9w 1/1 Running 0 3h56m
openshift-sdn sdn-controller-gbqxf 1/1 Running 0 3h56m
# oc describe pod network-metrics-daemon-bb4m2 -n openshift-multus
Name: network-metrics-daemon-bb4m2
Namespace: openshift-multus
Priority: 0
Node: master-0/9.114.98.137
Start Time: Thu, 17 Sep 2020 03:38:58 -0400
Labels: app=network-metrics-daemon
component=network
controller-revision-hash=77dd98c48b
openshift.io/component=network
pod-template-generation=1
type=infra
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: DaemonSet/network-metrics-daemon
Containers:
network-metrics-daemon:
Container ID:
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:83e81b9a1307b2c4137acb9b0fc940e42c245d5c823cbf4860f1ce41deed050d
Image ID:
Port: <none>
Host Port: <none>
Command:
/usr/bin/network-metrics
Args:
--node-name
$(NODE_NAME)
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 10m
memory: 100Mi
Environment:
NODE_NAME: (v1:spec.nodeName)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from metrics-daemon-sa-token-t95pz (ro)
kube-rbac-proxy:
Container ID:
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8fdb7e386b43c9f56189e90912b33b1cba9c24a09bde6a3067ff92ef529f71dd
Image ID:
Port: 8443/TCP
Host Port: 0/TCP
Args:
--logtostderr
--secure-listen-address=:8443
--tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
--upstream=http://127.0.0.1:9091/
--tls-private-key-file=/etc/metrics/tls.key
--tls-cert-file=/etc/metrics/tls.crt
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 10m
memory: 20Mi
Environment: <none>
Mounts:
/etc/metrics from metrics-certs (ro)
/var/run/secrets/kubernetes.io/serviceaccount from metrics-daemon-sa-token-t95pz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
metrics-certs:
Type: Secret (a volume populated by a Secret)
SecretName: metrics-daemon-secret
Optional: false
metrics-daemon-sa-token-t95pz:
Type: Secret (a volume populated by a Secret)
SecretName: metrics-daemon-sa-token-t95pz
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning NetworkNotReady 159m (x2552 over 4h4m) kubelet network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
Warning FailedMount 96m (x35 over 151m) kubelet MountVolume.SetUp failed for volume "metrics-certs" : secret "metrics-daemon-secret" not found
Warning NetworkNotReady 106s (x4502 over 151m) kubelet network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
# ssh core@master-1 'ip a'
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: env32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
link/ether fa:16:3e:f3:6e:61 brd ff:ff:ff:ff:ff:ff
inet 9.114.98.247/22 brd 9.114.99.255 scope global dynamic noprefixroute env32
valid_lft 12803sec preferred_lft 12803sec
inet6 fe80::f816:3eff:fef3:6e61/64 scope link noprefixroute
valid_lft forever preferred_lft forever
Restarted the nodes to see any change in the results but that did not help.
After doing some investigation with Mick and talking to the sdn team , they suggested it might be a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1874696. Today i built a patched ppc64le image with these two PRs: https://github.com/openshift/machine-config-operator/pull/2102 https://github.com/openshift/cluster-network-operator/pull/785 Archana was able to test the image and the install succeeded on powerVM. Marking it as a dup of 1874696 *** This bug has been marked as a duplicate of bug 1874696 *** |