Red Hat Bugzilla – Bug 1518684
"ovs-vsctl show" on OCP nodes returns multiple "No such device" messages
Last modified: 2018-10-10 05:29:30 EDT
Description of problem: On a fully patched OCP 3.6/CNS 3.6 cluster, receiving "No such device" messages on the nodes Version-Release number of selected component (if applicable): 3.6 How reproducible: 100% on this cluster Steps to Reproduce: 1. On each node: ovs-vsctl show Actual results: [...] Port "vethbcdb039b" Interface "vethbcdb039b" error: "could not open network device vethbcdb039b (No such device)" [...] Expected results: list of OpenvSwitch database without errors Additional info: sosreports will be added in private attachments
sosreports are too large for attachments
Saw the same error in v3.7.9: [root@host-172-16-120-67 ~]# ovs-vsctl show 8e6c5352-1338-4e22-ad1a-5e3a905b4159 Bridge "br0" fail_mode: secure Port "veth6cf0fa55" Interface "veth6cf0fa55" Port "veth0bf8145d" Interface "veth0bf8145d" Port "vethe68eec9b" Interface "vethe68eec9b" error: "could not open network device vethe68eec9b (No such device)" Port "veth5dabac94" Interface "veth5dabac94" Port "br0" Interface "br0" type: internal Port "vxlan0" Interface "vxlan0" type: vxlan options: {key=flow, remote_ip=flow} Port "vethd6279c2b" Interface "vethd6279c2b" Port "tun0" Interface "tun0" type: internal Port "veth98c02cf9" Interface "veth98c02cf9" ovs_version: "2.7.3" [root@host-172-16-120-67 ~]# oc version oc v3.7.9 kubernetes v1.7.6+a08f5eeb62 features: Basic-Auth GSSAPI Kerberos SPNEGO [root@host-172-16-120-67 ~]#
Weibin: can you attach the result of "ovs-ofctl -O OpenFlow13 show br0" and "ovs-ofctl -O OpenFlow13 dump-flows br0" as well?
Created attachment 1360987 [details] Log from ovs-vsctl and ovs-ofctl commands
OK, so "ovs-ofctl show" shows veths attached to ports 4, 8, 10, 12, and 13, but "ovs-ofctl dump" shows flows for ports 4, 7, 8, 10, 12, and 13. Meaning, we still have a flow for port 7 despite not having a veth attached to it, presumably corresponding to the missing veth in the "ovs-vsctl" output. So, this is some sort of pod cleanup error. Possibly related to bug 1518912. Weibin: can you put the atomic-openshift-node logs for this node somewhere? As far back as they go on this node. (And let me know what loglevel they're at.)
Although there is no evidence either way that this error causes any other issues, a workaround supplied by Dan removes these messages: 1) oadm drain <<node_name>> 2) Reboot node 3) oadm uncordon <<node_name>> Note that you must have sufficient capacity in your cluster to absorb the containers evacuated from the node.
Created attachment 1361101 [details] node log and OPTIONS=--loglevel=5
Tested and verified on v3.9.0.-0.41.0 [root@host-172-16-120-139 Sanity-Test]# ovs-vsctl show 451601d1-2b65-4e88-8be4-189491cdd333 Bridge "br0" fail_mode: secure Port "vethf90cbbbf" Interface "vethf90cbbbf" Port "veth06984ca2" Interface "veth06984ca2" Port "vxlan0" Interface "vxlan0" type: vxlan options: {key=flow, remote_ip=flow} Port "tun0" Interface "tun0" type: internal Port "vethf35a42c9" Interface "vethf35a42c9" Port "vethe1ee7155" Interface "vethe1ee7155" Port "br0" Interface "br0" type: internal Port "veth65346a6c" Interface "veth65346a6c" Port "veth65a33588" Interface "veth65a33588" Port "veth573462cb" Interface "veth573462cb" ovs_version: "2.7.3" [root@host-172-16-120-139 Sanity-Test]# [root@host-172-16-120-139 Sanity-Test]# [root@host-172-16-120-139 Sanity-Test]# oc version oc v3.9.0-0.41.0 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://172.16.120.139:8443 openshift v3.9.0-0.41.0 kubernetes v1.9.1+a0ce1bc657
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489
still foud in ocp 3.9.14