Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1518684 - "ovs-vsctl show" on OCP nodes returns multiple "No such device" messages
"ovs-vsctl show" on OCP nodes returns multiple "No such device" messages
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking (Show other bugs)
3.6.0
Unspecified Unspecified
medium Severity medium
: ---
: 3.9.0
Assigned To: Dan Williams
Meng Bo
: NeedsTestCase
Depends On:
Blocks: 1542093
  Show dependency treegraph
 
Reported: 2017-11-29 08:06 EST by Thom Carlin
Modified: 2018-10-10 05:29 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-03-28 10:13:03 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Log from ovs-vsctl and ovs-ofctl commands (20.16 KB, text/plain)
2017-11-30 11:18 EST, Weibin Liang
no flags Details
node log and OPTIONS=--loglevel=5 (63.92 KB, text/plain)
2017-11-30 13:47 EST, Weibin Liang
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Origin (Github) 18166 None None None 2018-01-19 09:23 EST
Red Hat Product Errata RHBA-2018:0489 None None None 2018-03-28 10:13 EDT

  None (edit)
Description Thom Carlin 2017-11-29 08:06:27 EST
Description of problem:

On a fully patched OCP 3.6/CNS 3.6 cluster, receiving "No such device" messages on the nodes

Version-Release number of selected component (if applicable):

3.6

How reproducible:

100% on this cluster

Steps to Reproduce:
1. On each node: ovs-vsctl show

Actual results:

[...]
      Port "vethbcdb039b"
            Interface "vethbcdb039b"
                error: "could not open network device vethbcdb039b (No such device)"
[...]



Expected results:

list of OpenvSwitch database without errors

Additional info:

sosreports will be added in private attachments
Comment 1 Thom Carlin 2017-11-29 09:09:03 EST
sosreports are too large for attachments
Comment 2 Weibin Liang 2017-11-30 10:41:24 EST
Saw the same error in v3.7.9:

[root@host-172-16-120-67 ~]# ovs-vsctl show
8e6c5352-1338-4e22-ad1a-5e3a905b4159
    Bridge "br0"
        fail_mode: secure
        Port "veth6cf0fa55"
            Interface "veth6cf0fa55"
        Port "veth0bf8145d"
            Interface "veth0bf8145d"
        Port "vethe68eec9b"
            Interface "vethe68eec9b"
                error: "could not open network device vethe68eec9b (No such device)"
        Port "veth5dabac94"
            Interface "veth5dabac94"
        Port "br0"
            Interface "br0"
                type: internal
        Port "vxlan0"
            Interface "vxlan0"
                type: vxlan
                options: {key=flow, remote_ip=flow}
        Port "vethd6279c2b"
            Interface "vethd6279c2b"
        Port "tun0"
            Interface "tun0"
                type: internal
        Port "veth98c02cf9"
            Interface "veth98c02cf9"
    ovs_version: "2.7.3"
[root@host-172-16-120-67 ~]# oc version
oc v3.7.9
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO
[root@host-172-16-120-67 ~]#
Comment 3 Dan Winship 2017-11-30 11:00:13 EST
Weibin: can you attach the result of "ovs-ofctl -O OpenFlow13 show br0" and "ovs-ofctl -O OpenFlow13 dump-flows br0" as well?
Comment 4 Weibin Liang 2017-11-30 11:18 EST
Created attachment 1360987 [details]
Log from ovs-vsctl and ovs-ofctl commands
Comment 5 Dan Winship 2017-11-30 11:50:22 EST
OK, so "ovs-ofctl show" shows veths attached to ports 4, 8, 10, 12, and 13, but "ovs-ofctl dump" shows flows for ports 4, 7, 8, 10, 12, and 13. Meaning, we still have a flow for port 7 despite not having a veth attached to it, presumably corresponding to the missing veth in the "ovs-vsctl" output.

So, this is some sort of pod cleanup error. Possibly related to bug 1518912.

Weibin: can you put the atomic-openshift-node logs for this node somewhere? As far back as they go on this node. (And let me know what loglevel they're at.)
Comment 6 Thom Carlin 2017-11-30 13:29:53 EST
Although there is no evidence either way that this error causes any other issues, 
a workaround supplied by Dan removes these messages:

1) oadm drain <<node_name>>
2) Reboot node
3) oadm uncordon <<node_name>>

Note that you must have sufficient capacity in your cluster to absorb the containers evacuated from the node.
Comment 7 Weibin Liang 2017-11-30 13:47 EST
Created attachment 1361101 [details]
node log and OPTIONS=--loglevel=5
Comment 12 Weibin Liang 2018-02-09 09:52:27 EST
Tested and verified on v3.9.0.-0.41.0

[root@host-172-16-120-139 Sanity-Test]# ovs-vsctl show
451601d1-2b65-4e88-8be4-189491cdd333
    Bridge "br0"
        fail_mode: secure
        Port "vethf90cbbbf"
            Interface "vethf90cbbbf"
        Port "veth06984ca2"
            Interface "veth06984ca2"
        Port "vxlan0"
            Interface "vxlan0"
                type: vxlan
                options: {key=flow, remote_ip=flow}
        Port "tun0"
            Interface "tun0"
                type: internal
        Port "vethf35a42c9"
            Interface "vethf35a42c9"
        Port "vethe1ee7155"
            Interface "vethe1ee7155"
        Port "br0"
            Interface "br0"
                type: internal
        Port "veth65346a6c"
            Interface "veth65346a6c"
        Port "veth65a33588"
            Interface "veth65a33588"
        Port "veth573462cb"
            Interface "veth573462cb"
    ovs_version: "2.7.3"
[root@host-172-16-120-139 Sanity-Test]# 
[root@host-172-16-120-139 Sanity-Test]# 
[root@host-172-16-120-139 Sanity-Test]# oc version
oc v3.9.0-0.41.0
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://172.16.120.139:8443
openshift v3.9.0-0.41.0
kubernetes v1.9.1+a0ce1bc657
Comment 15 errata-xmlrpc 2018-03-28 10:13:03 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489
Comment 16 xingweiyang 2018-05-29 21:08:02 EDT
still foud in ocp 3.9.14

Note You need to log in before you can comment on or make changes to this bug.