Bug 1737982

Summary: [OSP 13-15-16] dump_create: failed to get ifindex for tap* : Operation not supported
Product: Red Hat OpenStack Reporter: Artem Hrechanychenko <ahrechan>
Component: openvswitchAssignee: Assaf Muller <amuller>
Status: CLOSED EOL QA Contact: Eran Kuris <ekuris>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 15.0 (Stein)CC: aconole, agurenko, akaris, amasolov, apevec, asimonel, atragler, bcafarel, bfournie, bjacot, bshephar, cfontain, chrisbro, chris.smart, chrisw, coldford, cschimid, david_paterson, dhill, dsedgmen, dvd, ebarrera, fhallal, fiezzi, igortiunov, irichart, j.beisiegel, jhoylaer, jlibosva, juanluis.alarcon, kfida, mgarciac, michele, mvalsecc, nsatsia, rbarrott, rhos-maint, rszmigie, scohen, stchen, tredaelli, tvainio, yocha
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openvswitch2.11-2.11.0-49.el8fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1812613 1823804 (view as bug list) Environment:
Last Closed: 2020-09-30 19:44:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1823804    

Description Artem Hrechanychenko 2019-08-06 12:18:27 UTC
Description of problem:
during controller replacement 

from /var/log/messages

Aug  6 08:13:50 undercloud-0 ovs-vswitchd[15336]: ovs|154705|netdev_tc_offloads(revalidator57)|ERR|Dropped 1 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate
Aug  6 08:13:50 undercloud-0 ovs-vswitchd[15336]: ovs|154706|netdev_tc_offloads(revalidator57)|ERR|dump_create: failed to get ifindex for tap22cabec5-39: Operation not supported
Aug  6 08:13:51 undercloud-0 ovs-vswitchd[15336]: ovs|154707|netdev_tc_offloads(revalidator57)|ERR|Dropped 1 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate
Aug  6 08:13:51 undercloud-0 ovs-vswitchd[15336]: ovs|154708|netdev_tc_offloads(revalidator57)|ERR|dump_create: failed to get ifindex for tap22cabec5-39: Operation not supported
Aug  6 08:13:52 undercloud-0 ovs-vswitchd[15336]: ovs|154709|netdev_tc_offloads(revalidator57)|ERR|Dropped 1 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate
Aug  6 08:13:52 undercloud-0 ovs-vswitchd[15336]: ovs|154710|netdev_tc_offloads(revalidator57)|ERR|dump_create: failed to get ifindex for tap22cabec5-39: Operation not supported
Aug  6 08:13:53 undercloud-0 ovs-vswitchd[15336]: ovs|154711|netdev_tc_offloads(revalidator57)|ERR|Dropped 1 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate
Aug  6 08:13:53 undercloud-0 ovs-vswitchd[15336]: ovs|154712|netdev_tc_offloads(revalidator57)|ERR|dump_create: failed to get ifindex for tap22cabec5-39: Operation not supported
Aug  6 08:13:54 undercloud-0 ovs-vswitchd[15336]: ovs|154713|netdev_tc_offloads(revalidator57)|ERR|Dropped 1 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate
Aug  6 08:13:54 undercloud-0 ovs-vswitchd[15336]: ovs|154714|netdev_tc_offloads(revalidator57)|ERR|dump_create: failed to get ifindex for tap22cabec5-39: Operation not supported


[stack@undercloud-0 ~]$ sudo ovs-vsctl show
65e5a710-98d0-4f97-805d-48d4612e3d78
    Manager "ptcp:6640:127.0.0.1"
        is_connected: true
    Bridge br-int
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port br-int
            Interface br-int
                type: internal
        Port int-br-ctlplane
            Interface int-br-ctlplane
                type: patch
                options: {peer=phy-br-ctlplane}
        Port "tap22cabec5-39"
            tag: 1
            Interface "tap22cabec5-39"
                type: internal
    Bridge br-ctlplane
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port br-ctlplane
            Interface br-ctlplane
                type: internal
        Port "eth0"
            Interface "eth0"
        Port phy-br-ctlplane
            Interface phy-br-ctlplane
                type: patch
                options: {peer=int-br-ctlplane}
    ovs_version: "2.11.0"

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Artem Hrechanychenko 2019-08-06 12:21:21 UTC
[stack@undercloud-0 ~]$ sudo cat /var/log/messages | grep "ERR|Dropped 1 log messages in last 1 seconds" |wc -l
47995

[stack@undercloud-0 ~]$ sudo tail -n 10 /var/log/openvswitch/ovs-vswitchd.log
2019-08-06T12:20:30.186Z|155504|netdev_tc_offloads(revalidator57)|ERR|Dropped 1 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate
2019-08-06T12:20:30.186Z|155505|netdev_tc_offloads(revalidator57)|ERR|dump_create: failed to get ifindex for tap22cabec5-39: Operation not supported
2019-08-06T12:20:31.186Z|155506|netdev_tc_offloads(revalidator57)|ERR|Dropped 1 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate
2019-08-06T12:20:31.186Z|155507|netdev_tc_offloads(revalidator57)|ERR|dump_create: failed to get ifindex for tap22cabec5-39: Operation not supported
2019-08-06T12:20:32.186Z|155508|netdev_tc_offloads(revalidator57)|ERR|Dropped 1 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate
2019-08-06T12:20:32.186Z|155509|netdev_tc_offloads(revalidator57)|ERR|dump_create: failed to get ifindex for tap22cabec5-39: Operation not supported
2019-08-06T12:20:33.186Z|155510|netdev_tc_offloads(revalidator57)|ERR|Dropped 1 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate
2019-08-06T12:20:33.186Z|155511|netdev_tc_offloads(revalidator57)|ERR|dump_create: failed to get ifindex for tap22cabec5-39: Operation not supported
2019-08-06T12:20:34.188Z|155512|netdev_tc_offloads(revalidator57)|ERR|Dropped 1 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate
2019-08-06T12:20:34.188Z|155513|netdev_tc_offloads(revalidator57)|ERR|dump_create: failed to get ifindex for tap22cabec5-39: Operation not supported


[stack@undercloud-0 ~]$ sudo rpm -qa "*switch*"
openvswitch2.11-2.11.0-14.el8fdp.x86_64
network-scripts-openvswitch2.11-2.11.0-14.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-18.el8fdp.noarch
puppet-vswitch-10.4.1-0.20190614170518.ee3e6e1.el8ost.noarch
rhosp-openvswitch-2.11-0.3.el8ost.noarch

Comment 7 Jakub Libosvar 2019-08-27 11:58:35 UTC
It looks related to internal ports being used in network namespace. I reproduced this on RHEL8 outside of openstack:

ovs-vsctl add-br mybr
ovs-vsctl add-port mybr myport -- set Interface myport type=internal
ip net add myns
ip l s dev myport netns myns

grep myport /var/log/openvswitch/ovs-vswitchd.log

It doesn't happen for internal ports in the root namespace. I'm resetting assignee for this bug. The impact is an error message in ovs-vswitchd.log every second.

Comment 35 David Paterson 2020-04-08 16:24:36 UTC
Is there a temporary workaround to disable this message in the log until it's formally fixed?  Generates lots of noise.

Comment 36 Aaron Conole 2020-04-08 16:26:50 UTC
Sorry, I responded with a private comment.  I suggest to try to disable the netdev_tc_offload logs using 'ovs-appctl vlog/set netdev_offload_tc:OFF'

It will need to be done for each boot, though.

Comment 37 ITD27M01 2020-04-13 17:04:49 UTC
root.billing.ru ~ 
> ovs-appctl vlog/set netdev_offload_tc:OFF
no destination, level, or module "netdev_offload_tc"
ovs-appctl: ovs-vswitchd: server returned an error
root.billing.ru ~ 
> ovs-appctl vlog/set netdev_tc_offloads:OFF
root.billing.ru ~ 
>

Comment 39 Rodolfo Alonso 2020-05-11 09:33:00 UTC
*** Bug 1833754 has been marked as a duplicate of this bug. ***

Comment 49 Anita Tragler 2020-06-04 18:10:27 UTC
This bug is related to logging and looking up ifindex information. It could generate a lot of logging errors and consume some disk space, log rotate will ensure logs don't get too large. It should not impact data traffic.
The original issue is due to backporting a OVS TC/flwoer hardware offload feature with bad structure and this has been reverted, feature moved to later OVS release.
The fix is planned for OSP 13z12 (June) and OSP 16.1 GA (July)

There is a workaround available see comment #3, to turn off TC offload logging, this is not required  when OVS offload is not being used.

command will need to be run on each start / boot of openvswitch 

  ovs-appctl vlog/set netdev_tc_offloads:OFF

Comment 50 Anita Tragler 2020-06-04 18:11:41 UTC
There is a workaround available see comment #37, to turn off TC offload logging, this is not required  when OVS offload is not being used.

command will need to be run on each start / boot of openvswitch 

  ovs-appctl vlog/set netdev_tc_offloads:OFF

Comment 51 Brendan Shephard 2020-06-04 23:50:00 UTC
(In reply to Anita Tragler from comment #49)
> This bug is related to logging and looking up ifindex information. It could
> generate a lot of logging errors and consume some disk space, log rotate
> will ensure logs don't get too large. It should not impact data traffic.
> The original issue is due to backporting a OVS TC/flwoer hardware offload
> feature with bad structure and this has been reverted, feature moved to
> later OVS release.
> The fix is planned for OSP 13z12 (June) and OSP 16.1 GA (July)
> 
> There is a workaround available see comment #3, to turn off TC offload
> logging, this is not required  when OVS offload is not being used.
> 
> command will need to be run on each start / boot of openvswitch 
> 
>   ovs-appctl vlog/set netdev_tc_offloads:OFF

Hi Anita,

Thanks for your feedback. I believe the issue here is that when you have a large number of ports, the print operation for each and every interface multiple times gets fairly expensive. The issue we are generally seeing with this is that at some stage, Openvswitch is no longer able to keep up with the requests coming from Neutron - particularly in environments with distributed virtual routing enabled. At that stage, it stops updating flows and ports which causes a loss of traffic to the tenant VM's on the hypervisor. 

Disabling the netdev_tc_offloads logging does indeed improve the issue, and in most cases we have verified that the impact of the issue is mitigated by this change. 

But to be clear, this definitely, 100% is causing traffic outages to VM's. And the only way it gets resolved is by ovs-appctl vlog/set netdev_tc_offloads:OFF

So I'm glad to hear that the fix will be released in z12.

Comment 52 Chris Smart 2020-06-05 00:11:44 UTC
I'm using something like this in the meantime.


$ cat << EOF | sudo tee /etc/systemd/system/ovs-disable-tc-logging-hack.service
[Unit]
Description=Hack to work around OVS tc logging

[Service]
Type=oneshot
ExecStart=/bin/ovs-appctl vlog/set netdev_tc_offloads off
User=root
Group=systemd-journal
Requires=openvswitch.service
After=openvswitch.service
EOF

$ cat << EOF | sudo tee /etc/systemd/system/ovs-disable-tc-logging-hack.timer
[Unit]
Description=Run hack to work around OVS tc logging

[Timer]
OnBootSec=5min
OnUnitActiveSec=1h

[Install]
WantedBy=timers.target
EOF

$ sudo systemctl daemon-reload
$ sudo systemctl start ovs-disable-tc-logging-hack.timer
$ sudo systemctl enable ovs-disable-tc-logging-hack.timer

Comment 57 stchen 2020-09-30 19:44:49 UTC
Closing EOL, OSP 15 has been retired as of Sept 19, 2020