Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1811754

Summary:	OVS kernel space - internal port in down state forwards packets to kernel and kernel will log `over-mtu` if these packets are > kernel interface MTU
Product:	Red Hat OpenStack	Reporter:	Andreas Karis <akaris>
Component:	openvswitch	Assignee:	Flavio Leitner <fleitner>
Status:	CLOSED NEXTRELEASE	QA Contact:	Eran Kuris <ekuris>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	13.0 (Queens)	CC:	apevec, chrisw, fhallal, rhos-maint
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-03-17 13:23:02 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Andreas Karis 2020-03-09 17:06:12 UTC

Description of problem:

OVS kernel space - internal port in down state forwards packets to kernel and kernel will log `over-mtu` if these packets are > kernel interface MTU:

I can easily reproduce this in a lab of mine with OSP 13 and OVS 2.11:
~~~
[root@overcloud-compute-1 ~]# ip link ls dev br-int
24: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 9e:ef:c0:47:ca:47 brd ff:ff:ff:ff:ff:ff
[root@overcloud-compute-1 ~]# ip link add name veth-host type veth peer name veth-guest
[root@overcloud-compute-1 ~]# ip link set dev veth-host up
[root@overcloud-compute-1 ~]# ip link set dev veth-guest up
[root@overcloud-compute-1 ~]# ovs-vsctl add-port br-int veth-host
[root@overcloud-compute-1 ~]# ovs-vsctl show | grep br-int -A10
    Bridge br-int
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port br-int
            Interface br-int
                type: internal
        Port veth-host
            Interface veth-host
        Port "tpi-00c6e537-cc"
            tag: 1
            Interface "tpi-00c6e537-cc"
                type: patch
                options: {peer="tpt-00c6e537-cc"}
        Port "qvo5d2f3fba-11"
            tag: 2
[root@overcloud-compute-1 ~]# ip link set dev veth-host mtu 8900
[root@overcloud-compute-1 ~]# ip link set dev veth-guest mtu 8900
[root@overcloud-compute-1 ~]# ip a a dev veth-guest 192.168.123.10/24
[root@overcloud-compute-1 ~]# ping 192.168.123.255 -b -M do -s 8000
WARNING: pinging broadcast address
PING 192.168.123.255 (192.168.123.255) 8000(8028) bytes of data.
^C
--- 192.168.123.255 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1000ms

[1]+  Done                    tcpdump -nne -i br-int -l | grep --color=auto 192.168.123
[root@overcloud-compute-1 ~]# dmesg | tail
(...)
[281752.475238] br-int: dropped over-mtu packet: 8028 > 1500
[281753.474201] br-int: dropped over-mtu packet: 8028 > 1500
[root@overcloud-compute-1 ~]# tcpdump -nne -i br-int
tcpdump: br-int: That device is not up
[root@overcloud-compute-1 ~]#
~~~

Workaround:

I guess the workaround is as easy as that:
~~~
[root@overcloud-compute-1 ~]# ovs-vsctl set interface br-int mtu_request=8900
[root@overcloud-compute-1 ~]# ovs-vsctl list interface br-int | grep mtu
mtu                 : 8900
mtu_request         : 8900
[root@overcloud-compute-1 ~]# ip link ls dev br-int
24: br-int: <BROADCAST,MULTICAST> mtu 8900 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 9e:ef:c0:47:ca:47 brd ff:ff:ff:ff:ff:ff
~~~


Version-Release number of selected component (if applicable):
~~~
[root@overcloud-compute-1 ~]# rpm -qa | egrep 'kernel|openvswitch'
kernel-3.10.0-1062.9.1.el7.x86_64
erlang-kernel-18.3.4.11-2.el7ost.x86_64
python-openvswitch2.11-2.11.0-26.el7fdp.x86_64
openstack-neutron-openvswitch-12.1.0-2.el7ost.noarch
openvswitch2.11-2.11.0-26.el7fdp.x86_64
openvswitch-selinux-extra-policy-1.0-9.el7fdp.noarch
kernel-tools-3.10.0-1062.9.1.el7.x86_64
kernel-tools-libs-3.10.0-1062.9.1.el7.x86_64
python-rhosp-openvswitch-2.11-0.6.el7ost.noarch
rhosp-openvswitch-ovn-central-2.11-0.6.el7ost.noarch
rhosp-openvswitch-2.11-0.6.el7ost.noarch
rhosp-openvswitch-ovn-host-2.11-0.6.el7ost.noarch
~~~

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

I understand that everything here happens in kernel, that when the interface is down the packet gets rx_dropped:
~~~
[root@overcloud-compute-1 ~]# ovs-vsctl get interface br-int statistics
{collisions=0, rx_bytes=2792740, rx_crc_err=0, rx_dropped=10185074, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=15244, tx_bytes=1064, tx_dropped=0, tx_errors=2, tx_packets=12}
~~~

So I suppose that there's nothing that we can do about this warning message in the logs:
~~~
[281752.475238] br-int: dropped over-mtu packet: 8028 > 1500
~~~

Feel free to open this bug report as won't fix if there's nothing that we can do.

Comment 1 Andreas Karis 2020-03-09 17:08:42 UTC

Feel free to close this bug report as won't fix it there's nothing that we can do. But I'd like to find a way to suppress this log message as it may spam the logs and administrator may be worried about it.

Comment 2 Flavio Leitner 2021-02-12 14:51:43 UTC

Hi Andreas,

When a port is added to the bridge, it will find the minimum MTU of the ports to set the bridge MTU.
So, if in your scenario you had only veth-host, then the bridge would have its MTU set to 8900.

However, if there are other ports with lower MTU, then the bridge will be set to allow communication to all ports.

Having said that, I think the bridge auto config makes sense and we should not change.

Now looking at the kernel, the driver is supposed to drop if the device is down, so it is a bit after the
code checking the MTU.

void ovs_vport_send(struct vport *vport, struct sk_buff *skb, u8 mac_proto)
{
        if (unlikely(packet_length(skb, vport->dev) > mtu &&
                     !skb_is_gso(skb))) {
                net_warn_ratelimited("%s: dropped over-mtu packet: %d > %d\n",
                                     vport->dev->name,
                                     packet_length(skb, vport->dev), mtu);
                vport->dev->stats.tx_errors++;
                goto drop;
        }



static netdev_tx_t internal_dev_recv(struct sk_buff *skb)
{
        struct net_device *netdev = skb->dev;

        if (unlikely(!(netdev->flags & IFF_UP))) {
                kfree_skb(skb);
                netdev->stats.rx_dropped++;
                return NETDEV_TX_OK;
        }



Perhaps we can change ovs_vport_send() to not log that message if the egress device is down.

However, before I go down that route, I need to understand the use case. It doesn't seem correct to have different MTUs in the same bridge because the connection between those ports would be broken.
Could you please explain why the ports have different MTUs?

fbl

Comment 3 Andreas Karis 2021-02-12 15:32:56 UTC

Hi,

It's been a while for this one, but IIRC OpenStack neutron will set the MTU only for the ports that connect to the instances. It will not raise the MTU for the bridge internal port.

I think the reason why I opened this here is that I'd not expect a port in DOWN state to log " dropped over-mtu packet" packets, given that it's down.

So if I as an admin set a port as down, I wouldn't expect to see log messages for the port that tell me that it's MTU is not set right, given that I'd expect the port to not receive anything, anyway.

I hope that makes sense.

Again, feel free to close this as won't fix if you think this is not a valid request. I have not run into a similar complaint, since, and it's been a year.

- Andreas

Comment 4 Flavio Leitner 2021-03-16 21:58:29 UTC

Patch posted upstream:
https://lists.openwall.net/netdev/2021/03/16/279

Comment 5 Flavio Leitner 2021-03-17 13:23:02 UTC

Patch has been accepted in net-next:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=ebfbc46b35cb

We sync downstream openvswitch with upstream periodically, so this fix will eventually hit RHEL-8 and newer.

Having said that and based on comment#3, I am closing the ticket.

Thanks Andreas for reporting this bug!
fbl