Bug 1429163 - Automatically instruct MTU to the guest
Summary: Automatically instruct MTU to the guest
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 14.0 (Rocky)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: beta
: 14.0 (Rocky)
Assignee: smooney
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On: 1366919 1408701 1412234 1450162
Blocks: 1411862
TreeView+ depends on / blocked
 
Reported: 2017-03-05 08:01 UTC by Amnon Ilan
Modified: 2023-03-21 18:44 UTC (History)
36 users (show)

Fixed In Version: openstack-nova-18.0.0-0.20180822155218.14d9e9f.0rc1
Doc Type: Enhancement
Doc Text:
Clone Of: 1408701
Environment:
Last Closed: 2019-01-11 11:47:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 553072 0 None MERGED add mtu to libvirt xml for ethernet and bridge types 2020-10-06 14:24:52 UTC
Red Hat Issue Tracker OSP-23311 0 None None None 2023-03-21 18:44:21 UTC
Red Hat Product Errata RHEA-2019:0045 0 None None None 2019-01-11 11:47:25 UTC

Description Amnon Ilan 2017-03-05 08:01:28 UTC
+++ This bug was initially created as a clone of Bug #1408701 +++

+++ This bug was initially created as a clone of Bug #1366919 +++

Description of problem:

Bug #1366919 gives a new API in QEMU command-line to instruct the 
MTU to the guest.
This RFE Bug is for libvirt to configure QEMU with the MTU. 
The suggested way is to query the host switch for the MTU, and configure 
QEMU accordingly. 
The host switch can be:
1. OVS/OVS-DPDK
2. Linux Bridge

In future, support may be added for VPP as well.

--- Additional comment from Laine Stump on 2017-01-03 15:01:13 EST ---

As a "side feature" to this, we should provide a config knob to set the MTU of the bridge in a libvirt virtual network. This seems slightly problematic though since, as someone pointed out in #virt on oftc, the kernel won't allow setting an MTU over 1500 on a bridge device that doesn't have a physical ethernet attached.

Vlad - do you know why that is? (and more importantly, how to fix it?)

--- Additional comment from Aaron Conole on 2017-01-03 15:29:54 EST ---

The bridge will always configure to the lowest set mtu of all the devices added.  This means adding a new virtual interface to the bridge (or changing the mtu of an existing bridge element) will reconfigure the bridge MTU.

Is there a reason to care on an empty bridge?

As far as the fix goes, we can change the code for br_min_mtu to return the max-value.

Does this help as far as information?

--- Additional comment from Laine Stump on 2017-01-03 16:41:34 EST ---

Okay, I looked at this briefly just before leaving for a 12 day vacation, and now realize that I misremembered a bit, and also that I hadn't completed my investigation - my recollection was that the new tap devices being added would match the MTU of the bridge (I was thinking of it this way because libvirt always checks the MTU of the bridge prior to attaching a tap device, then sets the tap device MTU to match the bridge), but as you point out it's the opposite.

Beyond that, I just now tried the thing that I had *intended* to try before I left town (but didn't get around to, and then forgot) - if I 1) create a bridge device, then 2) create a tap device and 3) set the tap MTU to, e.g. 9000, and *then* 4) attach the tap to the bridge, the bridge now has a 9000 MTU. Since libvirt already creates a "dummy" tap device to attach to every bridge it creates (for multiple bookkeeping reasons), all we need to do is set the MTU of that tap device to the MTU we want for the network, and the bridge, as well as all new tap devices will inherit that MTU. So no changes are needed from anyone outside libvirt.

Thanks for pushing me to actually experiment, and sorry for the digression.

--- Additional comment from Laine Stump on 2017-01-23 10:34:20 EST ---

Is there a reason why this BZ, and its parent qemu BZ are marked private? I don't see any sensitive info anywhere. Marking them private means they can't be referred to in any public commit log or discussion, which leads to information disconnects in the future.

--- Additional comment from Amnon Ilan on 2017-01-26 18:07:19 EST ---

(In reply to Laine Stump from comment #4)
> Is there a reason why this BZ, and its parent qemu BZ are marked private? I
> don't see any sensitive info anywhere. Marking them private means they can't
> be referred to in any public commit log or discussion, which leads to
> information disconnects in the future.

You are right, changed it.

--- Additional comment from Michal Privoznik on 2017-01-30 09:00:04 EST ---

I've just pushed the patches upstream:

commit 572eda12ad7336031fbbdc8c130c8674ba7764e8
Author:     Michal Privoznik <mprivozn>
AuthorDate: Mon Jan 23 14:33:20 2017 +0100
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jan 26 10:00:01 2017 +0100

    qemu: Implement mtu on interface
    
    Not only we should set the MTU on the host end of the device but
    also let qemu know what MTU did we set.
    
    Signed-off-by: Michal Privoznik <mprivozn>

commit b020cf73fed761067c3b6196d993545f072c896f
Author:     Michal Privoznik <mprivozn>
AuthorDate: Mon Jan 23 14:32:13 2017 +0100
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jan 26 09:59:56 2017 +0100

    domain_conf: Introduce <mtu/> to <interface/>
    
    So far we allow to set MTU for libvirt networks. However, not all
    domain interfaces have to be plugged into a libvirt network and
    even if they are, they might want to have a different MTU (e.g.
    for testing purposes).
    
    Signed-off-by: Michal Privoznik <mprivozn>

commit eebec1697e88080f5a1271d104083430e34c457f
Author:     Michal Privoznik <mprivozn>
AuthorDate: Mon Jan 23 12:58:23 2017 +0100
Commit:     Michal Privoznik <mprivozn>
CommitDate: Wed Jan 25 09:18:49 2017 +0100

    virDomainNetDefParseXML: s/ret/rv/
    
    We use @ret to hold the actual return value of the function we
    are currently in. To hold a return value of a function called we
    use different variables: @rv, @rc, etc. Honour this naming
    scheme in virDomainNetDefParseXML too.
    
    Signed-off-by: Michal Privoznik <mprivozn>


v3.0.0-57-g572eda12a

--- Additional comment from Laine Stump on 2017-02-02 10:15:13 EST ---

Michal - Amnon asked the other day if we can learn the correct MTU for a vhost-user interface "from the switch", as it would be much simpler to configure if the user only needed to set the MTU for a switch, and then libvirt would query the switch MTU and set the appropriate host_mtu for all guest vhost-user interfaces using that switch.

I've already added on to my patches (but not yet posted - still cleaning up and testing) to set network MTU so that they will propogate the switch's MTU back to the qemu commandline in the case of Linux host bridge and OVS bridge connections that use tap devices (so we can just query the MTU of the device with virNetDevGetMTU()) but vhost-user connects to a Unix socket, and I know *nothing* about what's beyond that, or how to get to it. Do you know how to learn the MTU of the network in that case?

--- Additional comment from Michal Privoznik on 2017-02-03 03:58:09 EST ---

(In reply to Laine Stump from comment #7)
> Michal - Amnon asked the other day if we can learn the correct MTU for a
> vhost-user interface "from the switch", as it would be much simpler to
> configure if the user only needed to set the MTU for a switch, and then
> libvirt would query the switch MTU and set the appropriate host_mtu for all
> guest vhost-user interfaces using that switch.
> 
> I've already added on to my patches (but not yet posted - still cleaning up
> and testing) to set network MTU so that they will propogate the switch's MTU
> back to the qemu commandline in the case of Linux host bridge and OVS bridge
> connections that use tap devices (so we can just query the MTU of the device
> with virNetDevGetMTU()) but vhost-user connects to a Unix socket, and I know
> *nothing* about what's beyond that, or how to get to it. Do you know how to
> learn the MTU of the network in that case?

Sure:

# ovs-vsctl get Interface ovsbr0 mtu
1500

However, why do we need it on cmd line for classic Linux bridge + tap device case? Libvirt automatically sets tap device's mtu to match the one that the bridge has (unless the tap device has different MTU set - in which case we also put it onto qemu's cmd line).

--- Additional comment from Laine Stump on 2017-02-03 11:09:42 EST ---

(In reply to Michal Privoznik from comment #8)

> 
> However, why do we need it on cmd line for classic Linux bridge + tap device
> case? Libvirt automatically sets tap device's mtu to match the one that the
> bridge has (unless the tap device has different MTU set - in which case we
> also put it onto qemu's cmd line).

I don't see that anywhere in the code. It's true that we set the tap device MTU to match the bridge, but (without the patch that I have locally) this updated MTU isn't propagated into a host_mtu option on the commandline - host_mtu is only added if <mtu size...> is set in the interface config. (Also, for OVS bridges, libvirt is already getting the MTU of the bridge using a normal ioctl, just as it does for linux host bridges. Does this not work properly? (I guess I need to install OVS on my test machine to see).

As far as using ovs-vsctl to get the MTU of the OVS bridge for vhost-user interfaces - how do we know which bridge's MTU to get? All we know in the case of vhost-user is the name of the unix socket.

--- Additional comment from Laine Stump on 2017-02-13 16:21:06 EST ---

The patches below were pushed upstream and will be in libvirt 3.1.0. They take care of the issue I mentioned in Comment 9, as long as the network uses tap devices to connect to a Linux host bridge or OVS bridge.

For vhost-user, the discussion on the ML indicates that we'll just have to rely on Nova (or whatever higher level management application knows the details of what's at the other end of the Unix socket used by vhost-users) adding an <mtu size='blah'/> to the domain interface config.


commit dd8ac030fbd28bba81c24cdf1311e47a350a7683
Author: Laine Stump <laine>
Date:   Sun Jan 22 20:41:03 2017 -0500

    util: add MTU arg to virNetDevTapCreateInBridgePort()
    
commit 68a42bf6f701515df472f0dd039a1d7429ea62a8
Author: Laine Stump <laine>
Date:   Sun Jan 22 21:23:48 2017 -0500

    conf: support configuring mtu size in a virtual network
    
commit c0f706865e4ecac0ddaafdb4b9fc2db8d9612481
Author: Laine Stump <laine>
Date:   Sun Jan 22 21:33:07 2017 -0500

    network: honor mtu setting when creating network
    


commit 2841e6756d5807a4119e004bc5fb8e7d70806458
Author: Laine Stump <laine>
Date:   Fri Feb 3 11:55:20 2017 -0500

    qemu: propagate bridge MTU into qemu "host_mtu" option

--- Additional comment from Amnon Ilan on 2017-02-14 12:36:59 EST ---

(In reply to Laine Stump from comment #10)
> The patches below were pushed upstream and will be in libvirt 3.1.0. They
> take care of the issue I mentioned in Comment 9, as long as the network uses
> tap devices to connect to a Linux host bridge or OVS bridge.
> 
> For vhost-user, the discussion on the ML indicates that we'll just have to
> rely on Nova (or whatever higher level management application knows the
> details of what's at the other end of the Unix socket used by vhost-users)
> adding an <mtu size='blah'/> to the domain interface config.
> 

Why is OVS bridge different from OVS-DPDK (vhost-user)? Isn't there 
a common DB to query?

--- Additional comment from Laine Stump on 2017-02-14 13:28:04 EST ---

From libbirt's point of view a vhost-user interface doesn't connect "to OVS", it connects to a Unix socket. It has no idea that it is OVS at the other end. And from a discussion thread on libvir-list, I think consensus is that we don't want libvirt to know anything about what's at the other end of the socket. e.g.:

  https://www.redhat.com/archives/libvir-list/2017-February/msg00066.html

Also, a tap-based connection doesn't do any OVS-specific query to determine the MTU of the bridge - it just does a normal ioctl(SIOCGIFMTU), just as it does for a Linux host bridge.

--- Additional comment from errata-xmlrpc on 2017-03-03 09:23:10 EST ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHEA-2016:25896-01
https://errata.devel.redhat.com/advisory/25896

Comment 1 Amnon Ilan 2017-03-05 08:04:59 UTC
Description of problem:

Bug #1366919 gave a new API in QEMU command-line to instruct the 
MTU to the guest.
This RFE Bug is for libvirt to configure QEMU with the MTU. 
The suggested way is to query the host switch for the MTU, and configure 
QEMU accordingly. 
The host switch can be:
1. OVS/OVS-DPDK
2. Linux Bridge

In future, support may be added for VPP as well.

Comment 2 Amnon Ilan 2017-03-05 08:09:58 UTC
Sorry, for all the comments, just for writing a proper description.. 

Bug #1366919 gave a new API in QEMU command-line to instruct the 
MTU to the guest.
Bug #1408701 added libvirt support for it.

During the discussions on the libvirt BZ, we found out that libvirt cannot 
know/discover the actual value of MTU to set, and this should be done by higher 
level: Nova+Neutron (fill free to open another BZ for Neutron for that).

Comment 3 Stephen Finucane 2017-03-10 13:14:38 UTC
(In reply to Amnon Ilan from comment #2)
> During the discussions on the libvirt BZ, we found out that libvirt cannot 
> know/discover the actual value of MTU to set, and this should be done by
> higher 
> level: Nova+Neutron (fill free to open another BZ for Neutron for that).

I'm going to need a little more information on this. As things stand, we already set the MTU for interfaces and bridges using os-vif. Support was recently added for vhost-user interfaces also [1]. Is this somehow different?

[1] https://github.com/openstack/os-vif/commit/9a14c18c2163f8f90c797150d12e11b2aad8c1ee

Comment 5 Daniel Berrangé 2017-03-13 16:14:27 UTC
@stephen: all that stuff you describe is setting the MTU on the host side of the network connection.  The change made in libvirt is to provide a way to communicate this MTU to the guest OS. This is done by add <mtu size="NNNN"/> in the guest XML under the <interface> element. 

  http://libvirt.org/formatdomain.html#mtu

This allows the guest OS to configure the same sized MTU as used on the host.

Comment 6 Stephen Finucane 2018-04-06 13:43:28 UTC
(In reply to Daniel Berrange from comment #5)
> @stephen: all that stuff you describe is setting the MTU on the host side of
> the network connection.  The change made in libvirt is to provide a way to
> communicate this MTU to the guest OS. This is done by add <mtu size="NNNN"/>
> in the guest XML under the <interface> element. 
> 
>   http://libvirt.org/formatdomain.html#mtu
> 
> This allows the guest OS to configure the same sized MTU as used on the host.

OK, looks like we're going to start doing this as part of the resolution to #1553559. This seems like a fair request.

Comment 13 errata-xmlrpc 2019-01-11 11:47:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045


Note You need to log in before you can comment on or make changes to this bug.