Bug 1905929 - Libvirt shouldn't set the MTU of an unmanaged tap/macvtap device, it should just pass the mtu to the guest
Summary: Libvirt shouldn't set the MTU of an unmanaged tap/macvtap device, it should j...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.3
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Laine Stump
QA Contact: yalzhang@redhat.com
URL:
Whiteboard: Feature_Enhancement
Depends On:
Blocks: 1904132 1920437 1924681 1947824
TreeView+ depends on / blocked
 
Reported: 2020-12-09 10:56 UTC by Alona Kaplan
Modified: 2021-09-28 06:36 UTC (History)
17 users (show)

Fixed In Version: libvirt-7.0.0-2.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1920437 1924681 1947824 (view as bug list)
Environment:
Last Closed: 2021-05-25 06:45:17 UTC
Type: Feature Request
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt issues 4629 0 None closed Migrations fail after update to v0.35.0 to v0.36.0-rc.0 2021-02-15 15:20:32 UTC

Description Alona Kaplan 2020-12-09 10:56:10 UTC
Description of problem:

RFE https://bugzilla.redhat.com/1723367 introduced the ability to connect <interface> to a pre-created host tap/macvtap device.

However, when passing the device including a MTU to libvirt, it does both setting the MTU on the device and passing it to the guest.

Since the device is pre-created, it already contains the correct MTU. Libvirt shouldn't touch it.


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1.Pre create a tap device with MTU 2000.
2 Start a vm with the device.
The device passed to libvirt should be "unmanaged" and contain "mtu".

For example -
 <interface type='ethernet'>
      <mac address='02:00:00:d0:03:54'/>
      <target dev='tap0' managed='no'/>
      <model type='virtio'/>
      <mtu size='1440'/>
      <alias name='ua-default'/>
      <rom enabled='no'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>


Actual results:
Libvirt sets the MTU on the tap device and passes it to the guest.
(The MTU of the tap was changed to 1440, the MTU on the guest nic is 1440).

Expected results:
Libvirt shouldn't set the MTU on the tap/macvtap device. It should just pass the MTU to the guest.
(The MTU of the tap device should be untouched and stay 2000, the MTU on the guest nic should be 1440).

Additional info:
The bug is a blocker to cnv since we run the vm in a pod with no net-admin capability.
We have a pre-created tap device that already has the correct MTU. But when passing the device to libvirt we get the following error -

virError(Code=38, Domain=0, Message='Cannot set interface MTU on 'tap0': Operation not permitted')

We cannot omit the MTU option from the xml since we need libvirt to pass the MTU to the guest.

Comment 1 Laine Stump 2020-12-11 19:44:04 UTC
This sounds like a reasonable request. I'll look into making a patch for it.

(I thought I recalled that if the tap device was already set to the correct MTU, then this error wouldn't occur. Is there a specific reason why you want the device to have a larger MTU than the NIC in the guest? I'm not sure about this specific situation, but mismatched MTUs can sometimes lead to "bad things" happening (in particular, one end thinks the MTU is larger, so it tries to send larger packets w/o fragmenting, but those packets are dropped at the other end because they're too big))

Comment 2 Alona Kaplan 2020-12-11 20:04:09 UTC
We have the same MTU on the nic and the tap device. The example in the description was just to illustrate the issue (that libvirt shouldn't touch the MTU of the unmanaged device).

In CNV we run the vm (and libvirt) inside a pod with no net_admin capability.
We have a pre-created tap device that already has the correct MTU (the same MTU we pass on the xml of the tap device to libvirt). But when passing the device to libvirt we get the following error -
virError(Code=38, Domain=0, Message='Cannot set interface MTU on 'tap0': Operation not permitted')

We cannot omit the MTU option from the xml since we need libvirt to pass the MTU to the guest.
So in case of unmanaged tap/macvtap device we expect libvirt not to touch the MTU of the tap device but just pass the MTU to the guest.

Comment 3 Miguel Duarte Barroso 2021-01-12 09:14:05 UTC
Any news about this ? Can I help in any way ? 

I fully understand the holidays were in between, but we (CNV) are very keen on a fix for this.

Comment 4 Laine Stump 2021-01-12 22:40:49 UTC
Yeah, sorry. This was on my list to get to after the holidays, but I hadn't gotten there yet. It's actually a very simple fix - I just sent a patch upstream for it:

https://www.redhat.com/archives/libvir-list/2021-January/msg00634.html

Comment 5 Alona Kaplan 2021-01-14 08:40:38 UTC
Thanks a lot Laine.

Laine, 
Does passing the MTU by libvirt to the *guest* OS (as you wrote on your patch -to tell the emulated device what MTU to set on the other end of the tap) is supported for all the interface drivers (e.g. virtio, e1000)/all OSs? (For example, I see it doesn't work for cirros guest OS but works for fedora).

Comment 6 Laine Stump 2021-01-15 03:06:19 UTC
(In reply to Alona Kaplan from comment #5)
> Does passing the MTU by libvirt to the *guest* OS (as you wrote on your
> patch -to tell the emulated device what MTU to set on the other end of the
> tap) is supported for all the interface drivers (e.g. virtio, e1000)/all
> OSs? (For example, I see it doesn't work for cirros guest OS but works for
> fedora).

It only works for virtio-net (no e1000 or anything else), and only on guest OSes with a virtio-net driver new enough to understand the data passed into the guest via a PCI config register in the emulated device. AFAIK, this is only supported for Linux in the last year or two, and not by any other OS.

Comment 7 Laine Stump 2021-01-15 19:46:15 UTC
This is now upstream, but unfortunately not in the 7.0.0 release:

commit 3bb87556b8ab010e5b808ac6775af7c10ea3d05d
Author: Laine Stump <laine>
Date:   Tue Jan 12 14:10:05 2021 -0500

    qemu: don't set interface MTU when managed='no'

Comment 17 yalzhang@redhat.com 2021-01-25 09:47:03 UTC
Reproduce the issue on libvirt-6.10.0-1.module+el8.4.0+8898+a84e86e1.x86_64:

1. Create a tap device and set the mtu of it to be '2000';
# ip tuntap add mode tap name mytap0
# ip link set dev mytap0 mtu 2000
# ip l show mytap0
33: mytap0: <BROADCAST,MULTICAST> mtu 2000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 42:d4:c3:77:3b:09 brd ff:ff:ff:ff:ff:ff

2. Start a vm with the tap device above:
# virsh dumpxml rhel | grep /interface -B6
    <interface type='ethernet'>
      <mac address='52:54:00:93:0f:bc'/>
      <target dev='mytap0' managed='no'/>
      <model type='virtio'/>
      <mtu size='1400'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
 
# virsh start rhel 
Domain rhel started

3. Check the mtu size both on host for the tap device and the interface on guest, both change to '1400';
# ip l show mytap0
33: mytap0: <BROADCAST,MULTICAST> mtu 1400 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 42:d4:c3:77:3b:09 brd ff:ff:ff:ff:ff:ff

[root@new_guest ~]# ip l show | grep -v lo
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:93:0f:bc brd ff:ff:ff:ff:ff:ff


For unprivileged user, this can not happen because the vm can not start with mtu setting:
# ip tuntap add mode tap user test group test name mytap0
# ip link set dev mytap0 mtu 2000
# ip l show mytap0
$ virsh dumpxml vm1 | grep /interface -B6
    <interface type='ethernet'>
      <mac address='52:54:00:93:0f:bc'/>
      <target dev='mytap0' managed='no'/>
      <model type='virtio'/>
      <mtu size='1440'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>

$ virsh start vm1
error: Failed to start domain vm1
error: Cannot set interface MTU on 'mytap0': Operation not permitted

Comment 18 yalzhang@redhat.com 2021-01-25 09:56:34 UTC
Test on libvirt-7.0.0-2.module+el8.4.0+9520+ef609c5f.x86_64

1. prepare the tap device with mtu as '2000'
# ip l  show mytap0
1732: mytap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2000 qdisc fq_codel master br0 state UP mode DEFAULT group default qlen 1000
    link/ether ae:14:31:88:b3:ab brd ff:ff:ff:ff:ff:ff

2. start the vm with mtu set as '1400':
# virsh dumpxml test | grep /interface -B6
    <interface type='network'>
      <mac address='52:54:00:58:ff:ab'/>
      <source network='net'/>
      <model type='virtio'/>
      <mtu size='1400'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>

# virsh start test 
Domain 'test' started

3. check the mtu of the tap device, it is not changed by libvirt and keep as '2000': 
# ip l show mytap0
1732: mytap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2000 qdisc fq_codel master br0 state UP mode DEFAULT group default qlen 1000
    link/ether ae:14:31:88:b3:ab brd ff:ff:ff:ff:ff:ff

4.check the mtu on vm, it is 1400 as set in the xml:
# ip l  | grep -v lo
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:58:ff:ab brd ff:ff:ff:ff:ff:ff


scenario 2: set the mtu in interface larger than the tap device
# ip l show mytap0
1732: mytap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2000 qdisc fq_codel master br0 state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:14:31:88:b3:ab brd ff:ff:ff:ff:ff:ff

# virsh dumpxml test | grep /interface -B6
    <interface type='network'>
      <mac address='52:54:00:58:ff:ab'/>
      <source network='net'/>
      <model type='virtio'/>
      <mtu size='3000'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>

start the vm and check tap device again:
# ip l show mytap0
1732: mytap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2000 qdisc fq_codel master br0 state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:14:31:88:b3:ab brd ff:ff:ff:ff:ff:ff

check on guest:
# ip l  show enp1s0
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 3000 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:58:ff:ab brd ff:ff:ff:ff:ff:ff


the result is as expected, set the bug to be verified.

Comment 19 Miguel Duarte Barroso 2021-01-26 12:25:48 UTC
Can we have this fix back-ported to RHEL-AV-8.3.0 (and the bug cloned) ? CNV's next release will unfortunately use that instead - correct me if I'm wrong @phoracek - and we'd like to have the fix asap.

Comment 20 Petr Horáček 2021-01-26 13:08:35 UTC
Yes, the next release of OpenShift Virtualization will be released on 8.3.

We'd appreciate having the fix in 8.3, so we could verify whether it really covers our use-case completely. If we find out there are more issues blocking us on the libvirt side, we'd have enough time to open more RFEs.

Comment 21 Miguel Duarte Barroso 2021-01-26 13:48:01 UTC
(In reply to Petr Horáček from comment #20)
> Yes, the next release of OpenShift Virtualization will be released on 8.3.
> 
> We'd appreciate having the fix in 8.3, so we could verify whether it really
> covers our use-case completely. If we find out there are more issues
> blocking us on the libvirt side, we'd have enough time to open more RFEs.

@laine is this doable ?

Comment 22 Laine Stump 2021-01-27 15:21:56 UTC
Backporting the patch is trivial, it will take 5 minutes, and the chance of regression is essentially 0. I can never remember whether we're supposed to add the "ZStream" keyword, or set the zstream=? flag in order to request z-stream; I will ask on IRC and make the necessary change.

When you say "8.3", do you mean 8.3.1.z, or 8.3.0.z? AFAIU if you need 8.3.0.z then 8.3.1.z will need to be done first.

Comment 23 Michal Privoznik 2021-01-27 16:34:18 UTC
(In reply to Laine Stump from comment #22)
> Backporting the patch is trivial, it will take 5 minutes, and the chance of
> regression is essentially 0. I can never remember whether we're supposed to
> add the "ZStream" keyword, or set the zstream=? flag in order to request
> z-stream; I will ask on IRC and make the necessary change.
> 
> When you say "8.3", do you mean 8.3.1.z, or 8.3.0.z? AFAIU if you need
> 8.3.0.z then 8.3.1.z will need to be done first.

Yes, they need 8.3.0.z, but as you point out, we need to backport into 8.3.1.z too.

Comment 37 errata-xmlrpc 2021-05-25 06:45:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2098


Note You need to log in before you can comment on or make changes to this bug.