Bug 1449346 - Addition of host_mtu=XXXX changes PCI ioport size & addresses for virtio-net device
Summary: Addition of host_mtu=XXXX changes PCI ioport size & addresses for virtio-net ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Laine Stump
QA Contact: yalzhang@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1376765 1399515 1436046
TreeView+ depends on / blocked
 
Reported: 2017-05-09 17:26 UTC by Daniel Berrangé
Modified: 2017-08-02 01:32 UTC (History)
12 users (show)

Fixed In Version: libvirt-3.2.0-6.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-02 00:08:25 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1846 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2017-08-01 18:02:50 UTC

Description Daniel Berrangé 2017-05-09 17:26:44 UTC
Description of problem:

Take a guest running QEMU < 2.9.0, with a virtio-net device. Running lspci -v in the guest will show it has ioport size of 32

00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
        Subsystem: Red Hat, Inc Device 0001
        Physical Slot: 3
        Flags: bus master, fast devsel, latency 0, IRQ 11
        I/O ports at c060 [size=32]

                           ^^^^^^^^

        Memory at fc056000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at fc000000 [disabled] [size=256K]
        Capabilities: <access denied>
        Kernel driver in use: virtio-pci
        Kernel modules: virtio_pci


Now, upgrade to QEMU == 2.9.0, *keeping* the machine type unchanged. Boot the guest and run lspci -v again. The virtio-net device ioport size has changed to 64


00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
        Subsystem: Red Hat, Inc Device 0001
        Physical Slot: 3
        Flags: bus master, fast devsel, latency 0, IRQ 11
        I/O ports at c000 [size=64]

                           ^^^^^^^^

        Memory at fc056000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at fc000000 [disabled] [size=256K]
        Capabilities: <access denied>
        Kernel driver in use: virtio-pci
        Kernel modules: virtio_pci


Not only that, but the I/O addresses of every device change, as a fallout from the size change.

git bisect shows that the trigger is the new host_mtu attribute supported by QEMU:

commit a93e599d4a04c3cf7edcf5a24f3397e27431c027
Author: Maxime Coquelin <maxime.coquelin@redhat.com>
Date:   Sat Dec 10 16:30:38 2016 +0100

    virtio-net: Add MTU feature support
    
    This patch allows advising guest with host MTU's by setting
    host_mtu parameter.
    
    If VIRTIO_NET_F_MTU has been successfully negotiated, MTU
    value is passed to the backend.
    
    Cc: Michael S. Tsirkin <mst@redhat.com>
    Cc: Aaron Conole <aconole@redhat.com
    Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


Enabling the host_mtu attribute causes the ioport size to change for virtio-net.

libvirt detects this new host_mtu feature in QEMU and unconditionally enables it for all guests, due to


commit 2841e6756d5807a4119e004bc5fb8e7d70806458
Author: Laine Stump <laine@laine.org>
Date:   Fri Feb 3 11:55:20 2017 -0500

    qemu: propagate bridge MTU into qemu "host_mtu" option
    
    libvirt was able to set the host_mtu option when an MTU was explicitly
    given in the interface config (with <mtu size='n'/>), set the MTU of a
    libvirt network in the network config (with the same named
    subelement), and would automatically set the MTU of any tap device to
    the MTU of the network.
    
    This patch ties that all together (for networks based on tap devices
    and either Linux host bridges or OVS bridges) by learning the MTU of
    the network (i.e. the bridge) during qemuInterfaceBridgeConnect(), and
    returning that value so that it can then be passed to
    qemuBuildNicDevStr(); qemuBuildNicDevStr() then sets host_mtu in the
    interface's commandline options.
    
    The result is that a higher MTU for all guests connecting to a
    particular network will be plumbed top to bottom by simply changing
    the MTU of the network (in libvirt's config for libvirt-managed
    networks, or directly on the bridge device for simple host bridges or
    OVS bridges managed outside of libvirt).
    
    One question I have about this - it occurred to me that in the case of
    migrating a guest from a host with an older libvirt to one with a
    newer libvirt, the guest may have *not* had the host_mtu option on the
    older machine, but *will* have it on the newer machine. I'm curious if
    this could lead to incompatibilities between source and destination (I
    guess it all depends on whether or not the setting of host_mtu has a
    practical effect on a guest that is already running - Maxime?)
    
    Likewise, we could run into problems when migrating from a newer
    libvirt to older libvirt - The guest would have been told of the
    higher MTU on the newer libvirt, then migrated to a host that didn't
    understand <mtu size='blah'/>. (If this really is a problem, it would
    be a problem with or without the current patch).



This change in guest ABI will break migration compatibility between old & new QEMU, despite the machine type being unchanged.


Version-Release number of selected component (if applicable):
libvirt 3.2.0
qemu 2.9.0

How reproducible:
Always

Steps to Reproduce:
1. Install libvirt 3.2.0
2. Provision a guest with QEMU 2.8.0 (or older) and a virtio-net device present
3. Run 'lspci -v' in the guest
4. Upgrade QEMU to 2.9.0
5. Run 'lspci -v' in the guest again

Actual results:
The virtio-net ioport size changes from 32 to 64

Expected results:
No guest ABI change

Additional info:

Comment 2 Dr. David Alan Gilbert 2017-05-09 17:39:59 UTC
Michael:
  How is virtio feature negotiation supposed to work and keep compatibility?
  It looks to me like:
    a) The ioport BAR is dependent on the config size
    b) The config size is dependent on the MAX of the enabled features (virtio_net_config_size)
    c) To me it looks like both 'max_virtqueue_pairs' or 'mtu' both take it over the 32byte limit
    d) Aren't those features dependent on the capabilities of the backend?
    e) So a change in backend can change a feature availability and change the guest visible ioport BAR size on an existing machine type?

Dave

Comment 3 Daniel Berrangé 2017-05-10 13:10:56 UTC
I'm tending to think that libvirt is at fault here, even if the ioport size did not change. 

Setting host_mtu=NNN is exposing a new feature to the guest OS, regardless of the size change, and as such should always have been opt-in. As it is today, by using host_mtu=NNN, libvirt will potentially break backwards migration. ie running VM with 2.8 machine type on a QEMU 2.9 host, and then migrate to a QEMU 2.8 host, we're going to silently loose the host_mtu setting which is a guest ABI change.

So IMHO we need to revert this host_mtu addition in libvirt & its stable branches, and require an XML triggered opt-in for reporting it.

Comment 4 Laine Stump 2017-05-10 16:52:25 UTC
My understanding of host_mtu (which is obviously flawed since simply adding it to the commandline causes the ioport size to change, and I definitely hadn't previously heard *that*) was that the value is only read by the guest driver when it's initialized (ie at system boot time), so it wasn't supposed to cause any operational change during a migration. Of course thinking back, even *that* line of reasoning is flawed, since it depends on the behavior of the current virtio-net driver, which isn't guaranteed.

So just how far do we need to go in making this opt-in? I seriously dislike the idea of requiring the exact MTU explicitly in every individual domain config - this would be very cumbersome if someone wanted to change the MTU for the network. Maybe we could make it opt-in with "<mtu auto='on'/>, which would still enable a change in the MTU of the network/bridge to take effect on guests as they are started.

On the other hand, if the MTU of source and destination networks are different, then the value of host_mtu provided to the guest, although informational-only, would still change during migration. BUT, on the *other* other hand, if we hard-code MTU in each guest config, then as we migrate either the MTU of the bridge on the destination will change (bridges always adopt the smallest MTU of all connected devices), or the MTU of the guest's tap device will change (and be inconsistent with the MTU set in the guest itself).

So how about this - a domain can be configured with one of these:

     <mtu auto='on'/>
     <mtu size='n'/>

If either is specified and host_mtu is supported, it would be set in the qemu commandline. If auto='on' is specified, then the domain state XML would also report the actual MTU in use:

     <mtu auto='on' size='9000'/>

and this is also what would be sent with the migration, thus assuring that the MTU supplied to the guest on the destination (and set for the new tap device on the destination) is the same regardless of the MTU of the network. In the meantime,  since only auto='on' is in the persistent config, any time the guest is cold-started, it will set the optimum MTU based on the network on the current host (so a change in MTU of a network will properly propagate to *newly started* guests, but those already running will be unaffected).

Comment 6 Laine Stump 2017-05-22 07:33:25 UTC
I posted a patch to revert the original offending patch upstream (I'll get to reimplementing it as an opt-in feature later):

https://www.redhat.com/archives/libvir-list/2017-May/msg00786.html

Comment 7 Laine Stump 2017-05-22 20:27:26 UTC
Pushed upstream:

commit 77780a29edace958a1f931d3281b962be4f5290e
Author: Laine Stump <laine@laine.org>
Date:   Thu May 18 14:16:27 2017 -0400

    Revert "qemu: propagate bridge MTU into qemu "host_mtu" option"
    
    This reverts commit 2841e675.

Comment 10 Kashyap Chamarthy 2017-05-26 10:59:19 UTC
Just noting it here for posterity, and in case someone hits it.  *IF* you're not using the libvirt build libvirt-3.2.0-6.el7 on the destination RHEL 7.4 host, live migration will fail with the following error messages:

-----------------------------------------------------------------------
2017-05-26 10:24:45.631+0000: 8790: error : virNetClientProgramDispatchError:177 : internal error: qemu unexpectedly closed the monitor: 2017-05-26T10:24:45.180184Z qemu-kvm: -chardev pty,id
=charserial1: char device redirected to /dev/pts/2 (label charserial1)
2017-05-26T10:24:45.219273Z qemu-kvm: warning: Unknown firmware file in legacy mode: etc/msr_feature_control
2017-05-26T10:24:45.430363Z qemu-kvm: get_pci_config_device: Bad config data: i=0x10 read: 61 device: 1 cmask: ff wmask: c0 w1cmask:0
2017-05-26T10:24:45.430399Z qemu-kvm: Failed to load PCIDevice:config
2017-05-26T10:24:45.430403Z qemu-kvm: Failed to load virtio-net:virtio
2017-05-26T10:24:45.430409Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-net'
2017-05-26T10:24:45.430588Z qemu-kvm: load of migration failed: Invalid argument
-----------------------------------------------------------------------

I can confirm from testing that once you use the build libvirt-3.2.0-6.el7, the above error goes away, and migration succeeds.

Comment 11 Dan Zheng 2017-05-27 02:53:44 UTC
I confirm this works with libvirt-3.2.0.6.el7 on PPC now. BTW, this problem happens on PPC with libvirt-3.2.0.4.el7 


Source host: [7.3]
qemu-kvm-rhev-2.6.0-28.el7_3.10.ppc64le
libvirt-2.0.0-10.virtcov.el7_3.9.ppc64le
kernel-3.10.0-514.21.1.el7.ppc64le
RHEL-7.3-20161019.0


Target host:[7.4]
libvirt-3.2.0-6.el7.ppc64le
qemu-kvm-rhev-2.9.0-6.el7.ppc64le
kernel-3.10.0-700.el7.ppc64le
RHEL-7.4-20170504.0


#  virsh  migrate avocado-vt-vm1 --live --verbose  qemu+ssh://<target_host>/system
Migration: [100 %]

Comment 12 yalzhang@redhat.com 2017-06-02 08:54:33 UTC
Reproduce the bug, then update libvirt, check the PCI ioport size & addresses for virtio-net device keep no changes.

1. Reproduce it on 
libvirt-3.2.0-5.el7.x86_64
qemu-kvm-rhev-2.8.0-3.el7.x86_64 

# lspci -v -s 00:03.0
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
	Subsystem: Red Hat, Inc Device 0001
	Physical Slot: 3
	Flags: bus master, fast devsel, latency 0, IRQ 11
	I/O ports at c0a0 [size=32]
	Memory at fc056000 (32-bit, non-prefetchable) [size=4K]
	Memory at febf0000 (64-bit, prefetchable) [size=16K]
	Expansion ROM at fc000000 [disabled] [size=256K]
	Capabilities: [98] MSI-X: Enable+ Count=3 Masked-
	Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
	Capabilities: [70] Vendor Specific Information: VirtIO: Notify
	Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
	Capabilities: [50] Vendor Specific Information: VirtIO: ISR
	Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
	Kernel driver in use: virtio-pci
	Kernel modules: virtio_pci

2. update to qemu-kvm-rhev-2.9.0-7.el7.x86_64

on host:
# ps aux | grep qemu-kvm
qemu     18317 21.9  5.3 1934444 397524 ?      Sl   16:21   0:20 /usr/libexec/qemu-kvm -name guest=rhel7,....
-netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,**host_mtu=1500**,netdev=hostnet0,id=net0,mac=52:54:00:64:1e:24,bus=pci.0,addr=0x3......

<==="host_mtu=1500" added automatically even no mtu setting in guest's xml

on guest:
# lspci -v -s 00:03.0
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
	Subsystem: Red Hat, Inc Device 0001
	Physical Slot: 3
	Flags: bus master, fast devsel, latency 0, IRQ 11
	I/O ports at c000 [size=64]    <===
	Memory at fc056000 (32-bit, non-prefetchable) [size=4K]
	Memory at febf0000 (64-bit, prefetchable) [size=16K]
	Expansion ROM at fc000000 [disabled] [size=256K]
	Capabilities: [98] MSI-X: Enable+ Count=3 Masked-
	Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
	Capabilities: [70] Vendor Specific Information: VirtIO: Notify
	Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
	Capabilities: [50] Vendor Specific Information: VirtIO: ISR
	Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
	Kernel driver in use: virtio-pci
	Kernel modules: virtio_pci

3. update the libvirt to latest libvirt-3.2.0-7.el7

# rpm -q libvirt qemu-kvm-rhev
libvirt-3.2.0-7.el7.x86_64
qemu-kvm-rhev-2.9.0-7.el7.x86_64

# virsh destroy rhel7
Domain rhel7 destroyed

# systemctl restart libvirtd.service

# virsh start rhel7
Domain rhel7 started

# ps aux | grep qemu-kvm   
....-netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:64:1e:24,bus=pci.0,addr=0x3....
<==== no "host_mtu"

on guest:
# lspci -v -s 00:03.0
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
	Subsystem: Red Hat, Inc Device 0001
	Physical Slot: 3
	Flags: bus master, fast devsel, latency 0, IRQ 11
	I/O ports at c0a0 [size=32]  <==== fixed
	Memory at fc056000 (32-bit, non-prefetchable) [size=4K]
	Memory at febf0000 (64-bit, prefetchable) [size=16K]
	Expansion ROM at fc000000 [disabled] [size=256K]
	Capabilities: [98] MSI-X: Enable+ Count=3 Masked-
	Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
	Capabilities: [70] Vendor Specific Information: VirtIO: Notify
	Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
	Capabilities: [50] Vendor Specific Information: VirtIO: ISR
	Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
	Kernel driver in use: virtio-pci
	Kernel modules: virtio_pci

Comment 13 yalzhang@redhat.com 2017-06-04 02:44:08 UTC
Scenario 2: 
set <mtu size='9000'/> in the guest's xml with latest qemu-kvm-rhev and libvirt, I/O ports size is 64. Then migrate it to rhel7.3.z host. The host_mtu settings in qemu command line and guest's xml will be ignored. For the running guest, the I/O ports size and address will keep the same as this is initialized during boot.

src host: rhel7.4
libvirt-3.2.0-7.el7.x86_64
qemu-kvm-rhev-2.9.0-7.el7.x86_64

dst host: rhel7.3.z
libvirt-2.0.0-10.el7_3.9.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64

1. on src host:
# virsh dumpxml rhel7 | grep /interface -B3
      <mtu size='9000'/>
....
    </interface>

# ps aux | grep qemu-kvm
.... -device virtio-net-pci,host_mtu=9000,netdev=hostnet0,id=net0,mac=52:54:00:64:1e:24,bus=pci.0....

on guest:
# lspci -v -s 00:03.0
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
	Subsystem: Red Hat, Inc Device 0001
	Physical Slot: 3
	Flags: bus master, fast devsel, latency 0, IRQ 11
	I/O ports at c000 [size=64]  

                      ^^^^^^^^^^ 

	Memory at fc056000 (32-bit, non-prefetchable) [size=4K]
	Memory at febf0000 (64-bit, prefetchable) [size=16K]
	Expansion ROM at fc000000 [disabled] [size=256K]
	Capabilities: [98] MSI-X: Enable+ Count=3 Masked-
	Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
	Capabilities: [70] Vendor Specific Information: VirtIO: Notify
	Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
	Capabilities: [50] Vendor Specific Information: VirtIO: ISR
	Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
	Kernel driver in use: virtio-pci
	Kernel modules: virtio_pci

2. migrate to rhel7.3.z
# virsh migrate rhel7 --live --persistent qemu+ssh://server/system --verbose
root@server's password: 
Migration: [100 %]

3. After live migration, on the rhel7.3.z target host, check the qemu command line,the host_mtu is ignored, and the <mtu size='9000'/> is ignored in guest's xml.

# ps aux | grep qemu-kvm
....-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:64:1e:24,bus=pci.0,addr=0x3...

# virsh dumpxml rhel7 | grep mtu
#

4. For the running guest on 7.3.z host, address and I/O port size not change

 # lspci -v -s 00:03.0
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
	Subsystem: Red Hat, Inc Device 0001
	Physical Slot: 3
	Flags: bus master, fast devsel, latency 0, IRQ 11
	I/O ports at c000 [size=64] 

                      ^^^^^^^^^ 

	Memory at fc056000 (32-bit, non-prefetchable) [size=4K]
	Memory at febf0000 (64-bit, prefetchable) [size=16K]
	Expansion ROM at fc000000 [disabled] [size=256K]
	Capabilities: [98] MSI-X: Enable+ Count=3 Masked-
	Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
	Capabilities: [70] Vendor Specific Information: VirtIO: Notify
	Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
	Capabilities: [50] Vendor Specific Information: VirtIO: ISR
	Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
	Kernel driver in use: virtio-pci
	Kernel modules: virtio_pci

5. destroy and start the guest on target host, the address and I/O port size changed back.

# lspci -v -s 00:03.0
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
	Subsystem: Red Hat, Inc Device 0001
	Physical Slot: 3
	Flags: bus master, fast devsel, latency 0, IRQ 11
	I/O ports at c060 [size=32]

                          ^^^^

	Memory at fc056000 (32-bit, non-prefetchable) [size=4K]
	Memory at febf0000 (64-bit, prefetchable) [size=16K]
	Expansion ROM at fc000000 [disabled] [size=256K]
	Capabilities: [98] MSI-X: Enable+ Count=3 Masked-
	Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
	Capabilities: [70] Vendor Specific Information: VirtIO: Notify
	Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
	Capabilities: [50] Vendor Specific Information: VirtIO: ISR
	Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
	Kernel driver in use: virtio-pci
	Kernel modules: virtio_pci

Comment 14 yalzhang@redhat.com 2017-06-04 03:10:42 UTC
Scenario 3:
set mtu in network or bridge, the target device will inherit the mtu seting, but the guest will not get the mtu setting automatically.

1. set <mtu size='9000'/> in network default

2. start a guest with 1 interface connected to virbr0, the target device will get mtu=9000, yet the guest will not get host_mtu=9000 automatically in the qemu command line.

# brctl show
bridge name	bridge id		STP enabled	interfaces
virbr0		8000.5254009cc162	yes		virbr0-nic
							vnet0
# ifconfig vnet0
vnet0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
...

# ps aux | grep rhel7
...
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:64:1e:24,bus=pci.0,addr=0x3


Hi Laine, could you please help to check if the 3 scenarios in #c12, #c13 and #14 is expected? 

For #c6, if the "reimplementing it as an opt-in feature later" will be addressed in rhel7.4?

Comment 15 Laine Stump 2017-06-07 16:28:41 UTC
Yes, the 3 test scenarios produce expected results (Note that scenario 2 shows that migration would fail if you specified an mtu for the device and then migrated to a host whose libvirt or qemu didn't support host_mtu. That's why it needs to be "opt-in" - the admin shouldn't set it until all the nodes in the cluster have the proper libvirt+qemu versions.

As for "reimplementing it as an opt-in feature", It *is* now opt-in, with the additional restriction that you must specify the exact mtu for each device. In a future release (not 7.4) you will be able to just add:

     <mtu auto='on'/>

(or something similar), and libvirt will detect the MTU from the network (when possible) an propagate it to host_mtu on the guest.

Comment 16 yalzhang@redhat.com 2017-06-08 01:27:00 UTC
Thank you Laine, set this bug to be verified.

Comment 18 errata-xmlrpc 2017-08-02 00:08:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846

Comment 19 errata-xmlrpc 2017-08-02 01:32:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846


Note You need to log in before you can comment on or make changes to this bug.