Bug 1449346
| Summary: | Addition of host_mtu=XXXX changes PCI ioport size & addresses for virtio-net device | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Daniel Berrangé <berrange> |
| Component: | libvirt | Assignee: | Laine Stump <laine> |
| Status: | CLOSED ERRATA | QA Contact: | yalzhang <yalzhang> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.4 | CC: | dgilbert, dyuan, dzheng, gsun, hhan, kchamart, laine, maxime.coquelin, mst, rbalakri, xuzhang, zpeng |
| Target Milestone: | rc | Keywords: | Regression |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-3.2.0-6.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-02 00:08:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1376765, 1399515, 1436046 | ||
Michael:
How is virtio feature negotiation supposed to work and keep compatibility?
It looks to me like:
a) The ioport BAR is dependent on the config size
b) The config size is dependent on the MAX of the enabled features (virtio_net_config_size)
c) To me it looks like both 'max_virtqueue_pairs' or 'mtu' both take it over the 32byte limit
d) Aren't those features dependent on the capabilities of the backend?
e) So a change in backend can change a feature availability and change the guest visible ioport BAR size on an existing machine type?
Dave
I'm tending to think that libvirt is at fault here, even if the ioport size did not change. Setting host_mtu=NNN is exposing a new feature to the guest OS, regardless of the size change, and as such should always have been opt-in. As it is today, by using host_mtu=NNN, libvirt will potentially break backwards migration. ie running VM with 2.8 machine type on a QEMU 2.9 host, and then migrate to a QEMU 2.8 host, we're going to silently loose the host_mtu setting which is a guest ABI change. So IMHO we need to revert this host_mtu addition in libvirt & its stable branches, and require an XML triggered opt-in for reporting it. My understanding of host_mtu (which is obviously flawed since simply adding it to the commandline causes the ioport size to change, and I definitely hadn't previously heard *that*) was that the value is only read by the guest driver when it's initialized (ie at system boot time), so it wasn't supposed to cause any operational change during a migration. Of course thinking back, even *that* line of reasoning is flawed, since it depends on the behavior of the current virtio-net driver, which isn't guaranteed.
So just how far do we need to go in making this opt-in? I seriously dislike the idea of requiring the exact MTU explicitly in every individual domain config - this would be very cumbersome if someone wanted to change the MTU for the network. Maybe we could make it opt-in with "<mtu auto='on'/>, which would still enable a change in the MTU of the network/bridge to take effect on guests as they are started.
On the other hand, if the MTU of source and destination networks are different, then the value of host_mtu provided to the guest, although informational-only, would still change during migration. BUT, on the *other* other hand, if we hard-code MTU in each guest config, then as we migrate either the MTU of the bridge on the destination will change (bridges always adopt the smallest MTU of all connected devices), or the MTU of the guest's tap device will change (and be inconsistent with the MTU set in the guest itself).
So how about this - a domain can be configured with one of these:
<mtu auto='on'/>
<mtu size='n'/>
If either is specified and host_mtu is supported, it would be set in the qemu commandline. If auto='on' is specified, then the domain state XML would also report the actual MTU in use:
<mtu auto='on' size='9000'/>
and this is also what would be sent with the migration, thus assuring that the MTU supplied to the guest on the destination (and set for the new tap device on the destination) is the same regardless of the MTU of the network. In the meantime, since only auto='on' is in the persistent config, any time the guest is cold-started, it will set the optimum MTU based on the network on the current host (so a change in MTU of a network will properly propagate to *newly started* guests, but those already running will be unaffected).
I posted a patch to revert the original offending patch upstream (I'll get to reimplementing it as an opt-in feature later): https://www.redhat.com/archives/libvir-list/2017-May/msg00786.html Pushed upstream:
commit 77780a29edace958a1f931d3281b962be4f5290e
Author: Laine Stump <laine>
Date: Thu May 18 14:16:27 2017 -0400
Revert "qemu: propagate bridge MTU into qemu "host_mtu" option"
This reverts commit 2841e675.
Just noting it here for posterity, and in case someone hits it. *IF* you're not using the libvirt build libvirt-3.2.0-6.el7 on the destination RHEL 7.4 host, live migration will fail with the following error messages: ----------------------------------------------------------------------- 2017-05-26 10:24:45.631+0000: 8790: error : virNetClientProgramDispatchError:177 : internal error: qemu unexpectedly closed the monitor: 2017-05-26T10:24:45.180184Z qemu-kvm: -chardev pty,id =charserial1: char device redirected to /dev/pts/2 (label charserial1) 2017-05-26T10:24:45.219273Z qemu-kvm: warning: Unknown firmware file in legacy mode: etc/msr_feature_control 2017-05-26T10:24:45.430363Z qemu-kvm: get_pci_config_device: Bad config data: i=0x10 read: 61 device: 1 cmask: ff wmask: c0 w1cmask:0 2017-05-26T10:24:45.430399Z qemu-kvm: Failed to load PCIDevice:config 2017-05-26T10:24:45.430403Z qemu-kvm: Failed to load virtio-net:virtio 2017-05-26T10:24:45.430409Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-net' 2017-05-26T10:24:45.430588Z qemu-kvm: load of migration failed: Invalid argument ----------------------------------------------------------------------- I can confirm from testing that once you use the build libvirt-3.2.0-6.el7, the above error goes away, and migration succeeds. I confirm this works with libvirt-3.2.0.6.el7 on PPC now. BTW, this problem happens on PPC with libvirt-3.2.0.4.el7 Source host: [7.3] qemu-kvm-rhev-2.6.0-28.el7_3.10.ppc64le libvirt-2.0.0-10.virtcov.el7_3.9.ppc64le kernel-3.10.0-514.21.1.el7.ppc64le RHEL-7.3-20161019.0 Target host:[7.4] libvirt-3.2.0-6.el7.ppc64le qemu-kvm-rhev-2.9.0-6.el7.ppc64le kernel-3.10.0-700.el7.ppc64le RHEL-7.4-20170504.0 # virsh migrate avocado-vt-vm1 --live --verbose qemu+ssh://<target_host>/system Migration: [100 %] Reproduce the bug, then update libvirt, check the PCI ioport size & addresses for virtio-net device keep no changes. 1. Reproduce it on libvirt-3.2.0-5.el7.x86_64 qemu-kvm-rhev-2.8.0-3.el7.x86_64 # lspci -v -s 00:03.0 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device Subsystem: Red Hat, Inc Device 0001 Physical Slot: 3 Flags: bus master, fast devsel, latency 0, IRQ 11 I/O ports at c0a0 [size=32] Memory at fc056000 (32-bit, non-prefetchable) [size=4K] Memory at febf0000 (64-bit, prefetchable) [size=16K] Expansion ROM at fc000000 [disabled] [size=256K] Capabilities: [98] MSI-X: Enable+ Count=3 Masked- Capabilities: [84] Vendor Specific Information: VirtIO: <unknown> Capabilities: [70] Vendor Specific Information: VirtIO: Notify Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg Capabilities: [50] Vendor Specific Information: VirtIO: ISR Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg Kernel driver in use: virtio-pci Kernel modules: virtio_pci 2. update to qemu-kvm-rhev-2.9.0-7.el7.x86_64 on host: # ps aux | grep qemu-kvm qemu 18317 21.9 5.3 1934444 397524 ? Sl 16:21 0:20 /usr/libexec/qemu-kvm -name guest=rhel7,.... -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,**host_mtu=1500**,netdev=hostnet0,id=net0,mac=52:54:00:64:1e:24,bus=pci.0,addr=0x3...... <==="host_mtu=1500" added automatically even no mtu setting in guest's xml on guest: # lspci -v -s 00:03.0 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device Subsystem: Red Hat, Inc Device 0001 Physical Slot: 3 Flags: bus master, fast devsel, latency 0, IRQ 11 I/O ports at c000 [size=64] <=== Memory at fc056000 (32-bit, non-prefetchable) [size=4K] Memory at febf0000 (64-bit, prefetchable) [size=16K] Expansion ROM at fc000000 [disabled] [size=256K] Capabilities: [98] MSI-X: Enable+ Count=3 Masked- Capabilities: [84] Vendor Specific Information: VirtIO: <unknown> Capabilities: [70] Vendor Specific Information: VirtIO: Notify Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg Capabilities: [50] Vendor Specific Information: VirtIO: ISR Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg Kernel driver in use: virtio-pci Kernel modules: virtio_pci 3. update the libvirt to latest libvirt-3.2.0-7.el7 # rpm -q libvirt qemu-kvm-rhev libvirt-3.2.0-7.el7.x86_64 qemu-kvm-rhev-2.9.0-7.el7.x86_64 # virsh destroy rhel7 Domain rhel7 destroyed # systemctl restart libvirtd.service # virsh start rhel7 Domain rhel7 started # ps aux | grep qemu-kvm ....-netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:64:1e:24,bus=pci.0,addr=0x3.... <==== no "host_mtu" on guest: # lspci -v -s 00:03.0 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device Subsystem: Red Hat, Inc Device 0001 Physical Slot: 3 Flags: bus master, fast devsel, latency 0, IRQ 11 I/O ports at c0a0 [size=32] <==== fixed Memory at fc056000 (32-bit, non-prefetchable) [size=4K] Memory at febf0000 (64-bit, prefetchable) [size=16K] Expansion ROM at fc000000 [disabled] [size=256K] Capabilities: [98] MSI-X: Enable+ Count=3 Masked- Capabilities: [84] Vendor Specific Information: VirtIO: <unknown> Capabilities: [70] Vendor Specific Information: VirtIO: Notify Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg Capabilities: [50] Vendor Specific Information: VirtIO: ISR Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg Kernel driver in use: virtio-pci Kernel modules: virtio_pci Scenario 2:
set <mtu size='9000'/> in the guest's xml with latest qemu-kvm-rhev and libvirt, I/O ports size is 64. Then migrate it to rhel7.3.z host. The host_mtu settings in qemu command line and guest's xml will be ignored. For the running guest, the I/O ports size and address will keep the same as this is initialized during boot.
src host: rhel7.4
libvirt-3.2.0-7.el7.x86_64
qemu-kvm-rhev-2.9.0-7.el7.x86_64
dst host: rhel7.3.z
libvirt-2.0.0-10.el7_3.9.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
1. on src host:
# virsh dumpxml rhel7 | grep /interface -B3
<mtu size='9000'/>
....
</interface>
# ps aux | grep qemu-kvm
.... -device virtio-net-pci,host_mtu=9000,netdev=hostnet0,id=net0,mac=52:54:00:64:1e:24,bus=pci.0....
on guest:
# lspci -v -s 00:03.0
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
Subsystem: Red Hat, Inc Device 0001
Physical Slot: 3
Flags: bus master, fast devsel, latency 0, IRQ 11
I/O ports at c000 [size=64]
^^^^^^^^^^
Memory at fc056000 (32-bit, non-prefetchable) [size=4K]
Memory at febf0000 (64-bit, prefetchable) [size=16K]
Expansion ROM at fc000000 [disabled] [size=256K]
Capabilities: [98] MSI-X: Enable+ Count=3 Masked-
Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
Capabilities: [70] Vendor Specific Information: VirtIO: Notify
Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
Capabilities: [50] Vendor Specific Information: VirtIO: ISR
Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
Kernel driver in use: virtio-pci
Kernel modules: virtio_pci
2. migrate to rhel7.3.z
# virsh migrate rhel7 --live --persistent qemu+ssh://server/system --verbose
root@server's password:
Migration: [100 %]
3. After live migration, on the rhel7.3.z target host, check the qemu command line,the host_mtu is ignored, and the <mtu size='9000'/> is ignored in guest's xml.
# ps aux | grep qemu-kvm
....-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:64:1e:24,bus=pci.0,addr=0x3...
# virsh dumpxml rhel7 | grep mtu
#
4. For the running guest on 7.3.z host, address and I/O port size not change
# lspci -v -s 00:03.0
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
Subsystem: Red Hat, Inc Device 0001
Physical Slot: 3
Flags: bus master, fast devsel, latency 0, IRQ 11
I/O ports at c000 [size=64]
^^^^^^^^^
Memory at fc056000 (32-bit, non-prefetchable) [size=4K]
Memory at febf0000 (64-bit, prefetchable) [size=16K]
Expansion ROM at fc000000 [disabled] [size=256K]
Capabilities: [98] MSI-X: Enable+ Count=3 Masked-
Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
Capabilities: [70] Vendor Specific Information: VirtIO: Notify
Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
Capabilities: [50] Vendor Specific Information: VirtIO: ISR
Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
Kernel driver in use: virtio-pci
Kernel modules: virtio_pci
5. destroy and start the guest on target host, the address and I/O port size changed back.
# lspci -v -s 00:03.0
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
Subsystem: Red Hat, Inc Device 0001
Physical Slot: 3
Flags: bus master, fast devsel, latency 0, IRQ 11
I/O ports at c060 [size=32]
^^^^
Memory at fc056000 (32-bit, non-prefetchable) [size=4K]
Memory at febf0000 (64-bit, prefetchable) [size=16K]
Expansion ROM at fc000000 [disabled] [size=256K]
Capabilities: [98] MSI-X: Enable+ Count=3 Masked-
Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
Capabilities: [70] Vendor Specific Information: VirtIO: Notify
Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
Capabilities: [50] Vendor Specific Information: VirtIO: ISR
Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
Kernel driver in use: virtio-pci
Kernel modules: virtio_pci
Scenario 3: set mtu in network or bridge, the target device will inherit the mtu seting, but the guest will not get the mtu setting automatically. 1. set <mtu size='9000'/> in network default 2. start a guest with 1 interface connected to virbr0, the target device will get mtu=9000, yet the guest will not get host_mtu=9000 automatically in the qemu command line. # brctl show bridge name bridge id STP enabled interfaces virbr0 8000.5254009cc162 yes virbr0-nic vnet0 # ifconfig vnet0 vnet0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000 ... # ps aux | grep rhel7 ... -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:64:1e:24,bus=pci.0,addr=0x3 Hi Laine, could you please help to check if the 3 scenarios in #c12, #c13 and #14 is expected? For #c6, if the "reimplementing it as an opt-in feature later" will be addressed in rhel7.4? Yes, the 3 test scenarios produce expected results (Note that scenario 2 shows that migration would fail if you specified an mtu for the device and then migrated to a host whose libvirt or qemu didn't support host_mtu. That's why it needs to be "opt-in" - the admin shouldn't set it until all the nodes in the cluster have the proper libvirt+qemu versions.
As for "reimplementing it as an opt-in feature", It *is* now opt-in, with the additional restriction that you must specify the exact mtu for each device. In a future release (not 7.4) you will be able to just add:
<mtu auto='on'/>
(or something similar), and libvirt will detect the MTU from the network (when possible) an propagate it to host_mtu on the guest.
Thank you Laine, set this bug to be verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 |
Description of problem: Take a guest running QEMU < 2.9.0, with a virtio-net device. Running lspci -v in the guest will show it has ioport size of 32 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device Subsystem: Red Hat, Inc Device 0001 Physical Slot: 3 Flags: bus master, fast devsel, latency 0, IRQ 11 I/O ports at c060 [size=32] ^^^^^^^^ Memory at fc056000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at fc000000 [disabled] [size=256K] Capabilities: <access denied> Kernel driver in use: virtio-pci Kernel modules: virtio_pci Now, upgrade to QEMU == 2.9.0, *keeping* the machine type unchanged. Boot the guest and run lspci -v again. The virtio-net device ioport size has changed to 64 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device Subsystem: Red Hat, Inc Device 0001 Physical Slot: 3 Flags: bus master, fast devsel, latency 0, IRQ 11 I/O ports at c000 [size=64] ^^^^^^^^ Memory at fc056000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at fc000000 [disabled] [size=256K] Capabilities: <access denied> Kernel driver in use: virtio-pci Kernel modules: virtio_pci Not only that, but the I/O addresses of every device change, as a fallout from the size change. git bisect shows that the trigger is the new host_mtu attribute supported by QEMU: commit a93e599d4a04c3cf7edcf5a24f3397e27431c027 Author: Maxime Coquelin <maxime.coquelin> Date: Sat Dec 10 16:30:38 2016 +0100 virtio-net: Add MTU feature support This patch allows advising guest with host MTU's by setting host_mtu parameter. If VIRTIO_NET_F_MTU has been successfully negotiated, MTU value is passed to the backend. Cc: Michael S. Tsirkin <mst> Cc: Aaron Conole <aconole Signed-off-by: Maxime Coquelin <maxime.coquelin> Reviewed-by: Michael S. Tsirkin <mst> Signed-off-by: Michael S. Tsirkin <mst> Enabling the host_mtu attribute causes the ioport size to change for virtio-net. libvirt detects this new host_mtu feature in QEMU and unconditionally enables it for all guests, due to commit 2841e6756d5807a4119e004bc5fb8e7d70806458 Author: Laine Stump <laine> Date: Fri Feb 3 11:55:20 2017 -0500 qemu: propagate bridge MTU into qemu "host_mtu" option libvirt was able to set the host_mtu option when an MTU was explicitly given in the interface config (with <mtu size='n'/>), set the MTU of a libvirt network in the network config (with the same named subelement), and would automatically set the MTU of any tap device to the MTU of the network. This patch ties that all together (for networks based on tap devices and either Linux host bridges or OVS bridges) by learning the MTU of the network (i.e. the bridge) during qemuInterfaceBridgeConnect(), and returning that value so that it can then be passed to qemuBuildNicDevStr(); qemuBuildNicDevStr() then sets host_mtu in the interface's commandline options. The result is that a higher MTU for all guests connecting to a particular network will be plumbed top to bottom by simply changing the MTU of the network (in libvirt's config for libvirt-managed networks, or directly on the bridge device for simple host bridges or OVS bridges managed outside of libvirt). One question I have about this - it occurred to me that in the case of migrating a guest from a host with an older libvirt to one with a newer libvirt, the guest may have *not* had the host_mtu option on the older machine, but *will* have it on the newer machine. I'm curious if this could lead to incompatibilities between source and destination (I guess it all depends on whether or not the setting of host_mtu has a practical effect on a guest that is already running - Maxime?) Likewise, we could run into problems when migrating from a newer libvirt to older libvirt - The guest would have been told of the higher MTU on the newer libvirt, then migrated to a host that didn't understand <mtu size='blah'/>. (If this really is a problem, it would be a problem with or without the current patch). This change in guest ABI will break migration compatibility between old & new QEMU, despite the machine type being unchanged. Version-Release number of selected component (if applicable): libvirt 3.2.0 qemu 2.9.0 How reproducible: Always Steps to Reproduce: 1. Install libvirt 3.2.0 2. Provision a guest with QEMU 2.8.0 (or older) and a virtio-net device present 3. Run 'lspci -v' in the guest 4. Upgrade QEMU to 2.9.0 5. Run 'lspci -v' in the guest again Actual results: The virtio-net ioport size changes from 32 to 64 Expected results: No guest ABI change Additional info: