Bug 1623157
| Summary: | Domain ABI stability check must forbid host MTU changes on NICs | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Daniel Berrangé <berrange> | |
| Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | |
| Status: | CLOSED ERRATA | QA Contact: | Luyao Huang <lhuang> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 7.6 | CC: | dyuan, fjin, lmen, mtessun, stephenfin, xuzhang, yalzhang | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | libvirt-4.5.0-9.el7 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1623158 (view as bug list) | Environment: | ||
| Last Closed: | 2018-10-30 09:58:28 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1623158 | |||
Patch posted upstream: https://www.redhat.com/archives/libvir-list/2018-August/msg01873.html Verify this bug with libvirt-4.5.0-9.el7.x86_64:
1. prepare two host for migration
2. start a guest which have a vNIC set mtu:
# virsh dumpxml vm1
...
<interface type='network'>
<mac address='52:54:00:18:0b:27'/>
<source network='default' bridge='virbr0'/>
<target dev='vnet0'/>
<model type='rtl8139'/>
<mtu size='5000'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</interface>
...
3. dump guest migratable xml and change vNIC's mtu value from 5000 to 5500:
# virsh dumpxml vm1 --migratable > /tmp/mig.xml
# sed "s/<mtu size='5000'/<mtu size='5500'/g" /tmp/mig.xml > /tmp/mig-new.xml
4. migrate guest to target host with updated xml:
# virsh migrate vm1 qemu+ssh://target/system --live --unsafe --xml /tmp/mig-new.xml
error: unsupported configuration: Target network card MTU 5500 does not match source 5000
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3113 |
Description of problem: When the host MTU option is set for a QEMU NIC, it enables a new optional virtio-net feature. This in turn makes the PCI config space larger which is a guest ABI visible change. This has already required two changes in libvirt to protect users commit 77780a29edace958a1f931d3281b962be4f5290e Author: Laine Stump <laine> Date: Thu May 18 14:16:27 2017 -0400 Revert "qemu: propagate bridge MTU into qemu "host_mtu" option" This reverts commit 2841e675. It turns out that adding the host_mtu field to the PCI capabilities in the guest bumps the length of PCI capabilities beyond the 32 byte boundary, so the virtio-net device gets 64 bytes of ioport space instead of 32, which offsets the address of all the other following devices. Migration doesn't work very well when the location and length of PCI capabilities of devices is changed between source and destination. This means that we need to make sure that the absence/presence of host_mtu on the qemu commandline always matches between source and destination, which means that we need to make setting of host_mtu an opt-in thing (it can't happen automatically when the bridge being used has a non-default MTU, which is what commit 2841e675 implemented). I do want to re-implement this feature with an <mtu auto='on'/> setting, but probably won't backport that to any stable branches, so I'm first reverting the original commit, and that revert can be pushed to the few releases that have been made since the original (3.1.0 - 3.3.0) Resolves: https://bugzilla.redhat.com/1449346 commit 5f44d7e357f61f7be636a0e2e6d35453cbc3b589 Author: Michal Privoznik <mprivozn> Date: Thu Jun 8 13:45:31 2017 +0200 qemuDomainChangeNet: Forbid changing MTU https://bugzilla.redhat.com/show_bug.cgi?id=1447618 Currently, any attempt to change MTU on an interface that is plugged to a running domain is silently ignored. We should either do what's asked or error out. Well, we can update the host side of the interface, but we cannot change 'host_mtu' attribute for the virtio-net device. Therefore we have to error out. Signed-off-by: Michal Privoznik <mprivozn> Reviewed-by: Laine Stump <laine> This failed to protect the live migration scenario. The user have provide new XML during migration which sets host MTU where it was not set on the source. QEMU on the target host will see a corrupt migration data stream due to change PCI config space This has been hit during development in OpenStack when they're trying to change network backends across migration. source node [root@devstack2 devstack]# cat /var/log/libvirt/qemu/instance-00000008.log 2018-08-21 15:05:11.938+0000: starting up libvirt version: 3.9.0, package: 14.el7_5.6 (CentOS BuildSystem <http://bugs.centos.org>, 2018-06-27-14:13:57, x86-01.bsys.centos.org), qemu version: 2.10.0(qemu-kvm-ev-2.10.0-21.el7_5.4.1), hostname: devstack2 LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-00000008,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-instance-00000008/master-key.aes -machine pc-i440fx-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off -cpu host -m 256 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/2-instance-00000008,share=yes,size=268435456,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -uuid fead1ca6-beab-4c47-a73e-a3ab7f7c4de2 -smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=18.0.0,serial=0ea36e80-498c-4cc8-9359-61947acad230,uuid=fead1ca6-beab-4c47-a73e-a3ab7f7c4de2,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-instance-00000008/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/sda,format=raw,if=none,id=drive-virtio-disk0,serial=54fe3713-14a0-46a8-a2d1-7dbc16ae7942,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=30 -device virtio-net-pci,host_mtu=1450,netdev=hostnet0,id=net0,mac=fa:16:3e:00:90:8d,bus=pci.0,addr=0x3 -add-fd set=2,fd=32 -chardev pty,id=charserial0,logfile=/dev/fdset/2,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -vnc 0.0.0.0:0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on 2018-08-21 15:05:11.938+0000: Domain id=2 is tainted: host-cpu 2018-08-21T15:05:12.057301Z qemu-kvm: -chardev pty,id=charserial0,logfile=/dev/fdset/2,logappend=on: char device redirected to /dev/pts/1 (label charserial0) 2018-08-21T15:05:12.076514Z qemu-kvm: -drive file=/dev/sda,format=raw,if=none,id=drive-virtio-disk0,serial=54fe3713-14a0-46a8-a2d1-7dbc16ae7942,cache=none,aio=native: 'serial' is deprecated, please use the corresponding option of '-device' instead 2018-08-21 15:19:16.891+0000: initiating migration dest [root@devstack5 devstack]# cat /var/log/libvirt/qemu/instance-00000008.log 2018-08-21 15:19:15.062+0000: starting up libvirt version: 3.9.0, package: 14.el7_5.6 (CentOS BuildSystem <http://bugs.centos.org>, 2018-06-27-14:13:57, x86-01.bsys.centos.org), qemu version: 2.10.0(qemu-kvm-ev-2.10.0-21.el7_5.4.1), hostname: devstack5 LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-00000008,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-8-instance-00000008/master-key.aes -machine pc-i440fx-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off -cpu host -m 256 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/8-instance-00000008,share=yes,size=268435456,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -uuid fead1ca6-beab-4c47-a73e-a3ab7f7c4de2 -smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=18.0.0,serial=0ea36e80-498c-4cc8-9359-61947acad230,uuid=fead1ca6-beab-4c47-a73e-a3ab7f7c4de2,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-8-instance-00000008/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/sda,format=raw,if=none,id=drive-virtio-disk0,serial=54fe3713-14a0-46a8-a2d1-7dbc16ae7942,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charnet0,path=/var/run/openvswitch/vhuef02ea3f-9a,server -netdev vhost-user,chardev=charnet0,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:00:90:8d,bus=pci.0,addr=0x3 -add-fd set=0,fd=31 -chardev pty,id=charserial0,logfile=/dev/fdset/0,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -vnc 0.0.0.0:0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on 2018-08-21 15:19:15.062+0000: Domain id=8 is tainted: host-cpu 2018-08-21T15:19:15.187710Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhuef02ea3f-9a,server: info: QEMU waiting for connection on: disconnected:unix:/var/run/openvswitch/vhuef02ea3f-9a,server 2018-08-21T15:19:15.759589Z qemu-kvm: -chardev pty,id=charserial0,logfile=/dev/fdset/0,logappend=on: char device redirected to /dev/pts/2 (label charserial0) 2018-08-21T15:19:16.797822Z qemu-kvm: -drive file=/dev/sda,format=raw,if=none,id=drive-virtio-disk0,serial=54fe3713-14a0-46a8-a2d1-7dbc16ae7942,cache=none,aio=native: 'serial' is deprecated, please use the corresponding option of '-device' instead 2018-08-21T15:19:16.997543Z qemu-kvm: Features 0x300fffe3 unsupported. Allowed features: 0x150bf81a6 2018-08-21T15:19:16.997589Z qemu-kvm: Failed to load virtio-net:virtio 2018-08-21T15:19:16.997604Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-net' 2018-08-21T15:19:16.998185Z qemu-kvm: load of migration failed: Operation not permitted 2018-08-21 15:19:17.238+0000: shutting down, reason=failed What's missing is a check in the domain ABI stability APIs for NIC host MTU. Version-Release number of selected component (if applicable): libvirt-4.5.0-4.el7 How reproducible: Always Steps to Reproduce: 1. Create a guest without host MTU set 2. Migrate to a new host, passing updated XML with host MTU set 3. Actual results: Dest QEMU crashes due to corrupt data stream Expected results: Migration is forbidden by source libvirt Additional info: