Bug 2125111
| Summary: | Live block migration fails: QEMU compiled without old-style (blk/-b, inc/-i) block migration | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Han Han <hhan> |
| Component: | libvirt | Assignee: | Peter Krempa <pkrempa> |
| libvirt sub component: | Storage | QA Contact: | Fangge Jin <fjin> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | unspecified | CC: | aliang, coli, fjin, jdenemar, kwolf, lcheng, lmen, meili, pkrempa, vgoyal, virt-maint, yalzhang |
| Version: | 9.2 | Keywords: | AutomationTriaged, Regression, TestBlocker, Triaged |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-8.8.0-1.el9 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-05-09 07:27:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | 8.8.0 |
| Embargoed: | |||
I'll investigate. It could be a side effect of my removal of the pre-blockdev code. Test on qemu layer, it works ok.
Test Env:
kernel version:5.14.0-160.el9.x86_64
qemu-kvm version:qemu-kvm-7.1.0-1.el9
firmware: seabios
Test Steps:
1. Start src guest with qemu cmdline:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox on \
-machine q35,memory-backend=mem-machine_mem \
-device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
-device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x2 \
-m 30720 \
-object memory-backend-ram,size=30720M,id=mem-machine_mem \
-smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2 \
-cpu 'Cascadelake-Server',ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,kvm_pv_unhalt=on \
-chardev socket,wait=off,id=qmp_id_qmpmonitor1,server=on,path=/var/tmp/monitor-qmpmonitor1-20220906-035913-1TcuA2kK \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,wait=off,id=qmp_id_catch_monitor,server=on,path=/var/tmp/monitor-catch_monitor-20220906-035913-1TcuA2kK \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pvpanic,ioport=0x505,id=id64qZoW \
-chardev socket,wait=off,id=chardev_serial0,server=on,path=/var/tmp/serial-serial0-20220906-035913-1TcuA2kK \
-device isa-serial,id=serial0,chardev=chardev_serial0 \
-chardev socket,id=seabioslog_id_20220906-035913-1TcuA2kK,path=/var/tmp/seabios-20220906-035913-1TcuA2kK,server=on,wait=off \
-device isa-debugcon,chardev=seabioslog_id_20220906-035913-1TcuA2kK,iobase=0x402 \
-device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
-device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel920-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-object iothread,id=iothread0 \
-blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
-device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,iothread=iothread0,bus=pcie-root-port-2,addr=0x0 \
-device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x1.0x5,bus=pcie.0,chassis=6 \
-device virtio-net-pci,mac=9a:d9:55:69:81:3b,id=idf65zAl,netdev=idWTkxxw,bus=pcie-root-port-5,addr=0x0 \
-netdev tap,id=idWTkxxw,vhost=on \
-vnc :0 \
-rtc base=utc,clock=host,driftfix=slew \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=7 \
-monitor stdio \
2. Start dst with qemu cmdline:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox on \
-machine q35,memory-backend=mem-machine_mem \
-device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
-device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x2 \
-m 30720 \
-object memory-backend-ram,size=30720M,id=mem-machine_mem \
-smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2 \
-cpu 'Cascadelake-Server',ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,kvm_pv_unhalt=on \
-chardev socket,wait=off,id=qmp_id_qmpmonitor1,server=on,path=/var/tmp/monitor-qmpmonitor1-20220906-035913-1TcuA2kK \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,wait=off,id=qmp_id_catch_monitor,server=on,path=/var/tmp/monitor-catch_monitor-20220906-035913-1TcuA2kK \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pvpanic,ioport=0x505,id=id64qZoW \
-chardev socket,wait=off,id=chardev_serial0,server=on,path=/var/tmp/serial-serial0-20220906-035913-1TcuA2kK \
-device isa-serial,id=serial0,chardev=chardev_serial0 \
-chardev socket,id=seabioslog_id_20220906-035913-1TcuA2kK,path=/var/tmp/seabios-20220906-035913-1TcuA2kK,server=on,wait=off \
-device isa-debugcon,chardev=seabioslog_id_20220906-035913-1TcuA2kK,iobase=0x402 \
-device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
-device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/mirror.qcow2,cache.direct=on,cache.no-flush=off \
-object iothread,id=iothread0 \
-blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
-device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,iothread=iothread0,bus=pcie-root-port-2,addr=0x0 \
-device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x1.0x5,bus=pcie.0,chassis=6 \
-device virtio-net-pci,mac=9a:d9:55:69:81:3b,id=idf65zAl,netdev=idWTkxxw,bus=pcie-root-port-5,addr=0x0 \
-netdev tap,id=idWTkxxw,vhost=on \
-vnc :0 \
-rtc base=utc,clock=host,driftfix=slew \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=7 \
-monitor stdio \
-incoming defer \
3. In dst host, start nbd server and expose image
{ "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet", "data": { "host": "10.73.114.141", "port": "3333" } } } }
{"execute":"block-export-add","arguments":{"id": "export0", "node-name": "drive_image1", "type": "nbd", "writable": true}}
{"return": {}
4. In src host, add target image and do mirror from src to dst
{"execute":"blockdev-add","arguments":{"driver":"nbd","node-name":"mirror","server":{"type":"inet","host":"10.73.114.141","port":"3333"},"export":"drive_image1"}}
{ "execute": "blockdev-mirror", "arguments": { "device": "drive_image1","target": "mirror", "sync": "top","job-id":"j1" } }
{"timestamp": {"seconds": 1662633312, "microseconds": 512543}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}}
{"timestamp": {"seconds": 1662633312, "microseconds": 512601}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"return": {}}
{"timestamp": {"seconds": 1662633375, "microseconds": 366495}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}}
{"timestamp": {"seconds": 1662633375, "microseconds": 366586}, "event": "BLOCK_JOB_READY", "data": {"device": "j1", "len": 5397676032, "offset": 5397676032, "speed": 0, "type": "mirror"}}
5. After mirror reach ready status, set migration capabilities in both src and dst.
src: {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true},{"capability":"pause-before-switchover","state":true}]}}
dst: {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true},{"capability":"late-block-activate","state":true}]}}
{"return": {}}
{"execute": "migrate-incoming","arguments": {"uri": "tcp:[::]:5000"}}
6. Migrate from src to dst
{"execute": "migrate","arguments":{"uri": "tcp:10.73.114.141:5000"}}
{"timestamp": {"seconds": 1662633564, "microseconds": 471985}, "event": "MIGRATION", "data": {"status": "setup"}}
{"return": {}}
{"timestamp": {"seconds": 1662633564, "microseconds": 480129}, "event": "MIGRATION_PASS", "data": {"pass": 1}}
{"timestamp": {"seconds": 1662633564, "microseconds": 480206}, "event": "MIGRATION", "data": {"status": "active"}}
{"timestamp": {"seconds": 1662633585, "microseconds": 418560}, "event": "MIGRATION_PASS", "data": {"pass": 2}}
{"timestamp": {"seconds": 1662633585, "microseconds": 818248}, "event": "MIGRATION_PASS", "data": {"pass": 3}}
{"timestamp": {"seconds": 1662633586, "microseconds": 45897}, "event": "MIGRATION_PASS", "data": {"pass": 4}}
{"timestamp": {"seconds": 1662633586, "microseconds": 98249}, "event": "MIGRATION_PASS", "data": {"pass": 5}}
{"timestamp": {"seconds": 1662633586, "microseconds": 102244}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "j1"}}
{"timestamp": {"seconds": 1662633586, "microseconds": 102377}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}}
{"timestamp": {"seconds": 1662633586, "microseconds": 122347}, "event": "STOP"}
{"timestamp": {"seconds": 1662633586, "microseconds": 122369}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "j1"}}
{"timestamp": {"seconds": 1662633586, "microseconds": 122402}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}}
{"timestamp": {"seconds": 1662633586, "microseconds": 125442}, "event": "MIGRATION", "data": {"status": "pre-switchover"}}
7. Cancel mirror job
{"execute":"block-job-cancel","arguments":{"device":"j1"}}
{"return": {}}
{"timestamp": {"seconds": 1662633756, "microseconds": 388842}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "j1"}}
{"timestamp": {"seconds": 1662633756, "microseconds": 388880}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "j1"}}
{"timestamp": {"seconds": 1662633756, "microseconds": 388955}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 5399183360, "offset": 5399183360, "speed": 0, "type": "mirror"}}
{"timestamp": {"seconds": 1662633756, "microseconds": 388980}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}}
{"timestamp": {"seconds": 1662633756, "microseconds": 389002}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}}
8. Continue migrate
{"execute":"migrate-continue","arguments":{"state":"pre-switchover"}}
{"return": {}}
{"timestamp": {"seconds": 1662633781, "microseconds": 376912}, "event": "MIGRATION", "data": {"status": "device"}}
{"timestamp": {"seconds": 1662633781, "microseconds": 378394}, "event": "MIGRATION_PASS", "data": {"pass": 6}}
{"timestamp": {"seconds": 1662633781, "microseconds": 538015}, "event": "MIGRATION", "data": {"status": "completed"}}
9. Check dst vm status
(qemu)info status
VM status: running
Yes I've already found the bug in libvirt. A refactor of the migration setup code moved around some code and a block of code enabling NBD accidentally became conditional on whether the user passed an explicit list of disks to migrate (the '--migrate-disks' argument for virsh migrate). If the list of disks is empty NBD is not enabled and thus the code attempted fallback. The rest of the libvirt code should work properly so if --migrate-disks is passed the migration should pass. I'll be posting patches soon for fixing the case when --migrate-disks is not passed. Fixed upstream:
commit 83ffeae75a0dfe8077d5e98f8a48615c62db0284
Author: Peter Krempa <pkrempa>
Date: Thu Sep 8 11:55:08 2022 +0200
qemu: migration: Fix setup of non-shared storage migration in qemuMigrationSrcBeginPhase
In commit 6111b2352242e9 removing pre-blockdev code paths I've
improperly refactored the setup of non-shared storage migration.
Specifically the code checking that there are disks and setting up the
NBD data in the migration cookie was originally outside of the loop
checking the user provided list of specific disks to migrate, but became
part of the block as it was not un-indented when a higher level block
was being removed.
The above caused that if non-shared storage migration is requested, but
the user doesn't provide the list of disks to migrate (thus implying to
migrate every appropriate disk) the code doesn't actually setup the
migration and then later on falls back to the old-style migration which
no longer works with blockdev.
Move the check that there's anything to migrate out of the
'nmigrate_disks' block.
Fixes: 6111b2352242e93c6d2c29f9549d596ed1056ce5
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2125111
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/373
Signed-off-by: Peter Krempa <pkrempa>
Reviewed-by: Ján Tomko <jtomko>
v8.7.0-93-g83ffeae75a
Pre-verified on libvirt-8.8.0-1.el9.x86_64 Verified with libvirt-8.9.0-2.el9.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2171 |
Description of problem: As subject Version-Release number of selected component (if applicable): libvirt-8.7.0-1.el9.x86_64 qemu-kvm-7.1.0-1.el9.x86_64 How reproducible: 100% Steps to Reproduce: 1. Start an VM with an local disk 2. Migrate the VM to another host with --copy-storage-all ➜ ~ virsh migrate cephfs qemu+ssh://root@hhan-rhel9--1/system --live --p2p --copy-storage-all error: internal error: unable to execute QEMU command 'migrate': QEMU compiled without old-style (blk/-b, inc/-i) block migration The error log from QMP: 3.685 > 0x7f6e1c082460 {"execute":"query-migrate","id":"libvirt-428"} 3.686 < 0x7f6e1c082460 {"return": {}, "id": "libvirt-428"} 3.985 > 0x7f6e1c082460 {"execute":"query-migrate","id":"libvirt-429"} 3.986 < 0x7f6e1c082460 {"return": {}, "id": "libvirt-429"} 5.052 > 0x7f6e1c082460 {"execute":"query-migrate-parameters","id":"libvirt-430"} 5.053 < 0x7f6e1c082460 {"return": {"cpu-throttle-tailslow": false, "xbzrle-cache-size": 67108864, "cpu-throttle-initial": 20, "announce-max": 550, "decompress-threads": 2, "compress-threads": 8, "compress-level": 1, "multifd-channels": 2, "multifd-zstd-level": 1, "announce-initial": 50, "block-incremental": false, "compress-wait-thread": true, "downtime-limit": 300, "tls-authz": "", "multifd-compression": "none", "announce-rounds": 5, "announce-step": 100, "tls-creds": "", "multifd-zlib-level": 1, "max-cpu-throt 5.053 > 0x7f6e1c082460 {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"xbzrle","state":false},{"capability":"auto-converge","state":false},{"capability":"rdma-pin-all","state":false},{"capability":"postcopy-ram","state":false},{"capability":"compress","state":false},{"capability":"pause-before-switchover","state":true},{"capability":"late-block-activate","state":false},{"capability":"multifd","state":false},{"capability":"dirty-bitmaps","state":false},{"capability":"return-path" 5.059 < 0x7f6e1c082460 {"return": {}, "id": "libvirt-431"} 5.059 > 0x7f6e1c082460 {"execute":"migrate-set-parameters","arguments":{"tls-creds":"","tls-hostname":"","max-bandwidth":9223372036853727232},"id":"libvirt-432"} 5.061 < 0x7f6e1c082460 {"return": {}, "id": "libvirt-432"} 5.063 > 0x7f6e1c082460 {"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-433"} (fd=28) 5.064 < 0x7f6e1c082460 {"return": {}, "id": "libvirt-433"} 5.064 > 0x7f6e1c082460 {"execute":"migrate","arguments":{"detach":true,"blk":true,"inc":false,"resume":false,"uri":"fd:migrate"},"id":"libvirt-434"} 5.066 < 0x7f6e1c082460 {"id": "libvirt-434", "error": {"class": "GenericError", "desc": "QEMU compiled without old-style (blk/-b, inc/-i) block migration"}} 5.066 > 0x7f6e1c082460 {"execute":"closefd","arguments":{"fdname":"migrate"},"id":"libvirt-435"} Actual results: As above Expected results: No error Additional info: The error is from https://gitlab.com/qemu-project/qemu/-/blob/master/migration/migration.c#L1237 It works on libvirt-8.5.0-6.el9.x86_64 qemu-kvm-7.0.0-12.el9.x86_64