Bug 2125111
Summary: | Live block migration fails: QEMU compiled without old-style (blk/-b, inc/-i) block migration | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Han Han <hhan> |
Component: | libvirt | Assignee: | Peter Krempa <pkrempa> |
libvirt sub component: | Storage | QA Contact: | Fangge Jin <fjin> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | aliang, coli, fjin, jdenemar, kwolf, lcheng, lmen, meili, pkrempa, vgoyal, virt-maint, yalzhang |
Version: | 9.2 | Keywords: | AutomationTriaged, Regression, TestBlocker, Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-8.8.0-1.el9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-05-09 07:27:05 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | 8.8.0 |
Embargoed: |
Description
Han Han
2022-09-08 03:21:35 UTC
I'll investigate. It could be a side effect of my removal of the pre-blockdev code. Test on qemu layer, it works ok. Test Env: kernel version:5.14.0-160.el9.x86_64 qemu-kvm version:qemu-kvm-7.1.0-1.el9 firmware: seabios Test Steps: 1. Start src guest with qemu cmdline: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine q35,memory-backend=mem-machine_mem \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 30720 \ -object memory-backend-ram,size=30720M,id=mem-machine_mem \ -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2 \ -cpu 'Cascadelake-Server',ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,kvm_pv_unhalt=on \ -chardev socket,wait=off,id=qmp_id_qmpmonitor1,server=on,path=/var/tmp/monitor-qmpmonitor1-20220906-035913-1TcuA2kK \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,wait=off,id=qmp_id_catch_monitor,server=on,path=/var/tmp/monitor-catch_monitor-20220906-035913-1TcuA2kK \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=id64qZoW \ -chardev socket,wait=off,id=chardev_serial0,server=on,path=/var/tmp/serial-serial0-20220906-035913-1TcuA2kK \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20220906-035913-1TcuA2kK,path=/var/tmp/seabios-20220906-035913-1TcuA2kK,server=on,wait=off \ -device isa-debugcon,chardev=seabioslog_id_20220906-035913-1TcuA2kK,iobase=0x402 \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel920-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -object iothread,id=iothread0 \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,iothread=iothread0,bus=pcie-root-port-2,addr=0x0 \ -device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x1.0x5,bus=pcie.0,chassis=6 \ -device virtio-net-pci,mac=9a:d9:55:69:81:3b,id=idf65zAl,netdev=idWTkxxw,bus=pcie-root-port-5,addr=0x0 \ -netdev tap,id=idWTkxxw,vhost=on \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=7 \ -monitor stdio \ 2. Start dst with qemu cmdline: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine q35,memory-backend=mem-machine_mem \ -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \ -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x2 \ -m 30720 \ -object memory-backend-ram,size=30720M,id=mem-machine_mem \ -smp 10,maxcpus=10,cores=5,threads=1,dies=1,sockets=2 \ -cpu 'Cascadelake-Server',ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,hle=off,rtm=off,kvm_pv_unhalt=on \ -chardev socket,wait=off,id=qmp_id_qmpmonitor1,server=on,path=/var/tmp/monitor-qmpmonitor1-20220906-035913-1TcuA2kK \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,wait=off,id=qmp_id_catch_monitor,server=on,path=/var/tmp/monitor-catch_monitor-20220906-035913-1TcuA2kK \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=id64qZoW \ -chardev socket,wait=off,id=chardev_serial0,server=on,path=/var/tmp/serial-serial0-20220906-035913-1TcuA2kK \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20220906-035913-1TcuA2kK,path=/var/tmp/seabios-20220906-035913-1TcuA2kK,server=on,wait=off \ -device isa-debugcon,chardev=seabioslog_id_20220906-035913-1TcuA2kK,iobase=0x402 \ -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \ -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/mirror.qcow2,cache.direct=on,cache.no-flush=off \ -object iothread,id=iothread0 \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,iothread=iothread0,bus=pcie-root-port-2,addr=0x0 \ -device pcie-root-port,id=pcie-root-port-5,port=0x5,addr=0x1.0x5,bus=pcie.0,chassis=6 \ -device virtio-net-pci,mac=9a:d9:55:69:81:3b,id=idf65zAl,netdev=idWTkxxw,bus=pcie-root-port-5,addr=0x0 \ -netdev tap,id=idWTkxxw,vhost=on \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=7 \ -monitor stdio \ -incoming defer \ 3. In dst host, start nbd server and expose image { "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet", "data": { "host": "10.73.114.141", "port": "3333" } } } } {"execute":"block-export-add","arguments":{"id": "export0", "node-name": "drive_image1", "type": "nbd", "writable": true}} {"return": {} 4. In src host, add target image and do mirror from src to dst {"execute":"blockdev-add","arguments":{"driver":"nbd","node-name":"mirror","server":{"type":"inet","host":"10.73.114.141","port":"3333"},"export":"drive_image1"}} { "execute": "blockdev-mirror", "arguments": { "device": "drive_image1","target": "mirror", "sync": "top","job-id":"j1" } } {"timestamp": {"seconds": 1662633312, "microseconds": 512543}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}} {"timestamp": {"seconds": 1662633312, "microseconds": 512601}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}} {"return": {}} {"timestamp": {"seconds": 1662633375, "microseconds": 366495}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}} {"timestamp": {"seconds": 1662633375, "microseconds": 366586}, "event": "BLOCK_JOB_READY", "data": {"device": "j1", "len": 5397676032, "offset": 5397676032, "speed": 0, "type": "mirror"}} 5. After mirror reach ready status, set migration capabilities in both src and dst. src: {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true},{"capability":"pause-before-switchover","state":true}]}} dst: {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true},{"capability":"late-block-activate","state":true}]}} {"return": {}} {"execute": "migrate-incoming","arguments": {"uri": "tcp:[::]:5000"}} 6. Migrate from src to dst {"execute": "migrate","arguments":{"uri": "tcp:10.73.114.141:5000"}} {"timestamp": {"seconds": 1662633564, "microseconds": 471985}, "event": "MIGRATION", "data": {"status": "setup"}} {"return": {}} {"timestamp": {"seconds": 1662633564, "microseconds": 480129}, "event": "MIGRATION_PASS", "data": {"pass": 1}} {"timestamp": {"seconds": 1662633564, "microseconds": 480206}, "event": "MIGRATION", "data": {"status": "active"}} {"timestamp": {"seconds": 1662633585, "microseconds": 418560}, "event": "MIGRATION_PASS", "data": {"pass": 2}} {"timestamp": {"seconds": 1662633585, "microseconds": 818248}, "event": "MIGRATION_PASS", "data": {"pass": 3}} {"timestamp": {"seconds": 1662633586, "microseconds": 45897}, "event": "MIGRATION_PASS", "data": {"pass": 4}} {"timestamp": {"seconds": 1662633586, "microseconds": 98249}, "event": "MIGRATION_PASS", "data": {"pass": 5}} {"timestamp": {"seconds": 1662633586, "microseconds": 102244}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "j1"}} {"timestamp": {"seconds": 1662633586, "microseconds": 102377}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}} {"timestamp": {"seconds": 1662633586, "microseconds": 122347}, "event": "STOP"} {"timestamp": {"seconds": 1662633586, "microseconds": 122369}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "j1"}} {"timestamp": {"seconds": 1662633586, "microseconds": 122402}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}} {"timestamp": {"seconds": 1662633586, "microseconds": 125442}, "event": "MIGRATION", "data": {"status": "pre-switchover"}} 7. Cancel mirror job {"execute":"block-job-cancel","arguments":{"device":"j1"}} {"return": {}} {"timestamp": {"seconds": 1662633756, "microseconds": 388842}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "j1"}} {"timestamp": {"seconds": 1662633756, "microseconds": 388880}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "j1"}} {"timestamp": {"seconds": 1662633756, "microseconds": 388955}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 5399183360, "offset": 5399183360, "speed": 0, "type": "mirror"}} {"timestamp": {"seconds": 1662633756, "microseconds": 388980}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}} {"timestamp": {"seconds": 1662633756, "microseconds": 389002}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}} 8. Continue migrate {"execute":"migrate-continue","arguments":{"state":"pre-switchover"}} {"return": {}} {"timestamp": {"seconds": 1662633781, "microseconds": 376912}, "event": "MIGRATION", "data": {"status": "device"}} {"timestamp": {"seconds": 1662633781, "microseconds": 378394}, "event": "MIGRATION_PASS", "data": {"pass": 6}} {"timestamp": {"seconds": 1662633781, "microseconds": 538015}, "event": "MIGRATION", "data": {"status": "completed"}} 9. Check dst vm status (qemu)info status VM status: running Yes I've already found the bug in libvirt. A refactor of the migration setup code moved around some code and a block of code enabling NBD accidentally became conditional on whether the user passed an explicit list of disks to migrate (the '--migrate-disks' argument for virsh migrate). If the list of disks is empty NBD is not enabled and thus the code attempted fallback. The rest of the libvirt code should work properly so if --migrate-disks is passed the migration should pass. I'll be posting patches soon for fixing the case when --migrate-disks is not passed. Fixed upstream: commit 83ffeae75a0dfe8077d5e98f8a48615c62db0284 Author: Peter Krempa <pkrempa> Date: Thu Sep 8 11:55:08 2022 +0200 qemu: migration: Fix setup of non-shared storage migration in qemuMigrationSrcBeginPhase In commit 6111b2352242e9 removing pre-blockdev code paths I've improperly refactored the setup of non-shared storage migration. Specifically the code checking that there are disks and setting up the NBD data in the migration cookie was originally outside of the loop checking the user provided list of specific disks to migrate, but became part of the block as it was not un-indented when a higher level block was being removed. The above caused that if non-shared storage migration is requested, but the user doesn't provide the list of disks to migrate (thus implying to migrate every appropriate disk) the code doesn't actually setup the migration and then later on falls back to the old-style migration which no longer works with blockdev. Move the check that there's anything to migrate out of the 'nmigrate_disks' block. Fixes: 6111b2352242e93c6d2c29f9549d596ed1056ce5 Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2125111 Resolves: https://gitlab.com/libvirt/libvirt/-/issues/373 Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko> v8.7.0-93-g83ffeae75a Pre-verified on libvirt-8.8.0-1.el9.x86_64 Verified with libvirt-8.9.0-2.el9.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (libvirt bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2171 |