Created attachment 1507874 [details] gdb_debug_info-11222018 Description of problem: When migrate guest with dirty bitmap based on shared storage, it would fail with following error: src guest hmp: # ./vm00.sh QEMU 3.0.92 monitor - type 'help' for more information (qemu) qemu-system-ppc64: Can't migrate a bitmap that is in use by another operation: 'bitmap0' dst guest hmp: # ./vm00-mig.sh QEMU 3.0.92 monitor - type 'help' for more information (qemu) qemu-system-ppc64: Unable to read node name string qemu-system-ppc64: error while loading state for instance 0x0 of device 'dirty-bitmap' qemu-system-ppc64: load of migration failed: Invalid argument Version-Release number of selected component (if applicable): Host kernel: 4.18.0-32.el8.ppc64le (src) 4.18.0-40.el8.ppc64le (dst) Qemu: v3.1.0-rc2-dirty How reproducible: 100% Steps to Reproduce: 1. On both src and dst hosts, boot up guests with only a system disk, the src one is pre-installed, while the dst one is on a new created image: -blockdev node-name=disk0,file.driver=file,driver=qcow2,file.filename=/home/rhel80-ppc64le-upstream.qcow2 \ -device scsi-hd,drive=disk0,id=image0,bootindex=0 \ 2. On both hosts, set migration capabilities in the qmp connections: # nc -U /var/tmp/avocado1 {"QMP": {"version": {"qemu": {"micro": 92, "minor": 0, "major": 3}, "package": "v3.1.0-rc2-dirty"}, "capabilities": []}} {"execute":"qmp_capabilities"} {"return": {}} {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true},{"capability":"dirty-bitmaps","state":true},{"capability":"pause-before-switchover","state":true}]}} {"return": {}} 3. On dst host, start ndb server and add the export: { "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet","data": { "host": "10.0.1.44", "port": "3333" } } } } {"return": {}} { "execute": "nbd-server-add", "arguments":{ "device": "disk0", "writable": true } } 4. On src host, add dirty bitmap bitmap0: { "execute": "block-dirty-bitmap-add", "arguments": {"node": "disk0", "name": "bitmap0"}} {"return": {}} 5. On src host, do block-mirror: {"execute":"blockdev-add","arguments":{"driver":"nbd","node-name":"mirror","server":{"type":"inet","host":"10.0.1.44","port":"3333"},"export":"disk0"}} {"return": {}} {"execute": "blockdev-mirror", "arguments": { "device": "disk0","target": "mirror", "sync": "full", "job-id":"j1"}} {"timestamp": {"seconds": 1542856270, "microseconds": 804881}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}} {"timestamp": {"seconds": 1542856270, "microseconds": 804944}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}} {"return": {}} {"timestamp": {"seconds": 1542856314, "microseconds": 839507}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}} {"timestamp": {"seconds": 1542856314, "microseconds": 839567}, "event": "BLOCK_JOB_READY", "data": {"device": "j1", "len": 21474836480, "offset": 21474836480, "speed": 0, "type": "mirror"}} {"execute": "migrate","arguments":{"uri": "tcp:10.0.1.44:5200"}} 6. On src host, when the mirror job reaches ready status, begin to do the migrate: {"execute": "migrate","arguments":{"uri": "tcp:10.0.1.44:5200"}} {"timestamp": {"seconds": 1542856407, "microseconds": 47629}, "event": "MIGRATION", "data": {"status": "setup"}} {"return": {}} {"timestamp": {"seconds": 1542856407, "microseconds": 51688}, "event": "MIGRATION_PASS", "data": {"pass": 1}} {"timestamp": {"seconds": 1542856407, "microseconds": 51772}, "event": "MIGRATION", "data": {"status": "active"}} {"timestamp": {"seconds": 1542856407, "microseconds": 51805}, "event": "MIGRATION", "data": {"status": "failed"}} {"timestamp": {"seconds": 1542856419, "microseconds": 571231}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1542856419, "microseconds": 571517}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "j1"}} {"timestamp": {"seconds": 1542856419, "microseconds": 571607}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 21475229696, "offset": 21475098624, "speed": 0, "type": "mirror", "error": "Input/output error"}} {"timestamp": {"seconds": 1542856419, "microseconds": 571647}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}} {"timestamp": {"seconds": 1542856419, "microseconds": 571679}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}} Actual results: In step6, the migration is a failure. From hmp output of the src guest, it seems the bitmap is in use. Expected results: In step6, the migration is a success. Additional info: If turn to quit the guest in src side, it gets a core dump as follows; please refer to attachment gdb_debug_info-11222018 for details: # ./vm00.sh QEMU 3.0.92 monitor - type 'help' for more information (qemu) qemu-system-ppc64: Can't migrate a bitmap that is in use by another operation: 'bitmap0' (qemu) q qemu-system-ppc64: block.c:3526: bdrv_close_all: Assertion `QTAILQ_EMPTY(&all_bdrv_states)' failed. ./vm00.sh: line 23: 104628 Aborted (core dumped) /home/qemu/ppc64-softmmu/qemu-system-ppc64 -name 'avocado-vt-vm1' -machine pseries -object secret,id=sec0,data=redhat -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado1,server,nowait -mon chardev=qmp_id_qmpmonitor1,mode=control -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,id=qemu-ga0,name=org.qemu.guest_agent.0 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x6 -blockdev node-name=disk0,file.driver=file,driver=qcow2,file.filename=/home/rhel80-ppc64le-upstream.qcow2 -device scsi-hd,drive=disk0,id=image0,bootindex=0 -device virtio-net-pci,mac=9a:78:79:7a:7b :6a,id=id8e5D72,vectors=4,netdev=idrYUYaH,bus=pci.0,addr=0x3 -netdev tap,id=idrYUYaH,vhost=on -m 1024 -smp 2,maxcpus=2,cores=2,threads=1,sockets=1 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :20 -rtc base=localtime,clock=host,driftfix=slew -boot menu=off,strict=off,order=cdn,once=c -enable-kvm -monitor stdio [root@ibm-p8-rhevm-13 home]# [root@ibm-p8-rhevm-13 home]#
Also reproduced the bug on rhel7.6z: Host kernel: 3.10.0-957.1.2.el7.x86_64 Qemu: qemu-kvm-rhev-2.12.0-18.el7_6.2.x86_64 And it's found the issue point is that I have used '-blockdev node-name=disk0...' to start guests, if I turn to use '-drive id=disk0', even all the following steps are the same, there is no the bug problem.
*** Bug 1652873 has been marked as a duplicate of this bug. ***
QEMU 3.1.0 had this stanza: ``` if (bdrv_dirty_bitmap_user_locked(bitmap)) { error_report("Can't migrate a bitmap that is in use by another operation: '%s'", bdrv_dirty_bitmap_name(bitmap)); goto fail; } ``` What's user_locked? anything that's "frozen" or "qmp_locked". A. Frozen bitmaps are any with a successor. Those are created by: i. block-backup, not used here, and ii. migration *load*; dirty_bitmap_load_start B. qmp_locked is anything modified by bdrv_dirty_bitmap_set_qmp_locked(..., true) i. nbd export will lock a bitmap; but only if it was told to with 3.1's qmp_x_nbd_server_add_bitmap command. ii. migration will lock bitmaps in init_dirty_bitmap_migration in the discovery loop. I think it's highly likely that the old, flawed discovery loop for bitmaps used in 3.1.0 is trying to migrate the same bitmap twice. There are two key changes: 1: v4.0.0 changed the bitmap permission system, so the error message you see (on the source console) after this version might change. 2: v4.1.0, with commit 592203e7cfb, changed the bitmap discovery method. It was the root cause of ... bug https://bugzilla.redhat.com/show_bug.cgi?id=1652490 which was cloned from this bug. :) This ought to be fixed in any 4.1-based package, and should simply be re-tested.
Blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1748253.
Was iothread in use on this test?
(In reply to Eric Blake from comment #14) > Was iothread in use on this test? Hi, Eric iothread was not used in this test scenario. BR, aliang
Test on qemu-kvm-4.1.0-8.module+el8.1.0+4199+446e40fc.x86_64 with -drive/device. Result is as bellow: Migration with bitmap failed with info in dst: (qemu) qemu-kvm: Cannot find device=#block369 nor node_name=#block369 qemu-kvm: error while loading state for instance 0x0 of device 'dirty-bitmap' qemu-kvm: load of migration failed: Invalid argument But quit src vm successfully without coredump. *****Details************ Test steps: 1.In src, start guest with qemu cmds: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -machine q35 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x1 \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190820-032540-OesJUJdj,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190820-032540-OesJUJdj,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idbJPqrG \ -chardev socket,id=chardev_serial0,server,path=/var/tmp/serial-serial0-20190820-032540-OesJUJdj,nowait \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20190820-032540-OesJUJdj,path=/var/tmp/seabios-20190820-032540-OesJUJdj,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20190820-032540-OesJUJdj,iobase=0x402 \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -drive id=drive_image1,if=none,snapshot=off,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel810-64-virtio.qcow2 \ -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-3,addr=0x0 \ -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \ -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \ -drive id=drive_data1,if=none,snapshot=off,cache=none,format=qcow2,file=/home/data.qcow2,werror=stop,rerror=stop \ -device virtio-scsi-pci,id=scsi1,bus=pcie.0-root-port-6,addr=0x0 \ -device scsi-hd,id=data1,drive=drive_data1,bus=scsi1.0 \ -device pcie-root-port,id=pcie.0-root-port-7,slot=7,chassis=7,addr=0x7,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device virtio-net-pci,mac=9a:19:6a:3c:a6:a5,id=idq14C2Q,netdev=idHzG7Zk,bus=pcie.0-root-port-4,addr=0x0 \ -netdev tap,id=idHzG7Zk,vhost=on \ -m 2048 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -cpu 'Skylake-Client',+kvm_pv_unhalt \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ 2. In dst, create an empty disk and start guest with qemu cmds: #qemu-img create -f qcow2 data1.qcow2 2G /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -machine q35 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x1 \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190820-032540-OesJUJdjk,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190820-032540-OesJUJdj,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idbJPqrG \ -chardev socket,id=chardev_serial0,server,path=/var/tmp/serial-serial0-20190820-032540-OesJUJdj,nowait \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20190820-032540-OesJUJdj,path=/var/tmp/seabios-20190820-032540-OesJUJdj,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20190820-032540-OesJUJdj,iobase=0x402 \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -drive id=drive_image1,if=none,snapshot=off,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel810-64-virtio.qcow2 \ -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-3,addr=0x0 \ -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \ -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \ -drive id=drive_data1,if=none,snapshot=off,cache=none,format=qcow2,file=/home/data1.qcow2,werror=stop,rerror=stop \ -device virtio-scsi-pci,id=scsi1,bus=pcie.0-root-port-6,addr=0x0 \ -device scsi-hd,id=data1,drive=drive_data1,bus=scsi1.0 \ -device pcie-root-port,id=pcie.0-root-port-7,slot=7,chassis=7,addr=0x7,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device virtio-net-pci,mac=9a:19:6a:3c:a6:a5,id=idq14C2Q,netdev=idHzG7Zk,bus=pcie.0-root-port-4,addr=0x0 \ -netdev tap,id=idHzG7Zk,vhost=on \ -m 2048 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -cpu 'Skylake-Client',+kvm_pv_unhalt \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :1 \ -rtc base=utc,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \ -monitor stdio \ -device virtio-serial-pci,id=virtio-serial0,bus=pcie_extra_root_port_0,addr=0x0 \ -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \ -device virtserialport,bus=virtio-serial0.0,chardev=qga0,id=qemu-ga0,name=org.qemu.guest_agent.0 \ -qmp tcp:0:3001,server,nowait \ -incoming tcp:0:5000 \ 3. In dst, expose data disk { "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet","data": { "host": "10.73.224.68", "port": "3333" } } } } {"return": {}} { "execute": "nbd-server-add", "arguments": { "device": "drive_data1","writable": true}} 4. In src, add bitmap to data disk. { "execute": "block-dirty-bitmap-add", "arguments": {"node": "drive_data1", "name":"bitmap0"}} 5. dd a file in src guest. (guest)# dd if=/dev/urandom of=test bs=1M count=1000 6. Set migration capability in both src and dst, and mirror from src to dst. {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true},{"capability":"dirty-bitmaps","state":true}]}} { "execute": "drive-mirror", "arguments": { "device": "drive_data1","target": "nbd://10.73.224.68:3333/drive_data1", "sync": "full","format": "raw", "mode": "existing"}} {"timestamp": {"seconds": 1568273966, "microseconds": 432368}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "drive_data1"}} {"timestamp": {"seconds": 1568273966, "microseconds": 433464}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "drive_data1"}} {"return": {}} {"timestamp": {"seconds": 1568273985, "microseconds": 485472}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "drive_data1"}} {"timestamp": {"seconds": 1568273985, "microseconds": 485524}, "event": "BLOCK_JOB_READY", "data": {"device": "drive_data1", "len": 2147483648, "offset": 2147483648, "speed": 0, "type": "mirror"}} 7. Set migration capability with pre-switch true. {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]}} 8. Migrate from src to dst. {"execute": "migrate","arguments":{"uri": "tcp:10.73.224.68:5000"}} {"timestamp": {"seconds": 1568274021, "microseconds": 606826}, "event": "MIGRATION", "data": {"status": "setup"}} {"return": {}} {"timestamp": {"seconds": 1568274021, "microseconds": 614289}, "event": "MIGRATION_PASS", "data": {"pass": 1}} {"timestamp": {"seconds": 1568274021, "microseconds": 614352}, "event": "MIGRATION", "data": {"status": "active"}} {"timestamp": {"seconds": 1568274021, "microseconds": 614972}, "event": "MIGRATION", "data": {"status": "failed"}} 9. Check migration status in guest (qemu) qemu-kvm: Cannot find device=#block369 nor node_name=#block369 qemu-kvm: error while loading state for instance 0x0 of device 'dirty-bitmap' qemu-kvm: load of migration failed: Invalid argument 10. Quit vm in src (qemu)quit In step9, bitmap migration failed. After step10, src vm quit successfully.
Based on the above removing the depends on since bz1748253 is related to an IOThread issue. Adjusting the ITR to 8.1.1 as this is a backup/bitmap type issue which won't be used by libvirt until at least 8.1.1
Test on qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61.x86_64 (on which qemu-kvm version bz1748253 has been fixed), this bug still exist.
aliang, does this problem happen with -blockdev anymore? Is this now related to just -drive/-device? --js
The problem is that the drive-mirror graph manipulation removes our ability to see the block-backend the bitmap was originally associated with. When you use -drive and -device to create a qcow2 graph, you wind up with a structure like this: [blockbackend "drive_data1"] | ^ v | [bdrvchild role=child_root] | ^ v | [node "#block191"] | ^ v | [bdrvchild role=child_file] | ^ v | [node "#block056"] The top node represents the qcow2 file, and the bottom node represents the posix file. When we run bitmap-add against "drive_data1", it gets stored on #block191. When we migrate, we find node "#block191" because it has a bitmap attached. Usually, we use a function named bdrv_get_device_or_node_name on this node to get the name of the block-backend for migration purposes ("drive_data1"), but when we are running under a drive-mirror, the graph has been reorganized and we lose the ability to find this name. The function falls back to the node local name which does not exist on the destination. Problem #1: We should never use autogenerated names during migration, because they might accidentally attach to the wrong node if there's a very unlucky collision! Now, what does the graph look like when there's a migration like the one requested? [blockbackend "drive_data1"] | ^ v | [bdrvchild role=child_root name="root"] | v ,------. [node "#block449"] `-----------------------> [bdrvchild role=child_job name="main node"] --> job object | ^ v | [bdrvchild role=child_backing name="backing"] | v ,----. [node "#block191"] `---------------------> [bdrvchild role=child_job name="source"] -----> job object | ^ v | [bdrvchild role=child_file name="file"] | ^ v | [node "#block056"] Problem #2: drive-mirror has inserted a new filter node #block449 between the original root and the block-backend, so that name is no longer available from that position in the graph. Problem #3: #block191 no longer has a parent link to the original block-backend, but instead has a parent link to the job, instead.
I'm sorry in advance, but I made a graph because it helped me understand the situation a little better: https://i.imgur.com/HJtCjQK.png This graph illustrates a single qcow2 file attached to a block-backend named "drive_image1" being mirrored to an anonymously named target. The bitmaps would be attached to #block191 in this case, and you can see it's a bit of a far trek up the graph to find "drive_image1" from here.
Aliang, my current expectation is this: - Using blockdev with a mirror migration should work correctly if the same node names are used on the destination. - Using drive with a mirror migration will not work correctly. - Using drive or blockdev without a mirror should work correctly. If true, that means this BZ should be retitled from "blockdev" to "non-blockdev" -- this is technically a new bug from what this BZ started as. Upstream post: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg07241.html Upstream discussion: https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg00002.html
Test it with -blockdev on qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61.x86_64, don't hit this issue any more. Test steps: 1.In src, start guest with qemu cmds: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -machine q35 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x1 \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190820-032540-OesJUJdj,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190820-032540-OesJUJdj,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idbJPqrG \ -chardev socket,id=chardev_serial0,server,path=/var/tmp/serial-serial0-20190820-032540-OesJUJdj,nowait \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20190820-032540-OesJUJdj,path=/var/tmp/seabios-20190820-032540-OesJUJdj,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20190820-032540-OesJUJdj,iobase=0x402 \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -blockdev driver=file,filename=/home/kvm_autotest_root/images/rhel810-64-virtio.qcow2,node-name=file_node \ -blockdev driver=qcow2,node-name=drive_image1,file=file_node \ -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-3,addr=0x0 \ -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \ -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \ -blockdev driver=file,filename=/home/data.qcow2,node-name=data_node \ -blockdev driver=qcow2,node-name=drive_data1,file=data_node \ -device virtio-scsi-pci,id=scsi1,bus=pcie.0-root-port-6,addr=0x0 \ -device scsi-hd,id=data1,drive=drive_data1,bus=scsi1.0 \ -device pcie-root-port,id=pcie.0-root-port-7,slot=7,chassis=7,addr=0x7,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device virtio-net-pci,mac=9a:19:6a:3c:a6:a5,id=idq14C2Q,netdev=idHzG7Zk,bus=pcie.0-root-port-4,addr=0x0 \ -netdev tap,id=idHzG7Zk,vhost=on \ -m 2048 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -cpu 'Skylake-Client',+kvm_pv_unhalt \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -monitor stdio \ 2. In dst, create an empty disk and start guest with qemu cmds: #qemu-img create -f qcow2 data1.qcow2 2G /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -machine q35 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x1 \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190820-032540-OesJUJdjk,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190820-032540-OesJUJdj,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idbJPqrG \ -chardev socket,id=chardev_serial0,server,path=/var/tmp/serial-serial0-20190820-032540-OesJUJdj,nowait \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20190820-032540-OesJUJdj,path=/var/tmp/seabios-20190820-032540-OesJUJdj,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20190820-032540-OesJUJdj,iobase=0x402 \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -blockdev driver=file,filename=/home/kvm_autotest_root/images/rhel810-64-virtio.qcow2,node-name=file_node \ -blockdev driver=qcow2,node-name=drive_image1,file=file_node \ -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-3,addr=0x0 \ -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \ -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \ -blockdev driver=file,filename=/home/data1.qcow2,node-name=data_node \ -blockdev driver=qcow2,node-name=drive_data1,file=data_node \ -device virtio-scsi-pci,id=scsi1,bus=pcie.0-root-port-6,addr=0x0 \ -device scsi-hd,id=data1,drive=drive_data1,bus=scsi1.0 \ -device pcie-root-port,id=pcie.0-root-port-7,slot=7,chassis=7,addr=0x7,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device virtio-net-pci,mac=9a:19:6a:3c:a6:a5,id=idq14C2Q,netdev=idHzG7Zk,bus=pcie.0-root-port-4,addr=0x0 \ -netdev tap,id=idHzG7Zk,vhost=on \ -m 2048 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -cpu 'Skylake-Client',+kvm_pv_unhalt \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :1 \ -rtc base=utc,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \ -monitor stdio \ -device virtio-serial-pci,id=virtio-serial0,bus=pcie_extra_root_port_0,addr=0x0 \ -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \ -device virtserialport,bus=virtio-serial0.0,chardev=qga0,id=qemu-ga0,name=org.qemu.guest_agent.0 \ -qmp tcp:0:3001,server,nowait \ -incoming tcp:0:5000 \ 3. In dst, expose data disk { "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet","data": { "host": "10.73.224.68", "port": "3333" } } } } {"return": {}} { "execute": "nbd-server-add", "arguments": { "device": "drive_data1","writable": true}} 4. In src, add bitmap to data disk. { "execute": "block-dirty-bitmap-add", "arguments": {"node": "drive_data1", "name":"bitmap0"}} 5. dd a file in src guest. (guest)# dd if=/dev/urandom of=test bs=1M count=1000 6. Disable bitmap,check its sha256 { "execute": "block-dirty-bitmap-disable", "arguments": {"node": "drive_data1","name":"bitmap0"}} {"return": {}} {"execute": "x-debug-block-dirty-bitmap-sha256","arguments": {"node": "drive_data1","name":"bitmap0"}} {"return": {"sha256": "e16364d58befa6b394f1400074058fb5558a245328b4c4c651f71b82023b429a"}} 7. Set migration capability in both src and dst, and mirror from src to dst. {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true},{"capability":"dirty-bitmaps","state":true}]}} {"execute":"blockdev-add","arguments":{"driver":"nbd","node-name":"mirror","server":{"type":"inet","host":"10.73.224.68","port":"3333"},"export":"drive_data1"}} {"return": {}} {"execute": "blockdev-mirror", "arguments": { "device": "drive_data1","target": "mirror", "sync":"full", "job-id":"j1"}} {"timestamp": {"seconds": 1570500270, "microseconds": 287015}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}} {"timestamp": {"seconds": 1570500270, "microseconds": 287051}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}} {"return": {}} {"timestamp": {"seconds": 1570500281, "microseconds": 881920}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}} {"timestamp": {"seconds": 1570500281, "microseconds": 881957}, "event": "BLOCK_JOB_READY", "data": {"device": "j1", "len": 2147483648, "offset": 2147483648, "speed": 0, "type": "mirror"}} 8. Set migration capability with pre-switch true and migrate from src to dst. {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]}} {"execute": "migrate","arguments":{"uri": "tcp:10.73.224.68:5000"}} {"timestamp": {"seconds": 1570500292, "microseconds": 798923}, "event": "MIGRATION", "data": {"status": "setup"}} {"return": {}} {"timestamp": {"seconds": 1570500292, "microseconds": 806497}, "event": "MIGRATION_PASS", "data": {"pass": 1}} {"timestamp": {"seconds": 1570500292, "microseconds": 806554}, "event": "MIGRATION", "data": {"status": "active"}} {"timestamp": {"seconds": 1570500349, "microseconds": 185840}, "event": "MIGRATION_PASS", "data": {"pass": 2}} {"timestamp": {"seconds": 1570500354, "microseconds": 400633}, "event": "MIGRATION_PASS", "data": {"pass": 3}} {"timestamp": {"seconds": 1570500354, "microseconds": 701521}, "event": "MIGRATION_PASS", "data": {"pass": 4}} {"timestamp": {"seconds": 1570500354, "microseconds": 714837}, "event": "STOP"} {"timestamp": {"seconds": 1570500354, "microseconds": 716302}, "event": "MIGRATION", "data": {"status": "pre-switchover"}} 9. Cancel block job and continue migration. {"execute":"block-job-cancel","arguments":{"device":"j1"}} {"return": {}} {"timestamp": {"seconds": 1570500361, "microseconds": 752025}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "j1"}} {"timestamp": {"seconds": 1570500361, "microseconds": 752058}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "j1"}} {"timestamp": {"seconds": 1570500361, "microseconds": 752095}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 2147483648, "offset": 2147483648, "speed": 0, "type": "mirror"}} {"timestamp": {"seconds": 1570500361, "microseconds": 752167}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}} {"timestamp": {"seconds": 1570500361, "microseconds": 752182}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}} {"execute":"migrate-continue","arguments":{"state":"pre-switchover"}} {"return": {}} {"timestamp": {"seconds": 1570500371, "microseconds": 511780}, "event": "MIGRATION", "data": {"status": "device"}} {"timestamp": {"seconds": 1570500371, "microseconds": 512582}, "event": "MIGRATION_PASS", "data": {"pass": 5}} {"timestamp": {"seconds": 1570500371, "microseconds": 515433}, "event": "MIGRATION", "data": {"status": "completed"}} 10. Check vm status in both src and dst. (src qemu)info status VM status: paused (postmigrate) (dst qemu)info status VM status: running 11. Check bitmap sha256 in dst. {"execute": "x-debug-block-dirty-bitmap-sha256","arguments": {"node": "drive_data1","name":"bitmap0"}} {"return": {"sha256": "e16364d58befa6b394f1400074058fb5558a245328b4c4c651f71b82023b429a"}} According to John's suggestion in comment22, close this bug as current release, and will file a new bug to track "[-drive]migrate bitmap on non-shared storage failed" issue. Thanks again for John's analysing and suggestion. BR, aliang
Using -blockdev avoids the problem, but upstream now has a patch for older setups where libvirt is still using -drive: https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg07419.html Basically, qemu 5.1 will now look through filter nodes (like the mirror job) to migrate by the original device name, rather than the broken approach of attempting to migrate by a generated node name that is not likely to exist on the other side.