Bug 1652424 - [blockdev] Migrate with dirty bitmap on no shared storage failed
Summary: [blockdev] Migrate with dirty bitmap on no shared storage failed
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: ---
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: rc
: 8.0
Assignee: Eric Blake
QA Contact: aihua liang
URL:
Whiteboard:
: 1652873 (view as bug list)
Depends On:
Blocks: 1652490 1655541
TreeView+ depends on / blocked
 
Reported: 2018-11-22 03:52 UTC by Gu Nini
Modified: 2020-05-27 14:25 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1652490 1655541 (view as bug list)
Environment:
Last Closed: 2019-10-08 02:25:45 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
gdb_debug_info-11222018 (14.67 KB, text/plain)
2018-11-22 03:52 UTC, Gu Nini
no flags Details

Description Gu Nini 2018-11-22 03:52:30 UTC
Created attachment 1507874 [details]
gdb_debug_info-11222018

Description of problem:
When migrate guest with dirty bitmap based on shared storage, it would fail with following error:

src guest hmp:
# ./vm00.sh
QEMU 3.0.92 monitor - type 'help' for more information
(qemu) qemu-system-ppc64: Can't migrate a bitmap that is in use by another operation: 'bitmap0'

dst guest hmp:
# ./vm00-mig.sh 
QEMU 3.0.92 monitor - type 'help' for more information
(qemu) qemu-system-ppc64: Unable to read node name string
qemu-system-ppc64: error while loading state for instance 0x0 of device 'dirty-bitmap'
qemu-system-ppc64: load of migration failed: Invalid argument


Version-Release number of selected component (if applicable):
Host kernel: 4.18.0-32.el8.ppc64le (src)   4.18.0-40.el8.ppc64le (dst)
Qemu: v3.1.0-rc2-dirty


How reproducible:
100%

Steps to Reproduce:
1. On both src and dst hosts, boot up guests with only a system disk, the src one is pre-installed, while the dst one is on a new created image:

    -blockdev node-name=disk0,file.driver=file,driver=qcow2,file.filename=/home/rhel80-ppc64le-upstream.qcow2 \
    -device scsi-hd,drive=disk0,id=image0,bootindex=0 \

2. On both hosts, set migration capabilities in the qmp connections:

# nc -U /var/tmp/avocado1
{"QMP": {"version": {"qemu": {"micro": 92, "minor": 0, "major": 3}, "package": "v3.1.0-rc2-dirty"}, "capabilities": []}}
{"execute":"qmp_capabilities"}
{"return": {}}
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true},{"capability":"dirty-bitmaps","state":true},{"capability":"pause-before-switchover","state":true}]}}
{"return": {}}

3. On dst host, start ndb server and add the export:

{ "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet","data": { "host": "10.0.1.44", "port": "3333" } } } }
{"return": {}}
{ "execute": "nbd-server-add", "arguments":{ "device": "disk0", "writable": true } }

4. On src host, add dirty bitmap bitmap0:

{ "execute": "block-dirty-bitmap-add", "arguments": {"node": "disk0", "name": "bitmap0"}} 
{"return": {}}

5. On src host, do block-mirror:

{"execute":"blockdev-add","arguments":{"driver":"nbd","node-name":"mirror","server":{"type":"inet","host":"10.0.1.44","port":"3333"},"export":"disk0"}}
{"return": {}}
{"execute": "blockdev-mirror", "arguments": { "device": "disk0","target": "mirror", "sync": "full", "job-id":"j1"}}
{"timestamp": {"seconds": 1542856270, "microseconds": 804881}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}}
{"timestamp": {"seconds": 1542856270, "microseconds": 804944}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"return": {}}
{"timestamp": {"seconds": 1542856314, "microseconds": 839507}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}}
{"timestamp": {"seconds": 1542856314, "microseconds": 839567}, "event": "BLOCK_JOB_READY", "data": {"device": "j1", "len": 21474836480, "offset": 21474836480, "speed": 0, "type": "mirror"}}
{"execute": "migrate","arguments":{"uri": "tcp:10.0.1.44:5200"}}

6. On src host, when the mirror job reaches ready status, begin to do the migrate:

{"execute": "migrate","arguments":{"uri": "tcp:10.0.1.44:5200"}}
{"timestamp": {"seconds": 1542856407, "microseconds": 47629}, "event": "MIGRATION", "data": {"status": "setup"}}
{"return": {}}
{"timestamp": {"seconds": 1542856407, "microseconds": 51688}, "event": "MIGRATION_PASS", "data": {"pass": 1}}
{"timestamp": {"seconds": 1542856407, "microseconds": 51772}, "event": "MIGRATION", "data": {"status": "active"}}
{"timestamp": {"seconds": 1542856407, "microseconds": 51805}, "event": "MIGRATION", "data": {"status": "failed"}}
{"timestamp": {"seconds": 1542856419, "microseconds": 571231}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1542856419, "microseconds": 571517}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "j1"}}
{"timestamp": {"seconds": 1542856419, "microseconds": 571607}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 21475229696, "offset": 21475098624, "speed": 0, "type": "mirror", "error": "Input/output error"}}
{"timestamp": {"seconds": 1542856419, "microseconds": 571647}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}}
{"timestamp": {"seconds": 1542856419, "microseconds": 571679}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}}


Actual results:
In step6, the migration is a failure. From hmp output of the src guest, it seems the bitmap is in use.

Expected results:
In step6, the migration is a success.


Additional info:
If turn to quit the guest in src side, it gets a core dump as follows; please refer to attachment gdb_debug_info-11222018 for details:

# ./vm00.sh
QEMU 3.0.92 monitor - type 'help' for more information
(qemu) qemu-system-ppc64: Can't migrate a bitmap that is in use by another operation: 'bitmap0'
(qemu) q
qemu-system-ppc64: block.c:3526: bdrv_close_all: Assertion `QTAILQ_EMPTY(&all_bdrv_states)' failed.
./vm00.sh: line 23: 104628 Aborted                 (core dumped) /home/qemu/ppc64-softmmu/qemu-system-ppc64 -name 'avocado-vt-vm1' -machine pseries -object secret,id=sec0,data=redhat -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado1,server,nowait -mon chardev=qmp_id_qmpmonitor1,mode=control -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 
-chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,id=qemu-ga0,name=org.qemu.guest_agent.0 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x6 -blockdev node-name=disk0,file.driver=file,driver=qcow2,file.filename=/home/rhel80-ppc64le-upstream.qcow2 -device scsi-hd,drive=disk0,id=image0,bootindex=0 -device virtio-net-pci,mac=9a:78:79:7a:7b
:6a,id=id8e5D72,vectors=4,netdev=idrYUYaH,bus=pci.0,addr=0x3 -netdev tap,id=idrYUYaH,vhost=on -m 1024 -smp 2,maxcpus=2,cores=2,threads=1,sockets=1 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :20 -rtc base=localtime,clock=host,driftfix=slew -boot menu=off,strict=off,order=cdn,once=c -enable-kvm -monitor stdio
[root@ibm-p8-rhevm-13 home]# [root@ibm-p8-rhevm-13 home]#

Comment 1 Gu Nini 2018-11-22 07:39:13 UTC
Also reproduced the bug on rhel7.6z:
Host kernel: 3.10.0-957.1.2.el7.x86_64
Qemu: qemu-kvm-rhev-2.12.0-18.el7_6.2.x86_64


And it's found the issue point is that I have used '-blockdev node-name=disk0...' to start guests, if I turn to use '-drive id=disk0', even all the following steps are the same, there is no the bug problem.

Comment 2 Gu Nini 2018-11-26 01:39:33 UTC
*** Bug 1652873 has been marked as a duplicate of this bug. ***

Comment 12 John Snow 2019-08-29 20:28:03 UTC
QEMU 3.1.0 had this stanza:

```
if (bdrv_dirty_bitmap_user_locked(bitmap)) {
    error_report("Can't migrate a bitmap that is in use by another operation: '%s'",
                 bdrv_dirty_bitmap_name(bitmap));
    goto fail;
}
```

What's user_locked? anything that's "frozen" or "qmp_locked".

A. Frozen bitmaps are any with a successor. Those are created by:
  i. block-backup, not used here, and
  ii. migration *load*; dirty_bitmap_load_start

B. qmp_locked is anything modified by bdrv_dirty_bitmap_set_qmp_locked(..., true)
  i. nbd export will lock a bitmap; but only if it was told to with 3.1's qmp_x_nbd_server_add_bitmap command.
  ii. migration will lock bitmaps in init_dirty_bitmap_migration in the discovery loop.

I think it's highly likely that the old, flawed discovery loop for bitmaps used in 3.1.0 is trying to migrate the same bitmap twice.

There are two key changes:

1: v4.0.0 changed the bitmap permission system, so the error message you see (on the source console) after this version might change.
2: v4.1.0, with commit 592203e7cfb, changed the bitmap discovery method. It was the root cause of ... bug https://bugzilla.redhat.com/show_bug.cgi?id=1652490 which was cloned from this bug. :)


This ought to be fixed in any 4.1-based package, and should simply be re-tested.

Comment 13 aihua liang 2019-09-03 10:13:36 UTC
Blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1748253.

Comment 14 Eric Blake 2019-09-11 18:46:34 UTC
Was iothread in use on this test?

Comment 15 aihua liang 2019-09-12 07:42:08 UTC
(In reply to Eric Blake from comment #14)
> Was iothread in use on this test?

Hi, Eric
    
   iothread was not used in this test scenario.

BR,
aliang

Comment 16 aihua liang 2019-09-12 07:58:12 UTC
Test on qemu-kvm-4.1.0-8.module+el8.1.0+4199+446e40fc.x86_64 with -drive/device.

Result is as bellow:
  Migration with bitmap failed with info in dst:
      (qemu) qemu-kvm: Cannot find device=#block369 nor node_name=#block369
qemu-kvm: error while loading state for instance 0x0 of device 'dirty-bitmap'
qemu-kvm: load of migration failed: Invalid argument

  But quit src vm successfully without coredump.

*****Details************
Test steps:
 1.In src, start guest with qemu cmds:
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190820-032540-OesJUJdj,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190820-032540-OesJUJdj,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idbJPqrG \
    -chardev socket,id=chardev_serial0,server,path=/var/tmp/serial-serial0-20190820-032540-OesJUJdj,nowait \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20190820-032540-OesJUJdj,path=/var/tmp/seabios-20190820-032540-OesJUJdj,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190820-032540-OesJUJdj,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -drive id=drive_image1,if=none,snapshot=off,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel810-64-virtio.qcow2 \
    -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-3,addr=0x0 \
    -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -drive id=drive_data1,if=none,snapshot=off,cache=none,format=qcow2,file=/home/data.qcow2,werror=stop,rerror=stop \
    -device virtio-scsi-pci,id=scsi1,bus=pcie.0-root-port-6,addr=0x0 \
    -device scsi-hd,id=data1,drive=drive_data1,bus=scsi1.0 \
    -device pcie-root-port,id=pcie.0-root-port-7,slot=7,chassis=7,addr=0x7,bus=pcie.0 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:19:6a:3c:a6:a5,id=idq14C2Q,netdev=idHzG7Zk,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idHzG7Zk,vhost=on \
    -m 2048  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'Skylake-Client',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \

 2. In dst, create an empty disk and start guest with qemu cmds:
     #qemu-img create -f qcow2 data1.qcow2 2G
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190820-032540-OesJUJdjk,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190820-032540-OesJUJdj,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idbJPqrG \
    -chardev socket,id=chardev_serial0,server,path=/var/tmp/serial-serial0-20190820-032540-OesJUJdj,nowait \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20190820-032540-OesJUJdj,path=/var/tmp/seabios-20190820-032540-OesJUJdj,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190820-032540-OesJUJdj,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -drive id=drive_image1,if=none,snapshot=off,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel810-64-virtio.qcow2 \
    -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-3,addr=0x0 \
    -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -drive id=drive_data1,if=none,snapshot=off,cache=none,format=qcow2,file=/home/data1.qcow2,werror=stop,rerror=stop \
    -device virtio-scsi-pci,id=scsi1,bus=pcie.0-root-port-6,addr=0x0 \
    -device scsi-hd,id=data1,drive=drive_data1,bus=scsi1.0 \
    -device pcie-root-port,id=pcie.0-root-port-7,slot=7,chassis=7,addr=0x7,bus=pcie.0 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:19:6a:3c:a6:a5,id=idq14C2Q,netdev=idHzG7Zk,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idHzG7Zk,vhost=on \
    -m 2048  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'Skylake-Client',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :1  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -monitor stdio \
    -device virtio-serial-pci,id=virtio-serial0,bus=pcie_extra_root_port_0,addr=0x0 \
    -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
    -device virtserialport,bus=virtio-serial0.0,chardev=qga0,id=qemu-ga0,name=org.qemu.guest_agent.0 \
    -qmp tcp:0:3001,server,nowait \
    -incoming tcp:0:5000 \

 3. In dst, expose data disk
     { "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet","data": { "host": "10.73.224.68", "port": "3333" } } } }
{"return": {}}
{ "execute": "nbd-server-add", "arguments": { "device": "drive_data1","writable": true}}

 4. In src, add bitmap to data disk.
    { "execute": "block-dirty-bitmap-add", "arguments": {"node": "drive_data1", "name":"bitmap0"}}

 5. dd a file in src guest.
    (guest)# dd if=/dev/urandom of=test bs=1M count=1000

 6. Set migration capability in both src and dst, and mirror from src to dst.
    {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true},{"capability":"dirty-bitmaps","state":true}]}}
    { "execute": "drive-mirror", "arguments": { "device": "drive_data1","target": "nbd://10.73.224.68:3333/drive_data1", "sync": "full","format": "raw", "mode": "existing"}}
    {"timestamp": {"seconds": 1568273966, "microseconds": 432368}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "drive_data1"}}
{"timestamp": {"seconds": 1568273966, "microseconds": 433464}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "drive_data1"}}
{"return": {}}
{"timestamp": {"seconds": 1568273985, "microseconds": 485472}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "drive_data1"}}
{"timestamp": {"seconds": 1568273985, "microseconds": 485524}, "event": "BLOCK_JOB_READY", "data": {"device": "drive_data1", "len": 2147483648, "offset": 2147483648, "speed": 0, "type": "mirror"}}

 7. Set migration capability with pre-switch true.
    {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]}}

 8. Migrate from src to dst.
    {"execute": "migrate","arguments":{"uri": "tcp:10.73.224.68:5000"}}
{"timestamp": {"seconds": 1568274021, "microseconds": 606826}, "event": "MIGRATION", "data": {"status": "setup"}}
{"return": {}}
{"timestamp": {"seconds": 1568274021, "microseconds": 614289}, "event": "MIGRATION_PASS", "data": {"pass": 1}}
{"timestamp": {"seconds": 1568274021, "microseconds": 614352}, "event": "MIGRATION", "data": {"status": "active"}}
{"timestamp": {"seconds": 1568274021, "microseconds": 614972}, "event": "MIGRATION", "data": {"status": "failed"}}

 9. Check migration status in guest
     (qemu) qemu-kvm: Cannot find device=#block369 nor node_name=#block369
qemu-kvm: error while loading state for instance 0x0 of device 'dirty-bitmap'
qemu-kvm: load of migration failed: Invalid argument

 10. Quit vm in src
     (qemu)quit
 
 In step9, bitmap migration failed.
 After step10, src vm quit successfully.

Comment 17 John Ferlan 2019-09-12 10:41:51 UTC
Based on the above removing the depends on since bz1748253 is related to an IOThread issue.  

Adjusting the ITR to 8.1.1 as this is a backup/bitmap type issue which won't be used by libvirt until at least 8.1.1

Comment 18 aihua liang 2019-09-29 06:36:49 UTC
Test on qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61.x86_64 (on which qemu-kvm version bz1748253 has been fixed), this bug still exist.

Comment 19 John Snow 2019-09-30 17:25:23 UTC
aliang, does this problem happen with -blockdev anymore? Is this now related to just -drive/-device?

--js

Comment 20 John Snow 2019-09-30 23:32:59 UTC
The problem is that the drive-mirror graph manipulation removes our ability to see the block-backend the bitmap was originally associated with.

When you use -drive and -device to create a qcow2 graph, you wind up with a structure like this:

[blockbackend "drive_data1"]
    | ^
    v |
  [bdrvchild role=child_root]
      | ^
      v |
    [node "#block191"]
        | ^
        v |
      [bdrvchild role=child_file]
          | ^
          v |
        [node "#block056"]


The top node represents the qcow2 file, and the bottom node represents the posix file.
When we run bitmap-add against "drive_data1", it gets stored on #block191.

When we migrate, we find node "#block191" because it has a bitmap attached. Usually, we use a function named bdrv_get_device_or_node_name on this node to get the name of the block-backend for migration purposes ("drive_data1"), but when we are running under a drive-mirror, the graph has been reorganized and we lose the ability to find this name. The function falls back to the node local name which does not exist on the destination.

Problem #1: We should never use autogenerated names during migration, because they might accidentally attach to the wrong node if there's a very unlucky collision!


Now, what does the graph look like when there's a migration like the one requested?

[blockbackend "drive_data1"]
    | ^
    v |
  [bdrvchild role=child_root name="root"]
      |
      v            ,------.
    [node "#block449"]     `-----------------------> [bdrvchild role=child_job name="main node"] --> job object
        | ^
        v |
      [bdrvchild role=child_backing name="backing"]
          |
          v            ,----.
        [node "#block191"]   `---------------------> [bdrvchild role=child_job name="source"] -----> job object
            | ^
            v |
          [bdrvchild role=child_file name="file"]
              | ^
              v |
            [node "#block056"]

Problem #2: drive-mirror has inserted a new filter node #block449 between the original root and the block-backend, so that name is no longer available from that position in the graph.

Problem #3: #block191 no longer has a parent link to the original block-backend, but instead has a parent link to the job, instead.

Comment 21 John Snow 2019-10-01 03:59:00 UTC
I'm sorry in advance, but I made a graph because it helped me understand the situation a little better: https://i.imgur.com/HJtCjQK.png

This graph illustrates a single qcow2 file attached to a block-backend named "drive_image1" being mirrored to an anonymously named target.
The bitmaps would be attached to #block191 in this case, and you can see it's a bit of a far trek up the graph to find "drive_image1" from here.

Comment 22 John Snow 2019-10-03 23:19:33 UTC
Aliang, my current expectation is this:

- Using blockdev with a mirror migration should work correctly if the same node names are used on the destination.
- Using drive with a mirror migration will not work correctly.
- Using drive or blockdev without a mirror should work correctly.

If true, that means this BZ should be retitled from "blockdev" to "non-blockdev" -- this is technically a new bug from what this BZ started as.

Upstream post: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg07241.html
Upstream discussion: https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg00002.html

Comment 23 aihua liang 2019-10-08 02:25:45 UTC
Test it with -blockdev on qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61.x86_64, don't hit this issue any more.
  
Test steps:
 1.In src, start guest with qemu cmds:
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190820-032540-OesJUJdj,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190820-032540-OesJUJdj,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idbJPqrG \
    -chardev socket,id=chardev_serial0,server,path=/var/tmp/serial-serial0-20190820-032540-OesJUJdj,nowait \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20190820-032540-OesJUJdj,path=/var/tmp/seabios-20190820-032540-OesJUJdj,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190820-032540-OesJUJdj,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -blockdev driver=file,filename=/home/kvm_autotest_root/images/rhel810-64-virtio.qcow2,node-name=file_node \
    -blockdev driver=qcow2,node-name=drive_image1,file=file_node \
    -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-3,addr=0x0 \
    -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -blockdev driver=file,filename=/home/data.qcow2,node-name=data_node \
    -blockdev driver=qcow2,node-name=drive_data1,file=data_node \
    -device virtio-scsi-pci,id=scsi1,bus=pcie.0-root-port-6,addr=0x0 \
    -device scsi-hd,id=data1,drive=drive_data1,bus=scsi1.0 \
    -device pcie-root-port,id=pcie.0-root-port-7,slot=7,chassis=7,addr=0x7,bus=pcie.0 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:19:6a:3c:a6:a5,id=idq14C2Q,netdev=idHzG7Zk,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idHzG7Zk,vhost=on \
    -m 2048  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'Skylake-Client',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \

 2. In dst, create an empty disk and start guest with qemu cmds:
     #qemu-img create -f qcow2 data1.qcow2 2G
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190820-032540-OesJUJdjk,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190820-032540-OesJUJdj,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idbJPqrG \
    -chardev socket,id=chardev_serial0,server,path=/var/tmp/serial-serial0-20190820-032540-OesJUJdj,nowait \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20190820-032540-OesJUJdj,path=/var/tmp/seabios-20190820-032540-OesJUJdj,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190820-032540-OesJUJdj,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -blockdev driver=file,filename=/home/kvm_autotest_root/images/rhel810-64-virtio.qcow2,node-name=file_node \
    -blockdev driver=qcow2,node-name=drive_image1,file=file_node \
    -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-3,addr=0x0 \
    -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -blockdev driver=file,filename=/home/data1.qcow2,node-name=data_node \
    -blockdev driver=qcow2,node-name=drive_data1,file=data_node \
    -device virtio-scsi-pci,id=scsi1,bus=pcie.0-root-port-6,addr=0x0 \
    -device scsi-hd,id=data1,drive=drive_data1,bus=scsi1.0 \
    -device pcie-root-port,id=pcie.0-root-port-7,slot=7,chassis=7,addr=0x7,bus=pcie.0 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:19:6a:3c:a6:a5,id=idq14C2Q,netdev=idHzG7Zk,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idHzG7Zk,vhost=on \
    -m 2048  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'Skylake-Client',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :1  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -monitor stdio \
    -device virtio-serial-pci,id=virtio-serial0,bus=pcie_extra_root_port_0,addr=0x0 \
    -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
    -device virtserialport,bus=virtio-serial0.0,chardev=qga0,id=qemu-ga0,name=org.qemu.guest_agent.0 \
    -qmp tcp:0:3001,server,nowait \
    -incoming tcp:0:5000 \

 3. In dst, expose data disk
     { "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet","data": { "host": "10.73.224.68", "port": "3333" } } } }
{"return": {}}
{ "execute": "nbd-server-add", "arguments": { "device": "drive_data1","writable": true}}

 4. In src, add bitmap to data disk.
    { "execute": "block-dirty-bitmap-add", "arguments": {"node": "drive_data1", "name":"bitmap0"}}

 5. dd a file in src guest.
    (guest)# dd if=/dev/urandom of=test bs=1M count=1000

 6. Disable bitmap,check its sha256
    { "execute": "block-dirty-bitmap-disable", "arguments": {"node": "drive_data1","name":"bitmap0"}}
{"return": {}}
{"execute": "x-debug-block-dirty-bitmap-sha256","arguments": {"node": "drive_data1","name":"bitmap0"}}
{"return": {"sha256": "e16364d58befa6b394f1400074058fb5558a245328b4c4c651f71b82023b429a"}}

 7. Set migration capability in both src and dst, and mirror from src to dst.
    {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true},{"capability":"dirty-bitmaps","state":true}]}}
    {"execute":"blockdev-add","arguments":{"driver":"nbd","node-name":"mirror","server":{"type":"inet","host":"10.73.224.68","port":"3333"},"export":"drive_data1"}}
{"return": {}}
{"execute": "blockdev-mirror", "arguments": { "device": "drive_data1","target": "mirror", "sync":"full", "job-id":"j1"}}
{"timestamp": {"seconds": 1570500270, "microseconds": 287015}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}}
{"timestamp": {"seconds": 1570500270, "microseconds": 287051}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"return": {}}
{"timestamp": {"seconds": 1570500281, "microseconds": 881920}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}}
{"timestamp": {"seconds": 1570500281, "microseconds": 881957}, "event": "BLOCK_JOB_READY", "data": {"device": "j1", "len": 2147483648, "offset": 2147483648, "speed": 0, "type": "mirror"}}

 8. Set migration capability with pre-switch true and migrate from src to dst.
    {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]}}
    {"execute": "migrate","arguments":{"uri": "tcp:10.73.224.68:5000"}}
{"timestamp": {"seconds": 1570500292, "microseconds": 798923}, "event": "MIGRATION", "data": {"status": "setup"}}
{"return": {}}
{"timestamp": {"seconds": 1570500292, "microseconds": 806497}, "event": "MIGRATION_PASS", "data": {"pass": 1}}
{"timestamp": {"seconds": 1570500292, "microseconds": 806554}, "event": "MIGRATION", "data": {"status": "active"}}
{"timestamp": {"seconds": 1570500349, "microseconds": 185840}, "event": "MIGRATION_PASS", "data": {"pass": 2}}
{"timestamp": {"seconds": 1570500354, "microseconds": 400633}, "event": "MIGRATION_PASS", "data": {"pass": 3}}
{"timestamp": {"seconds": 1570500354, "microseconds": 701521}, "event": "MIGRATION_PASS", "data": {"pass": 4}}
{"timestamp": {"seconds": 1570500354, "microseconds": 714837}, "event": "STOP"}
{"timestamp": {"seconds": 1570500354, "microseconds": 716302}, "event": "MIGRATION", "data": {"status": "pre-switchover"}}

 9. Cancel block job and continue migration.
     {"execute":"block-job-cancel","arguments":{"device":"j1"}}
{"return": {}}
{"timestamp": {"seconds": 1570500361, "microseconds": 752025}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "j1"}}
{"timestamp": {"seconds": 1570500361, "microseconds": 752058}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "j1"}}
{"timestamp": {"seconds": 1570500361, "microseconds": 752095}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 2147483648, "offset": 2147483648, "speed": 0, "type": "mirror"}}
{"timestamp": {"seconds": 1570500361, "microseconds": 752167}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}}
{"timestamp": {"seconds": 1570500361, "microseconds": 752182}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}}
{"execute":"migrate-continue","arguments":{"state":"pre-switchover"}}
{"return": {}}
{"timestamp": {"seconds": 1570500371, "microseconds": 511780}, "event": "MIGRATION", "data": {"status": "device"}}
{"timestamp": {"seconds": 1570500371, "microseconds": 512582}, "event": "MIGRATION_PASS", "data": {"pass": 5}}
{"timestamp": {"seconds": 1570500371, "microseconds": 515433}, "event": "MIGRATION", "data": {"status": "completed"}}

10. Check vm status in both src and dst.
    (src qemu)info status
     VM status: paused (postmigrate)
 
    (dst qemu)info status
     VM status: running

11. Check bitmap sha256 in dst.
     {"execute": "x-debug-block-dirty-bitmap-sha256","arguments": {"node": "drive_data1","name":"bitmap0"}}
{"return": {"sha256": "e16364d58befa6b394f1400074058fb5558a245328b4c4c651f71b82023b429a"}}



According to John's suggestion in comment22, close this bug as current release, and will file a new bug to track "[-drive]migrate bitmap on non-shared storage failed" issue.

Thanks again for John's analysing and suggestion.

BR,
aliang

Comment 24 Eric Blake 2020-05-27 14:24:11 UTC
Using -blockdev avoids the problem, but upstream now has a patch for older setups where libvirt is still using -drive:
https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg07419.html
Basically, qemu 5.1 will now look through filter nodes (like the mirror job) to migrate by the original device name, rather than the broken approach of attempting to migrate by a generated node name that is not likely to exist on the other side.


Note You need to log in before you can comment on or make changes to this bug.