Bug 1652490

Summary: [RHEL.7] Migrating bitmaps on non-root nodes fails
Product: Red Hat Enterprise Linux 7 Reporter: Gu Nini <ngu>
Component: qemu-kvm-rhevAssignee: John Snow <jsnow>
Status: CLOSED DEFERRED QA Contact: aihua liang <aliang>
Severity: high Docs Contact:
Priority: high    
Version: 7.7CC: aliang, chayang, coli, eblake, jsnow, juzhang, qzhang, rbalakri, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1652424 Environment:
Last Closed: 2019-06-05 21:22:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1652424, 1655541    
Bug Blocks:    
Attachments:
Description Flags
gdb_debug_info_rhel7.6-11222018 none

Description Gu Nini 2018-11-22 09:04:58 UTC
Created attachment 1507893 [details]
gdb_debug_info_rhel7.6-11222018

+++ This bug was initially created as a clone of Bug #1652424 +++

Met the bug on rhel7.7(qemu-kvm-rhev-2.12.0-19.el7.x86_64), so clone it to be a separate one.

Src guest hmp:
 # ./vm01.sh
QEMU 2.12.0 monitor - type 'help' for more information
(qemu) qemu-kvm: Can't migrate locked dirty bitmap: 'bitmap0
(qemu) 
(qemu) q
qemu-kvm: block.c:3415: bdrv_close_all: Assertion `((&all_bdrv_states)->tqh_first == ((void *)0))' failed.
./vm01.sh: line 24: 13416 Aborted                 (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -machine pc -object secret,id=sec0,data=redhat -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado1,server,nowait -mon chardev=qmp_id_qmpmonitor1,mode=control -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -chardev socket,path=/tmp/q
ga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,id=qemu-ga0,name=org.qemu.guest_agent.0 -blockdev node-name=disk0,file.driver=file,driver=qcow2,file.filename=/home/rhel76.qcow2 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x6 -device scsi-hd,drive=disk0,id=image0,bootindex=0 -device virtio-net-pci,mac=9a:78:79:7a:7b:6a,id=id8e5D72,vectors=4,netdev=idrYUYaH,bu
s=pci.0,addr=0x3 -netdev tap,id=idrYUYaH,vhost=on -m 1024 -smp 2,maxcpus=2,cores=2,threads=1,sockets=1 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :20 -rtc base=localtime,clock=host,driftfix=slew -boot menu=off,strict=off,order=cdn,once=c -enable-kvm -monitor stdio
[root@hp-z800-03 home]# [root@hp-z800-03 home]# [root@hp-z800-03 home]#

Dst guest hmp:
# ./vm01.sh
QEMU 2.12.0 monitor - type 'help' for more information
(qemu) qemu-kvm: Unable to read node name string
qemu-kvm: error while loading state for instance 0x0 of device 'dirty-bitmap'
qemu-kvm: load of migration failed: Invalid argument



Description of problem:
When migrate guest with dirty bitmap based on shared storage, it would fail with following error:

src guest hmp:
# ./vm00.sh
QEMU 3.0.92 monitor - type 'help' for more information
(qemu) qemu-system-ppc64: Can't migrate a bitmap that is in use by another operation: 'bitmap0'

dst guest hmp:
# ./vm00-mig.sh 
QEMU 3.0.92 monitor - type 'help' for more information
(qemu) qemu-system-ppc64: Unable to read node name string
qemu-system-ppc64: error while loading state for instance 0x0 of device 'dirty-bitmap'
qemu-system-ppc64: load of migration failed: Invalid argument


Version-Release number of selected component (if applicable):
Host kernel: 4.18.0-32.el8.ppc64le (src)   4.18.0-40.el8.ppc64le (dst)
Qemu: v3.1.0-rc2-dirty


How reproducible:
100%

Steps to Reproduce:
1. On both src and dst hosts, boot up guests with only a system disk, the src one is pre-installed, while the dst one is on a new created image:

    -blockdev node-name=disk0,file.driver=file,driver=qcow2,file.filename=/home/rhel80-ppc64le-upstream.qcow2 \
    -device scsi-hd,drive=disk0,id=image0,bootindex=0 \

2. On both hosts, set migration capabilities in the qmp connections:

# nc -U /var/tmp/avocado1
{"QMP": {"version": {"qemu": {"micro": 92, "minor": 0, "major": 3}, "package": "v3.1.0-rc2-dirty"}, "capabilities": []}}
{"execute":"qmp_capabilities"}
{"return": {}}
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true},{"capability":"dirty-bitmaps","state":true},{"capability":"pause-before-switchover","state":true}]}}
{"return": {}}

3. On dst host, start ndb server and add the export:

{ "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet","data": { "host": "10.0.1.44", "port": "3333" } } } }
{"return": {}}
{ "execute": "nbd-server-add", "arguments":{ "device": "disk0", "writable": true } }

4. On src host, add dirty bitmap bitmap0:

{ "execute": "block-dirty-bitmap-add", "arguments": {"node": "disk0", "name": "bitmap0"}} 
{"return": {}}

5. On src host, do block-mirror:

{"execute":"blockdev-add","arguments":{"driver":"nbd","node-name":"mirror","server":{"type":"inet","host":"10.0.1.44","port":"3333"},"export":"disk0"}}
{"return": {}}
{"execute": "blockdev-mirror", "arguments": { "device": "disk0","target": "mirror", "sync": "full", "job-id":"j1"}}
{"timestamp": {"seconds": 1542856270, "microseconds": 804881}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}}
{"timestamp": {"seconds": 1542856270, "microseconds": 804944}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"return": {}}
{"timestamp": {"seconds": 1542856314, "microseconds": 839507}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}}
{"timestamp": {"seconds": 1542856314, "microseconds": 839567}, "event": "BLOCK_JOB_READY", "data": {"device": "j1", "len": 21474836480, "offset": 21474836480, "speed": 0, "type": "mirror"}}
{"execute": "migrate","arguments":{"uri": "tcp:10.0.1.44:5200"}}

6. On src host, when the mirror job reaches ready status, begin to do the migrate:

{"execute": "migrate","arguments":{"uri": "tcp:10.0.1.44:5200"}}
{"timestamp": {"seconds": 1542856407, "microseconds": 47629}, "event": "MIGRATION", "data": {"status": "setup"}}
{"return": {}}
{"timestamp": {"seconds": 1542856407, "microseconds": 51688}, "event": "MIGRATION_PASS", "data": {"pass": 1}}
{"timestamp": {"seconds": 1542856407, "microseconds": 51772}, "event": "MIGRATION", "data": {"status": "active"}}
{"timestamp": {"seconds": 1542856407, "microseconds": 51805}, "event": "MIGRATION", "data": {"status": "failed"}}
{"timestamp": {"seconds": 1542856419, "microseconds": 571231}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1542856419, "microseconds": 571517}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "j1"}}
{"timestamp": {"seconds": 1542856419, "microseconds": 571607}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 21475229696, "offset": 21475098624, "speed": 0, "type": "mirror", "error": "Input/output error"}}
{"timestamp": {"seconds": 1542856419, "microseconds": 571647}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}}
{"timestamp": {"seconds": 1542856419, "microseconds": 571679}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}}


Actual results:
In step6, the migration is a failure. From hmp output of the src guest, it seems the bitmap is in use.

Expected results:
In step6, the migration is a success.


Additional info:
If turn to quit the guest in src side, it gets a core dump as follows; please refer to attachment gdb_debug_info-11222018 for details:

# ./vm00.sh
QEMU 3.0.92 monitor - type 'help' for more information
(qemu) qemu-system-ppc64: Can't migrate a bitmap that is in use by another operation: 'bitmap0'
(qemu) q
qemu-system-ppc64: block.c:3526: bdrv_close_all: Assertion `QTAILQ_EMPTY(&all_bdrv_states)' failed.
./vm00.sh: line 23: 104628 Aborted                 (core dumped) /home/qemu/ppc64-softmmu/qemu-system-ppc64 -name 'avocado-vt-vm1' -machine pseries -object secret,id=sec0,data=redhat -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado1,server,nowait -mon chardev=qmp_id_qmpmonitor1,mode=control -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 
-chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtserialport,bus=virtio-serial0.0,chardev=qga0,id=qemu-ga0,name=org.qemu.guest_agent.0 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x6 -blockdev node-name=disk0,file.driver=file,driver=qcow2,file.filename=/home/rhel80-ppc64le-upstream.qcow2 -device scsi-hd,drive=disk0,id=image0,bootindex=0 -device virtio-net-pci,mac=9a:78:79:7a:7b
:6a,id=id8e5D72,vectors=4,netdev=idrYUYaH,bus=pci.0,addr=0x3 -netdev tap,id=idrYUYaH,vhost=on -m 1024 -smp 2,maxcpus=2,cores=2,threads=1,sockets=1 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :20 -rtc base=localtime,clock=host,driftfix=slew -boot menu=off,strict=off,order=cdn,once=c -enable-kvm -monitor stdio
[root@ibm-p8-rhevm-13 home]# [root@ibm-p8-rhevm-13 home]#

--- Additional comment from Gu Nini on 2018-11-22 02:39:13 EST ---

Also reproduced the bug on rhel7.6z:
Host kernel: 3.10.0-957.1.2.el7.x86_64
Qemu: qemu-kvm-rhev-2.12.0-18.el7_6.2.x86_64


And it's found the issue point is that I have used '-blockdev node-name=disk0...' to start guests, if I turn to use '-drive id=disk0', even all the following steps are the same, there is no the bug problem.

Comment 3 aihua liang 2019-02-15 08:42:50 UTC
Hi, John
  Test on qemu-kvm-rhev-2.12.0-23.el7.x86_64, also hit this issue.

Test steps:
  1.In src, start qemu with cmd:
     /usr/libexec/qemu-kvm \
    -S \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190123-032240-rOoB4cgD,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190123-032240-rOoB4cgD,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=id8Ec4Bn  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190123-032240-rOoB4cgD,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190123-032240-rOoB4cgD,path=/var/tmp/seabios-20190123-032240-rOoB4cgD,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190123-032240-rOoB4cgD,iobase=0x402 \
    -device ich9-usb-ehci1,id=usb1,addr=0x1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.4,firstport=4,bus=pci.0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x3 \
    -blockdev driver=file,node-name=file_base,filename=/home/kvm_autotest_root/images/rhel80-64-virtio-scsi.qcow2,auto-read-only=on \
    -blockdev driver=qcow2,file=file_base,node-name=drive_image1,auto-read-only=on \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:39:3a:3b:3c:3d,id=id1JNQsL,vectors=4,netdev=idVpZZ6A,bus=pci.0,addr=0x4  \
    -netdev tap,id=idVpZZ6A,vhost=on \
    -m 4096  \
    -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
    -cpu 'Penryn',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:0:3000,server,nowait \

 2. In src, set migrate capabilities:
     {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true},{"capability":"dirty-bitmaps","state":true}]}}

 3. In dst, create target image and start it.
     #qemu-img create -f qcow2 rhel80-64-virtio-scsi.qcow2 20G
     /usr/libexec/qemu-kvm \
    -S \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190123-032240-rOoB4cgD,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190123-032240-rOoB4cgD,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=id8Ec4Bn  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190123-032240-rOoB4cgD,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190123-032240-rOoB4cgD,path=/var/tmp/seabios-20190123-032240-rOoB4cgD,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190123-032240-rOoB4cgD,iobase=0x402 \
    -device ich9-usb-ehci1,id=usb1,addr=0x1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.4,firstport=4,bus=pci.0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x3 \
    -blockdev driver=file,node-name=file_base,filename=/home/kvm_autotest_root/images/rhel80-64-virtio-scsi.qcow2,auto-read-only=on \
    -blockdev driver=qcow2,file=file_base,node-name=drive_image1,auto-read-only=on \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:39:3a:3b:3c:3d,id=id1JNQsL,vectors=4,netdev=idVpZZ6A,bus=pci.0,addr=0x4  \
    -netdev tap,id=idVpZZ6A,vhost=on \
    -m 4096  \
    -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
    -cpu 'Penryn',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:0:3000,server,nowait \
    -incoming tcp:0:5000 \

4. In dst, set migrate capability and expose the target image.
    {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true},{"capability":"dirty-bitmaps","state":true}]}}
    {"execute":"nbd-server-start","arguments":{"addr":{"type":"inet","data":{"host":"10.66.144.33","port":"10809"}}}}
{"return": {}}
{ "execute": "nbd-server-add", "arguments":{ "device": "drive_image1", "writable": true } }
{"return": {}}

5. In src, add target image and do block mirror
    {"execute":"blockdev-add","arguments":{"driver":"nbd","node-name":"mirror","server":{"type":"inet","host":"10.66.144.33","port":"10809"},"export":"drive_image1"}}
{"return": {}}
{"execute": "blockdev-mirror", "arguments": { "device": "drive_image1","target": "mirror", "sync": "full", "job-id":"j1"}}
{"timestamp": {"seconds": 1550218788, "microseconds": 368165}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}}
{"timestamp": {"seconds": 1550218788, "microseconds": 368271}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"return": {}}
{"timestamp": {"seconds": 1550218856, "microseconds": 90211}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}}
{"timestamp": {"seconds": 1550218856, "microseconds": 90299}, "event": "BLOCK_JOB_READY", "data": {"device": "j1", "len": 21474836480, "offset": 21474836480, "speed": 0, "type": "mirror"}}

6. In src, after mirror reach steady status, set migration capability "pause-before-switchover".
   {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]}}

7. Do migrate from src to dst.
   {"execute": "migrate","arguments":{"uri": "tcp:10.66.144.33:5000"}}
{"timestamp": {"seconds": 1550218970, "microseconds": 456013}, "event": "MIGRATION", "data": {"status": "setup"}}
{"return": {}}
{"timestamp": {"seconds": 1550218970, "microseconds": 508258}, "event": "MIGRATION_PASS", "data": {"pass": 1}}
{"timestamp": {"seconds": 1550218970, "microseconds": 508414}, "event": "MIGRATION", "data": {"status": "active"}}
{"timestamp": {"seconds": 1550218970, "microseconds": 508460}, "event": "MIGRATION", "data": {"status": "failed"}}

  (qemu) qemu-kvm: Can't migrate a bitmap that is in use by another operation: 'bitmap0'

8. In src, check migrate info
{"execute":"query-migrate"}
{"return": {"status": "failed"}}

9. In dst, check if qemu's running
  Qemu quit automatically with info:
  (qemu) qemu-kvm: Unable to read node name string
qemu-kvm: error while loading state for instance 0x0 of device 'dirty-bitmap'
qemu-kvm: load of migration failed: Invalid argument

10. Quit vm in src
  (qemu)quit
  qemu coredump with info:
    qemu-kvm: block.c:3449: bdrv_close_all: Assertion `((&all_bdrv_states)->tqh_first == ((void *)0))' failed.
bug.txt: line 36:  6667 Aborted                 (core dumped) /usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -machine pc -nodefaults -device VGA,bus=pci.0,addr=0x2 -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190123-032240-rOoB4cgD,server,nowait -mon chardev=qmp_id_qmpmonitor1,mode=control ....

Comment 7 aihua liang 2019-05-08 14:19:59 UTC
Test it on qemu-kvm-rhev-2.12.0-27.el7.x86_64 with -drive, don't hit this issue, migration with bitmap on non-shared storage works normally.