Bug 1728148 - Qemu hang during migration after block mirror finished with virtio_blk and dataplane enabled
Summary: Qemu hang during migration after block mirror finished with virtio_blk and da...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 8.0
Assignee: Hanna Czenczek
QA Contact: aihua liang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-09 07:30 UTC by aihua liang
Modified: 2019-11-06 07:17 UTC (History)
10 users (show)

Fixed In Version: qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-06 07:17:18 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:3723 0 None None None 2019-11-06 07:17:38 UTC

Description aihua liang 2019-07-09 07:30:34 UTC
Description of problem:
 Qemu hang during migration after block mirror finished with virtio_blk and dataplane enabled

Version-Release number of selected component (if applicable):
 qemu-kvm version:qemu-kvm-3.1.0-28.module+el8.0.1+3556+b59953c6.x86_64

How reproducible:
 100%

Steps to Reproduce:
1.Create an empty image and start dst guest with virtio_blk device and dataplane enabled.
   /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190602-221944-MrlxVzib,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190602-221944-MrlxVzia,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idn20piu  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190602-221944-MrlxVzia,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190602-221944-MrlxVzia,path=/var/tmp/seabios-20190602-221944-MrlxVzia,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190602-221944-MrlxVzia,iobase=0x402 \
    -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -object iothread,id=iothread0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,file=/home/kvm_autotest_root/images/mirror.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bus=pci.0,addr=0x4,iothread=iothread0 \
    -device virtio-net-pci,mac=9a:33:34:35:36:37,id=idj01pFr,vectors=4,netdev=idMgbx8B,bus=pci.0,addr=0x5  \
    -netdev tap,id=idMgbx8B,vhost=on \
    -m 4096  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'Skylake-Client',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :1  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:0:3001,server,nowait \
    -incoming tcp:0:5000 

2.Start src guest with qemu cmds:
   /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190602-221944-MrlxVzia,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190602-221944-MrlxVzia,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idn20piu  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190602-221944-MrlxVzia,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190602-221944-MrlxVzia,path=/var/tmp/seabios-20190602-221944-MrlxVzia,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190602-221944-MrlxVzia,iobase=0x402 \
    -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -object iothread,id=iothread0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,file=/home/kvm_autotest_root/images/rhel801-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bus=pci.0,addr=0x4,iothread=iothread0 \
    -device virtio-net-pci,mac=9a:33:34:35:36:37,id=idj01pFr,vectors=4,netdev=idMgbx8B,bus=pci.0,addr=0x5  \
    -netdev tap,id=idMgbx8B,vhost=on \
    -m 4096  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'Skylake-Client',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:0:3000,server,nowait \

3. Do mirror from src to dst
   { "execute": "drive-mirror", "arguments": { "device": "drive_image1","target":"/home/kvm_autotest_root/images/mirror.qcow2","sync":"full","mode":"existing"}}

4. Set migration capabilities
   {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]}}

5. Migrate from src to dst
   { "execute": "migrate", "arguments": { "uri": "tcp:localhost:5000"}}

Actual results:
After step5, guest hang.
gdb -p 11891
(gdb) bt
#0  0x00007f127949689d in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1  0x00007f127948fb59 in __GI___pthread_mutex_lock
    (mutex=mutex@entry=0x55f0a105c500 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
#2  0x000055f0a08291cd in qemu_mutex_lock_impl
    (mutex=0x55f0a105c500 <qemu_global_mutex>, file=0x55f0a09b22f4 "util/main-loop.c", line=236)
    at util/qemu-thread-posix.c:66
#3  0x000055f0a051b47e in qemu_mutex_lock_iothread_impl
    (file=file@entry=0x55f0a09b22f4 "util/main-loop.c", line=line@entry=236)
    at /usr/src/debug/qemu-kvm-3.1.0-28.module+el8.0.1+3556+b59953c6.x86_64/cpus.c:1849
#4  0x000055f0a082605d in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:236
#5  0x000055f0a082605d in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497
#6  0x000055f0a0615529 in main_loop () at vl.c:1910
#7  0x000055f0a04d592f in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at vl.c:4681

Expected results:
   Storage vm migration can executed successfully with virtio_blk and dataplane enable.

Additional info:
  Same bug on RHEL7 -bz#1665256

Comment 1 aihua liang 2019-07-09 07:44:49 UTC
Test on qemu-kvm-4.0.0-4.module+el8.1.0+3523+b348b848.x86_64, don't hit this issue.

Comment 3 Hanna Czenczek 2019-07-27 15:46:11 UTC
Hi,

It looks like this was fixed by 5e771752a1ffba3a99d7d75b6d492b4a86b59e1b.  Now I don’t know what we should do about this BZ.  Said commit is included in 4.0.0 (as you say in comment 1), so it will be fixed in 8.1.0.  Should we fix it in z-stream for 8.0.1?  (It looks like it would be a trivial backport.)

Max

Comment 4 Ademar Reis 2019-08-13 16:21:48 UTC
(In reply to Max Reitz from comment #3)
> Hi,
> 
> It looks like this was fixed by 5e771752a1ffba3a99d7d75b6d492b4a86b59e1b. 
> Now I don’t know what we should do about this BZ.  Said commit is included
> in 4.0.0 (as you say in comment 1), so it will be fixed in 8.1.0.  Should we
> fix it in z-stream for 8.0.1?  (It looks like it would be a trivial
> backport.)

It will be fixed in 8.1 Advanced Virt (RHEL-AV, which rebases qemu and libvirt) and this BZ was open for RHEL, which is still stuck with qemu-2.12 and libvirt-4.5.

But I see the test was done with qemu from RHEL-AV, so the product is wrong.

Aihua: please retest with qemu from RHEL (not RHEL-AV).

Comment 6 aihua liang 2019-08-14 02:23:18 UTC
(In reply to Ademar Reis from comment #4)
> (In reply to Max Reitz from comment #3)
> > Hi,
> > 
> > It looks like this was fixed by 5e771752a1ffba3a99d7d75b6d492b4a86b59e1b. 
> > Now I don’t know what we should do about this BZ.  Said commit is included
> > in 4.0.0 (as you say in comment 1), so it will be fixed in 8.1.0.  Should we
> > fix it in z-stream for 8.0.1?  (It looks like it would be a trivial
> > backport.)
> 
> It will be fixed in 8.1 Advanced Virt (RHEL-AV, which rebases qemu and
> libvirt) and this BZ was open for RHEL, which is still stuck with qemu-2.12
> and libvirt-4.5.
> 
> But I see the test was done with qemu from RHEL-AV, so the product is wrong.
> 
> Aihua: please retest with qemu from RHEL (not RHEL-AV).

Hi, Ademar

   Make some correction:

       The feature is in back list on RHEL, so i just run test on RHEL-AV.
 
       I found the bug on 8.0.1-av, and it works ok on 8.1.0-av as i commented in #c1

   Have changed product label to "Red Hat Enterprise Linux Advanced Virtualization"
   Sorry for the confusing..

aliang

Comment 8 aihua liang 2019-08-16 08:27:57 UTC
Verified on qemu-kvm-4.1.0-1.module+el8.1.0+3966+4a23dca1.x86_64, the problem has been resolved, set bug's status to "Verified".

Test Steps:
  1.Create an empty image and start dst guest with virtio_blk device and dataplane enabled.
   /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190602-221944-MrlxVzib,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190602-221944-MrlxVzia,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idn20piu  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190602-221944-MrlxVzia,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190602-221944-MrlxVzia,path=/var/tmp/seabios-20190602-221944-MrlxVzia,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190602-221944-MrlxVzia,iobase=0x402 \
    -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -object iothread,id=iothread0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,file=/home/kvm_autotest_root/images/mirror.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bus=pci.0,addr=0x4,iothread=iothread0 \
    -device virtio-net-pci,mac=9a:33:34:35:36:37,id=idj01pFr,vectors=4,netdev=idMgbx8B,bus=pci.0,addr=0x5  \
    -netdev tap,id=idMgbx8B,vhost=on \
    -m 4096  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'Skylake-Client',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :1  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:0:3001,server,nowait \
    -incoming tcp:0:5000 

2.Start src guest with qemu cmds:
   /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190602-221944-MrlxVzia,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190602-221944-MrlxVzia,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idn20piu  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190602-221944-MrlxVzia,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190602-221944-MrlxVzia,path=/var/tmp/seabios-20190602-221944-MrlxVzia,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190602-221944-MrlxVzia,iobase=0x402 \
    -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -object iothread,id=iothread0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,file=/home/kvm_autotest_root/images/rhel801-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bus=pci.0,addr=0x4,iothread=iothread0 \
    -device virtio-net-pci,mac=9a:33:34:35:36:37,id=idj01pFr,vectors=4,netdev=idMgbx8B,bus=pci.0,addr=0x5  \
    -netdev tap,id=idMgbx8B,vhost=on \
    -m 4096  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'Skylake-Client',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:0:3000,server,nowait \

3. Do mirror from src to dst
   { "execute": "drive-mirror", "arguments": { "device": "drive_image1","target":"/home/kvm_autotest_root/images/mirror.qcow2","sync":"full","mode":"existing"}}

4. Set migration capabilities
   {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]}}

5. Migrate from src to dst
   { "execute": "migrate", "arguments": { "uri": "tcp:localhost:5000"}}
   {"timestamp": {"seconds": 1565943473, "microseconds": 568457}, "event": "STOP"}

6. Cancel block job:
   {"execute":"block-job-cancel","arguments":{"device":"drive_image1"}}
{"return": {}}
{"timestamp": {"seconds": 1565943683, "microseconds": 388539}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "drive_image1"}}
{"timestamp": {"seconds": 1565943683, "microseconds": 388572}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "drive_image1"}}
{"timestamp": {"seconds": 1565943683, "microseconds": 388872}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_image1", "len": 4905172992, "offset": 4905172992, "speed": 0, "type": "mirror"}}
{"timestamp": {"seconds": 1565943683, "microseconds": 388901}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "drive_image1"}}
{"timestamp": {"seconds": 1565943683, "microseconds": 388936}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "drive_image1"}}

7. Migrate continue
   {"execute":"migrate-continue","arguments":{"state":"pre-switchover"}}

8. In src, check vm status
  (qemu)info status
VM status: paused (postmigrate)

9. In dst, check vm status
  (qemu) info status 
VM status: running

10. In dst, reset vm
  (qemu)system_reset

After step10, vm restart successfully.

Comment 11 errata-xmlrpc 2019-11-06 07:17:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3723


Note You need to log in before you can comment on or make changes to this bug.