Bug 1633536

Summary: Qemu core dump when do migration after hot plugging a backend image with 'blockdev-add'(without the frontend)
Product: Red Hat Enterprise Linux 7 Reporter: Gu Nini <ngu>
Component: qemu-kvm-rhevAssignee: Kevin Wolf <kwolf>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.6CC: chayang, coli, juzhang, mrezanin, mtessun, qzhang, virt-maint, xianwang, xuwei, yuhuang
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.12.0-20.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1654963 (view as bug list) Environment:
Last Closed: 2019-08-22 09:19:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1649160, 1651787, 1652906, 1654963    

Description Gu Nini 2018-09-27 09:10:40 UTC
Description of problem:
After hot plugging a backend image with 'blockdev-add', migrate the guest from src to dst with shared storage(I have tried nfs/rbd), it's found qemu core dump in src side when the migration nearly finishes:

# Core dump info in src side:
(qemu) qemu-kvm: block.c:855: bdrv_child_cb_inactivate: Assertion `bs->open_flags & 0x0800' failed.
./vm-mig1.sh: line 28: 30951 Aborted                 (core dumped) /usr/libexec/qemu-kvm

# Error info in dst side:
(qemu) qemu-kvm: Failed to load virtio_pci/modern_queue_state:avail
qemu-kvm: Failed to load virtio_pci/modern_state:vqs
qemu-kvm: Failed to load virtio/extra_state:extra_state
qemu-kvm: Failed to load virtio-net:virtio
qemu-kvm: load of migration failed: Input/output error


Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.12.0-18.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Boot up both src and dst guests, the images are on shared storage:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga std  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado1,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x4\
    -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
    -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
    -device virtserialport,bus=virtio-serial0.0,chardev=qga0,id=qemu-ga0,name=org.qemu.guest_agent.0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x6 \
    -blockdev node-name=disk0,file.driver=file,driver=qcow2,file.filename=/home/kvm_autotest_root/images/rhel75-64-virtio-scsi.qcow2 \
    -device scsi-hd,drive=disk0,id=image1 \
    -blockdev node-name=disk1,file.driver=file,driver=qcow2,file.filename=/home/kvm_autotest_root/images/hd1 \
    -device scsi-hd,drive=disk1,id=image11 \
    -device virtio-net-pci,mac=9a:78:79:7a:7b:7d,id=id8e5D72,vectors=4,netdev=idrYUYaH,bus=pci.0,addr=0x3 \
    -netdev tap,id=idrYUYaH,vhost=on \
    -m 4096  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -cpu Penryn \
    -vnc :20  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=c \
    -enable-kvm \
    -monitor stdio

    **-incoming tcp:0:5200** #Only for dst guest

2. In src side, create an image with qemu-img:
# qemu-img create -f qcow2 /home/kvm_autotest_root/images/fullbackup.qcow2 2G

3. In both src and dst sides, hot plug the image with blockdev-add in qmp:
{"execute":"blockdev-add","arguments":{"driver":"file","node-name":"fullbackup","filename":"/home/kvm_autotest_root/images/fullbackup.qcow2"}}
{"return": {}}
{ 'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'fbk','file':'fullbackup'}}
{"return": {}}

4. In src side, do migration and check its status in qmp:
{"execute":"migrate","arguments":{"uri":"tcp:10.66.8.198:5200"}}
{"execute":"query-migrate"}


Actual results:
In step4, qemu core dump when the migration is nearly to finish, the detailed debug info is as follows:
(gdb) bt full
#0  0x00007f00f62fc207 in raise () at /lib64/libc.so.6
#1  0x00007f00f62fd8f8 in abort () at /lib64/libc.so.6
#2  0x00007f00f62f5026 in __assert_fail_base () at /lib64/libc.so.6
#3  0x00007f00f62f50d2 in  () at /lib64/libc.so.6
#4  0x00005598f83b10fe in bdrv_child_cb_inactivate (child=<optimized out>) at block.c:855
        child = <optimized out>
        bs = <optimized out>
#5  0x00005598f83b2e82 in bdrv_inactivate_recurse (bs=0x5598fa902800, setting_flag=setting_flag@entry=true) at block.c:4460
        perm = <optimized out>
        shared_perm = <optimized out>
        child = <optimized out>
        parent = 0x5598face2500
        ret = <optimized out>
#6  0x00005598f83b51ed in bdrv_inactivate_all () at block.c:4512
        bs = <optimized out>
        it = {phase = BDRV_NEXT_MONITOR_OWNED, blk = 0x0, bs = 0x5598fa902800}
        ret = <optimized out>
        pass = 1
        aio_ctxs = <optimized out>
        ctx = <optimized out>
#7  0x00005598f83564c2 in qemu_savevm_state_complete_precopy (f=0x5598fb512000, iterable_only=<optimized out>, inactivate_disks=<optimized out>) at migration/savevm.c:1198
        vmdesc = 0x5598fc98e9b0
        vmdesc_len = <optimized out>
        se = 0x0
        ret = <optimized out>
        in_postcopy = false
        __func__ = "qemu_savevm_state_complete_precopy"
#8  0x00005598f835235e in migration_thread (opaque=0x5598fa5ec500) at migration/migration.c:2144
        s = 0x5598fa5ec500
        setup_start = <optimized out>
#9  0x00007f00f669add5 in start_thread () at /lib64/libpthread.so.0
#10 0x00007f00f63c3ead in clone () at /lib64/libc.so.6
(gdb) 


Expected results:
No core dump when do migration after hot plug the backend image.

Additional info:
1. Migrating guest after hot plugging both backend image and the frontend one(with device_add) is without the bug problem.
2. Migrating guest after hot plugging an backend image with '__com.redhat_drive_add'(while without the frontend) is without the bug problem.

Comment 2 Gu Nini 2018-09-27 10:16:21 UTC
Have tried on rhel7.5z qemu-kvm-rhev-2.10.0-21.el7_5.7.x86_64, it has the same issue, so the bug is not a regression.

(qemu) qemu-kvm: block.c:816: bdrv_child_cb_inactivate: Assertion `bs->open_flags & 0x0800' failed.
./vm-mig1.sh: line 28: 18378 Aborted                 (core dumped) /usr/libexec/qemu-kvm

Comment 6 Miroslav Rezanina 2018-12-06 12:40:22 UTC
Fix included in qemu-kvm-rhev-2.12.0-20.el7

Comment 8 Kevin Wolf 2018-12-07 17:20:56 UTC
*** Bug 1655972 has been marked as a duplicate of this bug. ***

Comment 9 lchai 2018-12-10 03:29:30 UTC
Host:
kernel-3.10.0-957.1.2.el7.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.2.x86_64

Guest:
kernel-3.10.0-957.el7.x86_64

This issue was reproduced with above test environment.

(gdb) bt
#0  0x00007ff47f77b207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007ff47f77c8f8 in __GI_abort () at abort.c:90
#2  0x00007ff47f774026 in __assert_fail_base (fmt=0x7ff47f8ceea0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x55a658647e18 "bs->open_flags & 0x0800", file=file@entry=0x55a6585ad7cc "block.c", line=line@entry=855, function=function@entry=0x55a658649b60 <__PRETTY_FUNCTION__.30767> "bdrv_child_cb_inactivate") at assert.c:92
#3  0x00007ff47f7740d2 in __GI___assert_fail (assertion=assertion@entry=0x55a658647e18 "bs->open_flags & 0x0800", file=file@entry=0x55a6585ad7cc "block.c", line=line@entry=855, function=function@entry=0x55a658649b60 <__PRETTY_FUNCTION__.30767> "bdrv_child_cb_inactivate") at assert.c:101
#4  0x000055a65840020e in bdrv_child_cb_inactivate (child=<optimized out>) at block.c:855
#5  0x000055a658401f92 in bdrv_inactivate_recurse (bs=0x55a65a410800, setting_flag=setting_flag@entry=true) at block.c:4460
#6  0x000055a6584042fd in bdrv_inactivate_all () at block.c:4512
#7  0x000055a6583a55d2 in qemu_savevm_state_complete_precopy (f=0x55a65b2d4000, iterable_only=<optimized out>, inactivate_disks=<optimized out>) at migration/savevm.c:1198
#8  0x000055a6583a146e in migration_thread (opaque=0x55a65a358280) at migration/migration.c:2144
#9  0x00007ff47fb19dd5 in start_thread (arg=0x7ff358cb3700) at pthread_create.c:307
#10 0x00007ff47f842ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Comment 10 lchai 2018-12-10 06:04:20 UTC
Host:
kernel-3.10.0-957.1.2.el7.x86_64
qemu-kvm-rhev-2.12.0-20.el7.x86_64

Guest:
kernel-3.10.0-957.el7.x86_64

With qemu-kvm-rhev-2.12.0-20.el7.x86_64, this issue fixed. 

Hot plugging a backend image with 'blockdev-add', and no device is attached to it, then could do migration operation successfully.

Steps:
1)Boot up both src and dst guests:
/usr/libexec/qemu-kvm \
       	-S \
       	-name 'vm-test-2' \
	-boot menu=on \
       	-sandbox off \
       	-machine pc \
       	-nodefaults \
       	-device qxl-vga,bus=pci.0,addr=0x2 \
	-drive id=drive_win,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvmtest_lchai/win.qcow2 \
	-device virtio-blk-pci,id=sys_disk,drive=drive_win,bus=pci.0,addr=0x4 \
	-device virtio-net-pci,mac=30:9c:23:c7:45:78,id=iddd,vectors=4,netdev=idttt \
	-netdev tap,id=idttt,vhost=on \
	-m 4G \
	-smp 12,maxcpus=12,cores=6,threads=1,sockets=2 \
	-cpu 'Penryn' \
	-rtc base=utc,clock=host,driftfix=slew \
	-enable-kvm \
	-monitor stdio \
	-vnc :1 \
        -qmp tcp:127.0.0.1:4444,server,nowait

For dst guest, 
        -incoming tcp:0:5200 \ 
	-vnc :2 \
        -qmp tcp:127.0.0.1:4445,server,nowait

2)In both src and dst sides, hot plug the image with blockdev-add in qmp:
# qemu-img create -f qcow2 fullbackup.qcow2 10G
 {"execute":"blockdev-add","arguments":{"driver":"file","node-name":"fullbackup","filename":"/home/kvmtest_lchai/fullbackup.qcow2"}}
 { 'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'fbk','file':'fullbackup'}}

3)In src, do migration and check its status in qmp:
 {"execute":"migrate","arguments":{"uri":"tcp:0:5200"}}
=> The migration operation succeeded.

Comment 13 errata-xmlrpc 2019-08-22 09:19:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2553