Bug 1449031

Summary: qemu core dump when hot-unplug/hot-plug scsi controller in turns
Product: Red Hat Enterprise Linux 7 Reporter: lijin <lijin>
Component: qemu-kvm-rhevAssignee: Fam Zheng <famz>
Status: CLOSED ERRATA QA Contact: Xueqiang Wei <xuwei>
Severity: high Docs Contact:
Priority: high    
Version: 7.4CC: aliang, chayang, coli, drjones, famz, hhan, jinchen, jinzhao, juzhang, knoel, lprosek, michen, pbonzini, virt-maint
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-10.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1458782 (view as bug list) Environment:
Last Closed: 2017-08-02 04:38:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1456511    

Description lijin 2017-05-09 06:04:35 UTC
Description of problem:


Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.9.0-2.el7.x86_64&qemu-kvm-rhev-2.9.0-3.el7.x86_64
kernel-3.10.0-661.el7.x86_64
seabios-1.10.2-2.el7.x86_64
virtio-win-prewhql-136

How reproducible:
100%

Steps to Reproduce:
1.boot win7-64 guest with scsi disk and blk disk:
/usr/libexec/qemu-kvm \
  -M pc \
  -cpu host \
  -enable-kvm \
  -m 2G \
  -smp 4 \
  -nodefconfig \
  -rtc base=localtime,driftfix=slew \
  -device virtio-scsi-pci,id=scsi0,disable-modern=true \
  -drive file=win7-64-iso.raw,if=none,serial=virtioblk1,format=raw,cache=none,werror=stop,rerror=stop,id=drive-virtio-disk0,aio=native \
  -device scsi-hd,bus=scsi0.0,drive=drive-virtio-disk0,id=virtio-disk0 \
  -device piix3-usb-uhci,id=usb \
  -device usb-tablet,id=tablet0 \
  -vnc 0.0.0.0:0 \
  -k en-us \
  -vga std \
  -qmp tcp:0:4444,server,nowait \
  -boot menu=on \
  -monitor stdio \
  -device virtio-scsi-pci,id=scsi1 \
  -drive file=disk2.raw,if=none,format=raw,cache=none,werror=stop,rerror=stop,id=drive-virtio-disk2,aio=native \
  -device scsi-hd,bus=scsi1.0,drive=drive-virtio-disk2,id=virtio-disk2 \
  -object iothread,id=thread0 -drive file=disk3.raw,if=none,id=drive-virtio-disk00,format=raw,cache=none -device virtio-blk-pci,iothread=thread0,scsi=off,bus=pci.0,drive=drive-virtio-disk00,id=virtio-disk00

2.hot-unplg scsi disk and blk disk:
(qemu) device_del scsi1
(qemu) device_del virtio-disk00

3.hotplug scsi and blk disk back:
(qemu) device_add virtio-scsi-pci,id=scsi1
(qemu) __com.redhat_drive_add file=disk2.raw,format=raw,id=scsi-drive
(qemu) __com.redhat_drive_add file=disk3.raw,format=raw,id=blk-drive
(qemu) device_add scsi-hd,bus=scsi1.0,drive=scsi-drive,id=scsi-disk
(qemu) device_add virtio-blk-pci,drive=blk-drive,id=blk-disk,iothread=thread0


Actual results:
after step3,qemu core dump
(gdb) bt
#0  memory_listener_register (listener=listener@entry=0x562080ba6260, as=as@entry=0x562080ba6210)
    at /usr/src/debug/qemu-2.9.0/memory.c:2381
#1  0x000056207c335447 in address_space_init_dispatch (as=as@entry=0x562080ba6210)
    at /usr/src/debug/qemu-2.9.0/exec.c:2561
#2  0x000056207c384c67 in address_space_init (as=0x562080ba6210, root=0x562080ba6320, 
    name=0x562080ba60b8 "") at /usr/src/debug/qemu-2.9.0/memory.c:2425
#3  0x000056207c4f602f in do_pci_register_device (errp=0x7ffcb8b41fc0, devfn=<optimized out>, 
    name=0x56207eb3bca0 "virtio-blk-pci", bus=0x56207f166000, pci_dev=0x562080ba6000) at hw/pci/pci.c:1006
#4  pci_qdev_realize (qdev=0x562080ba6000, errp=0x7ffcb8b41fc0) at hw/pci/pci.c:1994
#5  0x000056207c49a4f1 in device_set_realized (obj=<optimized out>, value=<optimized out>, 
    errp=0x7ffcb8b420f8) at hw/core/qdev.c:939
#6  0x000056207c580aae in property_set_bool (obj=0x562080ba6000, v=<optimized out>, name=<optimized out>, 
    opaque=0x562080a57d60, errp=0x7ffcb8b420f8) at qom/object.c:1860
#7  0x000056207c58476f in object_property_set_qobject (obj=0x562080ba6000, value=<optimized out>, 
    name=0x56207c6a9d2b "realized", errp=0x7ffcb8b420f8) at qom/qom-qobject.c:27
#8  0x000056207c5825e0 in object_property_set_bool (obj=0x562080ba6000, value=<optimized out>, 
    name=0x56207c6a9d2b "realized", errp=0x7ffcb8b420f8) at qom/object.c:1163
#9  0x000056207c445ad3 in qdev_device_add (opts=opts@entry=0x56207eb3f950, errp=errp@entry=0x7ffcb8b421d0)
    at qdev-monitor.c:623
#10 0x000056207c446063 in qmp_device_add (qdict=<optimized out>, ret_data=ret_data@entry=0x0, 
    errp=errp@entry=0x7ffcb8b42200) at qdev-monitor.c:800
#11 0x000056207c45ecea in hmp_device_add (mon=<optimized out>, qdict=<optimized out>) at hmp.c:1720
#12 0x000056207c37270e in handle_hmp_command (mon=mon@entry=0x56207eb14000, 
    cmdline=0x56207ec5c00b "virtio-blk-pci,drive=blk-drive,id=blk-disk,iothread=thread0")
    at /usr/src/debug/qemu-2.9.0/monitor.c:3111
#13 0x000056207c373d97 in monitor_command_cb (opaque=0x56207eb14000, cmdline=<optimized out>, 
    readline_opaque=<optimized out>) at /usr/src/debug/qemu-2.9.0/monitor.c:3909
#14 0x000056207c64f118 in readline_handle_byte (rs=0x56207ec5c000, ch=<optimized out>)
    at util/readline.c:393
#15 0x000056207c372917 in monitor_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>)
    at /usr/src/debug/qemu-2.9.0/monitor.c:3892
#16 0x000056207c5eaa9f in fd_chr_read (chan=0x56207eb34220, cond=<optimized out>, opaque=0x56207ebdcbb0)
    at chardev/char-fd.c:66
#17 0x00007f008cce94c9 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#18 0x000056207c63ce6c in glib_pollfds_poll () at util/main-loop.c:213
#19 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
#20 main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:517
#21 0x000056207c32e01c in main_loop () at vl.c:1898
#22 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4720

Expected results:
disk can be hot-pluged successfully,no core dump 

Additional info:
can NOT reproduce with rhel7.3 released version:qemu-kvm-rhev-10:2.6.0-27.el7
so set regression keyword

Comment 3 lijin 2017-05-09 07:42:45 UTC
hit similar issue when hotplug/hot-unplug scsi controller in a loop:

steps:
1.boot guest with scsi disk:
/usr/libexec/qemu-kvm \
  -M pc \
  -cpu host \
  -enable-kvm \
  -m 2G \
  -smp 4 \
  -nodefconfig \
  -rtc base=localtime,driftfix=slew \
  -device virtio-scsi-pci,id=scsi0,disable-modern=true \
  -drive file=win7-64-iso.raw,if=none,serial=virtioblk1,format=raw,cache=none,werror=stop,rerror=stop,id=drive-virtio-disk0,aio=native \
  -device scsi-hd,bus=scsi0.0,drive=drive-virtio-disk0,id=virtio-disk0 \
  -device piix3-usb-uhci,id=usb \
  -device usb-tablet,id=tablet0 \
  -vnc 0.0.0.0:0 \
  -k en-us \
  -vga std \
  -qmp tcp:0:4444,server,nowait \
  -boot menu=on \
  -monitor stdio \
  -device virtio-scsi-pci,id=scsi1 \
  -drive file=disk1.raw,if=none,format=raw,cache=none,werror=stop,rerror=stop,id=drive-virtio-disk2,aio=native \
  -device scsi-hd,bus=scsi1.0,drive=drive-virtio-disk2,id=virtio-disk2 \

2.after guest boot up,hotunplug and hotplug scsi controller/disk in a loop:
#!/bin/bash
# some simply scripts for virtio scsi device hotplug/unplug in a loop
let i=0
exec 3<>/dev/tcp/localhost/4444 #note modify this to qmp port
echo -e "{ 'execute': 'qmp_capabilities' }" >&3
read response <&3
echo $response
while [ $i -lt 100 ]
do
echo -e "{ 'execute': 'device_del', 'arguments': {'id': 'scsi1' }}">&3 ;
sleep 5 ;
read response <&3 ;
echo "$i: $response"
sleep 5 ;
echo -e "{'execute':'__com.redhat_drive_add', 'arguments': {'file':'disk1.raw','format':'raw','id':'drive-scsi-disk1'}}">&3 ;
sleep 5 ;
read response <&3
echo "$i: $response"
sleep 5
echo -e "{'execute':'device_add','arguments':{'driver':'virtio-scsi-pci','id':'scsi1'}}" >&3
read response <&3
echo "$i: $response"
sleep 5
echo -e "{'execute':'device_add','arguments':{'driver':'scsi-hd','drive':'drive-scsi-disk1','id':'scsi-disk1'}}" >&3
read response <&3
echo "$i: $response"
let i=$i+1
done

result:
after 1 or 2 round of plug,qemu core dump

Comment 4 Fam Zheng 2017-05-10 04:12:59 UTC
Git bisection points to this commit:

commit c53598ed18e40a9609573b21f2a361221ca0f806
Author: Alexey Kardashevskiy <aik>
Date:   Mon Mar 27 15:40:30 2017 +1100

    pci: Add missing drop of bus master AS reference
    
    The recent introduction of a bus master container added
    memory_region_add_subregion() into the PCI device registering path but
    missed memory_region_del_subregion() in the unregistering path leaving
    a reference to the root memory region of the new container.
    
    This adds missing memory_region_del_subregion().
    
    Fixes: 3716d5902d743 ("pci: introduce a bus master container")
    Signed-off-by: Alexey Kardashevskiy <aik>
    Reviewed-by: Michael S. Tsirkin <mst>
    Signed-off-by: Michael S. Tsirkin <mst>
    Reviewed-by: Paolo Bonzini <pbonzini>

Comment 5 Markus Armbruster 2017-05-10 07:07:38 UTC
*** Bug 1447548 has been marked as a duplicate of this bug. ***

Comment 6 Fam Zheng 2017-05-16 12:30:28 UTC
I've posted a fix for upstream:

https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg03675.html

Comment 7 Ladi Prosek 2017-05-30 07:47:15 UTC
virtio-serial has the same problem.

  qbus_set_hotplug_handler(BUS(&vser->bus), DEVICE(vser), errp);

where vser controls the lifetime of bus and bus keeps a ref-count back on object.

I'l post a fix upstream shortly. Li Jin, do you want to open another BZ for virtio-serial or would you prefer to make this one more generic?

In any case, I believe that these fixes should be treated as blockers and backported to RHEL 7.4.

Comment 8 Fam Zheng 2017-05-30 08:34:32 UTC
FWIW, I revised the fix in comment 6 into:

    virtio-scsi: Unset hotplug handler when unrealize

which is now in Paolo's pull request for 2.10:

https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg06081.html

He also hinted that the handler could be cleaned up in bus_unparent(), but I didn't come up with an actual patch due to lack of knowledge in qdev/QOM ref count:

https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg04036.html

Comment 9 Ladi Prosek 2017-05-30 09:02:35 UTC
Thanks! Virtio-serial version of Fam's patch has been posted:
https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg06571.html

Comment 14 Han Han 2017-06-13 09:26:11 UTC
*** Bug 1454801 has been marked as a duplicate of this bug. ***

Comment 15 Miroslav Rezanina 2017-06-13 16:34:32 UTC
Fix included in qemu-kvm-rhev-2.9.0-10.el7

Comment 16 jingzhao 2017-06-14 03:06:53 UTC
*** Bug 1454537 has been marked as a duplicate of this bug. ***

Comment 17 Xueqiang Wei 2017-06-14 03:29:29 UTC
reproduce it on qemu-kvm-rhev-2.9.0-8.el7

after step 3, qemu core dump:

(gdb) bt
#0  0x00005586f7b16671 in memory_listener_register (listener=listener@entry=0x5586fb974260, as=as@entry=0x5586fb974210)
    at /usr/src/debug/qemu-2.9.0/memory.c:2381
#1  0x00005586f7ac6d37 in address_space_init_dispatch (as=as@entry=0x5586fb974210) at /usr/src/debug/qemu-2.9.0/exec.c:2561
#2  0x00005586f7b16897 in address_space_init (as=0x5586fb974210, root=0x5586fb974320, name=0x5586fb9740b8 "") at /usr/src/debug/qemu-2.9.0/memory.c:2425
#3  0x00005586f7c8535f in pci_qdev_realize (errp=0x7ffc51bd0250, devfn=<optimized out>, name=0x5586f990fc90 "virtio-blk-pci", bus=0x5586f9f34000, pci_dev=0x5586fb974000) at hw/pci/pci.c:1006
#4  0x00005586f7c8535f in pci_qdev_realize (qdev=0x5586fb974000, errp=0x7ffc51bd0250) at hw/pci/pci.c:1994
#5  0x00005586f7c29811 in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7ffc51bd0388) at hw/core/qdev.c:939
#6  0x00005586f7d0fcbe in property_set_bool (obj=0x5586fb974000, v=<optimized out>, name=<optimized out>, opaque=0x5586fb820f30, errp=0x7ffc51bd0388)
    at qom/object.c:1860
#7  0x00005586f7d1397f in object_property_set_qobject (obj=0x5586fb974000, value=<optimized out>, name=0x5586f7e399eb "realized", errp=0x7ffc51bd0388)
    at qom/qom-qobject.c:27
#8  0x00005586f7d117f0 in object_property_set_bool (obj=0x5586fb974000, value=<optimized out>, name=0x5586f7e399eb "realized", errp=0x7ffc51bd0388)
    at qom/object.c:1163
---Type <return> to continue, or q <return> to quit---
#9  0x00005586f7bd81a3 in qdev_device_add (opts=opts@entry=0x5586f9913d10, errp=errp@entry=0x7ffc51bd0460) at qdev-monitor.c:623
#10 0x00005586f7bd8733 in qmp_device_add (qdict=<optimized out>, ret_data=ret_data@entry=0x0, errp=errp@entry=0x7ffc51bd0490) at qdev-monitor.c:800
#11 0x00005586f7bf139a in hmp_device_add (mon=<optimized out>, qdict=<optimized out>) at hmp.c:1720
#12 0x00005586f7b0400e in handle_hmp_command (mon=mon@entry=0x5586f98ec000, cmdline=0x5586f9a2c00b "virtio-blk-pci,drive=blk-drive,id=blk-disk,iothread=thread0") at /usr/src/debug/qemu-2.9.0/monitor.c:3119
#13 0x00005586f7b05697 in monitor_command_cb (opaque=0x5586f98ec000, cmdline=<optimized out>, readline_opaque=<optimized out>)
    at /usr/src/debug/qemu-2.9.0/monitor.c:3917
#14 0x00005586f7ddf918 in readline_handle_byte (rs=0x5586f9a2c000, ch=<optimized out>) at util/readline.c:393
#15 0x00005586f7b04217 in monitor_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-2.9.0/monitor.c:3900
#16 0x00005586f7d7ae9f in fd_chr_read (chan=0x5586f9908220, cond=<optimized out>, opaque=0x5586f99acbb0) at chardev/char-fd.c:66
#17 0x00007f18f6e8d4c9 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#18 0x00005586f7dcd66c in main_loop_wait () at util/main-loop.c:213
#19 0x00005586f7dcd66c in main_loop_wait (timeout=<optimized out>)
    at util/main-loop.c:261
#20 0x00005586f7dcd66c in main_loop_wait (nonblocking=nonblocking@entry=0)
---Type <return> to continue, or q <return> to quit---
    at util/main-loop.c:517
#21 0x00005586f7abf8fc in main () at vl.c:1898
#22 0x00005586f7abf8fc in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4722


Tested on qemu-kvm-rhev-2.9.0-10.el7, not hit this issue, so verify it.

details after step 3:
(qemu) device_add virtio-scsi-pci,id=scsi1
(qemu) __com.redhat_drive_add file=disk2.raw,format=raw,id=scsi-drive
(qemu) __com.redhat_drive_add file=disk3.raw,format=raw,id=blk-drive
(qemu) device_add scsi-hd,bus=scsi1.0,drive=scsi-drive,id=scsi-disk
(qemu) device_add virtio-blk-pci,drive=blk-drive,id=blk-disk,iothread=thread0
(qemu) info block
drive-virtio-disk0 (#block187): /mnt/t/rhel69-64-virtio.raw (raw)
    Cache mode:       writeback, direct
floppy0: [not inserted]
    Removable device: not locked, tray closed
sd0: [not inserted]
    Removable device: not locked, tray closed
scsi-drive (#block731): disk2.raw (raw)
    Cache mode:       writeback
blk-drive (#block962): disk3.raw (raw)
    Cache mode:       writeback
(qemu) system_reset 
(qemu) system_powerdown

Comment 18 jingzhao 2017-06-14 07:47:47 UTC
*** Bug 1449143 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2017-08-02 04:38:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392