RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1516663 - Qemu-kvm core dumped when unplugging the data disk drive while writing to it
Summary: Qemu-kvm core dumped when unplugging the data disk drive while writing to it
Keywords:
Status: CLOSED DUPLICATE of bug 1486594
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.5
Hardware: All
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Eric Blake
QA Contact: Xueqiang Wei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-23 08:40 UTC by yilzhang
Modified: 2018-01-30 05:56 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-01-30 05:56:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description yilzhang 2017-11-23 08:40:09 UTC
Description of problem:
Create a disk image using iSCSI backend and a snapshot, then boot up one guest using both as two separate data disks. Write to the two data disks and meanwhile hot-unplug the snapshot image, qemu-kvm will crash with core dumped.


Version-Release number of selected component (if applicable):
Host:         IBM virt8 (POK name: c155f2-u3, IP:10.0.1.4)
Host kernel:   4.14.0-2.el7a.ppc64le
Guest kernel:  
qemu-kvm-rhev: qemu-kvm-rhev-2.10.0-6.el7
SLOF: SLOF-20170724-2.git89f519f.el7.noarch

How reproducible: 100%


Steps to Reproduce:
1. Create a base image using iSCSI backend
[Host]#  qemu-img create -f qcow2 iscsi://10.0.0.7/iqn.2017-08.com.yilzhang:t1/0  2G
2. Create a snapshot using the above image as backing file
[Host]# qemu-img create -f qcow2 /home/yilzhang/nfs/sn1   -b iscsi://10.0.0.7/iqn.2017-08.com.yilzhang:t1/0

3. Boot a guest with the above two images as two separate data disks, as iscsi is not protected by image locking, so here guest can successfully boot up
/usr/libexec/qemu-kvm \
 -smp 8,sockets=2,cores=4,threads=1 -m 8192 \
 -serial unix:/tmp/df-serial.log,server,nowait \
 -nodefaults \
 -rtc base=localtime,clock=host \
 -boot menu=on \
 -monitor stdio \
 -qmp tcp:0:991,server,nowait \
 -device virtio-vga -vnc :91 \
 -device pci-bridge,id=bridge1,chassis_nr=1,bus=pci.0 \
 -device pci-bridge,id=bridge2,chassis_nr=2,bus=bridge1,addr=0x3 \
\
-device virtio-scsi-pci,bus=bridge1,addr=0x1f,id=scsi0 \
-drive file=rhel.qcow2,media=disk,if=none,cache=none,id=drive_sysdisk,aio=native,format=qcow2,werror=stop,rerror=stop \
-device scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0 \
\
-device nec-usb-xhci,id=xhci0,bus=bridge1,addr=0x1d \
-device usb-mouse,bus=xhci0.0 -device usb-kbd \
-netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \
-device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c3:e7:8f,bus=bridge2,addr=0x1e \
\
\
-device virtio-scsi-pci,bus=bridge2,addr=0x1f,id=scsi1 \
-drive file=/home/yilzhang/nfs/sn1,if=none,cache=none,id=drive_ddisk_2,format=qcow2,werror=stop,rerror=stop \
-device scsi-hd,drive=drive_ddisk_2,bus=scsi1.0,id=ddisk_2 \
-drive file=iscsi://10.0.0.7/iqn.2017-08.com.yilzhang:t1/0,if=none,cache=none,id=drive_ddisk_1,format=qcow2,werror=stop,rerror=stop \
-device scsi-hd,drive=drive_ddisk_1,bus=scsi1.0,id=ddisk_1 \

4. Login guest, and check that there are two data disks with the same size
[root@virt8-yilzhang-Guest ~]# lsblk -p
NAME                      MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
/dev/sdb                    8:16   0   2G  0 disk 
/dev/sdc                    8:32   0   2G  0 disk 

5. Write to these two data disks simultaneously using "dd"
[Guest]# dd if=/dev/urandom of=/dev/sdb bs=1M count=2000 oflag=sync  &
[Guest]# dd if=/dev/random of=/dev/sdc bs=1M count=2000 oflag=sync  &

6. Hot-unplug the snapshot image when "dd" is ongoing
(qemu) drive_del  drive_ddisk_2


Actual results:
qemu-kvm crashed with core dumped
(qemu)  drive_del  drive_ddisk_2
iscsi-backingfile.sh: line 28: 81281 Segmentation fault      (core dumped) /usr/libexec/qemu-kvm -smp 8,sockets=2,cores=4,threads=1 -m 8192 -serial unix:/tmp/df-serial.log,server,nowait -nodefaults -rtc base=localtime,clock=host -boot menu=on -monitor stdio -qmp tcp:0:991,server,nowait -device virtio-vga -vnc :91 -device pci-bridge,id=bridge1,chassis_nr=1,bus=pci.0 -device pci-bridge,id=bridge2,chassis_nr=2,bus=bridge1,addr=0x3 -device virtio-scsi-pci,bus=bridge1,addr=0x1f,id=scsi0 -drive file=rhel.qcow2,media=disk,if=none,cache=none,id=drive_sysdisk,aio=native,format=qcow2,werror=stop,rerror=stop -device scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0 -device nec-usb-xhci,id=xhci0,bus=bridge1,addr=0x1d -device usb-mouse,bus=xhci0.0 -device usb-kbd -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c3:e7:8f,bus=bridge2,addr=0x1e -device virtio-scsi-pci,bus=bridge2,addr=0x1f,id=scsi1 -drive file=/home/yilzhang/nfs/sn1,if=none,cache=none,id=drive_ddisk_2,format=qcow2,werror=stop,rerror=stop -device scsi-hd,drive=drive_ddisk_2,bus=scsi1.0,id=ddisk_2 -drive file=iscsi://10.0.0.7/iqn.2017-08.com.yilzhang:t1/0,if=none,cache=none,id=drive_ddisk_1,format=qcow2,werror=stop,rerror=stop -device scsi-hd,drive=drive_ddisk_1,bus=scsi1.0,id=ddisk_1


Expected results:
No crash, drive_del should succeed; And after that "dd" will fail with Input/output error


Additional info:
[New LWP 81509]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/libexec/qemu-kvm -smp 8,sockets=2,cores=4,threads=1 -m 8192 -serial unix:/'.
Program terminated with signal 11, Segmentation fault.
#0  qemu_mutex_lock (mutex=0x60) at util/qemu-thread-posix.c:64
64	    assert(mutex->initialized);
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.ppc64le cyrus-sasl-gssapi-2.1.26-21.el7.ppc64le cyrus-sasl-lib-2.1.26-21.el7.ppc64le elfutils-libelf-0.168-8.el7.ppc64le elfutils-libs-0.168-8.el7.ppc64le glib2-2.50.3-3.el7.ppc64le glibc-2.17-196.el7.ppc64le gmp-6.0.0-15.el7.ppc64le gnutls-3.3.26-9.el7.ppc64le gperftools-libs-2.4-8.el7.ppc64le keyutils-libs-1.5.8-3.el7.ppc64le krb5-libs-1.15.1-8.el7.ppc64le libaio-0.3.109-13.el7.ppc64le libattr-2.4.46-12.el7.ppc64le libcap-2.22-9.el7.ppc64le libcom_err-1.42.9-10.el7.ppc64le libcurl-7.29.0-42.el7.ppc64le libdb-5.3.21-20.el7.ppc64le libfdt-1.4.3-1.el7.ppc64le libffi-3.0.13-18.el7.ppc64le libgcc-4.8.5-16.el7_4.1.ppc64le libgcrypt-1.5.3-14.el7.ppc64le libgpg-error-1.12-3.el7.ppc64le libibverbs-14-3.el7a.ppc64le libidn-1.28-4.el7.ppc64le libiscsi-1.9.0-7.el7.ppc64le libnl3-3.2.28-4.el7.ppc64le libpng-1.5.13-7.el7_2.ppc64le librdmacm-14-3.el7a.ppc64le libseccomp-2.3.1-3.el7.ppc64le libselinux-2.5-11.el7.ppc64le libssh2-1.4.3-10.el7_2.1.ppc64le libstdc++-4.8.5-16.el7_4.1.ppc64le libtasn1-4.10-1.el7.ppc64le libusbx-1.0.20-1.el7.ppc64le lzo-2.06-8.el7.ppc64le nettle-2.7.1-8.el7.ppc64le nspr-4.13.1-1.0.el7_3.ppc64le nss-3.28.4-15.el7_4.ppc64le nss-softokn-freebl-3.28.3-8.el7_4.ppc64le nss-util-3.28.4-3.el7.ppc64le numactl-libs-2.0.9-6.el7_2.ppc64le openldap-2.4.44-5.el7.ppc64le openssl-libs-1.0.2k-8.el7.ppc64le p11-kit-0.23.5-3.el7.ppc64le pcre-8.32-17.el7.ppc64le pixman-0.34.0-1.el7.ppc64le snappy-1.1.0-3.el7.ppc64le systemd-libs-219-42.el7_4.4.ppc64le xz-libs-5.2.2-1.el7.ppc64le zlib-1.2.7-17.el7.ppc64le
(gdb) bt
#0  qemu_mutex_lock (mutex=0x60) at util/qemu-thread-posix.c:64
#1  0x0000000119af7ccc in aio_context_acquire (ctx=<error reading variable: value has been optimized out>) at util/async.c:489
#2  0x0000000119949340 in scsi_dma_complete (opaque=0x13d86e400, ret=<optimized out>) at hw/scsi/scsi-disk.c:295
#3  0x00000001198d16a4 in dma_complete (ret=<optimized out>, dbs=0x13c2d27f0) at dma-helpers.c:116
#4  dma_blk_cb (opaque=0x13c2d27f0, ret=<optimized out>) at dma-helpers.c:138
#5  0x0000000119a40950 in blk_aio_complete (acb=0x13c1d19a0) at block/block-backend.c:1213
#6  0x0000000119b17578 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79
#7  0x00007fff7f3c2b9c in makecontext () from /lib64/libc.so.6
#8  0x0000000000000000 in ?? ()
(gdb) bt full
#0  qemu_mutex_lock (mutex=0x60) at util/qemu-thread-posix.c:64
        err = <optimized out>
        __PRETTY_FUNCTION__ = "qemu_mutex_lock"
        __func__ = "qemu_mutex_lock"
#1  0x0000000119af7ccc in aio_context_acquire (ctx=<error reading variable: value has been optimized out>) at util/async.c:489
No locals.
#2  0x0000000119949340 in scsi_dma_complete (opaque=0x13d86e400, ret=<optimized out>) at hw/scsi/scsi-disk.c:295
        r = 0x13d86e400
        s = 0x13f452080
#3  0x00000001198d16a4 in dma_complete (ret=<optimized out>, dbs=0x13c2d27f0) at dma-helpers.c:116
No locals.
#4  dma_blk_cb (opaque=0x13c2d27f0, ret=<optimized out>) at dma-helpers.c:138
        dbs = 0x13c2d27f0
#5  0x0000000119a40950 in blk_aio_complete (acb=0x13c1d19a0) at block/block-backend.c:1213
No locals.
#6  0x0000000119b17578 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79
        arg = {p = 0x13eea4800, i = {1055541248, 1}}
        self = 0x13eea4800
        co = 0x13eea4800
#7  0x00007fff7f3c2b9c in makecontext () from /lib64/libc.so.6
No symbol table info available.
#8  0x0000000000000000 in ?? ()
No symbol table info available.

Comment 2 yilzhang 2017-11-23 09:19:58 UTC
Power8 RHEL7.5 and x86 also has this bug

Power8 RHEL7.5:
Host kernel:   3.10.0-768.el7.ppc64le
Guest kernel:  3.10.0-693.el7.ppc64le
qemu-kvm-rhev: qemu-kvm-rhev-2.10.0-6.el7

x86:
Host kernel:   3.10.0-771.el7.x86_64
Guest kernel:  3.10.0-771.el7.x86_64
qemu-kvm-rhev: qemu-kvm-rhev-2.10.0-6.el7

Comment 6 yilzhang 2017-11-24 04:58:55 UTC
Note: The "Steps to Reproduce"  in this bug's Description is an invalid use case(due to BZ 1517042), and instead, please refer to Comment 4 for a valid reproducer.

Comment 7 Xueqiang Wei 2017-11-24 08:49:13 UTC
I think it is a negative test that unplugging the data disk drive while writing to it.


Hot-unplug the data disk with command "device_del", not hit this issue.

Comment 8 CongLi 2018-01-30 05:56:40 UTC

*** This bug has been marked as a duplicate of bug 1486594 ***


Note You need to log in before you can comment on or make changes to this bug.