Bug 1478227 - [NBD] qemu-kvm hit Segmentation fault if guest is writing to the NBD data disk and meanwhile unexport this data disk
[NBD] qemu-kvm hit Segmentation fault if guest is writing to the NBD data dis...
Status: NEW
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.5
All Linux
medium Severity high
: rc
: ---
Assigned To: Eric Blake
Longxiang Lyu
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-03 23:09 EDT by yilzhang
Modified: 2018-04-05 16:51 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description yilzhang 2017-08-03 23:09:52 EDT
Description of problem:
When guest is writing to the NBD data disk, unexport this NBD data disk on NBD server side, after that, qemu-kvm process will abort with Segmentation fault.

Version-Release number of selected component (if applicable):
host: 4.11.0-19.el7a.ppc64le
      qemu-kvm-2.9.0-19.el7a.ppc64le
      SLOF-20170303-4.git66d250e.el7.noarch
guest kernel: 4.11.0-14.el7a.ppc64le

How reproducible: 100%


Steps to Reproduce:
1. Create disk image on NBD server
# qemu-img create -f qcow2  -o preallocation=full   nbd_dataimage_0.qcow2  4G
2. Export image file on NBD server side
# qemu-nbd -f raw  /home/yilzhang/nbd_dataimage_0.qcow2  -p 9001 -t &
3. Boot up guest on NBD client, using the above NBD disk image as one data disk:
/usr/libexec/qemu-kvm \
-name yilzhang_virt8_guest \
 -smp 8,sockets=2,cores=4,threads=1 -m 8192 \
-serial unix:/tmp/nbd-serial.log,server,nowait \
-nodefaults \
 -rtc base=localtime,clock=host \
 -boot menu=on \
 -monitor stdio \
 -vnc :88 \
 -qmp tcp:0:9990,server,nowait \
\
-device pci-bridge,id=bridge1,chassis_nr=1,bus=pci.0 \
 -device virtio-scsi-pci,bus=bridge1,addr=0x1,id=scsi0 \
-drive file=/home/yilzhang/rhel7.4-alt.qcow2,if=none,cache=none,id=drive_sysdisk,snapshot=off,aio=native,format=qcow2,werror=stop,rerror=stop \
-device scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0 \
\
-drive file=nbd://10.0.1.20:9001,if=none,cache=none,id=drive_datadisk1,aio=native,format=qcow2,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive_datadisk1,bus=bridge1,addr=0x2,id=datadisk1 \
 -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \
 -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c3:e7:84 \

4. Start QMP on host: # telnet localhost 9990
                        {"execute": "qmp_capabilities"}
5. Login guest, and write data to the above data disk exported from NBD server
   [guest]# dd if=/dev/zero  of=/dev/vda  bs=1M count=2000 oflag=sync
6. During "dd" is still ongoing, unexport the NBD disk image
[NBD server]# kill -9 6746
[5]+  Killed                  qemu-nbd -f raw /home/yilzhang/nbd_dataimage_0.qcow2 -p 9001 -t
7. QMP emits "BLOCK_IO_ERROR" event:
{"timestamp": {"seconds": 1494090476, "microseconds": 553531}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_datadisk1", "nospace": false, "__com.redhat_reason": "eio", "node-name": "#block349", "reason": "Input/output error", "operation": "write", "action": "stop"}}


Actual results:
After a short while, qemu-kvm aborted with Segmentation fault

Expected results:
qemu-kvm should not abort abnormally


Additional info:
1. Power8+qemu-kvm-rhev-2.9.0-14.el7.ppc64le and x86 platform also have this issue
2. gdb  /usr/libexec/qemu-kvm  core.9638
warning: exec file is newer than core file.
[New LWP 9638]
[New LWP 9680]
[New LWP 9682]
[New LWP 9681]
[New LWP 9683]
[New LWP 9685]
[New LWP 9684]
[New LWP 9689]
[New LWP 9639]
[New LWP 9687]
[New LWP 9686]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/libexec/qemu-kvm -name yilzhang_virt8_guest -smp 8,sockets=2,cores=4,threa'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000002cd67bb4 in aio_co_schedule ()
Missing separate debuginfos, use: debuginfo-install qemu-kvm-2.9.0-19.el7a.ppc64le
(gdb) bt
#0  0x000000002cd67bb4 in aio_co_schedule ()
#1  0x000000002ccd1c9c in nbd_client_attach_aio_context ()
#2  0x000000002cccfce8 in nbd_attach_aio_context ()
#3  0x000000002cc72ac0 in bdrv_attach_aio_context ()
#4  0x000000002cc72a8c in bdrv_attach_aio_context ()
#5  0x000000002cc72c38 in bdrv_set_aio_context ()
#6  0x000000002ccbaab4 in blk_set_aio_context ()
#7  0x000000002c9ff9d0 in virtio_blk_data_plane_stop ()
#8  0x000000002cbf7020 in virtio_bus_stop_ioeventfd ()
#9  0x000000002cbf2598 in virtio_pci_vmstate_change ()
#10 0x000000002ca2e91c in virtio_vmstate_change ()
#11 0x000000002cb5a6b4 in vm_state_notify ()
#12 0x000000002c9bd9c0 in vm_stop ()
#13 0x000000002c95af10 in main ()
Comment 2 yilzhang 2017-08-04 03:53:10 EDT
Power8+qemu-kvm-rhev-2.9.0-14.el7.ppc64le and x86+qemu-kvm-rhev-2.9.0-19.el7a.x86_64   also have this issue
Comment 5 Longxiang Lyu 2017-10-12 04:37:22 EDT
Reproduced in qemu-img-rhev-2.10.0-1.el7.x86_64.

1. verify version info
# rpm -qa | grep ^qemu
qemu-kvm-tools-rhev-2.10.0-1.el7.x86_64
qemu-kvm-common-rhev-2.10.0-1.el7.x86_64
qemu-kvm-rhev-2.10.0-1.el7.x86_64
qemu-kvm-rhev-debuginfo-2.10.0-1.el7.x86_64
qemu-img-rhev-2.10.0-1.el7.x86_64
# uname -r
3.10.0-730.el7.x86_64

2. prepare data disk image and export as NBD server.
# qemu-img create -f qcow2 -o preallocation=full add.qcow2 5G
Formatting 'add.qcow2', fmt=qcow2 size=5368709120 encryption=off cluster_size=65536 preallocation=full lazy_refcounts=off refcount_bits=16

# qemu-nbd -f raw add.qcow2  -p 9001 -t

3. boot up guest
#!/bin/bash
/usr/libexec/qemu-kvm \
-name guest=test-virt \
-machine pc-i440fx-rhel7.4.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off \
-cpu SandyBridge \
-m 2G \
-smp 4,sockets=4,cores=1,threads=1 \
-boot strict=on \
-drive file=/home/test/nbd01/test.qcow2,if=none,format=qcow2,id=img0 \
-device virtio-blk-pci,bus=pci.0,drive=img0,id=virtio-disk0,bootindex=1 \
-drive file=nbd://10.66.11.1:9001,if=none,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native,id=img1 \
-device virtio-blk-pci,bus=pci.0,drive=img1,id=virtio-disk1 \
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:12:b3:20:61,bus=pci.0 \
-device qxl-vga \
-usbdevice tablet \
-vnc :2 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 \
-monitor stdio \
-qmp tcp:0:5555,server,nowait \

4. connect to qmp server
# telnet 127.0.0.1 5555
{"execute": "qmp_capabilities"}

4. dd in the guest to the nbd data disk.
# dd if=/dev/urandom of=/dev/vdb bs=1M count=1024

5. kill NBD server before dd ends.

result:
1.
qemu aborted with segmentation fault.
2.
qmp outputs:
{"timestamp": {"seconds": 1507796895, "microseconds": 776193}, "event": "BLOCK_IO_ERROR", "data": {"device": "img1", "nospace": false, "__com.redhat_reason": "eio", "node-name": "#block386", "reason": "Input/output error", "operation": "read", "action": "stop"}}

3. gdb bt output of core file
# gdb core.1788 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
[New LWP 1788]
[New LWP 1805]
[New LWP 1802]
[New LWP 1808]
[New LWP 1999]
[New LWP 1803]
[New LWP 1789]
[New LWP 1807]
[New LWP 1804]
Reading symbols from /usr/libexec/qemu-kvm...Reading symbols from /usr/lib/debug/usr/libexec/qemu-kvm.debug...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/libexec/qemu-kvm -name guest=test-virt -machine pc-i440fx-rhel7.4.0,accel='.
Program terminated with signal 11, Segmentation fault.
#0  0x000055ae1278a86a in aio_co_schedule (ctx=0x55ae15243980, co=0x0) at util/async.c:441
441	    QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 celt051-0.5.1.3-8.el7.x86_64 cyrus-sasl-gssapi-2.1.26-21.el7.x86_64 cyrus-sasl-lib-2.1.26-21.el7.x86_64 cyrus-sasl-md5-2.1.26-21.el7.x86_64 cyrus-sasl-plain-2.1.26-21.el7.x86_64 cyrus-sasl-scram-2.1.26-21.el7.x86_64 elfutils-libelf-0.168-8.el7.x86_64 elfutils-libs-0.168-8.el7.x86_64 glib2-2.50.3-3.el7.x86_64 glibc-2.17-196.el7.x86_64 glusterfs-api-3.8.4-45.el7rhgs.x86_64 glusterfs-libs-3.8.4-45.el7rhgs.x86_64 gmp-6.0.0-15.el7.x86_64 gnutls-3.3.26-9.el7.x86_64 gperftools-libs-2.4-8.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libacl-2.2.51-12.el7.x86_64 libaio-0.3.109-13.el7.x86_64 libattr-2.4.46-12.el7.x86_64 libblkid-2.23.2-43.el7.x86_64 libcacard-2.5.2-2.el7.x86_64 libcap-2.22-9.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libcurl-7.29.0-42.el7.x86_64 libdb-5.3.21-20.el7.x86_64 libffi-3.0.13-18.el7.x86_64 libgcc-4.8.5-16.el7.x86_64 libgcrypt-1.5.3-14.el7.x86_64 libgpg-error-1.12-3.el7.x86_64 libibverbs-13-7.el7.x86_64 libidn-1.28-4.el7.x86_64 libiscsi-1.9.0-7.el7.x86_64 libjpeg-turbo-1.2.90-5.el7.x86_64 libmount-2.23.2-43.el7.x86_64 libnl3-3.2.28-4.el7.x86_64 libpng-1.5.13-7.el7_2.x86_64 librados2-12.2.0-2.el7cp.x86_64 librbd1-12.2.0-2.el7cp.x86_64 librdmacm-13-7.el7.x86_64 libseccomp-2.3.1-3.el7.x86_64 libselinux-2.5-11.el7.x86_64 libssh2-1.4.3-10.el7_2.1.x86_64 libstdc++-4.8.5-16.el7.x86_64 libtasn1-4.10-1.el7.x86_64 libunwind-1.2-2.el7.x86_64 libusbx-1.0.20-1.el7.x86_64 libuuid-2.23.2-43.el7.x86_64 lttng-ust-2.4.1-4.el7.x86_64 lzo-2.06-8.el7.x86_64 nettle-2.7.1-8.el7.x86_64 nspr-4.13.1-1.0.el7_3.x86_64 nss-3.28.4-8.el7.x86_64 nss-softokn-freebl-3.28.3-6.el7.x86_64 nss-util-3.28.4-3.el7.x86_64 numactl-libs-2.0.9-6.el7_2.x86_64 openldap-2.4.44-5.el7.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 p11-kit-0.23.5-3.el7.x86_64 pcre-8.32-17.el7.x86_64 pixman-0.34.0-1.el7.x86_64 snappy-1.1.0-3.el7.x86_64 spice-server-0.12.8-2.el7.1.x86_64 systemd-libs-219-42.el7.x86_64 usbredir-0.7.1-2.el7.x86_64 userspace-rcu-0.7.16-1.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x000055ae1278a86a in aio_co_schedule (ctx=0x55ae15243980, co=0x0) at util/async.c:441
#1  0x000055ae126ccdad in bdrv_attach_aio_context (bs=0x55ae15732000, 
    new_context=new_context@entry=0x55ae15243980) at block.c:4547
#2  0x000055ae126ccd8b in bdrv_attach_aio_context (bs=bs@entry=0x55ae153b4800, 
    new_context=new_context@entry=0x55ae15243980) at block.c:4544
#3  0x000055ae126cce89 in bdrv_set_aio_context (bs=bs@entry=0x55ae153b4800, 
    new_context=new_context@entry=0x55ae15243980) at block.c:4580
#4  0x000055ae1270a2ec in blk_set_aio_context (blk=0x55ae1523c780, new_context=0x55ae15243980)
    at block/block-backend.c:1769
#5  0x000055ae124d9f17 in virtio_blk_data_plane_stop (vdev=<optimized out>)
    at /usr/src/debug/qemu-2.10.0/hw/block/dataplane/virtio-blk.c:262
#6  0x000055ae1266cc75 in virtio_bus_stop_ioeventfd (bus=0x55ae1769c3a8) at hw/virtio/virtio-bus.c:246
#7  0x000055ae124fecf4 in virtio_vmstate_change (opaque=0x55ae1769c420, running=<optimized out>, 
    state=<optimized out>) at /usr/src/debug/qemu-2.10.0/hw/virtio/virtio.c:2230
#8  0x000055ae1258a952 in vm_state_notify (running=running@entry=0, 
    state=state@entry=RUN_STATE_IO_ERROR) at vl.c:1603
#9  0x000055ae124ad4ba in do_vm_stop (state=RUN_STATE_IO_ERROR) at /usr/src/debug/qemu-2.10.0/cpus.c:941
#10 vm_stop (state=RUN_STATE_IO_ERROR) at /usr/src/debug/qemu-2.10.0/cpus.c:1807
#11 0x000055ae12472924 in main_loop_should_exit () at vl.c:1903
#12 main_loop () at vl.c:1921
#13 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4804
(gdb) 

Core dump file link:
http://fileshare.englab.nay.redhat.com/pub/section2/coredump/bug_1478227/core.1788
Comment 6 Yongxue Hong 2017-12-08 02:31:34 EST
It is also reproduced with 4.14.0-15.el7a.ppc64le on P9.

Note You need to log in before you can comment on or make changes to this bug.