Bug 1493470

Summary:	qemu-kvm core dumped if running stress-ng test inside guest and manually quit qemu after guest crashed
Product:	Red Hat Enterprise Linux 7	Reporter:	yilzhang
Component:	qemu-kvm-rhev	Assignee:	David Gibson <dgibson>
Status:	CLOSED CURRENTRELEASE	QA Contact:	yilzhang
Severity:	high	Docs Contact:
Priority:	high
Version:	7.4-Alt	CC:	knoel, michen, qzhang, rbalakri, virt-maint, yilzhang
Target Milestone:	rc
Target Release:	---
Hardware:	ppc64le
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-11-15 05:19:16 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description yilzhang 2017-09-20 09:31:24 UTC

Description of problem:
Boot up a guest with smp 64, then run stress-ng test inside the guest; after a while guest crashed and cannot boot up normally, then quit qemu-kvm by typing "q" in HMP, qemu-kvm core dumped.


Version-Release number of selected component (if applicable):
Host kernel: 4.11.0-33.el7a.ppc64le
qemu-kvm:    qemu-kvm-2.9.0-22.el7a
Guest kernel: 4.11.0-32.el7a.ppc64le

Host is IBM power9 with DD2 processors:
[root@virt4 ~]# cat /proc/cpuinfo | tail
processor	: 159
cpu		: POWER9 (raw), altivec supported
clock		: 3200.000000MHz
revision	: 2.0 (pvr 004e 1200)

timebase	: 512000000
platform	: PowerNV
model		: 8375-42A
machine		: PowerNV 8375-42A
firmware	: OPAL
[root@virt4 ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:            30G        5.7G         21G        115M        3.3G         21G
Swap:           15G        412M         15G


How reproducible: 1/6


Steps to Reproduce:
1. Boot up a guest, with smp 64
Qemu cli:
/usr/libexec/qemu-kvm \
-name yilzhang-VM \
-smp 64,sockets=16,cores=4,threads=1 -m 23G \
-machine pseries,accel=kvm \
-nodefaults \
-rtc base=localtime,clock=host \
-chardev socket,id=chardev0,path=/tmp/serial-path,server,nowait \
-device spapr-vty,chardev=chardev0 \
\
-boot menu=on \
-monitor stdio \
-qmp tcp:0:7777,server,nowait \
-vnc :10 \
\
-drive file=rhel7.4-alt-20170906.2.qcow2,media=disk,if=none,cache=none,id=drive_sysdisk,aio=native,format=qcow2,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive_sysdisk,bus=pci.0,id=sysdisk,bootindex=0 \
\
-netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \
-device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c3:e7:8a,bus=pci.0,addr=0x1e
2. After guest is up, run "stress-ng -a 0" inside guest
3. After a short while, guest crashes. Please refer to https://bugzilla.redhat.com/show_bug.cgi?id=1490282#c22
4. Typing "q" in HMP to quit the qemu-kvm process



Actual results:
qemu-kvm core dumped
(qemu) info status
VM status: running
(qemu) q
qemu-kvm: Guest says index 49152 is available
numa-mrema.sh: line 20: 14049 Segmentation fault      (core dumped) /usr/libexec/qemu-kvm -name yilzhang-VM -smp 64,sockets=16,cores=4,threads=1 -m 23G -machine pseries,accel=kvm -nodefaults -rtc base=localtime,clock=host -chardev socket,id=chardev0,path=/tmp/serial-path,server,nowait -device spapr-vty,chardev=chardev0 -boot menu=on -monitor stdio -qmp tcp:0:7777,server,nowait -vnc :10 -drive file=rhel7.4-alt-20170906.2.qcow2,media=disk,if=none,cache=none,id=drive_sysdisk,aio=native,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive_sysdisk,bus=pci.0,id=sysdisk,bootindex=0 -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c3:e7:8a,bus=pci.0,addr=0x1e

Expected results:
qemu-kvm quited without Segmentation fault


Additional info:
gdb /usr/libexec/qemu-kvm  core.30132 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "ppc64le-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/libexec/qemu-kvm...Reading symbols from /usr/lib/debug/usr/libexec/qemu-kvm.debug...done.
done.
[New LWP 30132]
[New LWP 30214]
[New LWP 30180]
[New LWP 30170]
[New LWP 30163]
[New LWP 30167]
[New LWP 30188]
[New LWP 31062]
[New LWP 30164]
[New LWP 30204]
[New LWP 30182]
[New LWP 30185]
[New LWP 30172]
[New LWP 30215]
[New LWP 30158]
[New LWP 30211]
[New LWP 30181]
[New LWP 30205]
[New LWP 30150]
[New LWP 30195]
[New LWP 30157]
[New LWP 30161]
[New LWP 30154]
[New LWP 30198]
[New LWP 30184]
[New LWP 30186]
[New LWP 30156]
[New LWP 30210]
[New LWP 30207]
[New LWP 30200]
[New LWP 30177]
[New LWP 30202]
[New LWP 30174]
[New LWP 30223]
[New LWP 30179]
[New LWP 30159]
[New LWP 30196]
[New LWP 30192]
[New LWP 30166]
[New LWP 30162]
[New LWP 30160]
[New LWP 30183]
[New LWP 30178]
[New LWP 30176]
[New LWP 30155]
[New LWP 30212]
[New LWP 30149]
[New LWP 30191]
[New LWP 30190]
[New LWP 30187]
[New LWP 30199]
[New LWP 30173]
[New LWP 30213]
[New LWP 30189]
[New LWP 30165]
[New LWP 30169]
[New LWP 30193]
[New LWP 30171]
[New LWP 30197]
[New LWP 30151]
[New LWP 30206]
[New LWP 30152]
[New LWP 30153]
[New LWP 30168]
[New LWP 30201]
[New LWP 30194]
[New LWP 30133]
[New LWP 30175]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/libexec/qemu-kvm -name yilzhang-VM -smp 64,sockets=16,cores=4,threads=1 -m'.
Program terminated with signal 11, Segmentation fault.
#0  virtio_has_feature (fbit=33, features=<error reading variable: Cannot access memory at address 0x88>) at /usr/src/debug/qemu-2.9.0/include/hw/virtio/virtio.h:310
310	    return !!(features & (1ULL << fbit));
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.ppc64le cyrus-sasl-gssapi-2.1.26-21.el7.ppc64le cyrus-sasl-lib-2.1.26-21.el7.ppc64le elfutils-libelf-0.168-8.el7.ppc64le elfutils-libs-0.168-8.el7.ppc64le glib2-2.50.3-3.el7.ppc64le glibc-2.17-196.el7.ppc64le gmp-6.0.0-15.el7.ppc64le gnutls-3.3.26-9.el7.ppc64le gperftools-libs-2.4-8.el7.ppc64le keyutils-libs-1.5.8-3.el7.ppc64le krb5-libs-1.15.1-8.el7.ppc64le libaio-0.3.109-13.el7.ppc64le libattr-2.4.46-12.el7.ppc64le libcap-2.22-9.el7.ppc64le libcom_err-1.42.9-10.el7.ppc64le libcurl-7.29.0-42.el7.ppc64le libdb-5.3.21-20.el7.ppc64le libfdt-1.4.3-1.el7.ppc64le libffi-3.0.13-18.el7.ppc64le libgcc-4.8.5-16.el7.ppc64le libgcrypt-1.5.3-14.el7.ppc64le libgpg-error-1.12-3.el7.ppc64le libibverbs-14-2.el7a.ppc64le libidn-1.28-4.el7.ppc64le libiscsi-1.9.0-7.el7.ppc64le libnl3-3.2.28-4.el7.ppc64le libpng-1.5.13-7.el7_2.ppc64le librdmacm-14-2.el7a.ppc64le libseccomp-2.3.1-3.el7.ppc64le libselinux-2.5-11.el7.ppc64le libssh2-1.4.3-10.el7_2.1.ppc64le libstdc++-4.8.5-16.el7.ppc64le libtasn1-4.10-1.el7.ppc64le libusbx-1.0.20-1.el7.ppc64le lzo-2.06-8.el7.ppc64le nettle-2.7.1-8.el7.ppc64le nspr-4.13.1-1.0.el7_3.ppc64le nss-3.28.4-12.el7_4.ppc64le nss-softokn-freebl-3.28.3-8.el7_4.ppc64le nss-util-3.28.4-3.el7.ppc64le numactl-libs-2.0.9-6.el7_2.ppc64le openldap-2.4.44-5.el7.ppc64le openssl-libs-1.0.2k-8.el7.ppc64le p11-kit-0.23.5-3.el7.ppc64le pcre-8.32-17.el7.ppc64le pixman-0.34.0-1.el7.ppc64le snappy-1.1.0-3.el7.ppc64le systemd-libs-219-42.el7_4.1.ppc64le xz-libs-5.2.2-1.el7.ppc64le zlib-1.2.7-17.el7.ppc64le
(gdb) bt
#0  virtio_has_feature (fbit=33, features=<error reading variable: Cannot access memory at address 0x88>) at /usr/src/debug/qemu-2.9.0/include/hw/virtio/virtio.h:310
#1  virtio_host_has_feature (vdev=0x0, fbit=33) at /usr/src/debug/qemu-2.9.0/include/hw/virtio/virtio.h:322
#2  vhost_dev_has_iommu (dev=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/virtio/vhost.c:429
#3  vhost_memory_unmap (access_len=2052, is_write=1, len=2052, buffer=0x3ff9ea371280, dev=0x44920340) at /usr/src/debug/qemu-2.9.0/hw/virtio/vhost.c:446
#4  vhost_virtqueue_stop (dev=0x44920340, vdev=0x46784510, vq=0x44920568, idx=0) at /usr/src/debug/qemu-2.9.0/hw/virtio/vhost.c:1155
#5  0x0000000028788034 in vhost_dev_stop (hdev=0x44920340, vdev=0x46784510) at /usr/src/debug/qemu-2.9.0/hw/virtio/vhost.c:1590
#6  0x0000000028767244 in vhost_net_stop_one (net=0x44920340, dev=0x46784510) at /usr/src/debug/qemu-2.9.0/hw/net/vhost_net.c:292
#7  0x00000000287679b4 in vhost_net_stop (dev=0x46784510, ncs=<optimized out>, total_queues=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/net/vhost_net.c:372
#8  0x0000000028766124 in virtio_net_vhost_status (status=79 'O', n=0x46784510) at /usr/src/debug/qemu-2.9.0/hw/net/virtio-net.c:176
#9  virtio_net_set_status (vdev=0x46784510, status=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/net/virtio-net.c:250
#10 0x000000002877db10 in virtio_set_status (vdev=0x46784510, val=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/virtio/virtio.c:1147
#11 0x000000002877fe7c in virtio_error (vdev=0x46784510, fmt=0x28af5d28 "Guest says index %u is available") at /usr/src/debug/qemu-2.9.0/hw/virtio/virtio.c:2475
#12 0x0000000028780de0 in virtqueue_get_head (vq=0x46820080, idx=<optimized out>, head=0x3fffdbd3d728) at /usr/src/debug/qemu-2.9.0/hw/virtio/virtio.c:545
#13 0x0000000028780f0c in virtqueue_drop_all (vq=0x46820080) at /usr/src/debug/qemu-2.9.0/hw/virtio/virtio.c:985
#14 0x000000002876264c in virtio_net_drop_tx_queue_data (vdev=0x46784510, vq=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/net/virtio-net.c:236
#15 0x000000002877b43c in virtio_queue_notify_vq (vq=0x46820080) at /usr/src/debug/qemu-2.9.0/hw/virtio/virtio.c:1526
#16 virtio_queue_host_notifier_read (n=0x468200e8) at /usr/src/debug/qemu-2.9.0/hw/virtio/virtio.c:2449
#17 0x00000000289461dc in virtio_bus_set_host_notifier (bus=<optimized out>, n=<optimized out>, assign=<optimized out>) at hw/virtio/virtio-bus.c:297
#18 0x00000000287871c4 in vhost_dev_disable_notifiers (hdev=0x44920340, vdev=0x46784510) at /usr/src/debug/qemu-2.9.0/hw/virtio/vhost.c:1427
#19 0x0000000028767254 in vhost_net_stop_one (net=0x44920340, dev=0x46784510) at /usr/src/debug/qemu-2.9.0/hw/net/vhost_net.c:293
#20 0x00000000287679b4 in vhost_net_stop (dev=0x46784510, ncs=<optimized out>, total_queues=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/net/vhost_net.c:372
#21 0x0000000028766124 in virtio_net_vhost_status (status=15 '\017', n=0x46784510) at /usr/src/debug/qemu-2.9.0/hw/net/virtio-net.c:176
#22 virtio_net_set_status (vdev=0x46784510, status=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/net/virtio-net.c:250
#23 0x0000000028961658 in qemu_del_net_client (nc=0x449f0000) at net/net.c:390
#24 0x0000000028963044 in net_cleanup () at net/net.c:1456
#25 0x00000000286aa108 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4742
(gdb) bt full
#0  virtio_has_feature (fbit=33, features=<error reading variable: Cannot access memory at address 0x88>) at /usr/src/debug/qemu-2.9.0/include/hw/virtio/virtio.h:310
No locals.
#1  virtio_host_has_feature (vdev=0x0, fbit=33) at /usr/src/debug/qemu-2.9.0/include/hw/virtio/virtio.h:322
No locals.
#2  vhost_dev_has_iommu (dev=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/virtio/vhost.c:429
        vdev = 0x0
#3  vhost_memory_unmap (access_len=2052, is_write=1, len=2052, buffer=0x3ff9ea371280, dev=0x44920340) at /usr/src/debug/qemu-2.9.0/hw/virtio/vhost.c:446
No locals.
#4  vhost_virtqueue_stop (dev=0x44920340, vdev=0x46784510, vq=0x44920568, idx=0) at /usr/src/debug/qemu-2.9.0/hw/virtio/vhost.c:1155
        vhost_vq_index = 0
        state = {index = 0, num = 0}
        r = <optimized out>
#5  0x0000000028788034 in vhost_dev_stop (hdev=0x44920340, vdev=0x46784510) at /usr/src/debug/qemu-2.9.0/hw/virtio/vhost.c:1590
        i = 0
        __PRETTY_FUNCTION__ = "vhost_dev_stop"
#6  0x0000000028767244 in vhost_net_stop_one (net=0x44920340, dev=0x46784510) at /usr/src/debug/qemu-2.9.0/hw/net/vhost_net.c:292
        file = {index = 2, fd = -1}
#7  0x00000000287679b4 in vhost_net_stop (dev=0x46784510, ncs=<optimized out>, total_queues=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/net/vhost_net.c:372
        qbus = 0x46784498
        __func__ = "vhost_net_stop"
        vbus = <optimized out>
        k = 0x448303c0
        i = <optimized out>
        r = <optimized out>
        __PRETTY_FUNCTION__ = "vhost_net_stop"
#8  0x0000000028766124 in virtio_net_vhost_status (status=79 'O', n=0x46784510) at /usr/src/debug/qemu-2.9.0/hw/net/virtio-net.c:176
        vdev = 0x46784510
        nc = 0x44840360
        queues = 1
#9  virtio_net_set_status (vdev=0x46784510, status=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/net/virtio-net.c:250
        n = 0x46784510
        __func__ = "virtio_net_set_status"
        q = <optimized out>
        i = <optimized out>
        queue_status = <optimized out>
#10 0x000000002877db10 in virtio_set_status (vdev=0x46784510, val=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/virtio/virtio.c:1147
        k = 0x449a06e0
        __func__ = "virtio_set_status"
#11 0x000000002877fe7c in virtio_error (vdev=0x46784510, fmt=0x28af5d28 "Guest says index %u is available") at /usr/src/debug/qemu-2.9.0/hw/virtio/virtio.c:2475
        ap = 0x3fffdbd3d6b0 "\360\021"
#12 0x0000000028780de0 in virtqueue_get_head (vq=0x46820080, idx=<optimized out>, head=0x3fffdbd3d728) at /usr/src/debug/qemu-2.9.0/hw/virtio/virtio.c:545
No locals.
#13 0x0000000028780f0c in virtqueue_drop_all (vq=0x46820080) at /usr/src/debug/qemu-2.9.0/hw/virtio/virtio.c:985
        dropped = 150
        elem = {index = 4592, out_num = 0, in_num = 0, in_addr = 0x0, out_addr = 0x0, in_sg = 0x0, out_sg = 0x0}
        vdev = <optimized out>
        fEventIdx = <optimized out>
#14 0x000000002876264c in virtio_net_drop_tx_queue_data (vdev=0x46784510, vq=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/net/virtio-net.c:236
        dropped = <optimized out>
#15 0x000000002877b43c in virtio_queue_notify_vq (vq=0x46820080) at /usr/src/debug/qemu-2.9.0/hw/virtio/virtio.c:1526
        vdev = <optimized out>
---Type <return> to continue, or q <return> to quit---
#16 virtio_queue_host_notifier_read (n=0x468200e8) at /usr/src/debug/qemu-2.9.0/hw/virtio/virtio.c:2449
        vq = 0x46820080
#17 0x00000000289461dc in virtio_bus_set_host_notifier (bus=<optimized out>, n=<optimized out>, assign=<optimized out>) at hw/virtio/virtio-bus.c:297
        vdev = <optimized out>
        k = 0x448303c0
        __func__ = "virtio_bus_set_host_notifier"
        proxy = <optimized out>
        vq = <optimized out>
        notifier = 0x468200e8
        r = 0
#18 0x00000000287871c4 in vhost_dev_disable_notifiers (hdev=0x44920340, vdev=0x46784510) at /usr/src/debug/qemu-2.9.0/hw/virtio/vhost.c:1427
        qbus = <optimized out>
        __func__ = "vhost_dev_disable_notifiers"
        i = 1
        r = <optimized out>
        __PRETTY_FUNCTION__ = "vhost_dev_disable_notifiers"
#19 0x0000000028767254 in vhost_net_stop_one (net=0x44920340, dev=0x46784510) at /usr/src/debug/qemu-2.9.0/hw/net/vhost_net.c:293
        file = {index = 2, fd = -1}
#20 0x00000000287679b4 in vhost_net_stop (dev=0x46784510, ncs=<optimized out>, total_queues=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/net/vhost_net.c:372
        qbus = 0x46784498
        __func__ = "vhost_net_stop"
        vbus = <optimized out>
        k = 0x448303c0
        i = <optimized out>
        r = <optimized out>
        __PRETTY_FUNCTION__ = "vhost_net_stop"
#21 0x0000000028766124 in virtio_net_vhost_status (status=15 '\017', n=0x46784510) at /usr/src/debug/qemu-2.9.0/hw/net/virtio-net.c:176
        vdev = 0x46784510
        nc = 0x44840360
        queues = 1
#22 virtio_net_set_status (vdev=0x46784510, status=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/net/virtio-net.c:250
        n = 0x46784510
        __func__ = "virtio_net_set_status"
        q = <optimized out>
        i = <optimized out>
        queue_status = <optimized out>
#23 0x0000000028961658 in qemu_del_net_client (nc=0x449f0000) at net/net.c:390
        nic = <optimized out>
        ncs = {0x449f0000, 0x46d1d528, 0x46d1d500, 0x3fffdbd3db50, 0x3fff90aa9900, 0x28ad3fe0 <qemu_co_queue_run_restart+112>, 0x28ca7b00 <__compound_literal.4+2128>, 0x2, 0x46d1d500, 0x3fffdbd3db90, 
          0x3fffdbd3db60, 0x28ad3bb0 <qemu_aio_coroutine_enter+112>, 0x46d1f300, 0x46b00300, 0x46dfc359a1e0bf00, 0x46d1f328, 0x46d1f300, 0x3fffdbd3dbc0, 0x3fff90aa9900, 
          0x28ad3fe0 <qemu_co_queue_run_restart+112>, 0x28ca7b00 <__compound_literal.4+2128>, 0x2, 0x46d1f300, 0x3fffdbd3dc00, 0x3fffdbd3e890, 0x28ad3bb0 <qemu_aio_coroutine_enter+112>, 0x4485ed00, 0x46d1d500, 
          0x46dfc359a1e0bf00, 0x4485ed28, 0x4485ed00, 0x3fffdbd3dc30, 0x3fff90aa9900, 0x28ad3fe0 <qemu_co_queue_run_restart+112>, 0x28ca7b00 <__compound_literal.4+2128>, 0x2, 0x4485ed00, 0x3fffdbd3dc70, 
          0x3fff900d1ec8, 0x28ad3bb0 <qemu_aio_coroutine_enter+112>, 0x44851200, 0x46d1f300, 0x46dfc359a1e0bf00, 0x44851228, 0x44851200, 0x3fffdbd3dca0, 0x3fff90aa9900, 
          0x28ad3fe0 <qemu_co_queue_run_restart+112>, 0x28ca7b00 <__compound_literal.4+2128>, 0x2, 0x44851200, 0x3fffdbd3dce0, 0x3fffdbd3e390, 0x28ad3bb0 <qemu_aio_coroutine_enter+112>, 0x46d11b00, 0x4485ed00, 
          0x46dfc359a1e0bf00, 0x46d11b28, 0x46d11b00, 0x3fffdbd3dd10, 0x3fff90aa9900, 0x28ad3fe0 <qemu_co_queue_run_restart+112>, 0x28ca7b00 <__compound_literal.4+2128>, 0x2, 0x46d11b00, 0x3fffdbd3dd50, 
          0x3fffdbd3dd20, 0x28ad3bb0 <qemu_aio_coroutine_enter+112>, 0x4485e700, 0x44851200, 0x46dfc359a1e0bf00, 0x4485e728, 0x4485e700, 0x3fffdbd3dd80, 0x3fff90aa9900, 
          0x28ad3fe0 <qemu_co_queue_run_restart+112>, 0x28ca7b00 <__compound_literal.4+2128>, 0x2, 0x4485e700, 0x3fffdbd3ddc0, 0x27, 0x28ad3bb0 <qemu_aio_coroutine_enter+112>, 0x4485f600, 0x46d11b00, 
          0x46dfc359a1e0bf00, 0x4485f628, 0x4485f600, 0x3fffdbd3ddf0, 0x3fff90aa9900, 0x28ad3fe0 <qemu_co_queue_run_restart+112>, 0x28ca7b00 <__compound_literal.4+2128>, 0x2, 0x4485f600, 0x3fffdbd3de30, 
          0x3fff90aa9900, 0x28ad3bb0 <qemu_aio_coroutine_enter+112>, 0x4485de00, 0x4485e700, 0x46dfc359a1e0bf00, 0x4485de28, 0x4485de00, 0x3fffdbd3de60, 0x3fff90aa9900, 
          0x28ad3fe0 <qemu_co_queue_run_restart+112>, 0x28ca7b00 <__compound_literal.4+2128>, 0x2, 0x4485de00, 0x3fffdbd3dea0, 0x4485de00, 0x28ad3bb0 <qemu_aio_coroutine_enter+112>, 0x46d10600, 0x4485f600, 
          0x46dfc359a1e0bf00, 0x46d10628, 0x46d10600, 0x3fffdbd3ded0, 0x3fff90aa9900, 0x28ad3fe0 <qemu_co_queue_run_restart+112>, 0x28ca7b00 <__compound_literal.4+2128>, 0x2, 0x46d10600, 0x3fffdbd3df10, 
---Type <return> to continue, or q <return> to quit---
          0x3fffdbd3f780, 0x28ad3bb0 <qemu_aio_coroutine_enter+112>, 0x46d1d800, 0x4485de00, 0x46dfc359a1e0bf00, 0x46d1d828, 0x46d1d800, 0x3fffdbd3df40, 0x3fff90aa9900, 
          0x28ad3fe0 <qemu_co_queue_run_restart+112>, 0x28ca7b00 <__compound_literal.4+2128>, 0x1, 0x46d1d800, 0x3fffdbd3df80, 0x28ca7b00 <__compound_literal.4+2128>, 0x28ad3bb0 <qemu_aio_coroutine_enter+112>, 
          0x46d1b100, 0x28ad3d80 <qemu_coroutine_enter+32>, 0x46dfc359a1e0bf00, 0x46d1b128, 0x46d1b100, 0x3fffdbd3dfb0, 0x3fff90aa9900, 0x3fffdbd3dfe0, 0x28ca7b00 <__compound_literal.4+2128>, 0x3fffdbd3dfe0, 
          0x46d1b100, 0x3fffdbd3dff0, 0x0, 0x28b05b49, 0x3fffdbd3e860, 0x3fffdbd3e660, 0x3fffdbd3dfe0, 0x3fffdbd3e640, 0x4485e400, 0x3fff90917950 <vfprintf@@GLIBC_2.17+304>, 0x3fff90aa9900, 
          0x28ad3fe0 <qemu_co_queue_run_restart+112>, 0x28ca7b00 <__compound_literal.4+2128>, 0x1, 0x4485e400, 0x3fffdbd3e060, 0x11110000, 0x28ad3bb0 <qemu_aio_coroutine_enter+112>, 0x46d14b00, 
          0x28ad3d80 <qemu_coroutine_enter+32>, 0x46dfc359a1e0bf00, 0x46d14b28, 0x46d14b00, 0x3fffdbd3e090, 0x3fff90aa9900, 0x28ad3fe0 <qemu_co_queue_run_restart+112>, 0x28ca7b00 <__compound_literal.4+2128>, 
          0x1, 0x46d14b00, 0x3fffdbd3e0d0, 0x448b2400, 0x28ad3bb0 <qemu_aio_coroutine_enter+112>, 0x46d18d00, 0x28ad3d80 <qemu_coroutine_enter+32>, 0x46dfc359a1e0bf00, 0x46d18d28, 0x46d18d00, 0x3fffdbd3e100, 
          0x3fff90aa9900, 0x28ad3fe0 <qemu_co_queue_run_restart+112>, 0x28ca7b00 <__compound_literal.4+2128>, 0x3fffdbd3e130, 0x46d18d00, 0x3fffdbd3e140, 0x0, 0x3fffdbd3e160, 0x44831680, 0x3fffdbd3e160, 
          0x3fffdbd3e130, 0x3fffdbd3e160, 0x0, 0x28aef4e8...}
        queues = <optimized out>
        i = <optimized out>
        nf = <optimized out>
        next = <optimized out>
        __PRETTY_FUNCTION__ = "qemu_del_net_client"
#24 0x0000000028963044 in net_cleanup () at net/net.c:1456
        nc = <optimized out>
#25 0x00000000286aa108 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4742
        i = <optimized out>
        snapshot = <optimized out>
        linux_boot = <optimized out>
        initrd_filename = <optimized out>
        kernel_filename = <optimized out>
        kernel_cmdline = <optimized out>
        boot_order = <optimized out>
        boot_once = 0x0
        cyls = 0
        heads = 0
        secs = 0
        translation = -1073741824
        opts = <optimized out>
        machine_opts = <optimized out>
        hda_opts = <optimized out>
        icount_opts = <optimized out>
        accel_opts = <optimized out>
        olist = <optimized out>
        optind = 32
        optarg = 0x3fffdbd4f60b "virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c3:e7:8a,bus=pci.0,addr=0x1e"
        loadvm = <optimized out>
        machine_class = 0x0
        cpu_model = <optimized out>
        vga_model = <optimized out>
        qtest_chrdev = <optimized out>
        qtest_log = <optimized out>
        pid_file = <optimized out>
        incoming = 0x0
        defconfig = <optimized out>
        userconfig = <optimized out>
        nographic = <optimized out>
        display_type = <optimized out>
        display_remote = <optimized out>
        log_mask = <optimized out>
---Type <return> to continue, or q <return> to quit---
        log_file = <optimized out>
        trace_file = 0x28affab8 ""
        maxram_size = <optimized out>
        ram_slots = <optimized out>
        vmstate_dump_file = 0x0
        main_loop_err = 0x0
        err = 0x0
        list_data_dirs = <optimized out>
        bdo_queue = {sqh_first = 0x0, sqh_last = 0x3fffdbd3fcd8}
        __func__ = "main"
        __FUNCTION__ = "main"
(gdb)

Comment 2 Karen Noel 2017-09-21 17:06:05 UTC

Move to qemu-kvm-rhev. This fix will apply to both RHEL KVM and qemu-kvm-rhev for RHV and RHOSP. Both packages are using the same code base.

Comment 3 yilzhang 2017-10-27 09:34:08 UTC

Retest this case against qemu-kvm-rhev on both Power8 and Power9, and the test result is:
In step3, guest won't crash, but instead, guest will hang with a lot of call traces popped-up on guest's console.

Version of components on Power8:
Host kernel:   3.10.0-747.el7.ppc64le
qemu-kvm-rhev: qemu-kvm-rhev-2.10.0-3.el7
Guest kernel:  3.10.0-747.el7.ppc64le
Host processors number: 185
MemTotal:       1067454016 kB

Version of components on Power9:
Host kernel:   4.11.0-42.el7a.ppc64le
qemu-kvm-rhev: qemu-kvm-rhev-2.10.0-3.el7
Guest kernel:  4.11.0-42.el7a.ppc64le
Host processors number: 176
MemTotal:       32232768 kB


Part of the call traces on Power8:
[ 2259.069411] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[ 2259.069548]   cache: nf_conntrack_c00000000120ab80, object size: 312, buffer size: 320, default order: 0, min order: 0
[ 2259.069764]   node 0: slabs: 734, objs: 149736, free: 0
[ 2259.292171] stress-ng-icmp-: page allocation failure: order:0, mode:0x200020
[ 2259.302360] CPU: 55 PID: 18479 Comm: stress-ng-icmp- Not tainted 3.10.0-747.el7.ppc64le #1
[ 2259.302534] Call Trace:
[ 2259.302605] [c0000001ba1e70e0] [c00000000001b340] show_stack+0x80/0x330 (unreliable)
[ 2259.302817] [c0000001ba1e7190] [c0000000009f5cf4] dump_stack+0x30/0x44
[ 2259.302995] [c0000001ba1e71b0] [c000000000264cdc] warn_alloc_failed+0x10c/0x160
[ 2259.303203] [c0000001ba1e7260] [c00000000026ba78] __alloc_pages_nodemask+0xb68/0xc70
[ 2259.303410] [c0000001ba1e7450] [c0000000002e2a60] alloc_pages_current+0x1f0/0x430
[ 2259.303627] [c0000001ba1e74d0] [c0000000002f1a8c] new_slab+0x67c/0x690
[ 2259.303820] [c0000001ba1e7530] [c0000000002f4da8] ___slab_alloc+0x538/0x680
[ 2259.304000] [c0000001ba1e7650] [c0000000009f1554] __slab_alloc+0x2c/0x70
[ 2259.304184] [c0000001ba1e7680] [c0000000002f4ff4] kmem_cache_alloc+0x104/0x2e0
[ 2259.304418] [c0000001ba1e76d0] [d000000006b43f40] init_conntrack+0x1e0/0x960 [nf_conntrack]
[ 2259.304633] [c0000001ba1e77b0] [d000000006b44e2c] nf_conntrack_in+0x76c/0x820 [nf_conntrack]
[ 2259.304844] [c0000001ba1e78b0] [d000000006ce0948] ipv4_conntrack_local+0x58/0x80 [nf_conntrack_ipv4]
[ 2259.305081] [c0000001ba1e78d0] [c0000000008916f8] nf_hook_slow+0xc8/0x1f0
[ 2259.305264] [c0000001ba1e7930] [c0000000008e3e00] raw_sendmsg+0x9e0/0xaa0
[ 2259.305447] [c0000001ba1e7af0] [c0000000008fcb7c] inet_sendmsg+0x7c/0x180
[ 2259.305626] [c0000001ba1e7b30] [c0000000008044dc] sock_sendmsg+0xec/0x140
[ 2259.305806] [c0000001ba1e7ca0] [c00000000080a7bc] SyS_sendto+0x15c/0x240
[ 2259.305986] [c0000001ba1e7dd0] [c00000000080be08] SyS_socketcall+0x2d8/0x430
[ 2259.306165] [c0000001ba1e7e30] [c00000000000a184] system_call+0x38/0xb4
[ 2259.306341] Mem-Info:
[ 2259.306429] active_anon:289035 inactive_anon:22264 isolated_anon:0
[ 2259.306429]  active_file:1801 inactive_file:2683 isolated_file:32
[ 2259.306429]  unevictable:606 dirty:3308 writeback:33 unstable:0
[ 2259.306429]  slab_reclaimable:2619 slab_unreclaimable:27867
[ 2259.306429]  mapped:4117 shmem:14694 pagetables:2707 bounce:0
[ 2259.306429]  free:107 free_pcp:0 free_cma:0
[ 2259.307154] Node 0 DMA free:6848kB min:19136kB low:23872kB high:28672kB active_anon:18498240kB inactive_anon:1424896kB active_file:115264kB inactive_file:171712kB unevictable:38784kB isolated(anon):0kB isolated(file):2048kB present:24117248kB managed:23012416kB mlocked:38784kB dirty:211712kB writeback:2112kB mapped:263488kB shmem:940416kB slab_reclaimable:167616kB slab_unreclaimable:1783488kB kernel_stack:197936kB pagetables:173248kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:5835004 all_unreclaimable? no
[ 2259.308226] lowmem_reserve[]: 0 0 0
[ 2259.308407] Node 0 DMA: 108*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 6912kB
[ 2259.308835] 19298 total pagecache pages
[ 2259.308927] 16 pages in swap cache
[ 2259.309016] Swap cache stats: add 36879, delete 36851, find 1387/1665
[ 2259.309165] Free swap  = 0kB
[ 2259.309254] Total swap = 2112832kB


Part of the call traces on Power9:
[ 5606.871593] Free swap  = 0kB
[ 5606.871616] Total swap = 2143552kB
[ 5606.871653] 376832 pages RAM
[ 5606.871668] 0 pages HighMem/MovableOnly
[ 5606.871711] 17475 pages reserved
[ 5606.871747] 0 pages cma reserved
[ 5606.871770] 2 pages hwpoisoned
[ 5612.874790] stress-ng-resou invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null),  order=0, oom_score_adj=1000
[ 5612.874797] stress-ng-resou cpuset=/ mems_allowed=0
[ 5612.874806] CPU: 30 PID: 14304 Comm: stress-ng-resou Not tainted 4.11.0-42.el7a.ppc64le #1
[ 5612.874807] Call Trace:
[ 5612.874815] [c0000001a25e7510] [c000000000c09008] dump_stack+0xb0/0xf0 (unreliable)
[ 5612.874819] [c0000001a25e7550] [c000000000c00768] dump_header+0xd4/0x284
[ 5612.874824] [c0000001a25e7630] [c00000000030618c] oom_kill_process+0x49c/0x7b0
[ 5612.874828] [c0000001a25e76f0] [c0000000003070f4] out_of_memory+0x8e4/0x930
[ 5612.874831] [c0000001a25e7790] [c00000000030fe74] __alloc_pages_nodemask+0xf54/0x1000
[ 5612.874835] [c0000001a25e7980] [c0000000003ba804] alloc_pages_vma+0x584/0x6e0
[ 5612.874839] [c0000001a25e7a40] [c00000000039b008] __read_swap_cache_async+0x1f8/0x2f0
[ 5612.874843] [c0000001a25e7ac0] [c00000000039b6fc] swapin_readahead+0x31c/0x5a0
[ 5612.874847] [c0000001a25e7bb0] [c00000000036a268] do_swap_page+0x608/0xad0
[ 5612.874850] [c0000001a25e7c30] [c00000000036f708] __handle_mm_fault+0x9c8/0x1100
[ 5612.874853] [c0000001a25e7d30] [c00000000036ff68] handle_mm_fault+0x128/0x210
[ 5612.874857] [c0000001a25e7d70] [c000000000072874] do_page_fault+0x5b4/0x850
[ 5612.874861] [c0000001a25e7e30] [c00000000000a3dc] handle_page_fault+0x18/0x38
[ 5612.874863] Mem-Info:
[ 5612.874873] active_anon:246805 inactive_anon:21374 isolated_anon:0
 active_file:499 inactive_file:338 isolated_file:32
 unevictable:4692 dirty:480 writeback:0 unstable:0
 slab_reclaimable:11635 slab_unreclaimable:47150
 mapped:10126 shmem:55060 pagetables:5404 bounce:0
 free:3023 free_pcp:74 free_cma:0

Comment 4 David Gibson 2017-11-13 06:13:27 UTC

stress-ng is deliberately designed to allocate lots of memory, until it can't any more.  The call traces are memory allocation failures, caused by stress-ng, as expected.

Unless something else is going wrong, this doesn't appear to be a bug.

Comment 5 David Gibson 2017-11-15 05:19:16 UTC

Spoke with yilzhang.  I suspect the problem at comment 3 is just allocation failures from the stress, along with appearing unresponsive because it's loaded by the stress.

In any case the original qemu segv is no longer reproducible, so if there's a remaining problem it can be filed as a different bug.