Bug 1432382

Summary: Hot-unplug "device_del dimm1" induce qemu-kvm coredump (hotplug at guest boot up stage)
Product: Red Hat Enterprise Linux 7 Reporter: Min Deng <mdeng>
Component: qemu-kvm-rhevAssignee: Laurent Vivier <lvivier>
Status: CLOSED ERRATA QA Contact: Min Deng <mdeng>
Severity: high Docs Contact:
Priority: high    
Version: 7.4CC: dgibson, hannsj_uhl, knoel, lvivier, mdeng, michen, mrezanin, qzhang, virt-maint, yuhuang, zhengtli
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64le   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 03:39:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1448344    

Description Min Deng 2017-03-15 09:35:34 UTC
Description of problem:
Hot-unplug "device_del dimm1" induce qemu-kvm coredump 
Version-Release number of selected component (if applicable):
ppc64le
kernel-3.10.0-600.el7.ppc64le
qemu-kvm-rhev-2.8.0-6.el7.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch
How reproducible:
2/3
Steps to Reproduce:
1.boot up guest with the following cli
  /usr/libexec/qemu-kvm -name virt-tests-vm1 -sandbox off -machine pseries-rhel7.4.0 -nodefaults -vga std -chardev socket,id=hmp_id_humanmonitor1,path=/tmp/monitor-humanmonitor1-20151207-185515-CKlGrjUv,server,nowait -mon chardev=hmp_id_humanmonitor1,mode=readline -chardev socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20151207-185515-CKlGrjUv,server,nowait -mon chardev=qmp_id_qmp1,mode=control -chardev socket,id=hmp_id_catch_monitor,path=/tmp/monitor-catch_monitor-20151207-185515-CKlGrjUv,server,nowait -mon chardev=hmp_id_catch_monitor,mode=readline -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20151207-185515-CKlGrjUv,server,nowait -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03,disable-legacy=off,disable-modern=on -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file=rhel74-ppc64le-virtio-scsi-latest.qcow2 -device scsi-hd,id=image1,drive=drive_image1 -numa node -qmp tcp:0:4444,server,nowait -vnc :1 -rtc base=utc,clock=host,driftfix=slew -boot order=cdn,once=c,menu=off,strict=off -enable-kvm -monitor stdio -device pci-ohci,id=usb1 -device usb-kbd,id=input0 -device usb-mouse,id=input1 -device usb-tablet,id=input2 -netdev tap,script=/etc/qemu-ifup,downscript=/etc/qemu-down,id=hostnet1,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:52:11:36:3f:00  -m 4G,slots=4,maxmem=8G -numa node
2.Hotplug memory for the guest during stage of booting up.*It is a must*.
  (qemu) object_add memory-backend-ram,id=mem1,size=1G
  (qemu) device_add pc-dimm,id=dimm1,memdev=mem1

3.And then try to unplug it
  (qemu) device_del dimm1

Actual results:
(qemu) device_del dimm1
(qemu) qemu-kvm: used ring relocated for ring 2
qemu-kvm: /builddir/build/BUILD/qemu-2.8.0/hw/virtio/vhost.c:622: vhost_commit: Assertion `r >= 0' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x3fffb3bbeab0 (LWP 48326)]
0x00003fffb6f3eb98 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install alsa-lib-1.1.3-3.el7.ppc64le bzip2-libs-1.0.6-13.el7.ppc64le cyrus-sasl-lib-2.1.26-21.el7.ppc64le cyrus-sasl-md5-2.1.26-21.el7.ppc64le cyrus-sasl-plain-2.1.26-21.el7.ppc64le dbus-libs-1.6.12-17.el7.ppc64le elfutils-libelf-0.168-5.el7.ppc64le elfutils-libs-0.168-5.el7.ppc64le flac-libs-1.3.0-5.el7_1.ppc64le glib2-2.46.2-4.el7.ppc64le glibc-2.17-171.el7.ppc64le gmp-6.0.0-12.el7_1.ppc64le gnutls-3.3.26-6.el7.ppc64le gperftools-libs-2.4-8.el7.ppc64le gsm-1.0.13-11.el7.ppc64le keyutils-libs-1.5.8-3.el7.ppc64le krb5-libs-1.15-2.el7.ppc64le libICE-1.0.9-5.el7.ppc64le libSM-1.2.2-2.el7.ppc64le libX11-1.6.4-4.el7.ppc64le libXau-1.0.8-2.1.el7.ppc64le libXext-1.3.3-3.el7.ppc64le libXi-1.7.9-1.el7.ppc64le libXtst-1.2.3-1.el7.ppc64le libaio-0.3.109-13.el7.ppc64le libasyncns-0.8-7.el7.ppc64le libattr-2.4.46-12.el7.ppc64le libcap-2.22-9.el7.ppc64le libcom_err-1.42.9-9.el7.ppc64le libcurl-7.29.0-39.el7.ppc64le libdb-5.3.21-19.el7.ppc64le libfdt-1.4.0-2.el7.ppc64le libffi-3.0.13-18.el7.ppc64le libgcc-4.8.5-11.el7.ppc64le libgcrypt-1.5.3-14.el7.ppc64le libgpg-error-1.12-3.el7.ppc64le libibverbs-12-2.el7.ppc64le libidn-1.28-4.el7.ppc64le libiscsi-1.9.0-7.el7.ppc64le libnl3-3.2.28-3.el7_3.ppc64le libogg-1.3.0-7.el7.ppc64le libpng-1.5.13-7.el7_2.ppc64le librdmacm-12-2.el7.ppc64le libseccomp-2.3.1-2.el7.ppc64le libselinux-2.5-9.el7.ppc64le libsndfile-1.0.25-10.el7.ppc64le libssh2-1.4.3-10.el7_2.1.ppc64le libstdc++-4.8.5-11.el7.ppc64le libtasn1-4.10-1.el7.ppc64le libusbx-1.0.20-1.el7.ppc64le libuuid-2.23.2-33.el7.ppc64le libvorbis-1.3.3-8.el7.ppc64le libxcb-1.12-1.el7.ppc64le lzo-2.06-8.el7.ppc64le nettle-2.7.1-8.el7.ppc64le nspr-4.13.1-1.0.el7.ppc64le nss-3.28.3-2.el7.ppc64le nss-softokn-freebl-3.28.3-2.el7.ppc64le nss-util-3.28.3-2.el7.ppc64le numactl-libs-2.0.9-6.el7_2.ppc64le openldap-2.4.44-1.el7.ppc64le openssl-libs-1.0.2k-3.el7.ppc64le p11-kit-0.23.5-1.el7.ppc64le pcre-8.32-17.el7.ppc64le pixman-0.34.0-1.el7.ppc64le pulseaudio-libs-10.0-2.el7.ppc64le snappy-1.1.0-3.el7.ppc64le systemd-libs-219-32.el7.ppc64le tcp_wrappers-libs-7.6-77.el7.ppc64le xz-libs-5.2.2-1.el7.ppc64le zlib-1.2.7-17.el7.ppc64le
(gdb) bt
#0  0x00003fffb6f3eb98 in raise () from /lib64/libc.so.6
#1  0x00003fffb6f40d1c in abort () from /lib64/libc.so.6
#2  0x00003fffb6f34924 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00003fffb6f34a14 in __assert_fail () from /lib64/libc.so.6
#4  0x0000000059ba1a58 in vhost_commit (listener=0x5adf0000) at /usr/src/debug/qemu-2.8.0/hw/virtio/vhost.c:622
#5  0x0000000059b4a658 in memory_region_transaction_commit () at /usr/src/debug/qemu-2.8.0/memory.c:929
#6  0x0000000059d19c4c in pc_dimm_memory_unplug (dev=0x5af60f30, hpms=0x5ae501e0, mr=0x5adb5b20) at hw/mem/pc-dimm.c:125
#7  0x0000000059bac870 in spapr_memory_unplug (errp=0x5a513e00 <error_abort>, dev=0x5af60f30, hotplug_dev=0x5ae50000) at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr.c:2421
#8  spapr_machine_device_unplug (hotplug_dev=0x5ae50000, dev=0x5af60f30, errp=0x5a513e00 <error_abort>) at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr.c:2523
#9  0x0000000059d08880 in hotplug_handler_unplug (plug_handler=0x5ae50000, plugged_dev=0x5af60f30, errp=0x5a513e00 <error_abort>) at hw/core/hotplug.c:56
#10 0x0000000059bac668 in spapr_lmb_release (dev=0x5af60f30, opaque=<optimized out>) at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr.c:2381
#11 0x0000000059bc2d80 in detach (drc=0x5ae10600, d=<optimized out>, detach_cb=0x59bac610 <spapr_lmb_release>, detach_cb_opaque=0x5ad5fa88, errp=<optimized out>)
    at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr_drc.c:442
#12 0x0000000059bc33b0 in set_allocation_state (drc=0x5ae10600, state=<optimized out>) at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr_drc.c:145
#13 0x0000000059bba274 in rtas_set_indicator (cpu=<optimized out>, spapr=0x5ae50000, token=<optimized out>, nargs=<optimized out>, args=<optimized out>, nret=<optimized out>, 
    rets=<optimized out>) at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr_rtas.c:459
#14 0x0000000059bbb3dc in spapr_rtas_call (cpu=<optimized out>, spapr=<optimized out>, token=<optimized out>, nargs=<optimized out>, args=<optimized out>, nret=<optimized out>, 
    rets=<optimized out>) at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr_rtas.c:665
#15 0x0000000059bb6164 in h_rtas (cpu=0x5b480000, spapr=0x5ae50000, opcode=<optimized out>, args=<optimized out>) at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr_hcall.c:666
#16 0x0000000059bb8738 in spapr_hypercall (cpu=0x5b480000, opcode=61440, args=0x3fffb33a0030) at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr_hcall.c:1081
#17 0x0000000059c672b4 in kvm_arch_handle_exit (cs=0x5b480000, run=0x3fffb33a0000) at /usr/src/debug/qemu-2.8.0/target-ppc/kvm.c:1757
#18 0x0000000059b45458 in kvm_cpu_exec (cpu=0x5b480000) at /usr/src/debug/qemu-2.8.0/kvm-all.c:2038
#19 0x0000000059b2baf0 in qemu_kvm_cpu_thread_fn (arg=<optimized out>) at /usr/src/debug/qemu-2.8.0/cpus.c:998
#20 0x00003fffb70e8728 in start_thread () from /lib64/libpthread.so.0
#21 0x00003fffb701de50 in clone () from /lib64/libc.so.6


Expected results:
The operation is successfully

Additional info:

Comment 2 Min Deng 2017-03-15 09:38:34 UTC
For x86 test,QE will update it to the bug as soon as the result burns out.

Comment 3 Min Deng 2017-03-15 09:55:02 UTC
Both host and guest's kernel is kernel-3.10.0-600.el7.ppc64le.

Comment 4 Yumei Huang 2017-03-16 03:09:09 UTC
Test on x86 host, and couldn't reproduce. 

qemu-kvm-rhev-2.8.0-6.el7
kernel-3.10.0-610.el7.x86_64

Comment 5 David Gibson 2017-03-16 04:32:13 UTC
This is a real bug.  However, I know this code has been changed in qemu-2.9, so I'm not sure there's much point debugging this in detail until we have the qemu-2.9 rebase.

Comment 6 David Gibson 2017-03-22 04:08:04 UTC
Are you able to try this with the preliminary qemu-2.9 based packages?

Comment 7 Min Deng 2017-03-23 05:39:37 UTC
Hi David,
   QE tried the bug on preliminary qemu-2.9.Unfortunately,QE still can *reproduce* the issue.Thanks.
   build info,
   kernel-3.10.0-600.el7.ppc64le (guest)
   qemu-kvm-rhev-2.9.0-0.el7.mrezanin201703210848.ppc64le
   SLOF-20160223-6.gitdbbfda4.el7.noarch
   The detail steps please just refer to comment0
Error messages,
(qemu) object_add memory-backend-ram,id=mem1,size=1G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1
(qemu) device_del dimm1
(qemu) qemu-kvm: used ring relocated for ring 2
qemu-kvm: /builddir/build/BUILD/qemu-2.9.0/hw/virtio/vhost.c:651: vhost_commit: Assertion `r >= 0' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x3fffb3baeaa0 (LWP 11470)]
0x00003fffb6f2edc8 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00003fffb6f2edc8 in raise () from /lib64/libc.so.6
#1  0x00003fffb6f30f4c in abort () from /lib64/libc.so.6
#2  0x00003fffb6f24b44 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00003fffb6f24c34 in __assert_fail () from /lib64/libc.so.6
#4  0x0000000046f93838 in vhost_commit (listener=0x48340288) at /usr/src/debug/qemu-2.9.0/hw/virtio/vhost.c:651
#5  0x0000000046f3a628 in memory_region_transaction_commit () at /usr/src/debug/qemu-2.9.0/memory.c:931
#6  0x00000000471149dc in pc_dimm_memory_unplug (dev=0x482f1290, hpms=0x48350530, mr=0x481e43c0) at hw/mem/pc-dimm.c:125
#7  0x0000000046f9fc60 in spapr_memory_unplug (errp=0x479482a8 <error_abort>, dev=0x482f1290, hotplug_dev=0x48350340) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr.c:2606
#8  spapr_machine_device_unplug (hotplug_dev=0x48350340, dev=0x482f1290, errp=0x479482a8 <error_abort>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr.c:2867
#9  0x0000000047102280 in hotplug_handler_unplug (plug_handler=0x48350340, plugged_dev=0x482f1290, errp=0x479482a8 <error_abort>) at hw/core/hotplug.c:56
#10 0x0000000046f9fa48 in spapr_lmb_release (dev=0x482f1290, opaque=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr.c:2566
#11 0x0000000046fb6ca0 in detach (drc=0x48240840, d=<optimized out>, detach_cb=0x46f9f9f0 <spapr_lmb_release>, detach_cb_opaque=0x48c82610, errp=<optimized out>)
    at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_drc.c:447
#12 0x0000000046fb72d0 in set_allocation_state (drc=0x48240840, state=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_drc.c:145
#13 0x0000000046fadf54 in rtas_set_indicator (cpu=<optimized out>, spapr=0x48350340, token=<optimized out>, nargs=<optimized out>, args=<optimized out>, nret=<optimized out>, 
    rets=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_rtas.c:460
#14 0x0000000046faf0bc in spapr_rtas_call (cpu=<optimized out>, spapr=<optimized out>, token=<optimized out>, nargs=<optimized out>, args=<optimized out>, nret=<optimized out>, 
    rets=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_rtas.c:666
#15 0x0000000046fa9c24 in h_rtas (cpu=0x488c0000, spapr=0x48350340, opcode=<optimized out>, args=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_hcall.c:663
#16 0x0000000046fac3f8 in spapr_hypercall (cpu=0x488c0000, opcode=61440, args=0x3fffb3390030) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_hcall.c:1055
#17 0x0000000047065ab4 in kvm_arch_handle_exit (cs=0x488c0000, run=0x3fffb3390000) at /usr/src/debug/qemu-2.9.0/target/ppc/kvm.c:1688
#18 0x0000000046f353d8 in kvm_cpu_exec (cpu=0x488c0000) at /usr/src/debug/qemu-2.9.0/kvm-all.c:2113
#19 0x0000000046f1a980 in qemu_kvm_cpu_thread_fn (arg=0x488c0000) at /usr/src/debug/qemu-2.9.0/cpus.c:1087
#20 0x00003fffb70e8728 in start_thread () from /lib64/libpthread.so.0
#21 0x00003fffb70113d0 in clone () from /lib64/libc.so.6

Thanks 
Min

Comment 9 Laurent Vivier 2017-03-24 09:55:38 UTC
I think this is crashing because the memory we hot-unplug is in use by vhost.

But I'm not able to reproduce.

What I understand is you hotplug the memory before the kernel has finished to boot and you hot-unplug it once it has booted.

Could you:
- try to add on the command line the hotplugged memory instead of hotplugging it
  manually: "... -object memory-backend-ram,id=mem1,size=1G \
                 -device pc-dimm,id=dimm1,memdev=mem1 ..."
- try with the latest built kernel in the guest (I'm testing with -612)

Thanks

Comment 10 Laurent Vivier 2017-03-24 10:00:44 UTC
OK... I'm able to reproduce if I connect via ssh to the guest before unplugging the memory

Comment 11 Laurent Vivier 2017-03-24 10:16:55 UTC
I have strange message in the guest kernel log whil i'm unplugging the memory:
(qemu) device_del dimm1

[   39.422692] pseries-hotplug-mem: Attempting to hot-add 4 LMB(s)

2017-03-24T10:12:26.396259Z qemu-system-ppc64: used ring relocated for ring 2
qemu-system-ppc64: /home/lvivier/Projects/qemu/hw/virtio/vhost.c:651: vhost_commit: Assertion `r >= 0' failed.

Comment 12 Laurent Vivier 2017-03-24 15:08:37 UTC
This problem can be reproduced with upstream qemu.

Comment 13 Laurent Vivier 2017-03-27 11:59:54 UTC
We have the crash because kernel is answering to hotplug event while we have started to hot-unplug the memory, so there is an inconsistency between the internal state of QEMU and the information sent by the kernel.

Comment 14 Laurent Vivier 2017-03-27 13:04:30 UTC
Some details to reproduce the problem:

- start QEMU with:

    -S -serial mon:stdio \
    -netdev tap,script=/etc/qemu-ifup,\
    downscript=/etc/qemu-down,id=hostnet0,vhost=on \
    -device virtio-net-pci,netdev=hostnet0

- swith to the monitor and execute:

  (qemu) object_add memory-backend-ram,id=mem1,size=1G
  (qemu) device_add pc-dimm,id=dimm1,memdev=mem1
  (qemu) continue

- once the OS is started, start an ssh connection to the guest

- switch to the monitor and execute:

  (qemu) device_del dimm1

Comment 15 Laurent Vivier 2017-03-30 14:43:34 UTC
Merged upstream: commit fe6824d ("spapr: fix memory hot-unplugging")

Comment 16 David Gibson 2017-03-31 03:50:38 UTC
Laurent,

Last I heard we may not be getting rebases to the later 2.9 rcs, so can you post the relevant patch downstream as well please.

Comment 18 Min Deng 2017-04-25 07:16:14 UTC
Verified the bug on the following builds
kernel-3.10.0-655.el7.ppc64le (guest and host)
qemu-kvm-rhev-2.9.0-1.el7.ppc64le
SLOF-20170303-1.git66d250e.el7.noarch

Detail steps,
please refer to comment0&comment14.
Expected results,
The original issue has already been fixed.
Actual results,
The issue has been fixed already.

So move the bug to verified status,thanks for everyone's effort.

Comment 20 errata-xmlrpc 2017-08-02 03:39:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392