RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1432382 - Hot-unplug "device_del dimm1" induce qemu-kvm coredump (hotplug at guest boot up stage)
Summary: Hot-unplug "device_del dimm1" induce qemu-kvm coredump (hotplug at guest boot...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.4
Hardware: ppc64le
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Laurent Vivier
QA Contact: Min Deng
URL:
Whiteboard:
Depends On:
Blocks: 1448344
TreeView+ depends on / blocked
 
Reported: 2017-03-15 09:35 UTC by Min Deng
Modified: 2017-08-02 03:39 UTC (History)
11 users (show)

Fixed In Version: qemu-kvm-rhev-2.9.0-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-02 03:39:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2392 0 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2017-08-01 20:04:36 UTC

Description Min Deng 2017-03-15 09:35:34 UTC
Description of problem:
Hot-unplug "device_del dimm1" induce qemu-kvm coredump 
Version-Release number of selected component (if applicable):
ppc64le
kernel-3.10.0-600.el7.ppc64le
qemu-kvm-rhev-2.8.0-6.el7.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch
How reproducible:
2/3
Steps to Reproduce:
1.boot up guest with the following cli
  /usr/libexec/qemu-kvm -name virt-tests-vm1 -sandbox off -machine pseries-rhel7.4.0 -nodefaults -vga std -chardev socket,id=hmp_id_humanmonitor1,path=/tmp/monitor-humanmonitor1-20151207-185515-CKlGrjUv,server,nowait -mon chardev=hmp_id_humanmonitor1,mode=readline -chardev socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20151207-185515-CKlGrjUv,server,nowait -mon chardev=qmp_id_qmp1,mode=control -chardev socket,id=hmp_id_catch_monitor,path=/tmp/monitor-catch_monitor-20151207-185515-CKlGrjUv,server,nowait -mon chardev=hmp_id_catch_monitor,mode=readline -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20151207-185515-CKlGrjUv,server,nowait -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03,disable-legacy=off,disable-modern=on -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file=rhel74-ppc64le-virtio-scsi-latest.qcow2 -device scsi-hd,id=image1,drive=drive_image1 -numa node -qmp tcp:0:4444,server,nowait -vnc :1 -rtc base=utc,clock=host,driftfix=slew -boot order=cdn,once=c,menu=off,strict=off -enable-kvm -monitor stdio -device pci-ohci,id=usb1 -device usb-kbd,id=input0 -device usb-mouse,id=input1 -device usb-tablet,id=input2 -netdev tap,script=/etc/qemu-ifup,downscript=/etc/qemu-down,id=hostnet1,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:52:11:36:3f:00  -m 4G,slots=4,maxmem=8G -numa node
2.Hotplug memory for the guest during stage of booting up.*It is a must*.
  (qemu) object_add memory-backend-ram,id=mem1,size=1G
  (qemu) device_add pc-dimm,id=dimm1,memdev=mem1

3.And then try to unplug it
  (qemu) device_del dimm1

Actual results:
(qemu) device_del dimm1
(qemu) qemu-kvm: used ring relocated for ring 2
qemu-kvm: /builddir/build/BUILD/qemu-2.8.0/hw/virtio/vhost.c:622: vhost_commit: Assertion `r >= 0' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x3fffb3bbeab0 (LWP 48326)]
0x00003fffb6f3eb98 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install alsa-lib-1.1.3-3.el7.ppc64le bzip2-libs-1.0.6-13.el7.ppc64le cyrus-sasl-lib-2.1.26-21.el7.ppc64le cyrus-sasl-md5-2.1.26-21.el7.ppc64le cyrus-sasl-plain-2.1.26-21.el7.ppc64le dbus-libs-1.6.12-17.el7.ppc64le elfutils-libelf-0.168-5.el7.ppc64le elfutils-libs-0.168-5.el7.ppc64le flac-libs-1.3.0-5.el7_1.ppc64le glib2-2.46.2-4.el7.ppc64le glibc-2.17-171.el7.ppc64le gmp-6.0.0-12.el7_1.ppc64le gnutls-3.3.26-6.el7.ppc64le gperftools-libs-2.4-8.el7.ppc64le gsm-1.0.13-11.el7.ppc64le keyutils-libs-1.5.8-3.el7.ppc64le krb5-libs-1.15-2.el7.ppc64le libICE-1.0.9-5.el7.ppc64le libSM-1.2.2-2.el7.ppc64le libX11-1.6.4-4.el7.ppc64le libXau-1.0.8-2.1.el7.ppc64le libXext-1.3.3-3.el7.ppc64le libXi-1.7.9-1.el7.ppc64le libXtst-1.2.3-1.el7.ppc64le libaio-0.3.109-13.el7.ppc64le libasyncns-0.8-7.el7.ppc64le libattr-2.4.46-12.el7.ppc64le libcap-2.22-9.el7.ppc64le libcom_err-1.42.9-9.el7.ppc64le libcurl-7.29.0-39.el7.ppc64le libdb-5.3.21-19.el7.ppc64le libfdt-1.4.0-2.el7.ppc64le libffi-3.0.13-18.el7.ppc64le libgcc-4.8.5-11.el7.ppc64le libgcrypt-1.5.3-14.el7.ppc64le libgpg-error-1.12-3.el7.ppc64le libibverbs-12-2.el7.ppc64le libidn-1.28-4.el7.ppc64le libiscsi-1.9.0-7.el7.ppc64le libnl3-3.2.28-3.el7_3.ppc64le libogg-1.3.0-7.el7.ppc64le libpng-1.5.13-7.el7_2.ppc64le librdmacm-12-2.el7.ppc64le libseccomp-2.3.1-2.el7.ppc64le libselinux-2.5-9.el7.ppc64le libsndfile-1.0.25-10.el7.ppc64le libssh2-1.4.3-10.el7_2.1.ppc64le libstdc++-4.8.5-11.el7.ppc64le libtasn1-4.10-1.el7.ppc64le libusbx-1.0.20-1.el7.ppc64le libuuid-2.23.2-33.el7.ppc64le libvorbis-1.3.3-8.el7.ppc64le libxcb-1.12-1.el7.ppc64le lzo-2.06-8.el7.ppc64le nettle-2.7.1-8.el7.ppc64le nspr-4.13.1-1.0.el7.ppc64le nss-3.28.3-2.el7.ppc64le nss-softokn-freebl-3.28.3-2.el7.ppc64le nss-util-3.28.3-2.el7.ppc64le numactl-libs-2.0.9-6.el7_2.ppc64le openldap-2.4.44-1.el7.ppc64le openssl-libs-1.0.2k-3.el7.ppc64le p11-kit-0.23.5-1.el7.ppc64le pcre-8.32-17.el7.ppc64le pixman-0.34.0-1.el7.ppc64le pulseaudio-libs-10.0-2.el7.ppc64le snappy-1.1.0-3.el7.ppc64le systemd-libs-219-32.el7.ppc64le tcp_wrappers-libs-7.6-77.el7.ppc64le xz-libs-5.2.2-1.el7.ppc64le zlib-1.2.7-17.el7.ppc64le
(gdb) bt
#0  0x00003fffb6f3eb98 in raise () from /lib64/libc.so.6
#1  0x00003fffb6f40d1c in abort () from /lib64/libc.so.6
#2  0x00003fffb6f34924 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00003fffb6f34a14 in __assert_fail () from /lib64/libc.so.6
#4  0x0000000059ba1a58 in vhost_commit (listener=0x5adf0000) at /usr/src/debug/qemu-2.8.0/hw/virtio/vhost.c:622
#5  0x0000000059b4a658 in memory_region_transaction_commit () at /usr/src/debug/qemu-2.8.0/memory.c:929
#6  0x0000000059d19c4c in pc_dimm_memory_unplug (dev=0x5af60f30, hpms=0x5ae501e0, mr=0x5adb5b20) at hw/mem/pc-dimm.c:125
#7  0x0000000059bac870 in spapr_memory_unplug (errp=0x5a513e00 <error_abort>, dev=0x5af60f30, hotplug_dev=0x5ae50000) at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr.c:2421
#8  spapr_machine_device_unplug (hotplug_dev=0x5ae50000, dev=0x5af60f30, errp=0x5a513e00 <error_abort>) at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr.c:2523
#9  0x0000000059d08880 in hotplug_handler_unplug (plug_handler=0x5ae50000, plugged_dev=0x5af60f30, errp=0x5a513e00 <error_abort>) at hw/core/hotplug.c:56
#10 0x0000000059bac668 in spapr_lmb_release (dev=0x5af60f30, opaque=<optimized out>) at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr.c:2381
#11 0x0000000059bc2d80 in detach (drc=0x5ae10600, d=<optimized out>, detach_cb=0x59bac610 <spapr_lmb_release>, detach_cb_opaque=0x5ad5fa88, errp=<optimized out>)
    at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr_drc.c:442
#12 0x0000000059bc33b0 in set_allocation_state (drc=0x5ae10600, state=<optimized out>) at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr_drc.c:145
#13 0x0000000059bba274 in rtas_set_indicator (cpu=<optimized out>, spapr=0x5ae50000, token=<optimized out>, nargs=<optimized out>, args=<optimized out>, nret=<optimized out>, 
    rets=<optimized out>) at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr_rtas.c:459
#14 0x0000000059bbb3dc in spapr_rtas_call (cpu=<optimized out>, spapr=<optimized out>, token=<optimized out>, nargs=<optimized out>, args=<optimized out>, nret=<optimized out>, 
    rets=<optimized out>) at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr_rtas.c:665
#15 0x0000000059bb6164 in h_rtas (cpu=0x5b480000, spapr=0x5ae50000, opcode=<optimized out>, args=<optimized out>) at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr_hcall.c:666
#16 0x0000000059bb8738 in spapr_hypercall (cpu=0x5b480000, opcode=61440, args=0x3fffb33a0030) at /usr/src/debug/qemu-2.8.0/hw/ppc/spapr_hcall.c:1081
#17 0x0000000059c672b4 in kvm_arch_handle_exit (cs=0x5b480000, run=0x3fffb33a0000) at /usr/src/debug/qemu-2.8.0/target-ppc/kvm.c:1757
#18 0x0000000059b45458 in kvm_cpu_exec (cpu=0x5b480000) at /usr/src/debug/qemu-2.8.0/kvm-all.c:2038
#19 0x0000000059b2baf0 in qemu_kvm_cpu_thread_fn (arg=<optimized out>) at /usr/src/debug/qemu-2.8.0/cpus.c:998
#20 0x00003fffb70e8728 in start_thread () from /lib64/libpthread.so.0
#21 0x00003fffb701de50 in clone () from /lib64/libc.so.6


Expected results:
The operation is successfully

Additional info:

Comment 2 Min Deng 2017-03-15 09:38:34 UTC
For x86 test,QE will update it to the bug as soon as the result burns out.

Comment 3 Min Deng 2017-03-15 09:55:02 UTC
Both host and guest's kernel is kernel-3.10.0-600.el7.ppc64le.

Comment 4 Yumei Huang 2017-03-16 03:09:09 UTC
Test on x86 host, and couldn't reproduce. 

qemu-kvm-rhev-2.8.0-6.el7
kernel-3.10.0-610.el7.x86_64

Comment 5 David Gibson 2017-03-16 04:32:13 UTC
This is a real bug.  However, I know this code has been changed in qemu-2.9, so I'm not sure there's much point debugging this in detail until we have the qemu-2.9 rebase.

Comment 6 David Gibson 2017-03-22 04:08:04 UTC
Are you able to try this with the preliminary qemu-2.9 based packages?

Comment 7 Min Deng 2017-03-23 05:39:37 UTC
Hi David,
   QE tried the bug on preliminary qemu-2.9.Unfortunately,QE still can *reproduce* the issue.Thanks.
   build info,
   kernel-3.10.0-600.el7.ppc64le (guest)
   qemu-kvm-rhev-2.9.0-0.el7.mrezanin201703210848.ppc64le
   SLOF-20160223-6.gitdbbfda4.el7.noarch
   The detail steps please just refer to comment0
Error messages,
(qemu) object_add memory-backend-ram,id=mem1,size=1G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1
(qemu) device_del dimm1
(qemu) qemu-kvm: used ring relocated for ring 2
qemu-kvm: /builddir/build/BUILD/qemu-2.9.0/hw/virtio/vhost.c:651: vhost_commit: Assertion `r >= 0' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x3fffb3baeaa0 (LWP 11470)]
0x00003fffb6f2edc8 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00003fffb6f2edc8 in raise () from /lib64/libc.so.6
#1  0x00003fffb6f30f4c in abort () from /lib64/libc.so.6
#2  0x00003fffb6f24b44 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00003fffb6f24c34 in __assert_fail () from /lib64/libc.so.6
#4  0x0000000046f93838 in vhost_commit (listener=0x48340288) at /usr/src/debug/qemu-2.9.0/hw/virtio/vhost.c:651
#5  0x0000000046f3a628 in memory_region_transaction_commit () at /usr/src/debug/qemu-2.9.0/memory.c:931
#6  0x00000000471149dc in pc_dimm_memory_unplug (dev=0x482f1290, hpms=0x48350530, mr=0x481e43c0) at hw/mem/pc-dimm.c:125
#7  0x0000000046f9fc60 in spapr_memory_unplug (errp=0x479482a8 <error_abort>, dev=0x482f1290, hotplug_dev=0x48350340) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr.c:2606
#8  spapr_machine_device_unplug (hotplug_dev=0x48350340, dev=0x482f1290, errp=0x479482a8 <error_abort>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr.c:2867
#9  0x0000000047102280 in hotplug_handler_unplug (plug_handler=0x48350340, plugged_dev=0x482f1290, errp=0x479482a8 <error_abort>) at hw/core/hotplug.c:56
#10 0x0000000046f9fa48 in spapr_lmb_release (dev=0x482f1290, opaque=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr.c:2566
#11 0x0000000046fb6ca0 in detach (drc=0x48240840, d=<optimized out>, detach_cb=0x46f9f9f0 <spapr_lmb_release>, detach_cb_opaque=0x48c82610, errp=<optimized out>)
    at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_drc.c:447
#12 0x0000000046fb72d0 in set_allocation_state (drc=0x48240840, state=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_drc.c:145
#13 0x0000000046fadf54 in rtas_set_indicator (cpu=<optimized out>, spapr=0x48350340, token=<optimized out>, nargs=<optimized out>, args=<optimized out>, nret=<optimized out>, 
    rets=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_rtas.c:460
#14 0x0000000046faf0bc in spapr_rtas_call (cpu=<optimized out>, spapr=<optimized out>, token=<optimized out>, nargs=<optimized out>, args=<optimized out>, nret=<optimized out>, 
    rets=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_rtas.c:666
#15 0x0000000046fa9c24 in h_rtas (cpu=0x488c0000, spapr=0x48350340, opcode=<optimized out>, args=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_hcall.c:663
#16 0x0000000046fac3f8 in spapr_hypercall (cpu=0x488c0000, opcode=61440, args=0x3fffb3390030) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_hcall.c:1055
#17 0x0000000047065ab4 in kvm_arch_handle_exit (cs=0x488c0000, run=0x3fffb3390000) at /usr/src/debug/qemu-2.9.0/target/ppc/kvm.c:1688
#18 0x0000000046f353d8 in kvm_cpu_exec (cpu=0x488c0000) at /usr/src/debug/qemu-2.9.0/kvm-all.c:2113
#19 0x0000000046f1a980 in qemu_kvm_cpu_thread_fn (arg=0x488c0000) at /usr/src/debug/qemu-2.9.0/cpus.c:1087
#20 0x00003fffb70e8728 in start_thread () from /lib64/libpthread.so.0
#21 0x00003fffb70113d0 in clone () from /lib64/libc.so.6

Thanks 
Min

Comment 9 Laurent Vivier 2017-03-24 09:55:38 UTC
I think this is crashing because the memory we hot-unplug is in use by vhost.

But I'm not able to reproduce.

What I understand is you hotplug the memory before the kernel has finished to boot and you hot-unplug it once it has booted.

Could you:
- try to add on the command line the hotplugged memory instead of hotplugging it
  manually: "... -object memory-backend-ram,id=mem1,size=1G \
                 -device pc-dimm,id=dimm1,memdev=mem1 ..."
- try with the latest built kernel in the guest (I'm testing with -612)

Thanks

Comment 10 Laurent Vivier 2017-03-24 10:00:44 UTC
OK... I'm able to reproduce if I connect via ssh to the guest before unplugging the memory

Comment 11 Laurent Vivier 2017-03-24 10:16:55 UTC
I have strange message in the guest kernel log whil i'm unplugging the memory:
(qemu) device_del dimm1

[   39.422692] pseries-hotplug-mem: Attempting to hot-add 4 LMB(s)

2017-03-24T10:12:26.396259Z qemu-system-ppc64: used ring relocated for ring 2
qemu-system-ppc64: /home/lvivier/Projects/qemu/hw/virtio/vhost.c:651: vhost_commit: Assertion `r >= 0' failed.

Comment 12 Laurent Vivier 2017-03-24 15:08:37 UTC
This problem can be reproduced with upstream qemu.

Comment 13 Laurent Vivier 2017-03-27 11:59:54 UTC
We have the crash because kernel is answering to hotplug event while we have started to hot-unplug the memory, so there is an inconsistency between the internal state of QEMU and the information sent by the kernel.

Comment 14 Laurent Vivier 2017-03-27 13:04:30 UTC
Some details to reproduce the problem:

- start QEMU with:

    -S -serial mon:stdio \
    -netdev tap,script=/etc/qemu-ifup,\
    downscript=/etc/qemu-down,id=hostnet0,vhost=on \
    -device virtio-net-pci,netdev=hostnet0

- swith to the monitor and execute:

  (qemu) object_add memory-backend-ram,id=mem1,size=1G
  (qemu) device_add pc-dimm,id=dimm1,memdev=mem1
  (qemu) continue

- once the OS is started, start an ssh connection to the guest

- switch to the monitor and execute:

  (qemu) device_del dimm1

Comment 15 Laurent Vivier 2017-03-30 14:43:34 UTC
Merged upstream: commit fe6824d ("spapr: fix memory hot-unplugging")

Comment 16 David Gibson 2017-03-31 03:50:38 UTC
Laurent,

Last I heard we may not be getting rebases to the later 2.9 rcs, so can you post the relevant patch downstream as well please.

Comment 18 Min Deng 2017-04-25 07:16:14 UTC
Verified the bug on the following builds
kernel-3.10.0-655.el7.ppc64le (guest and host)
qemu-kvm-rhev-2.9.0-1.el7.ppc64le
SLOF-20170303-1.git66d250e.el7.noarch

Detail steps,
please refer to comment0&comment14.
Expected results,
The original issue has already been fixed.
Actual results,
The issue has been fixed already.

So move the bug to verified status,thanks for everyone's effort.

Comment 20 errata-xmlrpc 2017-08-02 03:39:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392


Note You need to log in before you can comment on or make changes to this bug.