Bug 1432382
Summary: | Hot-unplug "device_del dimm1" induce qemu-kvm coredump (hotplug at guest boot up stage) | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Min Deng <mdeng> |
Component: | qemu-kvm-rhev | Assignee: | Laurent Vivier <lvivier> |
Status: | CLOSED ERRATA | QA Contact: | Min Deng <mdeng> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.4 | CC: | dgibson, hannsj_uhl, knoel, lvivier, mdeng, michen, mrezanin, qzhang, virt-maint, yuhuang, zhengtli |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | ppc64le | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-rhev-2.9.0-1.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-02 03:39:56 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1448344 |
Description
Min Deng
2017-03-15 09:35:34 UTC
For x86 test,QE will update it to the bug as soon as the result burns out. Both host and guest's kernel is kernel-3.10.0-600.el7.ppc64le. Test on x86 host, and couldn't reproduce. qemu-kvm-rhev-2.8.0-6.el7 kernel-3.10.0-610.el7.x86_64 This is a real bug. However, I know this code has been changed in qemu-2.9, so I'm not sure there's much point debugging this in detail until we have the qemu-2.9 rebase. Are you able to try this with the preliminary qemu-2.9 based packages? Hi David, QE tried the bug on preliminary qemu-2.9.Unfortunately,QE still can *reproduce* the issue.Thanks. build info, kernel-3.10.0-600.el7.ppc64le (guest) qemu-kvm-rhev-2.9.0-0.el7.mrezanin201703210848.ppc64le SLOF-20160223-6.gitdbbfda4.el7.noarch The detail steps please just refer to comment0 Error messages, (qemu) object_add memory-backend-ram,id=mem1,size=1G (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 (qemu) device_del dimm1 (qemu) qemu-kvm: used ring relocated for ring 2 qemu-kvm: /builddir/build/BUILD/qemu-2.9.0/hw/virtio/vhost.c:651: vhost_commit: Assertion `r >= 0' failed. Program received signal SIGABRT, Aborted. [Switching to Thread 0x3fffb3baeaa0 (LWP 11470)] 0x00003fffb6f2edc8 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00003fffb6f2edc8 in raise () from /lib64/libc.so.6 #1 0x00003fffb6f30f4c in abort () from /lib64/libc.so.6 #2 0x00003fffb6f24b44 in __assert_fail_base () from /lib64/libc.so.6 #3 0x00003fffb6f24c34 in __assert_fail () from /lib64/libc.so.6 #4 0x0000000046f93838 in vhost_commit (listener=0x48340288) at /usr/src/debug/qemu-2.9.0/hw/virtio/vhost.c:651 #5 0x0000000046f3a628 in memory_region_transaction_commit () at /usr/src/debug/qemu-2.9.0/memory.c:931 #6 0x00000000471149dc in pc_dimm_memory_unplug (dev=0x482f1290, hpms=0x48350530, mr=0x481e43c0) at hw/mem/pc-dimm.c:125 #7 0x0000000046f9fc60 in spapr_memory_unplug (errp=0x479482a8 <error_abort>, dev=0x482f1290, hotplug_dev=0x48350340) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr.c:2606 #8 spapr_machine_device_unplug (hotplug_dev=0x48350340, dev=0x482f1290, errp=0x479482a8 <error_abort>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr.c:2867 #9 0x0000000047102280 in hotplug_handler_unplug (plug_handler=0x48350340, plugged_dev=0x482f1290, errp=0x479482a8 <error_abort>) at hw/core/hotplug.c:56 #10 0x0000000046f9fa48 in spapr_lmb_release (dev=0x482f1290, opaque=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr.c:2566 #11 0x0000000046fb6ca0 in detach (drc=0x48240840, d=<optimized out>, detach_cb=0x46f9f9f0 <spapr_lmb_release>, detach_cb_opaque=0x48c82610, errp=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_drc.c:447 #12 0x0000000046fb72d0 in set_allocation_state (drc=0x48240840, state=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_drc.c:145 #13 0x0000000046fadf54 in rtas_set_indicator (cpu=<optimized out>, spapr=0x48350340, token=<optimized out>, nargs=<optimized out>, args=<optimized out>, nret=<optimized out>, rets=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_rtas.c:460 #14 0x0000000046faf0bc in spapr_rtas_call (cpu=<optimized out>, spapr=<optimized out>, token=<optimized out>, nargs=<optimized out>, args=<optimized out>, nret=<optimized out>, rets=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_rtas.c:666 #15 0x0000000046fa9c24 in h_rtas (cpu=0x488c0000, spapr=0x48350340, opcode=<optimized out>, args=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_hcall.c:663 #16 0x0000000046fac3f8 in spapr_hypercall (cpu=0x488c0000, opcode=61440, args=0x3fffb3390030) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_hcall.c:1055 #17 0x0000000047065ab4 in kvm_arch_handle_exit (cs=0x488c0000, run=0x3fffb3390000) at /usr/src/debug/qemu-2.9.0/target/ppc/kvm.c:1688 #18 0x0000000046f353d8 in kvm_cpu_exec (cpu=0x488c0000) at /usr/src/debug/qemu-2.9.0/kvm-all.c:2113 #19 0x0000000046f1a980 in qemu_kvm_cpu_thread_fn (arg=0x488c0000) at /usr/src/debug/qemu-2.9.0/cpus.c:1087 #20 0x00003fffb70e8728 in start_thread () from /lib64/libpthread.so.0 #21 0x00003fffb70113d0 in clone () from /lib64/libc.so.6 Thanks Min I think this is crashing because the memory we hot-unplug is in use by vhost. But I'm not able to reproduce. What I understand is you hotplug the memory before the kernel has finished to boot and you hot-unplug it once it has booted. Could you: - try to add on the command line the hotplugged memory instead of hotplugging it manually: "... -object memory-backend-ram,id=mem1,size=1G \ -device pc-dimm,id=dimm1,memdev=mem1 ..." - try with the latest built kernel in the guest (I'm testing with -612) Thanks OK... I'm able to reproduce if I connect via ssh to the guest before unplugging the memory I have strange message in the guest kernel log whil i'm unplugging the memory: (qemu) device_del dimm1 [ 39.422692] pseries-hotplug-mem: Attempting to hot-add 4 LMB(s) 2017-03-24T10:12:26.396259Z qemu-system-ppc64: used ring relocated for ring 2 qemu-system-ppc64: /home/lvivier/Projects/qemu/hw/virtio/vhost.c:651: vhost_commit: Assertion `r >= 0' failed. This problem can be reproduced with upstream qemu. We have the crash because kernel is answering to hotplug event while we have started to hot-unplug the memory, so there is an inconsistency between the internal state of QEMU and the information sent by the kernel. Some details to reproduce the problem: - start QEMU with: -S -serial mon:stdio \ -netdev tap,script=/etc/qemu-ifup,\ downscript=/etc/qemu-down,id=hostnet0,vhost=on \ -device virtio-net-pci,netdev=hostnet0 - swith to the monitor and execute: (qemu) object_add memory-backend-ram,id=mem1,size=1G (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 (qemu) continue - once the OS is started, start an ssh connection to the guest - switch to the monitor and execute: (qemu) device_del dimm1 Merged upstream: commit fe6824d ("spapr: fix memory hot-unplugging") Laurent, Last I heard we may not be getting rebases to the later 2.9 rcs, so can you post the relevant patch downstream as well please. Verified the bug on the following builds kernel-3.10.0-655.el7.ppc64le (guest and host) qemu-kvm-rhev-2.9.0-1.el7.ppc64le SLOF-20170303-1.git66d250e.el7.noarch Detail steps, please refer to comment0&comment14. Expected results, The original issue has already been fixed. Actual results, The issue has been fixed already. So move the bug to verified status,thanks for everyone's effort. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 |