RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1528173 - Hot-unplug memory during booting early stage induced qemu-kvm coredump
Summary: Hot-unplug memory during booting early stage induced qemu-kvm coredump
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.5
Hardware: ppc64le
OS: Linux
high
high
Target Milestone: rc
: 7.5
Assignee: Serhii Popovych
QA Contact: Min Deng
URL:
Whiteboard:
Depends On: 1533857
Blocks: 1399177
TreeView+ depends on / blocked
 
Reported: 2017-12-21 07:54 UTC by Min Deng
Modified: 2018-04-11 00:57 UTC (History)
13 users (show)

Fixed In Version: qemu-kvm-rhev-2.10.0-17.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-11 00:55:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
comparinglog1533857 (23.22 KB, text/plain)
2018-01-17 09:13 UTC, Min Deng
no flags Details


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 162983 0 None None None 2019-07-04 10:54:11 UTC
Red Hat Product Errata RHSA-2018:1104 0 None None None 2018-04-11 00:57:52 UTC

Description Min Deng 2017-12-21 07:54:27 UTC
Description of problem:
hot-unplug memory from guest during early stage induced qemu-kvm core dump

Version-Release number of selected component (if applicable):
kernel-3.10.0-823.el7.ppc64le (host and guest)
qemu-kvm-rhev-2.10.0-13.el7.ppc64le

How reproducible:
1/3
Steps to Reproduce:
1.boot up a guest with the following cli,
  /usr/libexec/qemu-kvm -name guest=nrs,debug-threads=on -machine pseries,accel=kvm,usb=off,dump-guest-core=off -m size=8388608k,slots=256,maxmem=419430400k -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=4096 -numa node,nodeid=1,cpus=2-3,mem=4096 -uuid d7987973-2467-43ff-b8d2-acefc6ac59e5 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/tmp/qmp,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x3 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x2 -drive file=rhel75-ppc64le-virtio-scsi.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:8a:8b,bus=pci.0,addr=0x1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on -monitor stdio -chardev socket,id=serial_id_serial0,path=/tmp/S,server,nowait -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 -monitor unix:/tmp/monitor3,server,nowait

2.during early booting stage  - *it wasn't always reproducible,about 30%*

(qemu) object_add memory-backend-ram,id=mem1,size=10G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1
(qemu)  device_del dimm1
(qemu) ...
(qemu)  device_del dimm1



Actual results:

(qemu)  device_del dimm1
**
ERROR:/builddir/build/BUILD/qemu-2.10.0/hw/ppc/spapr_drc.c:417:spapr_drc_detach: assertion failed: (drc->dev)

Program received signal SIGABRT, Aborted.
0x00003fffb700fa70 in raise () from /lib64/libc.so.6
(gdb) 
(gdb) bt
#0  0x00003fffb700fa70 in raise () from /lib64/libc.so.6
#1  0x00003fffb7011dec in abort () from /lib64/libc.so.6
#2  0x00003fffb758b964 in g_assertion_message () from /lib64/libglib-2.0.so.0
#3  0x00003fffb758ba6c in g_assertion_message_expr () from /lib64/libglib-2.0.so.0
#4  0x0000000020b44a7c in spapr_drc_detach (drc=0x21dd0b40) at /usr/src/debug/qemu-2.10.0/hw/ppc/spapr_drc.c:417
#5  0x0000000020b2c1d8 in spapr_memory_unplug_request (errp=0x3fffffffcfc0, dev=<optimized out>, hotplug_dev=0x21fa0000) at /usr/src/debug/qemu-2.10.0/hw/ppc/spapr.c:3115
#6  spapr_machine_device_unplug_request (hotplug_dev=0x21fa0000, dev=<optimized out>, errp=0x3fffffffcfc0) at /usr/src/debug/qemu-2.10.0/hw/ppc/spapr.c:3388
#7  0x0000000020c91770 in hotplug_handler_unplug_request (plug_handler=0x21fa0000, plugged_dev=0x21e72250, errp=0x3fffffffcfc0) at hw/core/hotplug.c:45
#8  0x0000000020c414b4 in qdev_unplug (dev=0x21e72250, errp=0x3fffffffcfc0) at qdev-monitor.c:877
#9  0x0000000020c65fec in hmp_device_del (mon=<optimized out>, qdict=<optimized out>) at hmp.c:1960
#10 0x0000000020aaf758 in handle_hmp_command (mon=0x21e92f40, cmdline=0x223a000c "dimm1") at /usr/src/debug/qemu-2.10.0/monitor.c:3151
#11 0x0000000020ab18ec in monitor_command_cb (opaque=0x21e92f40, cmdline=<optimized out>, readline_opaque=<optimized out>) at /usr/src/debug/qemu-2.10.0/monitor.c:3954
#12 0x0000000020e8ad4c in readline_handle_byte (rs=0x223a0000, ch=<optimized out>) at util/readline.c:393
#13 0x0000000020aafa28 in monitor_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-2.10.0/monitor.c:3937
#14 0x0000000020ded29c in qemu_chr_be_write_impl (len=<optimized out>, buf=<optimized out>, s=<optimized out>) at chardev/char.c:167
#15 qemu_chr_be_write (s=<optimized out>, buf=<optimized out>, len=<optimized out>) at chardev/char.c:179
#16 0x0000000020def538 in fd_chr_read (chan=0x21f50600, cond=<optimized out>, opaque=0x21da6ea0) at chardev/char-fd.c:66
#17 0x0000000020e0c8b4 in qio_channel_fd_source_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at io/channel-watch.c:84
#18 0x00003fffb7554230 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#19 0x0000000020e70aa4 in glib_pollfds_poll () at util/main-loop.c:213
#20 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
#21 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:515
#22 0x0000000020a48868 in main_loop () at vl.c:1917
#23 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4805
(gdb) 
From console,
Red Hat Enterprise Linux Server 7.5 Beta (Maipo)
Kernel 3.10.0-823.el7.ppc64le on an ppc64le

dhcp47-214 login: [    7.294099] pseries-hotplug-mem: Attempting to hot-add 40 LMB(s) at index 80000020
[    7.294345] lpar: Attempting to resize HPT to shift 24
[    7.481301] lpar: Hash collision while resizing HPT
[    7.481423] Unable to resize hash page table to target order 24: -28
[    7.486935] lpar: Attempting to resize HPT to shift 24
[    7.642486] lpar: Hash collision while resizing HPT
[    7.642559] Unable to resize hash page table to target order 24: -28
[    7.648089] lpar: Attempting to resize HPT to shift 24
[    7.799294] lpar: Hash collision while resizing HPT
[    7.799364] Unable to resize hash page table to target order 24: -28
[    7.804580] lpar: Attempting to resize HPT to shift 24
[    7.948760] lpar: Hash collision while resizing HPT
[    7.952859] Unable to resize hash page table to target order 24: -28
[    7.958323] lpar: Attempting to resize HPT to shift 24
[    8.111660] lpar: Hash collision while resizing HPT
[    8.111734] Unable to resize hash page table to target order 24: -28
[    8.117389] lpar: Attempting to resize HPT to shift 24
[    8.268721] lpar: Hash collision while resizing HPT
[    8.273091] Unable to resize hash page table to target order 24: -28
[    8.278931] lpar: Attempting to resize HPT to shift 24
[    8.429547] lpar: Hash collision while resizing HPT
[    8.429628] Unable to resize hash page table to target order 24: -28
[    8.435540] lpar: Attempting to resize HPT to shift 24
[    8.581015] lpar: Hash collision while resizing HPT
[    8.581087] Unable to resize hash page table to target order 24: -28
[    8.586548] lpar: Attempting to resize HPT to shift 24
[    8.724024] lpar: Hash collision while resizing HPT
[    8.724091] Unable to resize hash page table to target order 24: -28
[    8.729687] lpar: Attempting to resize HPT to shift 24
[    8.874897] lpar: Hash collision while resizing HPT
[    8.874963] Unable to resize hash page table to target order 24: -28
[    8.880284] lpar: Attempting to resize HPT to shift 24
[    9.030424] lpar: Hash collision while resizing HPT
[    9.030495] Unable to resize hash page table to target order 24: -28
[    9.036217] lpar: Attempting to resize HPT to shift 24
[    9.176211] lpar: Hash collision while resizing HPT
[    9.176277] Unable to resize hash page table to target order 24: -28
[    9.181778] lpar: Attempting to resize HPT to shift 24
[    9.327187] lpar: Hash collision while resizing HPT
[    9.327266] Unable to resize hash page table to target order 24: -28
[    9.332950] lpar: Attempting to resize HPT to shift 24
[    9.478299] lpar: Hash collision while resizing HPT
[    9.479110] Unable to resize hash page table to target order 24: -28
[    9.484917] lpar: Attempting to resize HPT to shift 24
[    9.633463] lpar: Hash collision while resizing HPT
[    9.633536] Unable to resize hash page table to target order 24: -28
[    9.639132] lpar: Attempting to resize HPT to shift 24
[    9.794400] lpar: Hash collision while resizing HPT
[    9.794468] Unable to resize hash page table to target order 24: -28
[    9.800051] lpar: Attempting to resize HPT to shift 24
[    9.955559] lpar: Hash collision while resizing HPT
[    9.955678] Unable to resize hash page table to target order 24: -28
[    9.961733] lpar: Attempting to resize HPT to shift 24
[   10.127090] lpar: Hash collision while resizing HPT
[   10.127165] Unable to resize hash page table to target order 24: -28
[   10.131866] lpar: Attempting to resize HPT to shift 24
[   10.272069] lpar: Hash collision while resizing HPT
[   10.272138] Unable to resize hash page table to target order 24: -28
[   10.277027] pseries-hotplug-mem: Memory indexed-count-add failed, removing any added LMBs
[   10.283714] Offlined Pages 4096
[   10.286505] lpar: Attempting to resize HPT to shift 24
[   10.430391] lpar: Hash collision while resizing HPT
[   10.430466] Unable to resize hash page table to target order 24: -28
[   10.431455] Offlined Pages 4096
[   10.450579] lpar: Attempting to resize HPT to shift 24
[   10.598721] lpar: Hash collision while resizing HPT
[   10.598813] Unable to resize hash page table to target order 24: -28
[   10.603601] Offlined Pages 4096
[   10.606790] lpar: Attempting to resize HPT to shift 24
[   10.760104] lpar: Hash collision while resizing HPT
[   10.760177] Unable to resize hash page table to target order 24: -28
[   10.771796] Offlined Pages 4096
[   10.774503] lpar: Attempting to resize HPT to shift 24
[   10.921789] lpar: Hash collision while resizing HPT
[   10.921858] Unable to resize hash page table to target order 24: -28
[   10.928810] Offlined Pages 4096
[   10.940207] lpar: Attempting to resize HPT to shift 24
[   11.088548] lpar: Hash collision while resizing HPT
[   11.088617] Unable to resize hash page table to target order 24: -28
[   11.092844] Offlined Pages 4096
[   11.096240] lpar: Attempting to resize HPT to shift 24
[   11.246749] lpar: Hash collision while resizing HPT
[   11.246833] Unable to resize hash page table to target order 24: -28
[   11.261022] Offlined Pages 4096
[   11.281193] lpar: Attempting to resize HPT to shift 24
[   11.427787] lpar: Hash collision while resizing HPT
[   11.427873] Unable to resize hash page table to target order 24: -28
[   11.432105] Offlined Pages 4096
[   11.447543] lpar: Attempting to resize HPT to shift 24
[   11.607265] lpar: Hash collision while resizing HPT
[   11.607373] Unable to resize hash page table to target order 24: -28
[   11.612329] Offlined Pages 4096
[   11.623614] lpar: Attempting to resize HPT to shift 24


Expected results:
There is no coredump

Additional info:

Comment 2 David Gibson 2018-01-09 05:07:11 UTC
Serhii,

Can you take a look at this one please.  I think it is probably a race between multiple attempts to unplug from the qemu side, if delays on the guest side been the early attempts don't complete quickly.

It might even be an equivalent bug on the qemu side to one of the ones you tracked down on the host kernel side.

Comment 3 Serhii Popovych 2018-01-10 13:42:28 UTC
We definitely need commit 2a129767ebb1 (hw/ppc/spapr.c: abort unplug_request if 
previous unplug isn't done) from master.

Applied, tested using steps from comment 0 and issue no longer reproducible.

Unfortunately have some issues with brew job submission: investigating.

Also while analyzing case have additional question: is following race is 
possible even after picking commit from upstream:

hw/ppc/spapr.c:
---------------

    spapr_memory_unplug_request()
    {
        ....

        /* (1) False for 1st attemt. True for next. */
        if (spapr_pending_dimm_unplugs_find(spapr, dimm))
            return;

        /* (2) Guard against multiple unplug request via */
        spapr_pending_dimm_unplugs_add();

        for (i = 0; i < nr_lmbs; i++) {
            /* (3) Lookup for DRC */
            drc = spapr_drc_by_id();

            spapr_drc_detach(drc);
        }
    }

    spapr_lmb_release()
    {
        /* Wait for all LMBs */
        if (--ds->nr_lmbs) {
            return;

        ....

        /* (4) This opens (1) while @drc is still
         * available to spapr_drc_by_id() in (3) due
         * to (5).
         *
         * This causes reentrancy of spapr_drc_detach(drc)
         * in (3) while both @dev and @drc can be released.
         */
        spapr_pending_dimm_unplugs_remove();
    }

hw/ppc/spapr_drc.c:
-------------------

    spapr_drc_lmb_class_init()
    {
       ....
       drck->release = spapr_lmb_release();
    }

    spapr_drc_detach()
    {
       ....
       drck->release(drc->dev); /* spapr_lmb_release() */
       ....
       /* (5) This should be done before calling
        * spapr_lmb_release() to hide @drc from
        * spapr_drc_by_id() in (3).
        */
       object_property_del(OBJECT(drc), "device", &error_abort);
       drck->dev = NULL;
    }

Comment 4 Serhii Popovych 2018-01-10 13:50:02 UTC
Also while working on the issue I contiguously spot following traces from guest kernel (3.10.0-826.el7.ppc64):

[    0.040090] IOMMU table initialized, virtual merging enabled
[    0.040141] iommu: Adding device 0000:00:00.0 to group 0
[    0.040210] iommu: Adding device 0000:00:01.0 to group 0
[    0.040273] iommu: Adding device 0000:00:02.0 to group 0
[    0.040337] iommu: Adding device 0000:00:03.0 to group 0
[    0.040571] Unable to handle kernel paging request for data at address 0x00000100
[    0.040619] Faulting instruction address: 0xc0000000001159c8
[    0.040660] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.040692] SMP NR_CPUS=2048 NUMA pSeries
[    0.040725] Modules linked in:
[    0.040760] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-826.el7.ppc64le #1
[    0.040807] task: c0000001fd7c0000 ti: c0000001fffe8000 task.ti: c0000001fd800000
[    0.040855] NIP: c0000000001159c8 LR: c0000000001160e4 CTR: 0000000000000000
[    0.040903] REGS: c0000001fffeb950 TRAP: 0300   Not tainted  (3.10.0-826.el7.ppc64le)
[    0.040950] MSR: 8000000100009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28000042  XER: 00000000
[    0.041063] CFAR: 000000000000aad0 DAR: 0000000000000100 DSISR: 40000000 SOFTE: 0
GPR00: c0000000001160e4 c0000001fffebbd0 c000000001273f00 0000000000000800
GPR04: 0000000000000000 c0000000fca72e00 0000000000000000 0000000fffffffe1
GPR08: 0000000fffffffe0 0000000000000000 0000000fffffffe0 0000000100001001
GPR12: 0000000000000000 c000000007b80000 c00000000000cb08 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000001 0000000000000002 0000000000000015
GPR24: c0000001fd8585a0 0000000000000800 c0000001fd858400 0000000000000000
GPR28: 0000000000000800 c0000000015f0f98 0000000000000000 c0000000fca72e00
[    0.041702] NIP [c0000000001159c8] __queue_work+0x68/0x6d0
[    0.041736] LR [c0000000001160e4] queue_work_on+0xb4/0xc0
[    0.041768] Call Trace:
[    0.041786] [c0000001fffebca0] [c0000000001160e4] queue_work_on+0xb4/0xc0
[    0.041835] [c0000001fffebcd0] [c0000000000a5e90] queue_hotplug_event+0xd0/0x150
[    0.041891] [c0000001fffebd20] [c0000000000a3620] ras_hotplug_interrupt+0x130/0x150
[    0.041947] [c0000001fffebdb0] [c0000000001e26d0] __handle_irq_event_percpu+0x90/0x2e0
[    0.042003] [c0000001fffebe80] [c0000000001e2a18] handle_irq_event+0x68/0xf0
[    0.042052] [c0000001fffebec0] [c0000000001e8040] handle_fasteoi_irq+0xd0/0x230
[    0.042108] [c0000001fffebef0] [c0000000001e18d0] generic_handle_irq+0x50/0x80
[    0.042164] [c0000001fffebf20] [c000000000014a88] __do_irq+0x88/0x190
[    0.042213] [c0000001fffebf90] [c000000000028e00] call_do_irq+0x14/0x24
[    0.042261] [c0000001fd8035a0] [c000000000014c24] do_IRQ+0x94/0x110
[    0.042310] [c0000001fd8035f0] [c000000000002794] hardware_interrupt_common+0x114/0x180
[    0.042367] --- Exception: 501 at arch_local_irq_restore+0xf0/0x140
[    0.042367]     LR = arch_local_irq_restore+0xf0/0x140
[    0.042439] [c0000001fd8038e0] [c0000000001e78f0] irq_startup+0x60/0x100 (unreliable)
[    0.042496] [c0000001fd803900] [c000000000a0ccac] _raw_spin_unlock_irqrestore+0x3c/0x80
[    0.042553] [c0000001fd803920] [c0000000001e5558] __setup_irq+0x488/0x6b0
[    0.042601] [c0000001fd803ac0] [c0000000001e5950] request_threaded_irq+0xf0/0x1f0
[    0.042657] [c0000001fd803b20] [c0000000000a33e0] request_event_sources_irqs+0x110/0x220
[    0.042713] [c0000001fd803c30] [c000000000d3aff0] init_ras_IRQ+0x98/0xfc
[    0.042762] [c0000001fd803c60] [c00000000000c28c] do_one_initcall+0x12c/0x2c0
[    0.042811] [c0000001fd803cf0] [c000000000d240d4] kernel_init_freeable+0x244/0x324
[    0.042868] [c0000001fd803db0] [c00000000000cb2c] kernel_init+0x2c/0x220
[    0.042917] [c0000001fd803e30] [c00000000000a4b8] ret_from_kernel_thread+0x5c/0xa4
[    0.042972] Instruction dump:
[    0.042997] faa1ffa8 fac1ffb0 fae1ffb8 f8010010 fb01ffc0 fb41ffd0 fba1ffe8 fbc1fff0
[    0.043078] f821ff31 892d02a2 2fa90000 40de04c0 <813b0100> 792887e3 40c205fc 7be79364
[    0.043161] ---[ end trace ed727bb1f359e245 ]---
[    0.044422]
[    2.044473] Kernel panic - not syncing: Fatal exception in interrupt

If I hot add memory before kernel loads (e.g. during the grub2 time) then machine restarts (not qemu process) instead of kernel crash.

May be this happens because on development P8 machine I have ppc64 (BE) host environment.

Comment 5 David Gibson 2018-01-11 02:38:11 UTC
Serhii,

1) Please backport 2a129767ebb1 for this BZ ASAP.

2) I'm still analyzing the race you suggest in comment 3 - let's handle that as a different BZ

3) The kernel crash in comment 4 looks like a different bug, let's handle that separately

4) "If I hot add memory before kernel loads (e.g. during the grub2 time) then machine restarts (not qemu process) instead of kernel crash."

This is expected - hotplugs before CAS (guest/qemu negotiation phase) cause a reboot (and the guest will pick up the new device on the reboot).  It's not ideal but avoiding it would introduce a whole heap of other complexities.

5) "May be this happens because on development P8 machine I have ppc64 (BE) host environment."

That sounds unlikely to me, but let's continue to invesigate after we've fixed the known qemu bug.

Comment 6 David Gibson 2018-01-11 02:38:57 UTC
Min, is this a regression from RHEL7.4?

Comment 7 Min Deng 2018-01-11 06:47:31 UTC
(In reply to David Gibson from comment #6)
> Min, is this a regression from RHEL7.4?
David,
   In my opinions,it wasn't a regression issue since some issues could also be found in the following builds.
   kernel-3.10.0-693.el7.ppc64le
   qemu-kvm-rhev-2.9.0-16.el7_4.1.ppc64le
QE got two types of error for those builds after several tries.
   1.  qemu-kvm quit directly
   2. (qemu)object_add memory-backend-ram,id=mem1,size=10G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1
(qemu) device_del dimm1
Memory hot unplug not supported for this guest
(qemu) device_del dimm1
(qemu) device_del dimm1

Message from syslogd@localhost at Jan 11 01:32:36 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#56 stuck for 22s! [CPU 0/KVM:53310]

Message from syslogd@localhost at Jan 11 01:32:36 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#72 stuck for 22s! [CPU 1/KVM:53311]

Message from syslogd@localhost at Jan 11 01:32:36 ...
 kernel:NMI watchdog: BUG: soft lockup - CPU#160 stuck for 22s! [CPU 3/KVM:53313]

Comment 11 Miroslav Rezanina 2018-01-16 13:45:38 UTC
Fix included in qemu-kvm-rhev-2.10.0-17.el7

Comment 13 Min Deng 2018-01-17 09:12:31 UTC
Reproduced on build
kernel-3.10.0-823.el7.ppc64le (host and guest)
qemu-kvm-rhev-2.10.0-13.el7.ppc64le

QE re-tested the bug on the following build
kernel-3.10.0-827.el7.ppc64le - host
kernel-3.10.0-828.el7.ppc64le - guest
qemu-kvm-rhev-2.10.0-17.el7.ppc64le

Steps please refer to comment0

Actual results,
The original issue could not be reproduced *but* reproduced bug 1533857 - > see comment4 and bug 1528178 reported by QE before.

David,
  What do you think,is it proper for us to mark it as verified ? At the same time,QE uploaded the entire log for comparing with 1533857.Any issues please let me know.

Thanks a lot
Min

Comment 14 Min Deng 2018-01-17 09:13:59 UTC
Created attachment 1382313 [details]
comparinglog1533857

if necessary,please comparing it with 1533857,thanks a lot

Comment 15 Min Deng 2018-01-31 05:47:52 UTC
Reproduced on build
kernel-3.10.0-823.el7.ppc64le (host and guest)
qemu-kvm-rhev-2.10.0-13.el7.ppc64le

Verified the bug on the following build
kernel-3.10.0-842.el7.ppc64le (both host and guest)
qemu-kvm-rhev-2.10.0-18.el7.ppc64le

Steps please refer to comment0

The bug has been fixed.Thanks a lot.

Comment 17 errata-xmlrpc 2018-04-11 00:55:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104


Note You need to log in before you can comment on or make changes to this bug.