RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1437337 - Hotplug cpu cores with invalid nr_threads causes qemu-kvm coredump
Summary: Hotplug cpu cores with invalid nr_threads causes qemu-kvm coredump
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.4
Hardware: ppc64le
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: David Gibson
QA Contact: Min Deng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-30 06:45 UTC by Min Deng
Modified: 2017-08-02 04:35 UTC (History)
7 users (show)

Fixed In Version: qemu-kvm-rhev-2.9.0-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-02 04:35:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2392 0 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2017-08-01 20:04:36 UTC

Description Min Deng 2017-03-30 06:45:24 UTC
Description of problem:
Hotplug cpu cores with invalid nr_threads causes qemu-kvm coredump

Version-Release number of selected component (if applicable):
kernel-3.10.0-628.el7.ppc64le
qemu-kvm-rhev-2.9.0-0.el7.patchwork201703291116.ppc64le
SLOF-20170303-1.git66d250e.el7.noarch

How reproducible:
2/2

Steps to Reproduce:
1.boot up guest with 
  /usr/libexec/qemu-kvm -name virt-tests-vm1 -sandbox off -machine pseries-rhel7.4.0 -nodefaults -vga std -chardev socket,id=hmp_id_humanmonitor1,path=/tmp/monitor-humanmonitor1-20151207-185515-CKlGrjUv,server,nowait -mon chardev=hmp_id_humanmonitor1,mode=readline -chardev socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20151207-185515-CKlGrjUv,server,nowait -mon chardev=qmp_id_qmp1,mode=control -chardev socket,id=hmp_id_catch_monitor,path=/tmp/monitor-catch_monitor-20151207-185515-CKlGrjUv,server,nowait -mon chardev=hmp_id_catch_monitor,mode=readline -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20151207-185515-CKlGrjUv,server,nowait -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03,disable-legacy=off,disable-modern=off -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file=rhel74-ppc64le-virtio-scsi-latest.qcow2 -device scsi-hd,id=image1,drive=drive_image1 -numa node -qmp tcp:0:4444,server,nowait -vnc :1 -rtc base=utc,clock=host,driftfix=slew -boot order=cdn,once=c,menu=off,strict=off -enable-kvm -monitor stdio -device nec-usb-xhci,id=usb1 -device usb-kbd,id=input0 -device usb-mouse,id=input1 -device usb-tablet,id=input2 -netdev tap,script=/etc/qemu-ifup,downscript=/etc/qemu-down,id=hostnet1,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:52:11:36:3f:00 -m 4G,slots=4,maxmem=8G -smp 2,maxcpus=4,cores=2,threads=2,sockets=1
2.hotplug cores with invalid nr_threads
  
3.telnet xx.xx.xx.xx port
  {"execute":"qmp_capabilities"}
  {"execute": "device_add", "arguments": {"driver": "host-spapr-cpu-core", "core-id": 2, "nr-threads": 3, "id": "core1"}}


Actual results:
After hotplugging
(qemu) [New Thread 0x3ffeabd2eaa0 (LWP 44083)]
[Thread 0x3ffeabd2eaa0 (LWP 44083) exited]
[New Thread 0x3ffeabd2eaa0 (LWP 44116)]
[Thread 0x3ffeabd2eaa0 (LWP 44116) exited]
qemu-kvm: /builddir/build/BUILD/qemu-2.9.0/numa.c:580: numa_get_node_for_cpu: Assertion `idx < max_cpus' failed.

Program received signal SIGABRT, Aborted.
0x00003fffb6f2edc8 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install alsa-lib-1.1.3-3.el7.ppc64le bzip2-libs-1.0.6-13.el7.ppc64le cyrus-sasl-lib-2.1.26-21.el7.ppc64le cyrus-sasl-plain-2.1.26-21.el7.ppc64le dbus-libs-1.6.12-17.el7.ppc64le elfutils-libelf-0.168-5.el7.ppc64le elfutils-libs-0.168-5.el7.ppc64le flac-libs-1.3.0-5.el7_1.ppc64le glib2-2.50.3-2.el7.ppc64le glibc-2.17-189.el7.ppc64le gmp-6.0.0-15.el7.ppc64le gnutls-3.3.26-6.el7.ppc64le gperftools-libs-2.4-8.el7.ppc64le gsm-1.0.13-11.el7.ppc64le keyutils-libs-1.5.8-3.el7.ppc64le krb5-libs-1.15.1-5.el7.ppc64le libICE-1.0.9-5.el7.ppc64le libSM-1.2.2-2.el7.ppc64le libX11-1.6.4-4.el7.ppc64le libXau-1.0.8-2.1.el7.ppc64le libXext-1.3.3-3.el7.ppc64le libXi-1.7.9-1.el7.ppc64le libXtst-1.2.3-1.el7.ppc64le libaio-0.3.109-13.el7.ppc64le libasyncns-0.8-7.el7.ppc64le libattr-2.4.46-12.el7.ppc64le libcap-2.22-9.el7.ppc64le libcom_err-1.42.9-9.el7.ppc64le libcurl-7.29.0-39.el7.ppc64le libdb-5.3.21-20.el7.ppc64le libfdt-1.4.3-1.el7.ppc64le libffi-3.0.13-18.el7.ppc64le libgcc-4.8.5-14.el7.ppc64le libgcrypt-1.5.3-14.el7.ppc64le libgpg-error-1.12-3.el7.ppc64le libibverbs-13-1.el7.ppc64le libidn-1.28-4.el7.ppc64le libiscsi-1.9.0-7.el7.ppc64le libnl3-3.2.28-3.el7_3.ppc64le libogg-1.3.0-7.el7.ppc64le libpng-1.5.13-7.el7_2.ppc64le librdmacm-13-1.el7.ppc64le libseccomp-2.3.1-3.el7.ppc64le libselinux-2.5-11.el7.ppc64le libsndfile-1.0.25-10.el7.ppc64le libssh2-1.4.3-10.el7_2.1.ppc64le libstdc++-4.8.5-14.el7.ppc64le libtasn1-4.10-1.el7.ppc64le libusbx-1.0.20-1.el7.ppc64le libuuid-2.23.2-36.el7.ppc64le libvorbis-1.3.3-8.el7.ppc64le libxcb-1.12-1.el7.ppc64le lzo-2.06-8.el7.ppc64le nettle-2.7.1-8.el7.ppc64le nspr-4.13.1-1.0.el7.ppc64le nss-3.28.3-4.el7.ppc64le nss-softokn-freebl-3.28.3-2.el7.ppc64le nss-util-3.28.3-3.el7.ppc64le numactl-libs-2.0.9-6.el7_2.ppc64le openldap-2.4.44-3.el7.ppc64le openssl-libs-1.0.2k-4.el7.ppc64le p11-kit-0.23.5-1.el7.ppc64le pcre-8.32-17.el7.ppc64le pixman-0.34.0-1.el7.ppc64le pulseaudio-libs-10.0-3.el7.ppc64le snappy-1.1.0-3.el7.ppc64le systemd-libs-219-32.el7.ppc64le tcp_wrappers-libs-7.6-77.el7.ppc64le xz-libs-5.2.2-1.el7.ppc64le zlib-1.2.7-17.el7.ppc64le
(gdb) bt
#0  0x00003fffb6f2edc8 in raise () from /lib64/libc.so.6
#1  0x00003fffb6f30f4c in abort () from /lib64/libc.so.6
#2  0x00003fffb6f24b44 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00003fffb6f24c34 in __assert_fail () from /lib64/libc.so.6
#4  0x000000005b51dc38 in numa_get_node_for_cpu (idx=<optimized out>) at /usr/src/debug/qemu-2.9.0/numa.c:580
#5  0x000000005b5a8e68 in spapr_cpu_core_realize (dev=<optimized out>, errp=0x3fffffffc9e0) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_cpu_core.c:183
#6  0x000000005b6ede90 in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x3fffffffcc00) at hw/core/qdev.c:939
#7  0x000000005b7b6f00 in property_set_bool (obj=0x5c82c580, v=<optimized out>, name=<optimized out>, opaque=0x5ddd0b50, errp=0x3fffffffcc00) at qom/object.c:1860
#8  0x000000005b7b9888 in object_property_set (obj=0x5c82c580, v=0x5c9d09c0, name=0x5b8f3710 "realized", errp=0x3fffffffcc00) at qom/object.c:1094
#9  0x000000005b7bcb0c in object_property_set_qobject (obj=0x5c82c580, value=<optimized out>, name=<optimized out>, errp=<optimized out>) at qom/qom-qobject.c:27
#10 0x000000005b7b9b4c in object_property_set_bool (obj=0x5c82c580, value=<optimized out>, name=<optimized out>, errp=<optimized out>) at qom/object.c:1163
#11 0x000000005b69eacc in qdev_device_add (opts=0x5c7e21c0, errp=0x3fffffffcd40) at qdev-monitor.c:623
#12 0x000000005b69f550 in qmp_device_add (qdict=<optimized out>, ret_data=<optimized out>, errp=0x3fffffffcdd8) at qdev-monitor.c:800
#13 0x000000005b8ab9d4 in do_qmp_dispatch (errp=0x3fffffffcdd0, request=<optimized out>, cmds=0x5bb1d910 <qmp_commands>) at qapi/qmp-dispatch.c:104
#14 qmp_dispatch (cmds=0x5bb1d910 <qmp_commands>, request=<optimized out>) at qapi/qmp-dispatch.c:131
#15 0x000000005b50f934 in handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /usr/src/debug/qemu-2.9.0/monitor.c:3729
#16 0x000000005b8b3be0 in json_message_process_token (lexer=0x5c843888, input=0x5c8c0320, type=<optimized out>, x=<optimized out>, y=<optimized out>)
    at qobject/json-streamer.c:105
#17 0x000000005b8dc8f8 in json_lexer_feed_char (lexer=0x5c843888, ch=<optimized out>, flush=false) at qobject/json-lexer.c:319
#18 0x000000005b8dca34 in json_lexer_feed (lexer=0x5c843888, buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:369
#19 0x000000005b8b3d3c in json_message_parser_feed (parser=<error reading variable: value has been optimized out>, buffer=<optimized out>, size=<optimized out>)
    at qobject/json-streamer.c:124
#20 0x000000005b50d9f4 in monitor_qmp_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-2.9.0/monitor.c:3772
#21 0x000000005b83de1c in qemu_chr_be_write_impl (len=<optimized out>, buf=<optimized out>, s=<optimized out>) at chardev/char.c:284
#22 qemu_chr_be_write (s=<optimized out>, buf=<optimized out>, len=<optimized out>) at chardev/char.c:296
#23 0x000000005b847868 in tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=<optimized out>) at chardev/char-socket.c:411
#24 0x000000005b85be44 in qio_channel_fd_source_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at io/channel-watch.c:84
#25 0x00003fffb7473ab0 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#26 0x000000005b8bc224 in glib_pollfds_poll () at util/main-loop.c:213
#27 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:258
#28 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:506
#29 0x000000005b4abee8 in main_loop () at vl.c:1898
#30 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4720
(gdb) 

Expected results:
As the nr_threads is not equal to "2" set in the cli it should fail.However,there should not any negative effect.For example,coredump.

Additional info:

The guest was also attached with "-numa node"

Comment 2 Min Deng 2017-03-30 06:50:02 UTC
It is a ppc64le specific issue since it is not reproduced on x86.

Comment 3 Min Deng 2017-03-30 07:06:35 UTC
Additional information about this bug.
Tried it with nr_threads = 1,the hotplug successfully.Does it worked as expected ?

Steps,
1.{"execute":"qmp_capabilities"}
{"return": {}}

2.{"execute": "query-hotpluggable-cpus"}
{"return": [{"props": {"core-id": 2}, "vcpus-count": 2, "type": "host-spapr-cpu-core"}, {"props": {"core-id": 0}, "vcpus-count": 2, "qom-path": "/machine/unattached/device[0]", "type": "host-spapr-cpu-core"}]}

3.{"execute": "device_add", "arguments": {"driver": "host-spapr-cpu-core", "core-id": 2, "nr-threads": 1, "id": "core1"}}
{"return": {}}

4.{"execute": "query-hotpluggable-cpus"}
{"return": [{"props": {"core-id": 2}, "vcpus-count": 2, "qom-path": "/machine/peripheral/core1", "type": "host-spapr-cpu-core"}, {"props": {"core-id": 0}, "vcpus-count": 2, "qom-path": "/machine/unattached/device[0]", "type": "host-spapr-cpu-core"}]}

5.{"execute": "query-cpus"}
{"return": [{"arch": "ppc", "current": true, "CPU": 0, "nip": -4611686018426750380, "qom_path": "/machine/unattached/device[0]/thread[0]", "halted": false, "thread_id": 47258}, {"arch": "ppc", "current": false, "CPU": 1, "nip": -4611686018426750380, "qom_path": "/machine/unattached/device[0]/thread[1]", "halted": false, "thread_id": 47259}, {"arch": "ppc", "current": false, "CPU": 2, "nip": -4611686018426750380, "qom_path": "/machine/peripheral/core1/thread[0]", "halted": false, "thread_id": 47391}]}

Comment 4 David Gibson 2017-03-31 04:52:09 UTC
Problem also exists upstream.

Upstream patch sent for review.

Comment 5 David Gibson 2017-04-03 04:31:35 UTC
Karen,

I'm about to send a patch upstream, and it's pretty straightforward.  Can you give this a devel_ack please?

Comment 6 David Gibson 2017-04-04 01:26:06 UTC
Fix is merged upstream for 2.9, so we should get it in the rebase.

Comment 7 Min Deng 2017-04-26 09:19:47 UTC
The bug can be reproduced on the previous build 
QE verified the bug on the following builds
kernel-3.10.0-657.el7.ppc64le
qemu-kvm-rhev-2.9.0-1.el7.ppc64le
SLOF-20170303-1.git66d250e.el7.noarch

Steps,
1.boot up guest with the similar cli - 
  ..."-m 4G,slots=4,maxmem=8G -smp 2,maxcpus=4,cores=2,threads=2,sockets=1"
2.did the following steps - "nr-threads is 2" - (based on comment0 and comment3)
2.1{"execute": "device_add", "arguments": {"driver": "host-spapr-cpu-core", "core-id": 2, "nr-threads": 3, "id": "core1"}}

{"error": {"class": "GenericError", "desc": "invalid nr-threads 3, must be 2"}}

2.2{"execute": "device_add", "arguments": {"driver": "host-spapr-cpu-core", "core-id": 2, "nr-threads": 1, "id": "core1"}}

{"error": {"class": "GenericError", "desc": "invalid nr-threads 1, must be 2"}}

Expected results,
Invalid nr-threads should not be added.

Actual results,
Invalid nr-threads could not be added any more. 


Base on above test results,the bug has been fixed already,thanks for everyone's help.So move it to status verified.

Comment 9 errata-xmlrpc 2017-08-02 04:35:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392


Note You need to log in before you can comment on or make changes to this bug.