Bug 1437337

Summary: Hotplug cpu cores with invalid nr_threads causes qemu-kvm coredump
Product: Red Hat Enterprise Linux 7 Reporter: Min Deng <mdeng>
Component: qemu-kvm-rhevAssignee: David Gibson <dgibson>
Status: CLOSED ERRATA QA Contact: Min Deng <mdeng>
Severity: high Docs Contact:
Priority: high    
Version: 7.4CC: dgibson, knoel, michen, mrezanin, qzhang, virt-maint, zhengtli
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64le   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 04:35:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Min Deng 2017-03-30 06:45:24 UTC
Description of problem:
Hotplug cpu cores with invalid nr_threads causes qemu-kvm coredump

Version-Release number of selected component (if applicable):
kernel-3.10.0-628.el7.ppc64le
qemu-kvm-rhev-2.9.0-0.el7.patchwork201703291116.ppc64le
SLOF-20170303-1.git66d250e.el7.noarch

How reproducible:
2/2

Steps to Reproduce:
1.boot up guest with 
  /usr/libexec/qemu-kvm -name virt-tests-vm1 -sandbox off -machine pseries-rhel7.4.0 -nodefaults -vga std -chardev socket,id=hmp_id_humanmonitor1,path=/tmp/monitor-humanmonitor1-20151207-185515-CKlGrjUv,server,nowait -mon chardev=hmp_id_humanmonitor1,mode=readline -chardev socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20151207-185515-CKlGrjUv,server,nowait -mon chardev=qmp_id_qmp1,mode=control -chardev socket,id=hmp_id_catch_monitor,path=/tmp/monitor-catch_monitor-20151207-185515-CKlGrjUv,server,nowait -mon chardev=hmp_id_catch_monitor,mode=readline -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20151207-185515-CKlGrjUv,server,nowait -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03,disable-legacy=off,disable-modern=off -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file=rhel74-ppc64le-virtio-scsi-latest.qcow2 -device scsi-hd,id=image1,drive=drive_image1 -numa node -qmp tcp:0:4444,server,nowait -vnc :1 -rtc base=utc,clock=host,driftfix=slew -boot order=cdn,once=c,menu=off,strict=off -enable-kvm -monitor stdio -device nec-usb-xhci,id=usb1 -device usb-kbd,id=input0 -device usb-mouse,id=input1 -device usb-tablet,id=input2 -netdev tap,script=/etc/qemu-ifup,downscript=/etc/qemu-down,id=hostnet1,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:52:11:36:3f:00 -m 4G,slots=4,maxmem=8G -smp 2,maxcpus=4,cores=2,threads=2,sockets=1
2.hotplug cores with invalid nr_threads
  
3.telnet xx.xx.xx.xx port
  {"execute":"qmp_capabilities"}
  {"execute": "device_add", "arguments": {"driver": "host-spapr-cpu-core", "core-id": 2, "nr-threads": 3, "id": "core1"}}


Actual results:
After hotplugging
(qemu) [New Thread 0x3ffeabd2eaa0 (LWP 44083)]
[Thread 0x3ffeabd2eaa0 (LWP 44083) exited]
[New Thread 0x3ffeabd2eaa0 (LWP 44116)]
[Thread 0x3ffeabd2eaa0 (LWP 44116) exited]
qemu-kvm: /builddir/build/BUILD/qemu-2.9.0/numa.c:580: numa_get_node_for_cpu: Assertion `idx < max_cpus' failed.

Program received signal SIGABRT, Aborted.
0x00003fffb6f2edc8 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install alsa-lib-1.1.3-3.el7.ppc64le bzip2-libs-1.0.6-13.el7.ppc64le cyrus-sasl-lib-2.1.26-21.el7.ppc64le cyrus-sasl-plain-2.1.26-21.el7.ppc64le dbus-libs-1.6.12-17.el7.ppc64le elfutils-libelf-0.168-5.el7.ppc64le elfutils-libs-0.168-5.el7.ppc64le flac-libs-1.3.0-5.el7_1.ppc64le glib2-2.50.3-2.el7.ppc64le glibc-2.17-189.el7.ppc64le gmp-6.0.0-15.el7.ppc64le gnutls-3.3.26-6.el7.ppc64le gperftools-libs-2.4-8.el7.ppc64le gsm-1.0.13-11.el7.ppc64le keyutils-libs-1.5.8-3.el7.ppc64le krb5-libs-1.15.1-5.el7.ppc64le libICE-1.0.9-5.el7.ppc64le libSM-1.2.2-2.el7.ppc64le libX11-1.6.4-4.el7.ppc64le libXau-1.0.8-2.1.el7.ppc64le libXext-1.3.3-3.el7.ppc64le libXi-1.7.9-1.el7.ppc64le libXtst-1.2.3-1.el7.ppc64le libaio-0.3.109-13.el7.ppc64le libasyncns-0.8-7.el7.ppc64le libattr-2.4.46-12.el7.ppc64le libcap-2.22-9.el7.ppc64le libcom_err-1.42.9-9.el7.ppc64le libcurl-7.29.0-39.el7.ppc64le libdb-5.3.21-20.el7.ppc64le libfdt-1.4.3-1.el7.ppc64le libffi-3.0.13-18.el7.ppc64le libgcc-4.8.5-14.el7.ppc64le libgcrypt-1.5.3-14.el7.ppc64le libgpg-error-1.12-3.el7.ppc64le libibverbs-13-1.el7.ppc64le libidn-1.28-4.el7.ppc64le libiscsi-1.9.0-7.el7.ppc64le libnl3-3.2.28-3.el7_3.ppc64le libogg-1.3.0-7.el7.ppc64le libpng-1.5.13-7.el7_2.ppc64le librdmacm-13-1.el7.ppc64le libseccomp-2.3.1-3.el7.ppc64le libselinux-2.5-11.el7.ppc64le libsndfile-1.0.25-10.el7.ppc64le libssh2-1.4.3-10.el7_2.1.ppc64le libstdc++-4.8.5-14.el7.ppc64le libtasn1-4.10-1.el7.ppc64le libusbx-1.0.20-1.el7.ppc64le libuuid-2.23.2-36.el7.ppc64le libvorbis-1.3.3-8.el7.ppc64le libxcb-1.12-1.el7.ppc64le lzo-2.06-8.el7.ppc64le nettle-2.7.1-8.el7.ppc64le nspr-4.13.1-1.0.el7.ppc64le nss-3.28.3-4.el7.ppc64le nss-softokn-freebl-3.28.3-2.el7.ppc64le nss-util-3.28.3-3.el7.ppc64le numactl-libs-2.0.9-6.el7_2.ppc64le openldap-2.4.44-3.el7.ppc64le openssl-libs-1.0.2k-4.el7.ppc64le p11-kit-0.23.5-1.el7.ppc64le pcre-8.32-17.el7.ppc64le pixman-0.34.0-1.el7.ppc64le pulseaudio-libs-10.0-3.el7.ppc64le snappy-1.1.0-3.el7.ppc64le systemd-libs-219-32.el7.ppc64le tcp_wrappers-libs-7.6-77.el7.ppc64le xz-libs-5.2.2-1.el7.ppc64le zlib-1.2.7-17.el7.ppc64le
(gdb) bt
#0  0x00003fffb6f2edc8 in raise () from /lib64/libc.so.6
#1  0x00003fffb6f30f4c in abort () from /lib64/libc.so.6
#2  0x00003fffb6f24b44 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00003fffb6f24c34 in __assert_fail () from /lib64/libc.so.6
#4  0x000000005b51dc38 in numa_get_node_for_cpu (idx=<optimized out>) at /usr/src/debug/qemu-2.9.0/numa.c:580
#5  0x000000005b5a8e68 in spapr_cpu_core_realize (dev=<optimized out>, errp=0x3fffffffc9e0) at /usr/src/debug/qemu-2.9.0/hw/ppc/spapr_cpu_core.c:183
#6  0x000000005b6ede90 in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x3fffffffcc00) at hw/core/qdev.c:939
#7  0x000000005b7b6f00 in property_set_bool (obj=0x5c82c580, v=<optimized out>, name=<optimized out>, opaque=0x5ddd0b50, errp=0x3fffffffcc00) at qom/object.c:1860
#8  0x000000005b7b9888 in object_property_set (obj=0x5c82c580, v=0x5c9d09c0, name=0x5b8f3710 "realized", errp=0x3fffffffcc00) at qom/object.c:1094
#9  0x000000005b7bcb0c in object_property_set_qobject (obj=0x5c82c580, value=<optimized out>, name=<optimized out>, errp=<optimized out>) at qom/qom-qobject.c:27
#10 0x000000005b7b9b4c in object_property_set_bool (obj=0x5c82c580, value=<optimized out>, name=<optimized out>, errp=<optimized out>) at qom/object.c:1163
#11 0x000000005b69eacc in qdev_device_add (opts=0x5c7e21c0, errp=0x3fffffffcd40) at qdev-monitor.c:623
#12 0x000000005b69f550 in qmp_device_add (qdict=<optimized out>, ret_data=<optimized out>, errp=0x3fffffffcdd8) at qdev-monitor.c:800
#13 0x000000005b8ab9d4 in do_qmp_dispatch (errp=0x3fffffffcdd0, request=<optimized out>, cmds=0x5bb1d910 <qmp_commands>) at qapi/qmp-dispatch.c:104
#14 qmp_dispatch (cmds=0x5bb1d910 <qmp_commands>, request=<optimized out>) at qapi/qmp-dispatch.c:131
#15 0x000000005b50f934 in handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /usr/src/debug/qemu-2.9.0/monitor.c:3729
#16 0x000000005b8b3be0 in json_message_process_token (lexer=0x5c843888, input=0x5c8c0320, type=<optimized out>, x=<optimized out>, y=<optimized out>)
    at qobject/json-streamer.c:105
#17 0x000000005b8dc8f8 in json_lexer_feed_char (lexer=0x5c843888, ch=<optimized out>, flush=false) at qobject/json-lexer.c:319
#18 0x000000005b8dca34 in json_lexer_feed (lexer=0x5c843888, buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:369
#19 0x000000005b8b3d3c in json_message_parser_feed (parser=<error reading variable: value has been optimized out>, buffer=<optimized out>, size=<optimized out>)
    at qobject/json-streamer.c:124
#20 0x000000005b50d9f4 in monitor_qmp_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-2.9.0/monitor.c:3772
#21 0x000000005b83de1c in qemu_chr_be_write_impl (len=<optimized out>, buf=<optimized out>, s=<optimized out>) at chardev/char.c:284
#22 qemu_chr_be_write (s=<optimized out>, buf=<optimized out>, len=<optimized out>) at chardev/char.c:296
#23 0x000000005b847868 in tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=<optimized out>) at chardev/char-socket.c:411
#24 0x000000005b85be44 in qio_channel_fd_source_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at io/channel-watch.c:84
#25 0x00003fffb7473ab0 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#26 0x000000005b8bc224 in glib_pollfds_poll () at util/main-loop.c:213
#27 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:258
#28 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:506
#29 0x000000005b4abee8 in main_loop () at vl.c:1898
#30 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4720
(gdb) 

Expected results:
As the nr_threads is not equal to "2" set in the cli it should fail.However,there should not any negative effect.For example,coredump.

Additional info:

The guest was also attached with "-numa node"

Comment 2 Min Deng 2017-03-30 06:50:02 UTC
It is a ppc64le specific issue since it is not reproduced on x86.

Comment 3 Min Deng 2017-03-30 07:06:35 UTC
Additional information about this bug.
Tried it with nr_threads = 1,the hotplug successfully.Does it worked as expected ?

Steps,
1.{"execute":"qmp_capabilities"}
{"return": {}}

2.{"execute": "query-hotpluggable-cpus"}
{"return": [{"props": {"core-id": 2}, "vcpus-count": 2, "type": "host-spapr-cpu-core"}, {"props": {"core-id": 0}, "vcpus-count": 2, "qom-path": "/machine/unattached/device[0]", "type": "host-spapr-cpu-core"}]}

3.{"execute": "device_add", "arguments": {"driver": "host-spapr-cpu-core", "core-id": 2, "nr-threads": 1, "id": "core1"}}
{"return": {}}

4.{"execute": "query-hotpluggable-cpus"}
{"return": [{"props": {"core-id": 2}, "vcpus-count": 2, "qom-path": "/machine/peripheral/core1", "type": "host-spapr-cpu-core"}, {"props": {"core-id": 0}, "vcpus-count": 2, "qom-path": "/machine/unattached/device[0]", "type": "host-spapr-cpu-core"}]}

5.{"execute": "query-cpus"}
{"return": [{"arch": "ppc", "current": true, "CPU": 0, "nip": -4611686018426750380, "qom_path": "/machine/unattached/device[0]/thread[0]", "halted": false, "thread_id": 47258}, {"arch": "ppc", "current": false, "CPU": 1, "nip": -4611686018426750380, "qom_path": "/machine/unattached/device[0]/thread[1]", "halted": false, "thread_id": 47259}, {"arch": "ppc", "current": false, "CPU": 2, "nip": -4611686018426750380, "qom_path": "/machine/peripheral/core1/thread[0]", "halted": false, "thread_id": 47391}]}

Comment 4 David Gibson 2017-03-31 04:52:09 UTC
Problem also exists upstream.

Upstream patch sent for review.

Comment 5 David Gibson 2017-04-03 04:31:35 UTC
Karen,

I'm about to send a patch upstream, and it's pretty straightforward.  Can you give this a devel_ack please?

Comment 6 David Gibson 2017-04-04 01:26:06 UTC
Fix is merged upstream for 2.9, so we should get it in the rebase.

Comment 7 Min Deng 2017-04-26 09:19:47 UTC
The bug can be reproduced on the previous build 
QE verified the bug on the following builds
kernel-3.10.0-657.el7.ppc64le
qemu-kvm-rhev-2.9.0-1.el7.ppc64le
SLOF-20170303-1.git66d250e.el7.noarch

Steps,
1.boot up guest with the similar cli - 
  ..."-m 4G,slots=4,maxmem=8G -smp 2,maxcpus=4,cores=2,threads=2,sockets=1"
2.did the following steps - "nr-threads is 2" - (based on comment0 and comment3)
2.1{"execute": "device_add", "arguments": {"driver": "host-spapr-cpu-core", "core-id": 2, "nr-threads": 3, "id": "core1"}}

{"error": {"class": "GenericError", "desc": "invalid nr-threads 3, must be 2"}}

2.2{"execute": "device_add", "arguments": {"driver": "host-spapr-cpu-core", "core-id": 2, "nr-threads": 1, "id": "core1"}}

{"error": {"class": "GenericError", "desc": "invalid nr-threads 1, must be 2"}}

Expected results,
Invalid nr-threads should not be added.

Actual results,
Invalid nr-threads could not be added any more. 


Base on above test results,the bug has been fixed already,thanks for everyone's help.So move it to status verified.

Comment 9 errata-xmlrpc 2017-08-02 04:35:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392