Bug 1614610
Summary: | Guest quit with error when hotunplug cpu | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Xujun Ma <xuma> | ||||
Component: | qemu-kvm-rhev | Assignee: | Laurent Vivier <lvivier> | ||||
Status: | CLOSED ERRATA | QA Contact: | Xujun Ma <xuma> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.6 | CC: | dgibson, ehabkost, jinzhao, lvivier, mdeng, micai, mrezanin, pbonzini, qzhang, virt-maint, xianwang, xuma, yhong, yihyu | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | ppc64le | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-rhev-2.12.0-22.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1665844 (view as bug list) | Environment: | |||||
Last Closed: | 2019-08-22 09:18:48 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1649160, 1665844, 1668205 | ||||||
Attachments: |
|
Description
Xujun Ma
2018-08-10 03:39:34 UTC
Set needinfo to Xujun for x86 test results. (In reply to Qunfang Zhang from comment #2) > Set needinfo to Xujun for x86 test results. x86 has the same problem with guest kernel 3.10.0-931.el7.x86_64. (In reply to Xujun Ma from comment #0) > Description of problem: > Guest quit with error when hotunplug cpu. > > Version-Release number of selected component (if applicable): > qemu-kvm-rhev-2.12.0-9.el7.ppc64le > guest:3.10.0-931.el7.ppc64 > > > How reproducible: > 1/5 > > Steps to Reproduce: > 1.Boot up guest with command: > MALLOC_PERTURB_=1 /usr/libexec/qemu-kvm \ > -name 'avocado-vt-vm1' \ > -sandbox off \ > -machine pseries \ > -nodefaults \ > -device VGA,bus=pci.0,addr=0x2 \ > -chardev > socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_iz0Gna/monitor- > qmpmonitor1-20180809-215932-vNtDCgIK,server,nowait \ > -mon chardev=qmp_id_qmpmonitor1,mode=control \ > -chardev > socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_iz0Gna/monitor- > catch_monitor-20180809-215932-vNtDCgIK,server,nowait \ > -mon chardev=qmp_id_catch_monitor,mode=control \ > -chardev > socket,id=serial_id_serial0,path=/var/tmp/avocado_iz0Gna/serial-serial0- > 20180809-215932-vNtDCgIK,server,nowait \ > -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \ > -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \ > -drive > id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2, > file=/home/kar/vt_test_images/rhel76-ppc64-virtio.qcow2 \ > -device > virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \ > -device > virtio-net-pci,mac=9a:2b:2c:2d:2e:2f,id=idDVCWwS,vectors=4,netdev=id6XSmsF, > bus=pci.0,addr=0x5 \ > -netdev tap,id=id6XSmsF,vhost=on,vhostfd=11,fd=17 \ > -m 8192 \ > -smp 8,maxcpus=64,cores=1,threads=8,sockets=1 \ > -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ > -vnc :0 \ > -rtc base=utc,clock=host \ > -boot menu=off,strict=off,order=cdn,once=c \ > -enable-kvm > > 2.Hotplug cpus to max 64 > 3.Offline all cpus till latest cpu > chcpu -d 0-63 > 4.Hotunplug all cpus pls provide exact steps here any scripts you use to do it > > > Actual results: > Guest quit with error as following when hotunplug cpu. > [qemu output] qemu:qemu_cpu_kick_thread: No such process > [qemu output] (Process terminated with status 1) I'd suspect attempt to unplug cpu0 which is not supported on x86 and probably not supported on ppc as well. Anyways lets see how cpus are unplugged and go from there > > > Expected results: > Guest no crash when hotunplug cpu. > > Additional info: In discussion with Igor, it looks like this is similar bugs in both the POWER and x86 code, rather than a generic bug. Therefore moving back to ppc64, Igor will clone for the x86 version of the bug. Also moving to RHEL7.7, since it's not urgent enough for 7.6. (In reply to Igor Mammedov from comment #4) > (In reply to Xujun Ma from comment #0) > > Description of problem: > > Guest quit with error when hotunplug cpu. > > > > Version-Release number of selected component (if applicable): > > qemu-kvm-rhev-2.12.0-9.el7.ppc64le > > guest:3.10.0-931.el7.ppc64 > > > > > > How reproducible: > > 1/5 > > > > Steps to Reproduce: > > 1.Boot up guest with command: > > MALLOC_PERTURB_=1 /usr/libexec/qemu-kvm \ > > -name 'avocado-vt-vm1' \ > > -sandbox off \ > > -machine pseries \ > > -nodefaults \ > > -device VGA,bus=pci.0,addr=0x2 \ > > -chardev > > socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_iz0Gna/monitor- > > qmpmonitor1-20180809-215932-vNtDCgIK,server,nowait \ > > -mon chardev=qmp_id_qmpmonitor1,mode=control \ > > -chardev > > socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_iz0Gna/monitor- > > catch_monitor-20180809-215932-vNtDCgIK,server,nowait \ > > -mon chardev=qmp_id_catch_monitor,mode=control \ > > -chardev > > socket,id=serial_id_serial0,path=/var/tmp/avocado_iz0Gna/serial-serial0- > > 20180809-215932-vNtDCgIK,server,nowait \ > > -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \ > > -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \ > > -drive > > id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2, > > file=/home/kar/vt_test_images/rhel76-ppc64-virtio.qcow2 \ > > -device > > virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=0x4 \ > > -device > > virtio-net-pci,mac=9a:2b:2c:2d:2e:2f,id=idDVCWwS,vectors=4,netdev=id6XSmsF, > > bus=pci.0,addr=0x5 \ > > -netdev tap,id=id6XSmsF,vhost=on,vhostfd=11,fd=17 \ > > -m 8192 \ > > -smp 8,maxcpus=64,cores=1,threads=8,sockets=1 \ > > -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ > > -vnc :0 \ > > -rtc base=utc,clock=host \ > > -boot menu=off,strict=off,order=cdn,once=c \ > > -enable-kvm > > > > 2.Hotplug cpus to max 64 > > 3.Offline all cpus till latest cpu > > chcpu -d 0-63 > > 4.Hotunplug all cpus > pls provide exact steps here any scripts you use to do it Sorry for replying so late. Send command: {'execute': 'device_del', 'arguments': {'id': 'core8'}, 'id': 'PQiEahED'} .... Send command: {'execute': 'device_del', 'arguments': {'id': 'core56'}, 'id': 'PQiEahED'} > > > > > > > Actual results: > > Guest quit with error as following when hotunplug cpu. > > [qemu output] qemu:qemu_cpu_kick_thread: No such process > > [qemu output] (Process terminated with status 1) > I'd suspect attempt to unplug cpu0 which is not supported on x86 and > probably not supported on ppc as well. That's right,actually I mean hot unplug all cpus pluged. > > Anyways lets see how cpus are unplugged and go from there Will update a testing log. > > > > > > > Expected results: > > Guest no crash when hotunplug cpu. > > > > Additional info: Created attachment 1482250 [details]
log
rhel8 qemu-kvm-2.12.0-26.el8+1648+9c120fe6.ppc64le have the same issue. I'm not able to reproduce it with host kernel 3.10.0-944 Could you try to reproduce the problem with this kernel on the host. I'm not able to reproduce the problem with the exact same versions of QEMU and kernel (host and guest). 1- starting with: -smp 8,maxcpus=64,cores=1,threads=8,sockets=1 2- hotplugging CPUs with: device_add driver=host-spapr-cpu-core core-id=8 id=core-8 device_add driver=host-spapr-cpu-core core-id=16 id=core-16 device_add driver=host-spapr-cpu-core core-id=24 id=core-24 device_add driver=host-spapr-cpu-core core-id=32 id=core-32 device_add driver=host-spapr-cpu-core core-id=40 id=core-40 device_add driver=host-spapr-cpu-core core-id=48 id=core-48 device_add driver=host-spapr-cpu-core core-id=56 id=core-56 device_add driver=host-spapr-cpu-core core-id=64 id=core-64 3- disabling CPUs with: chcpu -d 8-63 4- unplugging CPUs with: device_del id=core-8 device_del id=core-16 device_del id=core-24 device_del id=core-32 device_del id=core-40 device_del id=core-48 device_del id=core-56 device_del id=core-64 (In reply to Laurent Vivier from comment #9) > I'm not able to reproduce it with host kernel 3.10.0-944 > Could you try to reproduce the problem with this kernel on the host. It can't be reproduced every time,and the issue happens with 1/5 probability. Could you re-test with kernel-3.10.0-967 on the host? test env: host:kernel-3.10.0-967.el7.ppc64le qemu-kvm-rhev-2.12.0-18.el7.ppc64le guest:kernel-3.10.0-967.el7.ppc64le Run this case 100 times,hit this problem 6 times,so the bug hasn't been fixed. Xujun,
could you provide the script to reproduce the problem?
I tried to reproduce the commands I can seen in the attachment 1482250 [details] but I'm not able to reproduce the problem on 1000 attempts.
The following script is running for 2 days (5000 loops) and didn't trigger any problem: IP=root.122.80 QMP=$HOME/qemu/scripts/qmp/qmp-shell function query_cpu { echo "query-cpus" | sudo $QMP /tmp/qmp0 } function plug { for i in $(seq 8 8 $1); do echo "query-cpus" echo "device_add driver=host-spapr-cpu-core core-id=$i id=core-$i"; done | sudo $QMP /tmp/qmp0 } function unplug { for i in $(seq 8 8 $1); do echo "query-cpus" echo "device_del id=core-$i"; done | sudo $QMP /tmp/qmp0 } function remote_getconf { ssh $IP getconf $1 } function get_nb_plugged_cpus { ssh $IP lscpu | sed -n "/^CPU(s)/s/^CPU(s):[^[:digit:]]*\(.*\)/\1/p" } ssh -f $IP "stress-ng --cpu 64 --io 4 --vm 2 --vm-bytes 256M -l 100" l=0 while plug 63; do while [ "$(remote_getconf _NPROCESSORS_ONLN)" != 64 ] ; do : done ssh $IP lscpu ssh $IP chcpu -d 1-63 ssh $IP lscpu unplug 63 while [ "$(get_nb_plugged_cpus)" != 8 ] ; do : done ssh $IP lscpu ssh $IP chcpu -e 0-7 ssh $IP lscpu l=$((l+1)) echo "LOOP $l DONE" done (In reply to Xujun Ma from comment #0) > Description of problem: ... > 4.Hotunplug all cpus > > > Actual results: > Guest quit with error as following when hotunplug cpu. > [qemu output] qemu:qemu_cpu_kick_thread: No such process > [qemu output] (Process terminated with status 1) The only explanation I can find about this error is a race condition between qemu_kvm_cpu_thread_fn() that releases the thread and qemu_cpu_kick_thread() that could be using the same thread. Paolo, any idea? (In reply to Laurent Vivier from comment #16) > (In reply to Xujun Ma from comment #0) > > Description of problem: > ... > > 4.Hotunplug all cpus > > > > > > Actual results: > > Guest quit with error as following when hotunplug cpu. > > [qemu output] qemu:qemu_cpu_kick_thread: No such process > > [qemu output] (Process terminated with status 1) > > The only explanation I can find about this error is a race condition between > that releases the thread and qemu_cpu_kick_thread() > that could be using the same thread. I think in this case we could ignore the error doing: --- a/cpus.c +++ b/cpus.c @@ -1700,7 +1700,7 @@ static void qemu_cpu_kick_thread(CPUState *cpu) } cpu->thread_kicked = true; err = pthread_kill(cpu->thread->thread, SIG_IPI); - if (err) { + if (err && err != ESRCH) { fprintf(stderr, "qemu:%s: %s", __func__, strerror(err)); exit(1); } Hi Laurent I'm very sorry to fetch the scratch build late and it's closed. Could you provide a new one? Tested 100 times and no this problem with build qemu-kvm-rhev-2.12.0-20.el7.BZ1614610.ppc64le. Patch sent upstream: cpus: ignore ESRCH in qemu_cpu_kick_thread() https://patchwork.ozlabs.org/patch/1020005/ Fix included in qemu-kvm-rhev-2.12.0-22.el7 Verify this issue with qemu-kvm-rhev-2.12.0-25.el7.ppc64le, Tested this scenario 100 times,and didn't hit this problem again. Base the test result,the bug has been fixed,so set status to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2553 |