Bug 1427801
Summary: | CPU Hotunplug throws "error: operation failed: vcpu unplug request timed out" and leaves hotplug operation non functional | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Community] Virtualization Tools | Reporter: | Satheesh Rajendran <sathnaga> | ||||||||
Component: | libvirt | Assignee: | Peter Krempa <pkrempa> | ||||||||
Status: | CLOSED NEXTRELEASE | QA Contact: | |||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | unspecified | CC: | hannsj_uhl, libvirt-maint, pkrempa, rbalakri, sathnaga | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | ppc64le | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 1428893 (view as bug list) | Environment: | |||||||||
Last Closed: | 2017-03-10 08:13:21 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1428893 | ||||||||||
Attachments: |
|
Description
Satheesh Rajendran
2017-03-01 09:22:35 UTC
It is reproducible always #for i in {1..10};do virsh destroy virt-tests-vm1;virsh start virt-tests-vm1;sleep 120;virsh setvcpus virt-tests-vm1 16 --live;sleep 120;virsh vcpucount virt-tests-vm1 --guest;virsh setvcpus virt-tests-vm1 8 --live;sleep 120;virsh vcpucount virt-tests-vm1 --guest;done Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 It is reproducible always #for i in {1..10};do virsh destroy virt-tests-vm1;virsh start virt-tests-vm1;sleep 120;virsh setvcpus virt-tests-vm1 16 --live;sleep 120;virsh vcpucount virt-tests-vm1 --guest;virsh setvcpus virt-tests-vm1 8 --live;sleep 120;virsh vcpucount virt-tests-vm1 --guest;done Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Domain virt-tests-vm1 destroyed Domain virt-tests-vm1 started 16 error: operation failed: vcpu unplug request timed out 8 Got it pasted twice, ignore one of the previous comment Please attach the debug log of libvirtd while this bug reproduces. Created attachment 1259113 [details]
libvirtdlog
Attached libvirtd debug log for the below commands
# virsh setvcpus virt-tests-vm1 16 --live
# virsh vcpucount virt-tests-vm1 --guest
# virsh setvcpus virt-tests-vm1 8 --live
The logs you've provided are from the libvirt client library, not libvirtd - the daemon. I specifically asked for the daemon logs since the client logs are pretty useless. Please attach the proper log from the daemon. Created attachment 1259169 [details]
libvirt daemon log
logs were during the below command run
#virsh setvcpus virt-tests-vm1 8 --live
error: operation failed: vcpu unplug request timed out
That's unfortunately not enough. That code behaves exactly as it should during the first failure. The guest OS did not allow to unplug the vcpus and thus libvirt reported "vcpu unplug request timed out". This is expected. The issue with hotplug fully breaking after that is not captured in this log. Please fully reproduce it and post logs with everything. Created attachment 1259407 [details]
libvirt daemon log
Attached libvirtd daemon log for the following series of events
# virsh vcpucount virt-tests-vm1 --guest
8
# virsh setvcpus virt-tests-vm1 16 --live
# virsh vcpucount virt-tests-vm1 --guest
16
# virsh setvcpus virt-tests-vm1 8 --live
error: operation failed: vcpu unplug request timed out
# virsh vcpucount virt-tests-vm1 --guest
8
# virsh setvcpus virt-tests-vm1 16 --live
# virsh vcpucount virt-tests-vm1 --guest
8
# virsh vcpucount virt-tests-vm1 --guest
8
# virsh setvcpus virt-tests-vm1 8 --live
error: internal error: unable to execute QEMU command 'device_del': Device 'vcpu8' not found
# virsh vcpucount virt-tests-vm1 --guest
8
# virsh setvcpus virt-tests-vm1 16 --live
# echo $?
0
# virsh vcpucount virt-tests-vm1 --guest
8
# virsh setvcpus virt-tests-vm1 8 --live
error: internal error: unable to execute QEMU command 'device_del': Device 'vcpu8' not found
Fixed upstream: commit 8af68ea47830b8d32907dc50c6ca4869d14bb862 Author: Peter Krempa <pkrempa> Date: Fri Mar 3 16:04:57 2017 +0100 qemu: hotplug: Reset device removal waiting code after vCPU unplug If the delivery of the DEVICE_DELETED event for the vCPU being deleted would time out, the code would not call 'qemuDomainResetDeviceRemoval'. Since the waiting thread did not unregister itself prior to stopping the waiting the monitor code would try to wake it up instead of dispatching it to the event worker. As a result the unplug process would not be completed and the definition would not be updated. Am able to reproduce the issue with this patched libvirt aswell, am I missing something else? libvirt compiled at below commit commit a6d681485ff85e27859583a5c20e1630c5cf8352 Author: John Ferlan <jferlan> Date: Tue Mar 7 16:10:38 2017 -0500 Using F25 guest with upstream kernel. # uname -a Linux atest-guest 4.11.0-rc3 #1 SMP Mon Mar 20 08:59:24 EDT 2017 ppc64le ppc64le ppc64le GNU/Linux #virsh setvcpus vm1 255 --live;echo $?;virsh vcpucount vm1 --guest 0 154 # virsh vcpucount vm1 --guest 255 #virsh setvcpus vm1 1 --live;echo $?;virsh vcpucount vm1 --guest error: operation failed: vcpu unplug request timed out 1 error: Guest agent is not responding: Guest agent not available for now # virsh vcpucount vm1 --guest 254 error log from libvirtd log 2017-03-21 06:06:44.451+0000: 163682: error : qemuDomainHotplugDelVcpu:5383 : operation failed: vcpu unplug request timed out 2017-03-21 06:06:49.481+0000: 163684: error : qemuAgentSend:915 : Guest agent is not responding: Guest agent not available for now 2017-03-21 06:08:14.691+0000: 163686: error : qemuDomainHotplugDelVcpu:5383 : operation failed: vcpu unplug request timed out Let me know incase you need a log with debug enabled? The operation may still time out. My patches fixed the problem that once the operation actually finishes in the guest (after the timeout was reported) and the hypervisor is able to detach the vCPUS the libvirt data structures are updated properly which they did not previously and thus any further cpu-unplug attempt failed. Since the guest may stall for an undeterministic amount of time until it actually unplugs the vcpus the timeout code still needs to stay in place. |