Bug 1462092
Summary: | The first time to hot-unplug vcpu failed after restart libvirtd during hotplug vcpu | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jingjing Shao <jishao> | ||||||||||
Component: | libvirt | Assignee: | Peter Krempa <pkrempa> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Jingjing Shao <jishao> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 7.4 | CC: | dyuan, jishao, lhuang, pkrempa, rbalakri, xuzhang, yalzhang | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | libvirt-3.8.0-1.el7 | Doc Type: | If docs needed, set a value | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2018-04-10 10:48:37 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Jingjing Shao
2017-06-16 07:33:41 UTC
Created attachment 1288272 [details]
libvirtd.log
So the problem is that if you restart libvirtd and it is still finishing a job for a long time systemd may decide to forcefully kill it: Sep 25 22:16:51 andariel systemd[1]: libvirtd.service: State 'stop-sigterm' timed out. Killing. Sep 25 22:16:51 andariel systemd[1]: libvirtd.service: Killing process 24770 (libvirtd) with signal SIGKILL. Sep 25 22:16:51 andariel audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg Sep 25 22:16:51 andariel systemd[1]: libvirtd.service: Main process exited, code=killed, status=9/KILL Sep 25 22:16:51 andariel systemd[1]: Stopped Virtualization daemon. Sep 25 22:16:51 andariel systemd[1]: libvirtd.service: Unit entered failed state. Sep 25 22:16:51 andariel systemd[1]: libvirtd.service: Failed with result 'timeout'. Sep 25 22:16:51 andariel systemd[1]: Starting Virtualization daemon... The issue then is that libvirt may not finish creating cgroups for the new vcpu and thus will fail on the further unplug, because the cgroup did not exist yet. Upstream will no longer report error if the cgroup does not exist on cpu unplug: commit cf30a8cabd5943992e30c45efdd5fd7b82dd53cc Author: Peter Krempa <pkrempa> Date: Mon Sep 25 22:34:44 2017 +0200 qemu: hotplug: Ignore cgroup errors when hot-unplugging vcpus When the vcpu is successfully removed libvirt would remove the cgroup. In cases when removal of the cgroup fails libvirt would report an error. This does not make much sense, since the vcpu was removed and we can't really do anything with the cgroup. This patch silences the errors from cgroup removal. Hi Peter, I test with the rpm as below , but when guest will shutdown after the libvirtd restart. 3.10.0-752.el7.x86_64 qemu-kvm-rhev-2.10.0-3.el7.x86_64 libvirt-3.8.0-1.el7.x86_64 # virsh vcpucount V maximum config 200 maximum live 200 current config 3 current live 3 2. On one terminal, # virsh setvcpus V 200 3.On second terminal # service libvirtd restart 4.On the first terminal # virsh setvcpus V 200 error: Disconnected from qemu:///system due to keepalive timeout error: internal error: connection closed due to keepalive timeout 5. # service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service 6.# virsh list --all Id Name State ---------------------------------------------------- - V shut off What's the reason? Please post debug logs and VM log. Hi Peter, I just can reproduce the result as comment7 with 50% percent and did not find the special point in steps when I reproduced it. So I just attachment the libvirtd.log and guest log , can you help to check them ? thank you in advance. Created attachment 1345809 [details]
libvirtd_part1.log
Created attachment 1345810 [details]
libvirtd_part2.log
Created attachment 1345811 [details]
guest.log
Test with libvirt-3.9.0-3.virtcov.el7.x86_64 several times, can not reproduce the result in comment6 and get the expected result as below. So change the status to verified. 1. # virsh vcpucount rhel maximum config 200 maximum live 200 current config 3 current live 3 2. On one terminal, # virsh setvcpus rhel 200 3.On second terminal # service libvirtd restart 4.On the first terminal # virsh setvcpus rhel 200 error: Disconnected from qemu:///system due to keepalive timeout error: internal error: connection closed due to keepalive timeout 5. # service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service The libvirtd can start successfully. 6.# virsh vcpucount rhel maximum config 200 maximum live 200 current config 3 current live 39 7. # virsh setvcpus rhel 20 8. # virsh vcpucount rhel maximum config 200 maximum live 200 current config 3 current live 20 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0704 |