Bug 1462092

Summary: The first time to hot-unplug vcpu failed after restart libvirtd during hotplug vcpu
Product: Red Hat Enterprise Linux 7 Reporter: Jingjing Shao <jishao>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED ERRATA QA Contact: Jingjing Shao <jishao>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: dyuan, jishao, lhuang, pkrempa, rbalakri, xuzhang, yalzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-3.8.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 10:48:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
libvirtd.log
none
libvirtd_part1.log
none
libvirtd_part2.log
none
guest.log none

Description Jingjing Shao 2017-06-16 07:33:41 UTC
Description of problem:
The first time to hot-unplug vcpu  failed after restart libvirtd during hotplug vcpu

Version-Release number of selected component (if applicable):
libvirt-3.2.0-10.el7.x86_64

How reproducible:
100%

Steps to Reproduce:

1. # virsh vcpucount r7.2
maximum      config       200
maximum      live         200
current      config         3
current      live           3

2. On one terminal,
#  virsh setvcpus r7.2  200

3.On second terminal
#  service libvirtd restart

4.On the first terminal
#  virsh setvcpus r7.2  200
error: Disconnected from qemu:///system due to keepalive timeout
error: internal error: connection closed due to keepalive timeout

5. #  service libvirtd restart
Redirecting to /bin/systemctl restart libvirtd.service

The libvirtd can start successfully.

6.# virsh vcpucount r7.2
maximum      config       200
maximum      live         200
current      config         3
current      live          39

7. virsh setvcpus r7.2  20
error: Failed to create controller cpu for group: No such file or directory

8.#  virsh setvcpus r7.2  20
#
#

Actual results:
As the step shows

Expected results:
The first time should succeed

Additional info:
The libvirtd.log are attached as below

Comment 2 Jingjing Shao 2017-06-16 07:49:54 UTC
Created attachment 1288272 [details]
libvirtd.log

Comment 3 Peter Krempa 2017-09-25 20:34:08 UTC
So the problem is that if you restart libvirtd and it is still finishing a job for a long time systemd may decide to forcefully kill it:

Sep 25 22:16:51 andariel systemd[1]: libvirtd.service: State 'stop-sigterm' timed out. Killing.
Sep 25 22:16:51 andariel systemd[1]: libvirtd.service: Killing process 24770 (libvirtd) with signal SIGKILL.
Sep 25 22:16:51 andariel audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg
Sep 25 22:16:51 andariel systemd[1]: libvirtd.service: Main process exited, code=killed, status=9/KILL
Sep 25 22:16:51 andariel systemd[1]: Stopped Virtualization daemon.
Sep 25 22:16:51 andariel systemd[1]: libvirtd.service: Unit entered failed state.
Sep 25 22:16:51 andariel systemd[1]: libvirtd.service: Failed with result 'timeout'.
Sep 25 22:16:51 andariel systemd[1]: Starting Virtualization daemon...

The issue then is that libvirt may not finish creating cgroups for the new vcpu and thus will fail on the further unplug, because the cgroup did not exist yet.

Comment 4 Peter Krempa 2017-09-27 06:56:08 UTC
Upstream will no longer report error if the cgroup does not exist on cpu unplug:

commit cf30a8cabd5943992e30c45efdd5fd7b82dd53cc
Author: Peter Krempa <pkrempa>
Date:   Mon Sep 25 22:34:44 2017 +0200

    qemu: hotplug: Ignore cgroup errors when hot-unplugging vcpus
    
    When the vcpu is successfully removed libvirt would remove the cgroup.
    In cases when removal of the cgroup fails libvirt would report an error.
    
    This does not make much sense, since the vcpu was removed and we can't
    really do anything with the cgroup. This patch silences the errors from
    cgroup removal.

Comment 6 Jingjing Shao 2017-10-30 14:29:56 UTC
Hi Peter,

I test with the rpm as below , but when guest will shutdown after the libvirtd restart.

3.10.0-752.el7.x86_64
qemu-kvm-rhev-2.10.0-3.el7.x86_64
libvirt-3.8.0-1.el7.x86_64


# virsh vcpucount V
maximum      config       200
maximum      live         200
current      config         3
current      live           3

2. On one terminal,
#  virsh setvcpus V 200

3.On second terminal
#  service libvirtd restart

4.On the first terminal
#  virsh setvcpus V  200
error: Disconnected from qemu:///system due to keepalive timeout
error: internal error: connection closed due to keepalive timeout

5. #  service libvirtd restart
Redirecting to /bin/systemctl restart libvirtd.service

6.# virsh list  --all
 Id    Name                           State
----------------------------------------------------
 -     V                              shut off

Comment 7 Peter Krempa 2017-10-30 15:30:46 UTC
What's the reason? Please post debug logs and VM log.

Comment 8 Jingjing Shao 2017-10-31 08:32:12 UTC
Hi Peter,

I just can reproduce the result as comment7 with 50% percent and did not find the special point in steps when I reproduced it.

So I just attachment the libvirtd.log and guest log , can you help to check them ? thank you in advance.

Comment 9 Jingjing Shao 2017-10-31 08:33:34 UTC
Created attachment 1345809 [details]
libvirtd_part1.log

Comment 10 Jingjing Shao 2017-10-31 08:34:53 UTC
Created attachment 1345810 [details]
libvirtd_part2.log

Comment 11 Jingjing Shao 2017-10-31 08:35:32 UTC
Created attachment 1345811 [details]
guest.log

Comment 12 Jingjing Shao 2017-11-29 11:29:28 UTC
Test with libvirt-3.9.0-3.virtcov.el7.x86_64 several times, can not reproduce the result in comment6 and get the expected result as below. So change the status to verified.


1. # virsh vcpucount rhel
maximum      config       200
maximum      live         200
current      config         3
current      live           3

2. On one terminal,
#  virsh setvcpus rhel  200

3.On second terminal
#  service libvirtd restart

4.On the first terminal
#  virsh setvcpus rhel  200
error: Disconnected from qemu:///system due to keepalive timeout
error: internal error: connection closed due to keepalive timeout

5. #  service libvirtd restart
Redirecting to /bin/systemctl restart libvirtd.service

The libvirtd can start successfully.

6.# virsh vcpucount rhel
maximum      config       200
maximum      live         200
current      config         3
current      live          39

7. # virsh setvcpus rhel  20

8. # virsh vcpucount rhel
maximum      config       200
maximum      live         200
current      config         3
current      live          20

Comment 16 errata-xmlrpc 2018-04-10 10:48:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0704