Bug 1681180

Summary: lxc container really stopped, but virsh list still show status "running"
Product: [Community] Virtualization Tools Reporter: Maxim <kolomaxes>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: libvirt-maint, mprivozn, tburke
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-5.1.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-26 07:27:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Maxim 2019-02-25 16:10:40 UTC
Description of problem:
"virsh shutdown" stop all process inside container, but "virsh list" report that container still running. After restart libvirt, container get correct status "shut off"

Version-Release number of selected component (if applicable):
5.0.0-1.el7.x86_64

How reproducible:
see later

Steps to Reproduce:
1. create lxc container LXC_NAME
2. virsh start LXC_NAME, check "virsh list" output
3. virsh shutdown LXC_NAME, check "virsh list" output

Actual results:
container LXC_NAME hav status "running"

Expected results:
container LXC_NAME must have status "shut off"

Additional info:
libvirt-5 can work with cgroups controller v1 and v2. On Centos-7.6
libvirt correct detect that only one controller v1 available.
When container stopped, inside virLXCProcessStop() call next code:
 if (priv->cgroup) {
        rc = virCgroupKillPainfully(priv->cgroup);
        if (rc < 0)
            return -1;
        if (rc > 0) {
            virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
                           _("Some processes refused to die"));
            return -1;
        }

And because virCgroupKillPainfully() return -1, label "cleanup" never reached:
 cleanup:
    virLXCProcessCleanup(driver, vm, reason);

Function virCgroupKillPainfully() call virCgroupKillRecursive() and it return -1, because check only variable backends[i] before call method associated with current backend. But even cgroups v2 not detected, member of array backends[1] still present. Then we call backends[i]->killRecursive(group, signum, pids) and it return -1.
I prepare patch by analogy with other places of code, where used cgroups backend functions:

diff -urN libvirt-5.0.0/src/util/vircgroup.c libvirt-5.0.0.patch/src/util/vircgroup.c
--- libvirt-5.0.0/src/util/vircgroup.c  2019-01-10 23:35:29.005474054 +0300
+++ libvirt-5.0.0.patch/src/util/vircgroup.c    2019-02-25 18:07:56.394307749 +0300
@@ -2622,7 +2622,7 @@
     }
 
     for (i = 0; i < VIR_CGROUP_BACKEND_TYPE_LAST; i++) {
-        if (backends[i]) {
+        if (backends[i] && backends[i]->available()) {
             rc = backends[i]->killRecursive(group, signum, pids);
             if (rc < 0) {
                 ret = -1;

Tested on centos-7.6, now container shutdowned correctly, virsh list report "shut off"

Comment 1 Michal Privoznik 2019-02-26 07:27:42 UTC
Yep, I have merged a patch similar to this not that far ago:

commit 401030499bfb03b182da14f7e00f4a82beab9a8e
Author:     Michal Privoznik <mprivozn>
AuthorDate: Thu Jan 24 17:20:58 2019 +0100
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Feb 7 11:16:29 2019 +0100

    vircgroup: Try harder to kill cgroup
    
    Prior to rewrite of cgroup code we only had one backend to try.
    After the rewrite the virCgroupBackendGetAll() returns both
    backends (for v1 and v2). However, not both have to really be
    present on the system which results in killRecursive callback
    failing which in turn might mean we won't try the other backend.
    
    At the same time, this function reports no error as it should.
    
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Erik Skultety <eskultet>

v5.0.0-234-g401030499b