1681180 – lxc container really stopped, but virsh list still show status "running"

Bug 1681180 - lxc container really stopped, but virsh list still show status "running"

Summary: lxc container really stopped, but virsh list still show status "running"

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Virtualization Tools
Classification:	Community
Component:	libvirt
Sub Component:
Version:	unspecified
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Michal Privoznik
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-02-25 16:10 UTC by Maxim
Modified:	2019-02-26 07:27 UTC (History)
CC List:	3 users (show)
Fixed In Version:	libvirt-5.1.0
Clone Of:
Environment:
Last Closed:	2019-02-26 07:27:42 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Maxim 2019-02-25 16:10:40 UTC

Description of problem:
"virsh shutdown" stop all process inside container, but "virsh list" report that container still running. After restart libvirt, container get correct status "shut off"

Version-Release number of selected component (if applicable):
5.0.0-1.el7.x86_64

How reproducible:
see later

Steps to Reproduce:
1. create lxc container LXC_NAME
2. virsh start LXC_NAME, check "virsh list" output
3. virsh shutdown LXC_NAME, check "virsh list" output

Actual results:
container LXC_NAME hav status "running"

Expected results:
container LXC_NAME must have status "shut off"

Additional info:
libvirt-5 can work with cgroups controller v1 and v2. On Centos-7.6
libvirt correct detect that only one controller v1 available.
When container stopped, inside virLXCProcessStop() call next code:
 if (priv->cgroup) {
        rc = virCgroupKillPainfully(priv->cgroup);
        if (rc < 0)
            return -1;
        if (rc > 0) {
            virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
                           _("Some processes refused to die"));
            return -1;
        }

And because virCgroupKillPainfully() return -1, label "cleanup" never reached:
 cleanup:
    virLXCProcessCleanup(driver, vm, reason);

Function virCgroupKillPainfully() call virCgroupKillRecursive() and it return -1, because check only variable backends[i] before call method associated with current backend. But even cgroups v2 not detected, member of array backends[1] still present. Then we call backends[i]->killRecursive(group, signum, pids) and it return -1.
I prepare patch by analogy with other places of code, where used cgroups backend functions:

diff -urN libvirt-5.0.0/src/util/vircgroup.c libvirt-5.0.0.patch/src/util/vircgroup.c
--- libvirt-5.0.0/src/util/vircgroup.c  2019-01-10 23:35:29.005474054 +0300
+++ libvirt-5.0.0.patch/src/util/vircgroup.c    2019-02-25 18:07:56.394307749 +0300
@@ -2622,7 +2622,7 @@
     }
 
     for (i = 0; i < VIR_CGROUP_BACKEND_TYPE_LAST; i++) {
-        if (backends[i]) {
+        if (backends[i] && backends[i]->available()) {
             rc = backends[i]->killRecursive(group, signum, pids);
             if (rc < 0) {
                 ret = -1;

Tested on centos-7.6, now container shutdowned correctly, virsh list report "shut off"

Comment 1 Michal Privoznik 2019-02-26 07:27:42 UTC

Yep, I have merged a patch similar to this not that far ago:

commit 401030499bfb03b182da14f7e00f4a82beab9a8e
Author:     Michal Privoznik <mprivozn>
AuthorDate: Thu Jan 24 17:20:58 2019 +0100
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Feb 7 11:16:29 2019 +0100

    vircgroup: Try harder to kill cgroup
    
    Prior to rewrite of cgroup code we only had one backend to try.
    After the rewrite the virCgroupBackendGetAll() returns both
    backends (for v1 and v2). However, not both have to really be
    present on the system which results in killRecursive callback
    failing which in turn might mean we won't try the other backend.
    
    At the same time, this function reports no error as it should.
    
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Erik Skultety <eskultet>

v5.0.0-234-g401030499b

Note You need to log in before you can comment on or make changes to this bug.