Bug 592170

Summary:	[LXC] The related process still exist after os container has been shutdown.
Product:	Red Hat Enterprise Linux 6	Reporter:	dyuan
Component:	libvirt	Assignee:	Jiri Denemark <jdenemar>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	medium	Docs Contact:
Priority:	low
Version:	6.0	CC:	ajia, berrange, dallan, jdenemar, llim, moli, mzhan, xen-maint, yoyzhang
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	libvirt-0.9.1-1.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-12-06 10:44:07 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	621776, 693512

Description dyuan 2010-05-14 05:43:02 UTC

Description of problem:
The related processes still exist after os container has been shutdown.
But after reboot host, the related processes does not exist. 

Version-Release number of selected component (if applicable):
libvirt-0.8.1-3.el6.i686
kernel-2.6.32-25.el6.i686

How reproducible:
always

Steps to Reproduce:
1.  create a root filesytem with febootstrap

# febootstrap --group-install="base" rawhide /tmp/rawhide

2.  define a OS container with following xml:

virsh # dumpxml fedora-rawhide
<domain type='lxc' id='30013'>
  <name>fedora-rawhide</name>
  <uuid>6222c8db-8764-9c54-8fed-2646b8c4ef78</uuid>
  <memory>32768</memory>
  <currentMemory>32768</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64'>exe</type>
    <init>/sbin/init</init>
  </os>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/libvirt_lxc</emulator>
    <filesystem type='mount'>
      <source dir='/tmp/rawhide'/>
      <target dir='/'/>
    </filesystem>
    <interface type='network'>
      <mac address='52:54:00:73:6b:43'/>
      <source network='default'/>
      <target dev='veth1'/>
    </interface>
    <console type='pty'>
      <target port='0'/>
    </console>
  </devices>
</domain> 

3. start container

virsh # start fedora-rawhide
Domain fedora-rawhide started

4. check cgroup filesystem and processes
# cat /cgroup/cpu/libvirt/lxc/fedora-rawhide/tasks 
2883
2889
2991
4222
4223
4224
4225
4251
4297
4342
4351
4360
4371

# ps aux|grep 2883
root      2883  0.0  0.0   4092   944 ?        Ss   13:22   0:00 /usr/libexec/libvirt_lxc --name fedora-rawhide --console 17 --background --veth veth0
# ps aux|grep 2889
root      2889  0.0  0.0   2768  1136 pts/0    Ss+  13:22   0:00 /sbin/init
# ps aux|grep 2991
root      2991  0.0  0.0   2428   556 ?        S<s  13:22   0:00 /sbin/udevd -d

5. shutdown container

virsh # shutdown fedora-rawhide
Domain fedora-rawhide is being shutdown

virsh # list --all
 Id Name                 State
----------------------------------
  - fedora-rawhide       shut off

6. check cgroup filesystem and processes
# cat /cgroup/cpu/libvirt/lxc/fedora-rawhide/tasks 
2889
2991
4222
4223
4224
4225
4251
4297
4342
4351
4360
4371

# ps aux|grep 2889
root      2889  0.0  0.0   2824  1408 ?        Ss   13:22   0:00 /sbin/init
# ps aux|grep 2991
root      2991  0.0  0.0   2428   556 ?        S<s  13:22   0:00 /sbin/udevd -d

7. start again
# virsh start fedora-rawhide
Domain fedora-rawhide started

# cat /cgroup/cpu/libvirt/lxc/fedora-rawhide/tasks 
2889
2991
4222
4223
4224
4225
4251
4297
4342
4351
4360
4371
4478
4499
4505
4527
4601
4942

# ps aux|grep 2889
root      2889  0.0  0.0   2824  1408 ?        Ss   13:22   0:00 /sbin/init
# ps aux|grep 2991
root      2991  0.0  0.0   2428   556 ?        S<s  13:22   0:00 /sbin/udevd -d
# ps aux|grep 4478
root      4478  0.0  0.0   4092   940 ?        Ss   13:29   0:00 /usr/libexec/libvirt_lxc --name fedora-rawhide --console 17 --background --veth veth0
# ps aux|grep 4499
root      4499  0.0  0.0   2768  1136 pts/0    Ss+  13:29   0:00 /sbin/init
  
Actual results:
The related processes still exist after the os container has been shutdown.

After reboot host, the 'fedora-rawhide' related item does not exist in cgroup filesystem and the processes does not exist in ps result.

Expected results:
The related processes and dir items should not exist after the os container has been shutdown for no need to reboot host. 

Additional info:

Comment 2 RHEL Program Management 2010-05-14 07:25:45 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 4 Jiri Denemark 2010-07-23 08:34:43 UTC

Should be fixed by 7af5f4689f63bc6ffec441178166c562fee28bc6 upstream commit.

Comment 5 Daniel Berrangé 2010-11-04 17:56:41 UTC

7af5f4689f63bc6ffec441178166c562fee28bc6 only fixes one scenario. It deals with cleanup when libvirt_lxc shuts down cleanly. If you run 'virsh destroy' though, libvirt_lxc just gets SIGKILL, so can't then kill off the container init process.

We need to explicitly kill every single PID in the $CGROUP/tasks file really.

Comment 7 Daniel Berrangé 2011-02-23 12:05:05 UTC

The following two changes are required to fully solve this

http://www.redhat.com/archives/libvir-list/2011-February/msg01005.html
http://www.redhat.com/archives/libvir-list/2011-February/msg01006.html

Comment 8 Jiri Denemark 2011-04-21 09:40:56 UTC

Fixed upstream by v0.8.8-76-g33191b4 and v0.8.8-179-g4e3117a:

commit 33191b419c8c8b17af7c6100997e64ed18bd5f62
Author: Daniel P. Berrange <berrange>
Date:   Tue Feb 22 17:33:59 2011 +0000

    Add APIs for killing off processes inside a cgroup
    
    The virCgroupKill method kills all PIDs found in a cgroup
    
    The virCgroupKillRecursively method does this recursively
    for child cgroups.
    
    The virCgroupKillPainfully method does a recursive kill
    several times in a row until everything has really died

commit 4e3117ae50efc0fcbd5ce485cd610dfab7f5c625
Author: Daniel P. Berrange <berrange>
Date:   Tue Feb 22 17:35:06 2011 +0000

    Make LXC container startup/shutdown/I/O more robust
    
    The current LXC I/O controller looks for HUP to detect
    when a guest has quit. This isn't reliable as during
    initial bootup it is possible that 'init' will close
    the console and let mingetty re-open it. The shutdown
    of containers was also flakey because it only killed
    the libvirt I/O controller and expected container
    processes to gracefully follow.
    
    Change the I/O controller such that when it see HUP
    or an I/O error, it uses kill($PID, 0) to see if the
    process has really quit.
    
    Change the container shutdown sequence to use the
    virCgroupKillPainfully function to ensure every
    really goes away
    
    This change makes the use of the 'cpu', 'devices'
    and 'memory' cgroups controllers compulsory with
    LXC

Comment 10 zhanghaiyan 2011-05-26 02:02:01 UTC

Verified this bug pass with libvirt-0.9.1-1.el6.x86_64
1.  create a root filesytem with febootstrap
# febootstrap --group-install="base" rawhide /tmp/rawhide

2.  define a OS container with following xml:
virsh # dumpxml fedora-rawhide
<domain type='lxc' id='30013'>
  <name>fedora-rawhide</name>
  <uuid>6222c8db-8764-9c54-8fed-2646b8c4ef78</uuid>
  <memory>32768</memory>
  <currentMemory>32768</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64'>exe</type>
    <init>/sbin/init</init>
  </os>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/libvirt_lxc</emulator>
    <filesystem type='mount'>
      <source dir='/tmp/rawhide'/>
      <target dir='/'/>
    </filesystem>
    <interface type='network'>
      <mac address='52:54:00:73:6b:43'/>
      <source network='default'/>
      <target dev='veth1'/>
    </interface>
    <console type='pty'>
      <target port='0'/>
    </console>
  </devices>
</domain> 

3. start container
virsh # start fedora-rawhide
Domain fedora-rawhide started

4. check cgroup filesystem and processes
# cat /cgroup/cpu/libvirt/lxc/fedora-rawhide/tasks 
27936
27961
27991
27996

# ps axu|grep 27936
root     27936  0.0  0.0  38388  1268 ?        Ss   21:58   0:00 /usr/libexec/libvirt_lxc --name fedora-rawhide --console 22 --background --veth veth3
# ps axu|grep 27961
root     27961  0.0  0.0  34124  3580 pts/0    Ss+  21:58   0:00 /sbin/init
# ps axu|grep 27991
root     27991  0.0  0.0  14844   992 ?        Ss   21:58   0:00 /sbin/udevd

5. destroy container
virsh # destroy fedora-rawhide
Domain fedora-rawhide destroyed

virsh # list --all
 Id Name                 State
----------------------------------
  - fedora-rawhide       shut off

6. # cat /cgroup/cpu/libvirt/lxc/fedora-rawhide/tasks 
cat: /cgroup/cpu/libvirt/lxc/fedora-rawhide/tasks: No such file or directory

Comment 12 Min Zhan 2011-07-18 07:19:17 UTC

Move to Verified according to Comment #10

Comment 13 errata-xmlrpc 2011-12-06 10:44:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html