Bug 689030

Summary: throw a better error if blkio cgroup controller is too old
Product: Red Hat Enterprise Linux 6 Reporter: Dave Allan <dallan>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.1CC: ajia, eblake, jeder, jlmagee, jyang, libvirt-maint, syeghiay, yoyzhang
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.8.7-17.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 13:29:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 632492    
Bug Blocks:    

Description Dave Allan 2011-03-18 21:00:37 UTC
If the blkio cgroup controller is too old to support nested cgroups, the error messages provided when starting a guest is incorrect.  

# virsh -c lxc:/// start LXCtest
error: Failed to start domain LXCtest
error: internal error The 'cpuacct', 'devices' & 'memory' cgroups controllers must be mounted

However, those cgroups are mounted; the problem is that the blkio controller is also mounted, but it's too old.  Even though the documentation clearly states that it's necessary to unmount the blkio controller in that case, the error should also provide guidance on what went wrong.

Comment 2 Eric Blake 2011-03-18 22:40:46 UTC
This proposed upstream patch may be the solution; I'm checking now:

https://www.redhat.com/archives/libvir-list/2011-March/msg00213.html

This patch enables cgroup controllers as much as possible by skipping
the creation of blkio controller when running with old kenels that
doesn't support multi-level directory for blkio controller.

Comment 4 Dave Allan 2011-03-19 01:31:21 UTC
On my system, that patch is helpful, but not a complete fix.  With it I am able to start the container, but I cannot get a console:

[root@dhcp74-119 libvirt]# virsh -c lxc:/// console LXCtest
error: Unable to get domain status
error: internal error Unable to get cgroup for LXCtest

Disabling the blkio controller allows console to succeed.

Comment 5 Eric Blake 2011-03-19 01:37:08 UTC
back to assigned until rest of problem is solved

Comment 6 Dave Allan 2011-03-31 01:52:05 UTC
I'm not sure what to make of this: with the current git head, I am able to get a console with the blkio controller mounted.  AFAIK, there haven't been any code changes in this area, and the system hasn't been updated, so I can't explain why it's now working.

Comment 7 Dave Allan 2011-03-31 16:02:26 UTC
Just for the record, I when I disabled the blkio cgroups controller I did it by commenting it out in /etc/cgconfig.conf:

mount {
	cpuset	= /cgroup/cpuset;
	cpu	= /cgroup/cpu;
	cpuacct	= /cgroup/cpuacct;
	memory	= /cgroup/memory;
	devices	= /cgroup/devices;
	freezer	= /cgroup/freezer;
	net_cls	= /cgroup/net_cls;
	ns	= /cgroup/ns;
#	blkio	= /cgroup/blkio;
}

Comment 8 Osier Yang 2011-04-01 04:16:27 UTC
It works fine for me, I mean could connect to the lxc guest via console successfully with blkio controller mounted, and it looks to me "lxcDomainOpenConsole" has no relationship with cgroup, so it's strange that how could get that error when connecting with console.

Comment 9 Dave Allan 2011-04-01 14:00:34 UTC
Yeah, and it's working fine for me now, too.  I wonder if that error was improperly reported.

Comment 14 Alex Jia 2011-04-15 09:26:24 UTC
Hi Dave,
I saw Vivek Goyal had applied a patch 'blk-cgroup: Allow creation of hierarchical cgroups' to kernel since Mon Nov 15 2010, so I downloaded packages of the kernel before the patch such as 2.6.32-81.el6.x86_64(28 Oct 2010), however, I can't still reproduce the bug on rhel6.0(2.6.32-81.el6.x86_64) with libvirt-0.8.7-14.el6.x86_64, I'm not sure if my kernel version is newer than you or I'm missing something.


Thanks,
Alex

Comment 15 Alex Jia 2011-04-15 09:30:17 UTC
BTW, I haven't installed a real guest, just define by xml and then starting it, I think it should be enough for the bug, if not, please correct me, thanks.

# virsh -c lxc:/// define lxc.xml
Domain vm1 defined from lxc.xml

# virsh -c lxc:/// start vm1
error: Failed to start domain vm1
error: Unknown failure

# virsh -c lxc:/// dumpxml vm1
<domain type='lxc'>
  <name>vm1</name>
  <uuid>62fd647f-bc09-cfbf-80b5-73a243afdee6</uuid>
  <memory>500000</memory>
  <currentMemory>500000</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64'>exe</type>
    <init>/bin/sh</init>
  </os>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/libvirt_lxc</emulator>
    <interface type='network'>
      <mac address='52:54:00:cb:7f:b9'/>
      <source network='default'/>
    </interface>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
  </devices>
</domain>

Comment 16 Eric Blake 2011-04-15 16:13:31 UTC
Have you tried testing with qemu guests instead of lxc?  Also, are you sure that the blkio cgroup is mounted (that is, is the cgconfig service running)?

Comment 17 Alex Jia 2011-04-18 03:10:36 UTC
(In reply to comment #16)
> Have you tried testing with qemu guests instead of lxc?  Also, are you sure
> that the blkio cgroup is mounted (that is, is the cgconfig service running)?

Hi Eric,
I haven't tried testing with qemu guest, and I can make sure cgroup system has been mounted before previous testing.

The following is qemu guest testing result:
# lscgroup | grep blkio
blkio:/
blkio:/libvirt

Note: libvirt hasn't created a 'qemu' and 'lxc' hypervisor directory under the /cgroup/blkio/libvirt, please see the following additional information. 

# virsh uri
qemu:///system

# virsh define /tmp/vr-rhel6-x86_64-kvm.xml
Domain vr-rhel6-x86_64-kvm defined from /tmp/vr-rhel6-x86_64-kvm.xml

# virsh start vr-rhel6-x86_64-kvm
Domain vr-rhel6-x86_64-kvm started

Note: have no any error information in here.



Additional information:

# service cgconfig status
Running

# lscgroup
cpuset:/
cpuset:/libvirt
cpuset:/libvirt/lxc
cpuset:/libvirt/qemu
cpu:/
cpu:/libvirt
cpu:/libvirt/lxc
cpu:/libvirt/qemu
cpuacct:/
cpuacct:/libvirt
cpuacct:/libvirt/lxc
cpuacct:/libvirt/qemu
memory:/
memory:/libvirt
memory:/libvirt/lxc
memory:/libvirt/qemu
devices:/
devices:/libvirt
devices:/libvirt/lxc
devices:/libvirt/qemu
freezer:/
freezer:/libvirt
freezer:/libvirt/lxc
freezer:/libvirt/qemu
net_cls:/
blkio:/
blkio:/libvirt

# grep 'cgroup' /proc/mounts 
cgroup /cgroup/cpuset cgroup rw,relatime,cpuset 0 0
cgroup /cgroup/cpu cgroup rw,relatime,cpu 0 0
cgroup /cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0
cgroup /cgroup/memory cgroup rw,relatime,memory 0 0
cgroup /cgroup/devices cgroup rw,relatime,devices 0 0
cgroup /cgroup/freezer cgroup rw,relatime,freezer 0 0
cgroup /cgroup/net_cls cgroup rw,relatime,net_cls 0 0
cgroup /cgroup/blkio cgroup rw,relatime,blkio 0 0

# uname -r
2.6.32-81.el6.x86_64
# rpm -q libcgroup
libcgroup-0.37-1.el6.x86_64
# rpm -q libvirt
libvirt-0.8.7-14.el6.x86_64

Comment 18 Alex Jia 2011-04-26 03:13:40 UTC
Hi Dave and Eric,

I tried to use lxc and qemu guest to reproduce the bug on rhel6.0.z(2.6.32-71.7.1.el6.x86_64) with rhel6.1 libvirt(libvirt-0.8.7-16.el6.x86_64), however, I can't encounter any cgroup issues:

# service cgconfig status
Running

# lscgroup | grep blkio
blkio:/
blkio:/libvirt

For lxc guest, it seems recent libvirt version exists bug for lxc hypervisor, guest will can't be successfully started:
# virsh -c lxc:/// start vm1
error: Failed to start domain vm1
error: Unknown failure

As Dave said, this error perhaps is prior to the cgroups check.

For qemu guest, I can successfully start guest and without any cgroup error.


I have tried different kernel and libvirt version, and also tried different hyperviosr guest, but I still can't get expected cgroup wrong. so I don't know how to continue to test the bug, could you give me some advise?

Thanks,
Alex

Comment 19 Dave Allan 2011-04-26 16:05:57 UTC
Alex, I don't know how to reproduce this BZ on 6.1.  It was determined to be present in 6.1, AFAIK, by code inspection.  I was only able to reproduce it on Fedora 14.  I wish I could give you more guidance, but I've made all the suggestions that I can.  Eric, do you have any further ideas?

Comment 20 Alex Jia 2011-04-28 10:13:05 UTC
I have tried to use lxc and qemu guest to reproduce the bug on rhel6.0.z (2.6.32-71.7.1.el6.x86_64) with rhel6.1 libvirt (libvirt-0.8.7-16.el6.x86_64),
however, I can't encounter any cgroup issues, in other words, I can't reproduce it.

In addition, I have also verified the bug on rhel6.1 snapshot5 (2.6.32-131.0.5.el6.x86_64) with libvirt-0.8.7-18.el6.x86_64, the test result is the same to previous test for lxc and qemu guest, please see Comment 18.

So change the bug to SanityOnly VERIFIED.


Alex

Comment 21 John L Magee 2011-05-01 18:58:58 UTC
FWIW. A similar issue exists in Fedora 14. The cgroup hierarchy for individual QEMU VMs is not created. Early F14 worked fine. Somewhere along the way, virt-preview updates caused it. I just commented out the blkio controller as suggested above and now get the individual VMs in the other cgroup hierarchies.

[root@kvmhost01 ~]# rpm -q libvirt
libvirt-0.8.8-4.fc14.x86_64
[root@kvmhost01 ~]# rpm -q libcgroup
libcgroup-0.36.2-6.fc14.x86_64
[root@kvmhost01 ~]# uname -r
2.6.35.12-90.fc14.x86_64

Comment 24 errata-xmlrpc 2011-05-19 13:29:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html