Description of problem: The vdsm logs show: libvirtError: Requested operation is not valid: cgroup CPU controller is not mounted Thread-251::DEBUG::2015-02-19 12:44:02,465::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: cgroup CPUACCT controller is not mounted Thread-251::ERROR::2015-02-19 12:44:02,465::sampling::475::vm.Vm::(collect) vmId=`fdc09d48-d800-4138-95e0-db18596882e9`::Stats function failed: <AdvancedStatsFunction _sampleCpu at 0x2a00490> Traceback (most recent call last): File "/usr/share/vdsm/virt/sampling.py", line 471, in collect File "/usr/share/vdsm/virt/sampling.py", line 346, in __call__ File "/usr/share/vdsm/virt/vm.py", line 303, in _sampleCpu File "/usr/share/vdsm/virt/vm.py", line 689, in f File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1911, in getCPUStats libvirtError: Requested operation is not valid: cgroup CPUACCT controller is not mounted Thread-251::DEBUG::2015-02-19 12:44:02,481::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: cgroup CPU controller is not mounted Thread-251::ERROR::2015-02-19 12:44:02,481::sampling::475::vm.Vm::(collect) vmId=`fdc09d48-d800-4138-95e0-db18596882e9`::Stats function failed: <AdvancedStatsFunction _sampleCpuTune at 0x2a006c0> Traceback (most recent call last): File "/usr/share/vdsm/virt/sampling.py", line 471, in collect File "/usr/share/vdsm/virt/sampling.py", line 346, in __call__ File "/usr/share/vdsm/virt/vm.py", line 349, in _sampleCpuTune File "/usr/share/vdsm/virt/vm.py", line 689, in f File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1991, in schedulerParameters Contrary to bug 1198187 in this case, no functional impact can be observed. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Besides showing the messages from the description in the logs, for this bug we see that the SELinux context of the libvirtd instances is correct. (In bug 1198187 the selinux context of the libvirtd process was incorrect)
From previous sos reports we know that libvirtd is running in the right context. Now we need to understand if libvirtd is seeing the cgroup mounts. And it looks like sos is not collecting the informations of all processes. So to understand what mounts libvirtd sees, could you please manually: 1. Identify the pid of libvirtd on a host: pidof libvirtd 2. Collect the proc dir: tar cf logs.tar /proc/$(pidof libvirtd) 3. Collect ps -eZ from the same host Please make sure that the tarball then contains the 'mounts' file for the libvirtd process. The goal is still to find out if libvirtd is (a) seeing cgroups or (b) selinux is preventing it.
Thanks for the files. Interestingly none of the mount related files contains data: They are empty. Nijin or Alexandros, can you manually check if the /proc/<libvirt pid>/mounts file is empty on the hosts?
Thanks Alexandros, I should have used --no-same-owner ... But looking at the file I do not see suspicious output. The cgroup controller is there. However, in RHEV 3.5.4 we will fix bug 1236489 which is also related to libvirtd, this might have an impact on this bug.
Please clarify the root cause, I'm not sure I understand it from #4. Does this mean something forget to enable cgroups (cgconfig daemon)? Hm, I see libvirt rpm has following stuff to enable cgroups: # Starting with Fedora 16/RHEL-7, systemd automounts all cgroups, # and cgconfig is no longer a necessary service. if [ "$1" -eq "1" ]; then /sbin/chkconfig cgconfig on fi But it seems to me that cgconfig is not enabled by default: # cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor release 6.7 (20150707.0.el6ev) # sed -n -e '/^# chkconfig/p' -e '/BEGIN INIT INFO/,/END INIT INFO/p' /etc/rc.d/init.d/cgconfig # chkconfig: - 5 95 ### BEGIN INIT INFO # Provides: cgconfig # Required-Start: # Required-Stop: # Should-Start: ypbind # Should-Stop: ypbind # Short-Description: Create and setup control group filesystem(s) # Description: Create and setup control group filesystem(s) ### END INIT INFO Or how should be this BZ read?
Jiri, thanks for looking into this. Right now it is not clear what the root cause for this bug is, that is why it's moving so slow. cgconfig is enabled on RHEV-H 6.7, yuo can check that by running $ service cgconfig status $ chkconfig --list cgconfig But maybe you find some additional hints.
not sure this may help, but I've seen this happen firsthand on CentOS/RHEL7 hosts (7.0/7.1). I don't have a clear reproducer, it seems to happen more frequently after libvirtd is restarted a few times. From what I can tell, the amount of 'few' seems pretty random. Unfortunately I can't be more precise because I don't know how to narrow it down further :\
Moving this to vdsm, because this can also seen on plain centos/rhel according to comment 16.
Can you have someone take a look?
This is barely related to vdsm, only occurs in el6 which uses cggroup config, and indeed related to Bug 1198187 which summaries it as """Cause: A race between services during boot""" - sound like the same case from the comments above. """Consequence: libvirt was to early, which prevented VMs from starting up correctly""" - vdsm tries to collect sampleCpu info from libvirt and get this exception """Fix: libvirt is now started at the right time""" snip of comment #72 ---- (In reply to Jiri Denemark from comment #71) > The libvirtd service should not ever be started before cgconfig (ideally, it > should just be started at the end of init sequence as would the S97libvirtd > siggest). I've never reproduced this either, but it looks like there is a > race between the standard init sequence and the hacks which start libvirtd > in parallel to it. Except for asking the customer who hit the bug to try > with the new image, the only way (which I can think of) to verify this is to > make the standard init sequence slower... something like "sleep 60" at the > beginning of start section in cgconfig init script. I'm not sure if this is > possible on RHEV-H, though. Well, it is possible, but: I'm asked because a recent change moved the hooks which start vdsm (which starts libvirt) to S98ovirt-post, with libvirtd as a Required-Start, which should resolve/remove any race conditions there. We certainly can patch init scripts, but hopefully we wouldn't need to. ----- """Result: VMs are brought up correctly""" - what happened to that? beside that - getting back to vdsm, I guess you still want to keep the backtrace and live with the log flooding when libvirt starts to shout,right? and as a workaround - doesn't initctl restart to libvirtd solve it?
According to comment 16 this was also reproduced on stock RHEL. For RHEV-H we already took some measures to prevent that it libvirtd is stated before cgconfig.
As I stated in comment #22 the only question regarding vdsm is if to remove the error report in this race case - I don't see any reason to remove this exception report and therefore this bug is duplication of Bug 1198187 *** This bug has been marked as a duplicate of bug 1198187 ***