1235718 – vdsm log is flooded with "cgroup CPUACCT controller is not mounted" errors

Bug 1235718 - vdsm log is flooded with "cgroup CPUACCT controller is not mounted" errors

Summary: vdsm log is flooded with "cgroup CPUACCT controller is not mounted" errors

Keywords:
Status:	CLOSED DUPLICATE of bug 1198187
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.5.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.5.5
Assignee:	Yaniv Bronhaim
QA Contact:	Pavol Brilla
Docs Contact:
URL:
Whiteboard:	infra
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-06-25 14:58 UTC by Fabian Deutsch
Modified:	2019-10-10 09:53 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-09-08 09:11:30 UTC
oVirt Team:	Infra
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1236489	0	high	CLOSED	Keep the upstart libvirtd file to enable relaunching libvirt in case it goes down	2021-02-22 00:41:40 UTC

Internal Links: 1236489

Description Fabian Deutsch 2015-06-25 14:58:39 UTC

Description of problem:
The vdsm logs show:

libvirtError: Requested operation is not valid: cgroup CPU controller is not mounted
Thread-251::DEBUG::2015-02-19 12:44:02,465::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: cgroup CPUACCT controller is not mounted
Thread-251::ERROR::2015-02-19 12:44:02,465::sampling::475::vm.Vm::(collect) vmId=`fdc09d48-d800-4138-95e0-db18596882e9`::Stats function failed: <AdvancedStatsFunction _sampleCpu at 0x2a00490>
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/sampling.py", line 471, in collect
  File "/usr/share/vdsm/virt/sampling.py", line 346, in __call__
  File "/usr/share/vdsm/virt/vm.py", line 303, in _sampleCpu
  File "/usr/share/vdsm/virt/vm.py", line 689, in f
  File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1911, in getCPUStats
libvirtError: Requested operation is not valid: cgroup CPUACCT controller is not mounted
Thread-251::DEBUG::2015-02-19 12:44:02,481::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: cgroup CPU controller is not mounted
Thread-251::ERROR::2015-02-19 12:44:02,481::sampling::475::vm.Vm::(collect) vmId=`fdc09d48-d800-4138-95e0-db18596882e9`::Stats function failed: <AdvancedStatsFunction _sampleCpuTune at 0x2a006c0>
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/sampling.py", line 471, in collect
  File "/usr/share/vdsm/virt/sampling.py", line 346, in __call__
  File "/usr/share/vdsm/virt/vm.py", line 349, in _sampleCpuTune
  File "/usr/share/vdsm/virt/vm.py", line 689, in f
  File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1991, in schedulerParameters


Contrary to bug 1198187 in this case, no functional impact can be observed.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Fabian Deutsch 2015-06-25 15:02:09 UTC

Besides showing the messages from the description in the logs, for this bug we see that the SELinux context of the libvirtd instances is correct.

(In bug 1198187 the selinux context of the libvirtd process was incorrect)

Comment 4 Fabian Deutsch 2015-06-25 15:07:13 UTC

From previous sos reports we know that libvirtd is running in the right context.

Now we need to understand if libvirtd is seeing the cgroup mounts.
And it looks like sos is not collecting the informations of all processes.

So to understand what mounts libvirtd sees, could you please manually:

1. Identify the pid of libvirtd on a host: pidof libvirtd
2. Collect the proc dir: tar cf logs.tar /proc/$(pidof libvirtd)
3. Collect ps -eZ from the same host

Please make sure that the tarball then contains the 'mounts' file for the libvirtd process.

The goal is still to find out if libvirtd is (a) seeing cgroups or (b) selinux is preventing it.

Comment 9 Fabian Deutsch 2015-07-03 09:20:38 UTC

Thanks for the files.

Interestingly none of the mount related files contains data: They are empty.

Nijin or Alexandros, can you manually check if the /proc/<libvirt pid>/mounts file is empty on the hosts?

Comment 11 Fabian Deutsch 2015-07-06 13:29:30 UTC

Thanks Alexandros, I should have used --no-same-owner ... 

But looking at the file I do not see suspicious output.

The cgroup controller is there.

However, in RHEV 3.5.4 we will fix bug 1236489 which is also related to libvirtd, this might have an impact on this bug.

Comment 14 Jiri Belka 2015-07-09 14:25:09 UTC

Please clarify the root cause, I'm not sure I understand it from #4.

Does this mean something forget to enable cgroups (cgconfig daemon)?

Hm, I see libvirt rpm has following stuff to enable cgroups:

# Starting with Fedora 16/RHEL-7, systemd automounts all cgroups,
# and cgconfig is no longer a necessary service.
if [ "$1" -eq "1" ]; then
/sbin/chkconfig cgconfig on
fi

But it seems to me that cgconfig is not enabled by default:

# cat /etc/redhat-release 
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20150707.0.el6ev)

# sed -n -e '/^# chkconfig/p' -e '/BEGIN INIT INFO/,/END INIT INFO/p' /etc/rc.d/init.d/cgconfig 
# chkconfig: - 5 95
### BEGIN INIT INFO
# Provides:             cgconfig
# Required-Start:
# Required-Stop:
# Should-Start:         ypbind
# Should-Stop:          ypbind
# Short-Description:    Create and setup control group filesystem(s)
# Description:          Create and setup control group filesystem(s)
### END INIT INFO

Or how should be this BZ read?

Comment 15 Fabian Deutsch 2015-07-09 15:00:52 UTC

Jiri, thanks for looking into this.

Right now it is not clear what the root cause for this bug is, that is why it's moving so slow.

cgconfig is enabled on RHEV-H 6.7, yuo can check that by running 
$ service cgconfig status
$ chkconfig --list cgconfig

But maybe you find some additional hints.

Comment 16 Francesco Romani 2015-07-14 14:08:06 UTC

not sure this may help, but I've seen this happen firsthand on CentOS/RHEL7 hosts (7.0/7.1).

I don't have a clear reproducer, it seems to happen more frequently after libvirtd is restarted a few times. From what I can tell, the amount of 'few' seems pretty random.

Unfortunately I can't be more precise because I don't know how to narrow it down further :\

Comment 17 Fabian Deutsch 2015-07-15 15:20:26 UTC

Moving this to vdsm, because this can also seen on plain centos/rhel according to comment 16.

Comment 18 Yaniv Lavi 2015-08-02 14:57:15 UTC

Can you have someone take a look?

Comment 22 Yaniv Bronhaim 2015-08-09 08:38:57 UTC

This is barely related to vdsm, only occurs in el6 which uses cggroup config, and indeed related to Bug 1198187 which summaries it as 
"""Cause: A race between services during boot""" - sound like the same case from the comments above.
"""Consequence: libvirt was to early, which prevented VMs from starting up correctly"""
 - vdsm tries to collect sampleCpu info from libvirt and get this exception
"""Fix: libvirt is now started at the right time"""

snip of comment #72 
----
(In reply to Jiri Denemark from comment #71)
> The libvirtd service should not ever be started before cgconfig (ideally, it
> should just be started at the end of init sequence as would the S97libvirtd
> siggest). I've never reproduced this either, but it looks like there is a
> race between the standard init sequence and the hacks which start libvirtd
> in parallel to it. Except for asking the customer who hit the bug to try
> with the new image, the only way (which I can think of) to verify this is to
> make the standard init sequence slower... something like "sleep 60" at the
> beginning of start section in cgconfig init script. I'm not sure if this is
> possible on RHEV-H, though.

Well, it is possible, but:

I'm asked because a recent change moved the hooks which start vdsm (which starts libvirt) to S98ovirt-post, with libvirtd as a Required-Start, which should resolve/remove any race conditions there.

We certainly can patch init scripts, but hopefully we wouldn't need to.
-----

"""Result: VMs are brought up correctly""" - what happened to that?

beside that - getting back to vdsm, I guess you still want to keep the backtrace and live with the log flooding when libvirt starts to shout,right?
and as a workaround - doesn't initctl restart to libvirtd solve it?

Comment 23 Fabian Deutsch 2015-08-17 06:05:34 UTC

According to comment 16 this was also reproduced on stock RHEL.

For RHEV-H we already took some measures to prevent that it libvirtd is stated before cgconfig.

Comment 25 Yaniv Bronhaim 2015-09-08 09:11:30 UTC

As I stated in comment #22 the only question regarding vdsm is if to remove the error report in this race case - I don't see any reason to remove this exception report and therefore this bug is duplication of Bug 1198187

*** This bug has been marked as a duplicate of bug 1198187 ***

Note You need to log in before you can comment on or make changes to this bug.