Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1149601

Summary: Unexpected exception in vdsm log, when setting up hosted-engine on EL7
Product: Red Hat Enterprise Virtualization Manager Reporter: Artyom <alukiano>
Component: vdsmAssignee: Nobody <nobody>
Status: CLOSED WORKSFORME QA Contact: Jiri Belka <jbelka>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: alukiano, bazulay, danken, dfediuck, ecohen, fromani, gklein, lpeer, lsurette, michal.skrivanek, mkalinin, msivak, nbarcet, oourfali, pkrempa, raul.laansoo, rgolan, sbonazzo, ybronhei, yeylon
Target Milestone: ovirt-3.5.6Keywords: Triaged
Target Release: 3.5.6   
Hardware: x86_64   
OS: Linux   
Whiteboard: sla
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-19 10:48:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm log
none
host's logs
none
hosted-engine-setup and vdsm log
none
vdsm.log none

Description Artyom 2014-10-06 08:00:53 UTC
Created attachment 944164 [details]
vdsm log

Description of problem:
Unexpected exception in vdsm log, when setting up hosted-engine on EL7, setup of hosted engine success and work.

Version-Release number of selected component (if applicable):
vdsm-4.16.5-2.el7.x86_64
ovirt-hosted-engine-setup-1.2.0-1.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. Run hosted-engine --deploy and finish setup
2.
3.

Actual results:
Unexpected exception in vdsm log:
Thread-20::ERROR::2014-10-05 17:23:10,961::sampling::475::vm.Vm::(collect) vmId=`f4675766-a472-4c49-bde7-ad68db30e77c`::Stats function failed: <AdvancedStatsFunction _sampleCpu at 0x2adf1b0>
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/sampling.py", line 471, in collect
    statsFunction()
  File "/usr/share/vdsm/virt/sampling.py", line 346, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/virt/vm.py", line 313, in _sampleCpu
    cpuStats = self._vm._dom.getCPUStats(True, 0)
  File "/usr/share/vdsm/virt/vm.py", line 670, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2016, in getCPUStats
    if ret is None: raise libvirtError ('virDomainGetCPUStats() failed', dom=self)
libvirtError: Requested operation is not valid: cgroup CPUACCT controller is not mounted
Thread-20::DEBUG::2014-10-05 17:23:10,964::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: cgroup CPU controller is not mounted

Expected results:
No any exceptions in vdsm log.

Additional info:

Comment 1 Sandro Bonazzola 2014-11-20 13:25:01 UTC
I'm not able to reproduce this, please reopen if you can provide more info on how to reproduce it with latest build.

Comment 2 Jiri Belka 2014-11-21 12:22:19 UTC
i still see it in vdsm.log.

qemu-kvm-rhev-1.5.3-60.el7_0.10.x86_64
libvirt-daemon-1.1.1-29.el7_0.3.x86_64
vdsm-4.16.7.4-1.el7ev.x86_6

- steps:
  * install rhel7
  * install ovirt-hosted-engine-setup
  * hosted-engine --deploy

i didn't use dns but modified /etc/hosts.

- some fake ip for hosted engine during initial hosted-engine --deploy
- when guest os is installed, query its ip and modify host's /etc/hosts to match

Thread-19::ERROR::2014-11-21 13:07:45,339::sampling::475::vm.Vm::(collect) vmId=`bef5cfb5-b8ac-4ec8-bdda-bedb757ea695`::Stats function failed: <AdvancedStatsFunction _sampleCpu at 0x304c840>
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/sampling.py", line 471, in collect
    statsFunction()
  File "/usr/share/vdsm/virt/sampling.py", line 346, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/virt/vm.py", line 313, in _sampleCpu
    cpuStats = self._vm._dom.getCPUStats(True, 0)
  File "/usr/share/vdsm/virt/vm.py", line 668, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2016, in getCPUStats
    if ret is None: raise libvirtError ('virDomainGetCPUStats() failed', dom=self)
libvirtError: Requested operation is not valid: cgroup CPUACCT controller is not mounted
Thread-19::DEBUG::2014-11-21 13:07:45,342::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: cgroup CPU controller is not mounted
Thread-19::ERROR::2014-11-21 13:07:45,342::sampling::475::vm.Vm::(collect) vmId=`bef5cfb5-b8ac-4ec8-bdda-bedb757ea695`::Stats function failed: <AdvancedStatsFunction _sampleCpuTune at 0x3047650>
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/sampling.py", line 471, in collect
    statsFunction()
  File "/usr/share/vdsm/virt/sampling.py", line 346, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/virt/vm.py", line 359, in _sampleCpuTune
    infos = self._vm._dom.schedulerParameters()
  File "/usr/share/vdsm/virt/vm.py", line 668, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2134, in schedulerParameters
    if ret is None: raise libvirtError ('virDomainGetSchedulerParameters() failed', dom=self)
libvirtError: Requested operation is not valid: cgroup CPU controller is not mounted

Comment 3 Jiri Belka 2014-11-21 12:23:09 UTC
Created attachment 959670 [details]
host's logs

Comment 4 Sandro Bonazzola 2014-11-28 15:33:14 UTC
So, just to recap, hosted-engine --deploy finished with success, I can see in the logs:

"Hosted Engine successfully set up"

The engine VM is up and running and everything seems to work but VDSM is reporting

Thread-19::ERROR::2014-11-21 13:07:30,330::sampling::475::vm.Vm::(collect) vmId=`bef5cfb5-b8ac-4ec8-bdda-bedb757ea695`::Stats function failed: <AdvancedStatsFunction _sampleCpu at 0x304c840>
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/sampling.py", line 471, in collect
    statsFunction()
  File "/usr/share/vdsm/virt/sampling.py", line 346, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/virt/vm.py", line 313, in _sampleCpu
    cpuStats = self._vm._dom.getCPUStats(True, 0)
  File "/usr/share/vdsm/virt/vm.py", line 668, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2016, in getCPUStats
    if ret is None: raise libvirtError ('virDomainGetCPUStats() failed', dom=self)
libvirtError: Requested operation is not valid: cgroup CPUACCT controller is not mounted
Thread-19::DEBUG::2014-11-21 13:07:30,335::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: cgroup CPU controller is not mounted

Moving to VDSM infra for further investigation since it may be just an uncaught exception.

Comment 5 Yaniv Bronhaim 2014-12-01 08:57:34 UTC
The issue relates to specific libvirt request for cpu states. I think virt more familiar with the process and can give better inputs about this area

Comment 6 Francesco Romani 2014-12-01 09:04:21 UTC
I saw the same message casually on one Centos 7 host of mine. 
I haven't pinpointed here the root cause but looks like we faced it in the not so distant past

https://bugzilla.redhat.com/show_bug.cgi?id=982306

relevant libvirt documentation:

https://libvirt.org/cgroups.html

it seems it is a mismatch on the tree hierarchy systemd configures and libvirt expects.

Comment 7 Michal Skrivanek 2014-12-01 12:26:44 UTC
CPUStats and Hosted Engine are SLA, moving for further investigation.
Yaniv, it may be also a configuration issue so it may end up back on infra...

Comment 8 Yaniv Bronhaim 2014-12-04 08:00:57 UTC
What configuration issue can it be? I can't find something related to this exception and never encountered it.. Let's get more input and see what can lead to this. I follow it  

Martin, any idea?

Comment 9 Doron Fediuck 2014-12-07 14:41:31 UTC
(In reply to Yaniv Bronhaim from comment #8)
> What configuration issue can it be? I can't find something related to this
> exception and never encountered it.. Let's get more input and see what can
> lead to this. I follow it  
> 
> Martin, any idea?

The above seems quite clear-

libvirtError: Requested operation is not valid: cgroup CPUACCT controller is not mounted

We need to verify cgroups are mounted, and it seems they are not.
This is what causing the exception. In the past host deploy verified it.
Yaniv, can you please check what changed?

Comment 10 Oved Ourfali 2014-12-10 08:15:40 UTC
Martin - Please check whether cgroups is really available or not (or perhaps ask Artyom to reproduce first).
As far as I know cgroups are part of the OS, so perhaps it is a libvirt issue.
Anyway, it is not a host-deploy issue.

Comment 11 Martin Sivák 2014-12-10 09:48:21 UTC
Libvirt is responsible for using the proper cgroups structure and libvirt daemon should depend on cgroups mounting in systemd. I will ask libvirt people for clarification, but I think there is nothing we can do here.

Comment 12 Peter Krempa 2015-01-08 09:20:55 UTC
It also might be related to cgroups disappearing in some cases due to a bug in systemd: https://bugzilla.redhat.com/show_bug.cgi?id=1179715

Comment 14 Michal Skrivanek 2015-03-04 12:45:04 UTC
so I guess this is worth a retest with https://rhn.redhat.com/errata/RHBA-2015-0037.html

Comment 15 Artyom 2015-03-05 13:03:02 UTC
Created attachment 998344 [details]
hosted-engine-setup and vdsm log

I did deploy on rhel7, with:
systemd-python-208-11.el7_0.6.x86_64
systemd-libs-208-11.el7_0.6.x86_64
systemd-208-11.el7_0.6.x86_64
systemd-sysv-208-11.el7_0.6.x86_64

and do not see above error in vdsm log, how I understand it's not happen all time, so this retest not mean much.

Comment 16 Michal Skrivanek 2015-03-05 13:10:07 UTC
what's different from comment #2?

Comment 17 Artyom 2015-03-05 13:43:46 UTC
I see a little difference in versions:
vdsm-4.16.8.1-7.el7ev.x86_64
qemu-kvm-rhev-1.5.3-60.el7_0.12.x86_64
libvirt-daemon-1.1.1-29.el7_0.7.x86_64

It can be different cpu or something else, or bug occurred but not always happen.
Maybe Jiri can provide more information(if he can reproduce it and provide logs with exception)

Comment 18 Raul Laansoo 2015-03-23 20:15:37 UTC
I also see this error with http://resources.ovirt.org/pub/ovirt-3.5/iso/ovirt-node-iso-3.5-0.201502231653.el6.iso.

Comment 19 Raul Laansoo 2015-03-23 20:27:39 UTC
Created attachment 1005577 [details]
vdsm.log

ovirt-node-iso-3.5-0.201502231653.el6.iso

Comment 20 Eyal Edri 2015-04-28 11:22:10 UTC
moving to 3.5.4 due to capacity planning for 3.5.3.
if you believe this should remain in 3.5.3, please sync with pm/dev/qe and a full triple ack for it. also - ensure priority is set accordingly.

Comment 21 Roy Golan 2015-06-10 09:37:13 UTC
Artyom can you or Jiri try to reproduce? if its still there will handle it for 3.6, otherwise close it.

Comment 22 Artyom 2015-06-10 12:33:03 UTC
I checked deployment on clean host and no exception connect to libvirt error.
So hard for me to say if bug still actually or not, I that Raul Laansoo was the last who saw bug, so maybe he can help.