Bug 1213431 - vdsm log is flooded with cgroup CPUACCT controller is not mounted errors, migration of VMs not possible
Summary: vdsm log is flooded with cgroup CPUACCT controller is not mounted errors, mi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-node
Version: 3.5.1
Hardware: Unspecified
OS: Unspecified
medium
urgent
Target Milestone: ---
: 3.5.1
Assignee: Fabian Deutsch
QA Contact: Virtualization Bugs
URL:
Whiteboard: node
Depends On: 1198187
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-20 14:27 UTC by Bronce McClain
Modified: 2019-10-10 09:45 UTC (History)
36 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, a race between services during boot caused libvirt to start too early, which prevented virtual machines from starting up correctly. With this update, libvirt is started at the right time, so virtual machines start correctly.
Clone Of: 1198187
Environment:
Last Closed: 2015-06-03 12:28:20 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1365943 0 None None None Never
Red Hat Knowledge Base (Solution) 1409243 0 None None None Never
Red Hat Product Errata RHBA-2015:0903 0 normal SHIPPED_LIVE ovirt-node bug fix and enhancement update for RHEV 3.5.1 with RHEL 7.1 support 2015-04-28 22:53:11 UTC

Comment 1 Bronce McClain 2015-04-20 14:28:52 UTC
Please see comments in https://bugzilla.redhat.com/show_bug.cgi?id=1198187 as they had to be scrubbed from this in order to clone.

Comment 7 Ying Cui 2015-04-27 05:26:50 UTC
Sanity test pass on the following versions:

rhev-hypervisor6-6.6-20150421.0.el6ev(ovirt-node-3.2.2-3.el6, libvirt-0.10.2-46.el6_6.4.x86_64, vdsm-4.16.13.1-1.el6ev.x86_64)
rhev-hypervisor7-7.1-20150420.0.el7ev(ovirt-node-3.2.2-3.el7, libvirt-1.2.8-16.el7_1.2.x86_64, vdsm-4.16.13.1-1.el7ev.x86_64)
RHEVM 3.5.1-0.4.el6ev

1. restart rhevh 6.6 for 3.5.1 20 times, check whether race condition happen or not, no such cgroup error in vdsm.log, rhevh 6.6 for 3.5.1 can up on rhevm, and VM can migrate successful. 
2. upgrade rhevh 6.6 for 3.4.z to rhevh 6.6 for 3.5.1 build via rhevm, then check vdsm.log and message log, no such cgroup error in vdsm.log
3. upgrade rhevh 6.6 for 3.5.0 to rhevh 6.6 for 3.5.1 build via rhevm, then check vdsm.log and message log, no such cgroup error in vdsm.log
4. restart rhevh 7.1 20 times, check whether race condition happen or not, no such cgroup error in vdsm.log. rhevh 7.1 can up on rhevm, and VM can migrate successful.

Comment 8 Ying Cui 2015-04-27 06:30:22 UTC
We have to make rhevh 3.5.1 errata release on time, so change the bug status to VERIFIED according to comment 7 and bug 1198187 comment 79 and bug 1198187 comment 74. 
If any feedback coming later from customer for comment 4 request, let's also paste the result in comment before rhev 3.5.1 ship live. Thanks.

Comment 10 errata-xmlrpc 2015-04-28 18:54:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0903.html

Comment 12 Ying Cui 2015-05-12 13:04:46 UTC
Hi Alexandros,
   Saw bug 1198187 comment 79 and bug 1198187 comment 74, some customers are not affected. And for this bug, so far no team has reproduce this issue in-house. Could you please help to confirm below:

1. Can the customer reproduce this issue every time on this same machine? And could possible to provide exact and detail reproduce steps, and rhevm version info?

2. In case 01442491, comment from Inaki, "this isn't impacting any vm at the moment", so does it mean currently VMs running good on this host, and VM migration works good, right?
  and "I did however have some issues before when starting a pool of 30 machines", here what are some issues before? 

3. fresh sosreport of RHEV-H 6.6 (20150421.0.el6ev) as a must for this bug investigation. 

Thanks.

Comment 13 Alexandros Gkesos 2015-05-12 14:24:15 UTC
Hello Ying,

1. This message is coming "non-stop" 

$ grep CPUACCT vdsm.log | wc -l
5262

rhevm-3.5.0-0.29.el6ev.noarch

2. That problem seems to be fixed with upgrading to the current RHEV-H 6.6 version (in which he is still seeing the CPUACCT messages)

3. It's around 80MB. Can you download it from the Case #01442491 ?

Thank you Ying

Comment 14 Alexandros Gkesos 2015-05-12 14:25:38 UTC
I forgot to mention, that i asked for a migrate test/reboot so we can see if it's affected/when they start

Comment 15 Marina Kalinin 2015-05-12 20:49:37 UTC
So, Alex, to understand what is going on after upgrading to latest version of RHEV-H available today, i.e. 20150421.0.el6:
- Apparently no impact on RHEV behavior - can start and live-migrate VMs without issues. 
- Problem: vdsm.log is flooded with CPUACCT error.

Is this correct?

Comment 16 Alexandros Gkesos 2015-05-13 07:52:15 UTC
Marina,

"Can start", yes. "Can't migrate" is yet to be confirmed.

I checked the DB and the last migrations were from old HV to the new HV (during the HV upgrade), so i suppose there is no problem with migrations, but let's confirm it first with the customer

Comment 17 Fabian Deutsch 2015-05-19 08:51:26 UTC
Alexandros, is there any update on this bug?

1. Has the user migrated to 3.5.1-1 (or -2)?
2. After the user has merged to 3.5.1 (or later), is there still a flood of messages in the logfile, and if so: Which messages?

Comment 18 Alexandros Gkesos 2015-05-19 09:11:05 UTC
Hello Fabian,

There was no update yet. I just asked the customer to upgrade to the latest RHEV-H and update us with the results/logs.

Comment 20 Fabian Deutsch 2015-05-26 12:19:41 UTC
Alexandros, is there any update? Otherwise  I'd close this bug soon.

Comment 22 Alexandros Gkesos 2015-05-26 12:34:00 UTC
Created attachment 1029921 [details]
messages from RHEV-H 6.6 (20150512.0.el6ev)

Comment 25 Fabian Deutsch 2015-05-27 14:26:10 UTC
Alexandros, can you please provide the output of ps -eZfl.


Note You need to log in before you can comment on or make changes to this bug.