Please see comments in https://bugzilla.redhat.com/show_bug.cgi?id=1198187 as they had to be scrubbed from this in order to clone.
Sanity test pass on the following versions: rhev-hypervisor6-6.6-20150421.0.el6ev(ovirt-node-3.2.2-3.el6, libvirt-0.10.2-46.el6_6.4.x86_64, vdsm-4.16.13.1-1.el6ev.x86_64) rhev-hypervisor7-7.1-20150420.0.el7ev(ovirt-node-3.2.2-3.el7, libvirt-1.2.8-16.el7_1.2.x86_64, vdsm-4.16.13.1-1.el7ev.x86_64) RHEVM 3.5.1-0.4.el6ev 1. restart rhevh 6.6 for 3.5.1 20 times, check whether race condition happen or not, no such cgroup error in vdsm.log, rhevh 6.6 for 3.5.1 can up on rhevm, and VM can migrate successful. 2. upgrade rhevh 6.6 for 3.4.z to rhevh 6.6 for 3.5.1 build via rhevm, then check vdsm.log and message log, no such cgroup error in vdsm.log 3. upgrade rhevh 6.6 for 3.5.0 to rhevh 6.6 for 3.5.1 build via rhevm, then check vdsm.log and message log, no such cgroup error in vdsm.log 4. restart rhevh 7.1 20 times, check whether race condition happen or not, no such cgroup error in vdsm.log. rhevh 7.1 can up on rhevm, and VM can migrate successful.
We have to make rhevh 3.5.1 errata release on time, so change the bug status to VERIFIED according to comment 7 and bug 1198187 comment 79 and bug 1198187 comment 74. If any feedback coming later from customer for comment 4 request, let's also paste the result in comment before rhev 3.5.1 ship live. Thanks.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0903.html
Hi Alexandros, Saw bug 1198187 comment 79 and bug 1198187 comment 74, some customers are not affected. And for this bug, so far no team has reproduce this issue in-house. Could you please help to confirm below: 1. Can the customer reproduce this issue every time on this same machine? And could possible to provide exact and detail reproduce steps, and rhevm version info? 2. In case 01442491, comment from Inaki, "this isn't impacting any vm at the moment", so does it mean currently VMs running good on this host, and VM migration works good, right? and "I did however have some issues before when starting a pool of 30 machines", here what are some issues before? 3. fresh sosreport of RHEV-H 6.6 (20150421.0.el6ev) as a must for this bug investigation. Thanks.
Hello Ying, 1. This message is coming "non-stop" $ grep CPUACCT vdsm.log | wc -l 5262 rhevm-3.5.0-0.29.el6ev.noarch 2. That problem seems to be fixed with upgrading to the current RHEV-H 6.6 version (in which he is still seeing the CPUACCT messages) 3. It's around 80MB. Can you download it from the Case #01442491 ? Thank you Ying
I forgot to mention, that i asked for a migrate test/reboot so we can see if it's affected/when they start
So, Alex, to understand what is going on after upgrading to latest version of RHEV-H available today, i.e. 20150421.0.el6: - Apparently no impact on RHEV behavior - can start and live-migrate VMs without issues. - Problem: vdsm.log is flooded with CPUACCT error. Is this correct?
Marina, "Can start", yes. "Can't migrate" is yet to be confirmed. I checked the DB and the last migrations were from old HV to the new HV (during the HV upgrade), so i suppose there is no problem with migrations, but let's confirm it first with the customer
Alexandros, is there any update on this bug? 1. Has the user migrated to 3.5.1-1 (or -2)? 2. After the user has merged to 3.5.1 (or later), is there still a flood of messages in the logfile, and if so: Which messages?
Hello Fabian, There was no update yet. I just asked the customer to upgrade to the latest RHEV-H and update us with the results/logs.
Alexandros, is there any update? Otherwise I'd close this bug soon.
Created attachment 1029921 [details] messages from RHEV-H 6.6 (20150512.0.el6ev)
Alexandros, can you please provide the output of ps -eZfl.