Red Hat Bugzilla – Bug 999615
get_ksmd_cpu_usage returns incorrect results
Last modified: 2013-09-23 08:14:48 EDT
Description of problem:
When using mom on a ovirt managed host, and when it runs ksm because of mom policy, vdsClient getVdsStats returns every growing value for ksmCpu, which is supposed to be a cpu percentage, and eventually breaks ovirt-engine-dwhd because it becomes greater than an int.
Version-Release number of selected component (if applicable):
ovirt engine 3.2.2 - 1.1.fc18
vdsm 4.10.3 - 17.fc18
mom 0.3.0 - 1.fc18
run an ovirt managed host with memory pressure to start ksm. Run vdsClient -s 0 getVdsStats repeatedly to watch the ksmCpu value increase over time.
Steps to Reproduce:
should be percentage of cpu, but grows unbounded
It looks like a bug in mom's HostKSM.py where last_jiff used to calculate difference in jiffies is never reset.
A change to set last_jiff to curr_jiff in get_ksmd_cpu_usage fixes it for me
[root@vm7 Collectors]# diff -C 4 HostKSM.py~ HostKSM.py
*** HostKSM.py~ 2012-10-05 13:37:16.000000000 -0400
--- HostKSM.py 2013-08-21 13:09:49.782064019 -0400
*** 71,78 ****
--- 71,79 ----
# wrap-around into account.
interval_jiffs = (cur_jiff - self.last_jiff) % 2**32
total_jiffs = os.sysconf('SC_CLK_TCK') * self.interval
# Calculate percentage of total jiffies during this interval.
+ self.last_jiff = cur_jiff
return 100 * interval_jiffs / total_jiffs
Thanks for the detailed report. I agree with your assessment. Please see http://gerrit.ovirt.org/#/c/18420/ for the suggested fix.
I have built new packages with this fix incorporated. Can someone confirm that it fixes the problem in the original environment?
fixed in mom-0.3.2 see BZ#999973 in 3.3
can I set it to VERIFIED or test in 3.2 too?
moving to verified as this was fixed and tested on mom-0.3.2
bulk closing, assuming verified bugs are in 3.3.