Duplicating for RHEV 3.3 +++ This bug was initially created as a clone of Bug #999615 +++ Description of problem: When using mom on a ovirt managed host, and when it runs ksm because of mom policy, vdsClient getVdsStats returns every growing value for ksmCpu, which is supposed to be a cpu percentage, and eventually breaks ovirt-engine-dwhd because it becomes greater than an int. Version-Release number of selected component (if applicable): ovirt engine 3.2.2 - 1.1.fc18 vdsm 4.10.3 - 17.fc18 mom 0.3.0 - 1.fc18 How reproducible: run an ovirt managed host with memory pressure to start ksm. Run vdsClient -s 0 getVdsStats repeatedly to watch the ksmCpu value increase over time. Steps to Reproduce: 1. 2. 3. Actual results: should be percentage of cpu, but grows unbounded Expected results: Additional info: It looks like a bug in mom's HostKSM.py where last_jiff used to calculate difference in jiffies is never reset. A change to set last_jiff to curr_jiff in get_ksmd_cpu_usage fixes it for me [root@vm7 Collectors]# diff -C 4 HostKSM.py~ HostKSM.py *** HostKSM.py~ 2012-10-05 13:37:16.000000000 -0400 --- HostKSM.py 2013-08-21 13:09:49.782064019 -0400 *************** *** 71,78 **** --- 71,79 ---- # wrap-around into account. interval_jiffs = (cur_jiff - self.last_jiff) % 2**32 total_jiffs = os.sysconf('SC_CLK_TCK') * self.interval # Calculate percentage of total jiffies during this interval. + self.last_jiff = cur_jiff return 100 * interval_jiffs / total_jiffs def get_shareable_mem(self): """ --- Additional comment from Adam Litke on 2013-08-22 16:11:32 IDT --- Thanks for the detailed report. I agree with your assessment. Please see http://gerrit.ovirt.org/#/c/18420/ for the suggested fix.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0064.html