Bug 999973 - get_ksmd_cpu_usage returns incorrect results
Summary: get_ksmd_cpu_usage returns incorrect results
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: mom
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.3.0
Assignee: Martin Sivák
QA Contact: Lukas Svaty
Cheryn Tan
URL:
Whiteboard: sla
Depends On: 999615
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-22 13:29 UTC by Doron Fediuck
Modified: 2016-02-10 20:15 UTC (History)
7 users (show)

Fixed In Version: mom-0.3.2-6.el6ev
Doc Type: Bug Fix
Doc Text:
Previously the HostKSM collector calculated the number of jiffies that have been used since the last collection period. However, the count accumulated indefinitely as it was never reset, which could lead to failure of ovirt-engine-dwhd. This issue is fixed by calculating the current number of jiffies used, so get_ksmd_cpu_usage returns the correct results.
Clone Of: 999615
Environment:
Last Closed: 2014-01-21 15:06:18 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:0064 0 normal SHIPPED_LIVE New package: Memory Overcommitment Manager 2014-01-21 19:53:42 UTC

Description Doron Fediuck 2013-08-22 13:29:47 UTC
Duplicating for RHEV 3.3

+++ This bug was initially created as a clone of Bug #999615 +++

Description of problem:
When using mom on a ovirt managed host, and when it runs ksm because of mom policy, vdsClient getVdsStats returns every growing value for ksmCpu, which is supposed to be a cpu percentage, and eventually breaks ovirt-engine-dwhd because it becomes greater than an int.

Version-Release number of selected component (if applicable):
ovirt engine 3.2.2 - 1.1.fc18
vdsm 4.10.3 - 17.fc18
mom  0.3.0 - 1.fc18

How reproducible:

run an ovirt managed host with memory pressure to start ksm. Run vdsClient -s 0 getVdsStats repeatedly to watch the ksmCpu value increase over time.


Steps to Reproduce:
1.
2.
3.

Actual results:
should be percentage of cpu, but grows unbounded

Expected results:


Additional info:

It looks like a bug in mom's HostKSM.py where last_jiff used to calculate difference in jiffies is never reset.
A change to set last_jiff to curr_jiff in get_ksmd_cpu_usage fixes it for me

[root@vm7 Collectors]# diff -C 4 HostKSM.py~ HostKSM.py
*** HostKSM.py~ 2012-10-05 13:37:16.000000000 -0400
--- HostKSM.py  2013-08-21 13:09:49.782064019 -0400
***************
*** 71,78 ****
--- 71,79 ----
          # wrap-around into account.
          interval_jiffs = (cur_jiff - self.last_jiff) % 2**32
          total_jiffs = os.sysconf('SC_CLK_TCK') * self.interval
          # Calculate percentage of total jiffies during this interval.
+         self.last_jiff = cur_jiff
          return 100 * interval_jiffs / total_jiffs

      def get_shareable_mem(self):
          """

--- Additional comment from Adam Litke on 2013-08-22 16:11:32 IDT ---

Thanks for the detailed report.  I agree with your assessment.  Please see http://gerrit.ovirt.org/#/c/18420/ for the suggested fix.

Comment 4 errata-xmlrpc 2014-01-21 15:06:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0064.html


Note You need to log in before you can comment on or make changes to this bug.