Bug 999615 - get_ksmd_cpu_usage returns incorrect results
Summary: get_ksmd_cpu_usage returns incorrect results
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: mom
Version: 3.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Adam Litke
QA Contact: Lukas Svaty
URL:
Whiteboard: sla
Depends On:
Blocks: 999973
TreeView+ depends on / blocked
 
Reported: 2013-08-21 17:35 UTC by John Taylor
Modified: 2013-09-23 12:14 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 999973 (view as bug list)
Environment:
Last Closed: 2013-09-23 12:14:48 UTC
oVirt Team: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 18420 0 None None None Never

Description John Taylor 2013-08-21 17:35:52 UTC
Description of problem:
When using mom on a ovirt managed host, and when it runs ksm because of mom policy, vdsClient getVdsStats returns every growing value for ksmCpu, which is supposed to be a cpu percentage, and eventually breaks ovirt-engine-dwhd because it becomes greater than an int.

Version-Release number of selected component (if applicable):
ovirt engine 3.2.2 - 1.1.fc18
vdsm 4.10.3 - 17.fc18
mom  0.3.0 - 1.fc18

How reproducible:

run an ovirt managed host with memory pressure to start ksm. Run vdsClient -s 0 getVdsStats repeatedly to watch the ksmCpu value increase over time.


Steps to Reproduce:
1.
2.
3.

Actual results:
should be percentage of cpu, but grows unbounded

Expected results:


Additional info:

It looks like a bug in mom's HostKSM.py where last_jiff used to calculate difference in jiffies is never reset.
A change to set last_jiff to curr_jiff in get_ksmd_cpu_usage fixes it for me

[root@vm7 Collectors]# diff -C 4 HostKSM.py~ HostKSM.py
*** HostKSM.py~ 2012-10-05 13:37:16.000000000 -0400
--- HostKSM.py  2013-08-21 13:09:49.782064019 -0400
***************
*** 71,78 ****
--- 71,79 ----
          # wrap-around into account.
          interval_jiffs = (cur_jiff - self.last_jiff) % 2**32
          total_jiffs = os.sysconf('SC_CLK_TCK') * self.interval
          # Calculate percentage of total jiffies during this interval.
+         self.last_jiff = cur_jiff
          return 100 * interval_jiffs / total_jiffs

      def get_shareable_mem(self):
          """

Comment 1 Adam Litke 2013-08-22 13:11:32 UTC
Thanks for the detailed report.  I agree with your assessment.  Please see http://gerrit.ovirt.org/#/c/18420/ for the suggested fix.

Comment 2 Adam Litke 2013-08-23 13:29:50 UTC
I have built new packages with this fix incorporated.  Can someone confirm that it fixes the problem in the original environment?

https://koji.fedoraproject.org/koji/packageinfo?packageID=12742

mom-0.3.2-5

Comment 3 Lukas Svaty 2013-08-30 18:32:23 UTC
fixed in mom-0.3.2 see BZ#999973 in 3.3
can I set it to VERIFIED or test in 3.2 too?

Comment 4 Lukas Svaty 2013-09-05 10:42:44 UTC
moving to verified as this was fixed and tested on mom-0.3.2

Comment 5 Itamar Heim 2013-09-23 12:14:48 UTC
bulk closing, assuming verified bugs are in 3.3.


Note You need to log in before you can comment on or make changes to this bug.