Bug 999615 - get_ksmd_cpu_usage returns incorrect results
get_ksmd_cpu_usage returns incorrect results
Status: CLOSED CURRENTRELEASE
Product: oVirt
Classification: Community
Component: mom (Show other bugs)
3.2
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Adam Litke
Lukas Svaty
sla
:
Depends On:
Blocks: 999973
  Show dependency treegraph
 
Reported: 2013-08-21 13:35 EDT by John Taylor
Modified: 2013-09-23 08:14 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 999973 (view as bug list)
Environment:
Last Closed: 2013-09-23 08:14:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 18420 None None None Never

  None (edit)
Description John Taylor 2013-08-21 13:35:52 EDT
Description of problem:
When using mom on a ovirt managed host, and when it runs ksm because of mom policy, vdsClient getVdsStats returns every growing value for ksmCpu, which is supposed to be a cpu percentage, and eventually breaks ovirt-engine-dwhd because it becomes greater than an int.

Version-Release number of selected component (if applicable):
ovirt engine 3.2.2 - 1.1.fc18
vdsm 4.10.3 - 17.fc18
mom  0.3.0 - 1.fc18

How reproducible:

run an ovirt managed host with memory pressure to start ksm. Run vdsClient -s 0 getVdsStats repeatedly to watch the ksmCpu value increase over time.


Steps to Reproduce:
1.
2.
3.

Actual results:
should be percentage of cpu, but grows unbounded

Expected results:


Additional info:

It looks like a bug in mom's HostKSM.py where last_jiff used to calculate difference in jiffies is never reset.
A change to set last_jiff to curr_jiff in get_ksmd_cpu_usage fixes it for me

[root@vm7 Collectors]# diff -C 4 HostKSM.py~ HostKSM.py
*** HostKSM.py~ 2012-10-05 13:37:16.000000000 -0400
--- HostKSM.py  2013-08-21 13:09:49.782064019 -0400
***************
*** 71,78 ****
--- 71,79 ----
          # wrap-around into account.
          interval_jiffs = (cur_jiff - self.last_jiff) % 2**32
          total_jiffs = os.sysconf('SC_CLK_TCK') * self.interval
          # Calculate percentage of total jiffies during this interval.
+         self.last_jiff = cur_jiff
          return 100 * interval_jiffs / total_jiffs

      def get_shareable_mem(self):
          """
Comment 1 Adam Litke 2013-08-22 09:11:32 EDT
Thanks for the detailed report.  I agree with your assessment.  Please see http://gerrit.ovirt.org/#/c/18420/ for the suggested fix.
Comment 2 Adam Litke 2013-08-23 09:29:50 EDT
I have built new packages with this fix incorporated.  Can someone confirm that it fixes the problem in the original environment?

https://koji.fedoraproject.org/koji/packageinfo?packageID=12742

mom-0.3.2-5
Comment 3 Lukas Svaty 2013-08-30 14:32:23 EDT
fixed in mom-0.3.2 see BZ#999973 in 3.3
can I set it to VERIFIED or test in 3.2 too?
Comment 4 Lukas Svaty 2013-09-05 06:42:44 EDT
moving to verified as this was fixed and tested on mom-0.3.2
Comment 5 Itamar Heim 2013-09-23 08:14:48 EDT
bulk closing, assuming verified bugs are in 3.3.

Note You need to log in before you can comment on or make changes to this bug.