Bug 1256949 - KSM sleep_millisecs bellow 10ms for systems above 16GB of RAM
Summary: KSM sleep_millisecs bellow 10ms for systems above 16GB of RAM
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.3
Hardware: All
OS: Linux
high
high
Target Milestone: ovirt-3.6.1
: 3.6.1
Assignee: Martin Sivák
QA Contact: Shira Maximov
URL:
Whiteboard:
Depends On:
Blocks: 1261507
TreeView+ depends on / blocked
 
Reported: 2015-08-25 20:23 UTC by Amador Pahim
Modified: 2019-09-12 08:52 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, a Memory Overcommitment Manager (MOM) policy rule computed KSM's sleep_millisecs value using a division with the amount of host memory being part of the divider. As a result, the sleep_millisecs value dropped below 10ms on hosts with more than 16GiB of RAM. That value was invalid and too aggressive, causing a huge CPU load on the host. In this release, the sleep_millisecs value was bounded to never drop below 10ms, thus improving the CPU load on affected machines.
Clone Of:
: 1261507 (view as bug list)
Environment:
Last Closed: 2016-03-09 19:44:08 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0362 0 normal SHIPPED_LIVE vdsm 3.6.0 bug fix and enhancement update 2016-03-09 23:49:32 UTC
oVirt gerrit 45332 0 master MERGED Limit sleep_millisecs to minimum 10 Never
oVirt gerrit 46006 0 ovirt-3.5 MERGED Limit sleep_millisecs to minimum 10 Never
oVirt gerrit 47835 0 ovirt-3.6 MERGED Limit sleep_millisecs to minimum 10 Never

Description Amador Pahim 2015-08-25 20:23:45 UTC
Description of problem:

According to the former KSM controller, ksmtuned, 10 should be the minimum value for sleep_millisecs:

  48 sleep=$[KSM_SLEEP_MSEC * 16 * 1024 * 1024 / total]
  49 [ $sleep -le 10 ] && sleep=10      <---- 10 is the minimum
  50 debug sleep $sleep

But according to the 03-ksm.policy file, sleep_millisecs can go bellow that value:

  49         (Host.Control "ksm_sleep_millisecs"
  50             (/ (* ksm_sleep_ms_baseline 16777216) Host.mem_available))

Causing system with, let's say 256GB of RAM, to have sleep_millisecs set to 0:

2015-08-25 16:57:53,118 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:600 run:1 sleep_millisecs:0


Version-Release number of selected component (if applicable):

vdsm-4.16.20-1.el7ev.x86_64

How reproducible:

100%

Steps to Reproduce:
1. Stress VMs memory in a host with 256GB of RAM or more.

Actual results:

sleep_millisecs set to 0, causing ksmd process to run without interruption, consuming a very high amount of CPU time:

    PID USER      PR  NI  VIRT  RES  SHR %CPU S %MEM    TIME+  COMMAND
    262 root      25   5     0    0    0 73.4 R  0.0  5126049h ksmd


Expected results:

Limit sleep_millisecs to minimum 10.

Comment 2 Martin Sivák 2015-08-27 14:11:59 UTC
Thanks Amador.

Comment 14 Julio Entrena Perez 2015-09-08 13:51:55 UTC
Is there a way to reload modified file /etc/vdsm/mom.d/03-ksm.policy without restarting vdsmd?

Comment 15 Martin Sivák 2015-09-09 07:51:49 UTC
Hi Julio,

on 3.6 you just need to restart vdsm-mom service. There is no simple way on 3.5 unless you enabled the RPC port that is disabled by default (because it is unprotected).

Comment 18 Shira Maximov 2015-10-26 15:52:52 UTC
try to verify on: vdsm-4.17.10-5.el7ev.noarch

couldn't verify because that patch is missing from that version.

Comment 19 Shira Maximov 2015-10-26 15:53:31 UTC
try to verify on: vdsm-4.17.10-5.el7ev.noarch

couldn't verify because that patch is missing from that version.

Comment 20 Doron Fediuck 2015-11-02 13:25:48 UTC
Martin,
can you please ensure it's available for 3.6.1?

Comment 21 Martin Sivák 2015-11-03 15:53:03 UTC
Already merged to the right VDSM branch.

Comment 24 Shira Maximov 2015-11-05 11:49:10 UTC
moving back to modified because the patch didn't enter the latest version yet. 

details of the latest verion i tried to verify on :
Red Hat Enterprise Virtualization Manager Version: 3.6.0.3-0.1.el6
vdsm-4.17.10.1-0.el7ev.noarch
mom-0.5.1-1.el7ev.noarch

Comment 26 Shira Maximov 2015-11-24 09:42:11 UTC
moving back to modified because the patch didn't enter the latest version yet. 

details of the latest verion i tried to verify on :
Red Hat Enterprise Virtualization Manager Version: 3.6.0.3-0.1.el6
vdsm-4.17.10.1-0.el7ev.noarch
mom-0.5.1-1.el7ev.noarch

Comment 28 Shira Maximov 2015-12-03 11:41:12 UTC
verified on : 
http://bob.eng.lab.tlv.redhat.com/builds/3.6/3.6.1-2/el7/x86_64/


verifcation steps:
1. create a virtual host with (virtual) 256GB : 
(if you have an host with 256GB you can skip this step) 

    have a nested environment in order to 
    crate a VM that will be a host in your setup 
    in the nested environment, create a new cluster policy 
    which the memory filter is disabled 
    (the right up corner -> configure -> cluster policy)
    go to clusters tab -> choose your cluster -> 
    in the cluster policy -> choose the cluster policy that you created 
    disable memory overcommited on the nested host 
    add the following line in the file 
    /etc/sysctl.conf : vm.overcommit_memory = 1 
    power of the vm (nested host) 
    edit the vm ( your nested host) - > set the memory to 262144
    start the vm

	
2. Triggered KSM 
3. see that the ksmd process is not taking a high amount of CPU-
   run `ps -fade | grep ksmd` - and check that CPU is not unusual	
4. in /var/log/vdsm/mom.log, when ksm triggered you should
see that:run != 0 and the millisec always bigger than 10	

log results should look like this
: /var/log/vdsm/mom.log: 2015-11-05 15:46:26,923 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:64 run:1 sleep_millisecs:10

Comment 30 errata-xmlrpc 2016-03-09 19:44:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html


Note You need to log in before you can comment on or make changes to this bug.