Bug 1256949 - KSM sleep_millisecs bellow 10ms for systems above 16GB of RAM
KSM sleep_millisecs bellow 10ms for systems above 16GB of RAM
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.5.3
All Linux
high Severity high
: ovirt-3.6.1
: 3.6.1
Assigned To: Martin Sivák
Shira Maximov
: EasyFix, Patch, ZStream
Depends On:
Blocks: 1261507
  Show dependency treegraph
 
Reported: 2015-08-25 16:23 EDT by Amador Pahim
Modified: 2016-03-28 08:20 EDT (History)
17 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, a Memory Overcommitment Manager (MOM) policy rule computed KSM's sleep_millisecs value using a division with the amount of host memory being part of the divider. As a result, the sleep_millisecs value dropped below 10ms on hosts with more than 16GiB of RAM. That value was invalid and too aggressive, causing a huge CPU load on the host. In this release, the sleep_millisecs value was bounded to never drop below 10ms, thus improving the CPU load on affected machines.
Story Points: ---
Clone Of:
: 1261507 (view as bug list)
Environment:
Last Closed: 2016-03-09 14:44:08 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: SLA
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 45332 master MERGED Limit sleep_millisecs to minimum 10 Never
oVirt gerrit 46006 ovirt-3.5 MERGED Limit sleep_millisecs to minimum 10 Never
oVirt gerrit 47835 ovirt-3.6 MERGED Limit sleep_millisecs to minimum 10 Never

  None (edit)
Description Amador Pahim 2015-08-25 16:23:45 EDT
Description of problem:

According to the former KSM controller, ksmtuned, 10 should be the minimum value for sleep_millisecs:

  48 sleep=$[KSM_SLEEP_MSEC * 16 * 1024 * 1024 / total]
  49 [ $sleep -le 10 ] && sleep=10      <---- 10 is the minimum
  50 debug sleep $sleep

But according to the 03-ksm.policy file, sleep_millisecs can go bellow that value:

  49         (Host.Control "ksm_sleep_millisecs"
  50             (/ (* ksm_sleep_ms_baseline 16777216) Host.mem_available))

Causing system with, let's say 256GB of RAM, to have sleep_millisecs set to 0:

2015-08-25 16:57:53,118 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:600 run:1 sleep_millisecs:0


Version-Release number of selected component (if applicable):

vdsm-4.16.20-1.el7ev.x86_64

How reproducible:

100%

Steps to Reproduce:
1. Stress VMs memory in a host with 256GB of RAM or more.

Actual results:

sleep_millisecs set to 0, causing ksmd process to run without interruption, consuming a very high amount of CPU time:

    PID USER      PR  NI  VIRT  RES  SHR %CPU S %MEM    TIME+  COMMAND
    262 root      25   5     0    0    0 73.4 R  0.0  5126049h ksmd


Expected results:

Limit sleep_millisecs to minimum 10.
Comment 2 Martin Sivák 2015-08-27 10:11:59 EDT
Thanks Amador.
Comment 14 Julio Entrena Perez 2015-09-08 09:51:55 EDT
Is there a way to reload modified file /etc/vdsm/mom.d/03-ksm.policy without restarting vdsmd?
Comment 15 Martin Sivák 2015-09-09 03:51:49 EDT
Hi Julio,

on 3.6 you just need to restart vdsm-mom service. There is no simple way on 3.5 unless you enabled the RPC port that is disabled by default (because it is unprotected).
Comment 18 Shira Maximov 2015-10-26 11:52:52 EDT
try to verify on: vdsm-4.17.10-5.el7ev.noarch

couldn't verify because that patch is missing from that version.
Comment 19 Shira Maximov 2015-10-26 11:53:31 EDT
try to verify on: vdsm-4.17.10-5.el7ev.noarch

couldn't verify because that patch is missing from that version.
Comment 20 Doron Fediuck 2015-11-02 08:25:48 EST
Martin,
can you please ensure it's available for 3.6.1?
Comment 21 Martin Sivák 2015-11-03 10:53:03 EST
Already merged to the right VDSM branch.
Comment 24 Shira Maximov 2015-11-05 06:49:10 EST
moving back to modified because the patch didn't enter the latest version yet. 

details of the latest verion i tried to verify on :
Red Hat Enterprise Virtualization Manager Version: 3.6.0.3-0.1.el6
vdsm-4.17.10.1-0.el7ev.noarch
mom-0.5.1-1.el7ev.noarch
Comment 26 Shira Maximov 2015-11-24 04:42:11 EST
moving back to modified because the patch didn't enter the latest version yet. 

details of the latest verion i tried to verify on :
Red Hat Enterprise Virtualization Manager Version: 3.6.0.3-0.1.el6
vdsm-4.17.10.1-0.el7ev.noarch
mom-0.5.1-1.el7ev.noarch
Comment 28 Shira Maximov 2015-12-03 06:41:12 EST
verified on : 
http://bob.eng.lab.tlv.redhat.com/builds/3.6/3.6.1-2/el7/x86_64/


verifcation steps:
1. create a virtual host with (virtual) 256GB : 
(if you have an host with 256GB you can skip this step) 

    have a nested environment in order to 
    crate a VM that will be a host in your setup 
    in the nested environment, create a new cluster policy 
    which the memory filter is disabled 
    (the right up corner -> configure -> cluster policy)
    go to clusters tab -> choose your cluster -> 
    in the cluster policy -> choose the cluster policy that you created 
    disable memory overcommited on the nested host 
    add the following line in the file 
    /etc/sysctl.conf : vm.overcommit_memory = 1 
    power of the vm (nested host) 
    edit the vm ( your nested host) - > set the memory to 262144
    start the vm

	
2. Triggered KSM 
3. see that the ksmd process is not taking a high amount of CPU-
   run `ps -fade | grep ksmd` - and check that CPU is not unusual	
4. in /var/log/vdsm/mom.log, when ksm triggered you should
see that:run != 0 and the millisec always bigger than 10	

log results should look like this
: /var/log/vdsm/mom.log: 2015-11-05 15:46:26,923 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:64 run:1 sleep_millisecs:10
Comment 30 errata-xmlrpc 2016-03-09 14:44:08 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html

Note You need to log in before you can comment on or make changes to this bug.