Bug 854027 - 3.1 - vdsm should start ksmtuned upon startup (if ksm/memory sharing is enabled on the cluster)
3.1 - vdsm should start ksmtuned upon startup (if ksm/memory sharing is enabl...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm (Show other bugs)
6.3
Unspecified Unspecified
high Severity high
: beta
: 6.4
Assigned To: Laszlo Hornyak
Haim
sla
: Regression, ZStream
Depends On:
Blocks: 782183
  Show dependency treegraph
 
Reported: 2012-09-03 11:45 EDT by David Jaša
Modified: 2014-01-12 19:54 EST (History)
14 users (show)

See Also:
Fixed In Version: vdsm-4.9.6-39.0
Doc Type: Bug Fix
Doc Text:
Previously, VDSM did not start the ksm and ksmtuned services when it started. This adversely impacted the Red Hat Enterprise Virtualization memory over-commitment features. Now, VDSM automatically starts ksm and ksmtuned.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-12-04 14:09:28 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description David Jaša 2012-09-03 11:45:42 EDT
Description of problem:
vdsm should start ksmtuned upon startup (if ksm/memory sharing is enabled on the cluster)

Version-Release number of selected component (if applicable):
current: vdsm-4.9-113.1.el6.x86_64 / RHEV 3.0.5

history including qemu-kvm:
# grep 'vdsm-[0-9]\|qemu-kvm-0\|qemu-kvm-rhev-0' /var/log/yum.log 
Feb 07 22:22:18 Installed: 2:qemu-kvm-0.12.1.2-2.209.el6_2.4.x86_64
Feb 07 22:22:20 Installed: vdsm-4.9-112.4.el6_2.x86_64
Feb 16 17:58:23 Updated: vdsm-4.9-112.6.el6_2.x86_64
Mar 09 15:50:19 Updated: 2:qemu-kvm-0.12.1.2-2.241.el6.bz801063.1.x86_64
Mar 09 16:38:47 Installed: 2:qemu-kvm-0.12.1.2-2.209.el6_2.4.x86_64
May 06 00:08:33 Updated: vdsm-4.9-112.12.el6_2.x86_64
May 10 14:26:08 Updated: 2:qemu-kvm-0.12.1.2-2.209.el6_2.5.x86_64
Jul 17 11:47:44 Installed: 2:qemu-kvm-rhev-0.12.1.2-2.295.el6.x86_64
Jul 17 11:47:50 Updated: vdsm-4.9-113.1.el6.x86_64

How reproducible:
always


Steps to Reproduce:
1. have a host in a cluster with 150 % or 200 % memory overcommit
2. stop & disable ksmtuned service (this matches current status on my machine):
service ksmtuned stop
chkconfig --del ksmtuned
3. (re)start vdsm service
  
Actual results:
vdsm doesn't start ksmtuned, thus effectively disabling KSM/Memory Sharing

Expected results:
vdsm configures and starts ksmtuned if needed similarly to libvirtd or iscsid


Additional info:
on my production hosts, ksm was never enabled neither during the machine life - that makes the bug more severe than zero complaints of customers might suggest. The "stop & disable" steps are just for purpose of bug reproducing.

See also engine-side bug 854018.
Comment 3 Barak 2012-09-05 11:46:56 EDT
What was the memory state when you restarted VDSM ? over committed ?

The VDSM has an internal thread that handles the ksm, and does the start and the stop of the service.

It is not required to turn on the ksmtuned when the memory is not above the threshold (I think 80% of physical memory)
Comment 4 David Jaša 2012-09-05 11:55:14 EDT
ksm* services were not started even when the host was under memory crunch. The vdsm log contains lines like this from such moments:

Thread-2303351::DEBUG::2012-08-31 10:48:17,157::utils::579::Storage.Misc.excCmd::(execCmd) '/usr/bin/sudo -n /sbin/service ksmtuned retune' (cwd None)
Thread-2303351::DEBUG::2012-08-31 10:48:17,196::utils::579::Storage.Misc.excCmd::(execCmd) FAILED: <err> = ''; <rc> = 1

so i'd say that vdsm assumes that ksm is running but it actualy is not.
Comment 5 Dan Kenigsberg 2012-09-06 03:16:22 EDT
(In reply to comment #4)
> 
> so i'd say that vdsm assumes that ksm is running but it actualy is not.

I think you are correct: we have ksm.start() function, but I do not see anyone calling it!
Comment 6 Michal Skrivanek 2012-09-06 07:54:15 EDT
well, also:

is ksm service running?

it seems to me the check in vdsm/ksm.py is not sufficient/correct.
it checks /sys/kernel/mm/ksm/run and if it's 0 it starts both ksm and ksmtuned.
But that file reports ksm's status only. I.e. you get 1(running) when ksm is started and ksmtuned stopped.
Comment 7 David Jaša 2012-09-06 13:25:51 EDT
Two more related bugs: bug 855018 (so that engine distinguishes ksm disabled and enabled-but-inactive states) and bug 855103 requesting pointer to KSM chapter from Cluster configuration chapter (that speaks about Cluster-level memory overocommit).
Comment 8 Simon Grinberg 2012-09-27 13:09:44 EDT
(In reply to comment #5)
> (In reply to comment #4)
> > 
> > so i'd say that vdsm assumes that ksm is running but it actualy is not.
> 
> I think you are correct: we have ksm.start() function, but I do not see
> anyone calling it!

Dan, if this is the case this is a regression and should be handled at a high priority, please mark it for RHEV 3.1
Comment 10 Yaniv Kaul 2012-09-27 16:58:18 EDT
(In reply to comment #5)
> (In reply to comment #4)
> > 
> > so i'd say that vdsm assumes that ksm is running but it actualy is not.
> 
> I think you are correct: we have ksm.start() function, but I do not see
> anyone calling it!

Perhaps I'm misreading it, but I see in ksm.py:

if config.getboolean('ksm', 'ksm_monitor_thread'):
            pids = utils.execCmd([constants.EXT_PGREP, '-xf', 'ksmd'],
                                 raw=False, sudo=False)[1]
            if pids:
                self._pid = pids[0].strip()
                self.start()                 <--------------- 
            else:
                self._cif.log.error('failed to find ksmd thread')


I'm more concerned how the service got disabled in the first place, and why no one complained thus far. I fully understand it CAN be disabled, but so can many other services which will harm our functionality.
Comment 11 Doron Fediuck 2012-09-30 03:59:46 EDT
$SUBJECT is confusing, so to make it clear;
VDSM has no knowledge of cluster, RHEV-M and anything beyond the host level scope.

So the fix should be simply to run ksmd and ksmtuned if not already running.
And I share Kaul's concern on why ksm wasn't running by default. If ksm is
not running by default it's a ksm bug.
Comment 12 Laszlo Hornyak 2012-10-03 11:49:09 EDT
the start() function in ksm.py is not a member of KsmMonitorThread, it is not called by anyone and therefore the ksm monitor thread is not started
Comment 13 Laszlo Hornyak 2012-10-03 11:52:56 EDT
Sorry, let me correct myself, the thread will be started but the functions that check if ksm is running and the functions that actually start ksm and ksmtuned are not invoked.

Also, the ksmtuned and ksmd are handled as one thing, I mean if ksmd is up, it will not check ksmtuned, while it seems they can run independently from each other.
Comment 14 Laszlo Hornyak 2012-10-04 07:38:41 EDT
http://gerrit.ovirt.org/#/c/8357/
Comment 15 Laszlo Hornyak 2012-10-18 04:05:19 EDT
merged upstream 10e3fdb756cf4b9d641df4f95cc82133ec8ce14c
Comment 21 Leonid Natapov 2012-11-18 09:53:48 EST
vdsm-4.9.6-43.0.el6_3.x86_64.
----------------------------

[root@purple-vds2 init.d]# service ksmtuned status
ksmtuned (pid  27087) is running...
[root@purple-vds2 init.d]# service ksmtuned stop
Stopping ksmtuned:                                         [  OK  ]
[root@purple-vds2 init.d]# service ksmtuned status
ksmtuned is stopped
[root@purple-vds2 init.d]# chkconfig --del ksmtuned
[root@purple-vds2 init.d]# service vdsmd restart
Shutting down vdsm daemon:
vdsm watchdog stop                                         [  OK  ]
vdsm stop                                                  [  OK  ]
vdsm: libvirt already configured for vdsm                  [  OK  ]
Starting iscsid:
Starting up vdsm daemon:
vdsm start                                                 [  OK  ]
[root@purple-vds2 init.d]# service ksmtuned status
ksmtuned (pid  28981) is running...
[root@purple-vds2 init.d]#
Comment 23 errata-xmlrpc 2012-12-04 14:09:28 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1508.html

Note You need to log in before you can comment on or make changes to this bug.