Bug 1945132 - VDSM always reports ksmState as True, regardless of the cluster ksm configuration
Summary: VDSM always reports ksmState as True, regardless of the cluster ksm configura...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: mom
Classification: oVirt
Component: General
Version: 0.6.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ovirt-4.4.6
: 0.6.1
Assignee: Tomáš Golembiovský
QA Contact: Polina
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-31 12:13 UTC by Polina
Modified: 2021-05-05 05:36 UTC (History)
7 users (show)

Fixed In Version: mom-0.6.1-1.el8ev
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-05 05:36:02 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.4+
mjurasek: blocker+


Attachments (Terms of Use)
engine & vdsm logs (12.04 MB, application/gzip)
2021-03-31 12:13 UTC, Polina
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 114433 0 master MERGED ensure KSM controller has a collector 2021-04-26 14:53:38 UTC
oVirt gerrit 114434 0 master MERGED ksm: use data from HostKSM collector 2021-04-26 14:54:05 UTC

Description Polina 2021-03-31 12:13:27 UTC
Created attachment 1768059 [details]
engine & vdsm logs

Description of problem:ksmState is always reported as True, regardless of whether ksm is enabled for the cluster or not


Version-Release number of selected component (if applicable):ovirt-engine-4.4.6.2-433.g5a9bbba.6.el8ev.noarch


How reproducible:100 % 


Steps to Reproduce:
1.
On hosts 
vdsm-client Host getStats |grep -i ksm
    "ksmCpu": 0,
    "ksmMergeAcrossNodes": true,
    "ksmPages": 100,
    "ksmState": true,

Actual results:ksmState on the host is always reported as True

Expected results:

The ksmState must be true when the cluster is configured 'ksm_enabled': True and several VMs starting on the host.
When the VMs are migrated from the host the ksmState must be changed to False.
Now it is always True

Additional info:

Comment 1 Michal Skrivanek 2021-03-31 13:17:56 UTC
Polina, can you please check if it's really always running or it is just a wrong report
Perhaps just check 

/sys/kernel/mm/ksm/run: 
- write 0 to disable ksm, read 0 while ksm is disabled.
- write 1 to run ksm, read 1 while ksm is running.
- write 2 to disable ksm and unmerge all its pages.

Comment 2 Polina 2021-03-31 14:22:19 UTC
/sys/kernel/mm/ksm/run always remains 1

Comment 3 Polina 2021-03-31 14:26:05 UTC
if I write echo 0 > /sys/kernel/mm/ksm/run, 
the report is false , 
vdsm-client Host getStats |grep -i ksm
    "ksmCpu": 0,
    "ksmMergeAcrossNodes": true,
    "ksmPages": 100,
    "ksmState": false,

echo 2 > /sys/kernel/mm/ksm/run
vdsm-client Host getStats |grep -i ksm
    "ksmCpu": 0,
    "ksmMergeAcrossNodes": true,
    "ksmPages": 100,
    "ksmState": true,

Comment 4 Arik 2021-03-31 21:49:54 UTC
The output in comment 3 is a good sign.
Note that we've changed the default for new clusters to ksm=true [1].
Can you please check what is reported when changing the cluster's setting to ksm=false and syncing mom policy?

[1] https://gerrit.ovirt.org/c/ovirt-engine/+/111422

Comment 5 Michal Skrivanek 2021-04-01 03:53:57 UTC
Isn’t it mentioned in original description? “regardless of whether ksm is enabled for the cluster or not”

Comment 6 Arik 2021-04-01 08:58:46 UTC
(In reply to Michal Skrivanek from comment #5)
> Isn’t it mentioned in original description? “regardless of whether ksm is
> enabled for the cluster or not”

That's what I want to check - previously, if you've created a cluster without specifying its kvm state, it would mean that it will be disabled and that's not the case anymore. So I want to know what happens when you disable it explicitly and  then syncing MOM policy on the host (or reactivating the host) to make sure we don't fall for that

Comment 7 Polina 2021-04-01 09:09:48 UTC
just to check that I do correctly:

Edit Cluster => Optimization => Check 'Enable KSM', 'Share memory pages across all available memory (best KSM effectivness)'

Then click on Hosts tab in details view, click Synch Mom Policy .

Then check on host 'vdsm-client Host getStats |grep -i ksm'

nothing changed

Comment 12 Polina 2021-04-05 07:47:33 UTC
I've updated the version. The problem shows only in nightly build (4.4.6). 

While deploying rhv-4.4.5-11 , we have cluster configured with KSM enabled which is new and completely ok and vdsm reporting "ksmState": false. As the VMs progressively start the ksm state reported by vmds changes to True and the test passes successfully as expected. 
In nightly build, the situation differs by vdsm reporting "ksmState": true. which explains that we see the failure only in the nightly build

Comment 13 Arik 2021-04-18 19:15:41 UTC
Polina and I investigated it in her environment with both RHEL 8.3 and RHEL 8.4 and found that:
1. It's indeed about the initial configuration of KSM.
2. MOM assumes things about the initial configuration of KSM and specifically that run=0 [1]. Then it only updates the configuration at /sys/kernel/mm/ksm/ upon changes and since /sys/kernel/mm/ksm/run should, from MOM's point of view, remains 0 until we have enough running VMs on the host, it doesn't change /sys/kernel/mm/ksm/run.
3. After restarting the host /sys/kernel/mm/ksm/run is set to 1 on both RHEL 8.3 and RHEL 8.4.
4. After host-deploy /sys/kernel/mm/ksm/run is set to 0 on RHEL 8.3 and to 1 on RHEL 8.4.
5. The only component we see that changes /sys/kernel/mm/ksm/run to 0 is ksmtuned that should be disabled. When it's started, it changes /sys/kernel/mm/ksm/run to 0.

So we didn't figure out the reason for /sys/kernel/mm/ksm/run to be 0 after host-deploy on RHEL 8.3 and to be 1 on RHEL 8.4.
We suspect it has something to do with ksmtuned but we're not sure.

Anyway, it seems wrong that MOM have the initial configuration of KSM hard-coded without inspecting the configuration of KSM or changing them when it starts. I also wonder what happen when KSM makes changes to the configuration in /sys/kernel/mm/ksm/ and then restarts - I guess it will assume the initial configuration in [1] is set again.

We suspected that this might be affected by the fact we now restart the host by-default after host-deploy but that's unrelated, /sys/kernel/mm/ksm/run is set to 1 also when deploying the host without restarting it.

Martin, I didn't find any recent change related to KSM in host-deploy - is there something that I might be missing?
Tomas, do you think it worth digging in into what changed with RHEL 8.4 hosts or maybe it would be better to change MOM to apply the initial settings it assumes KSM to have on startup?

[1] https://github.com/oVirt/mom/blob/master/mom/Controllers/KSM.py#L32-L33

Comment 14 Martin Perina 2021-04-19 06:21:50 UTC
(In reply to Arik from comment #13)
> Polina and I investigated it in her environment with both RHEL 8.3 and RHEL
> 8.4 and found that:
> 1. It's indeed about the initial configuration of KSM.
> 2. MOM assumes things about the initial configuration of KSM and
> specifically that run=0 [1]. Then it only updates the configuration at
> /sys/kernel/mm/ksm/ upon changes and since /sys/kernel/mm/ksm/run should,
> from MOM's point of view, remains 0 until we have enough running VMs on the
> host, it doesn't change /sys/kernel/mm/ksm/run.
> 3. After restarting the host /sys/kernel/mm/ksm/run is set to 1 on both RHEL
> 8.3 and RHEL 8.4.
> 4. After host-deploy /sys/kernel/mm/ksm/run is set to 0 on RHEL 8.3 and to 1
> on RHEL 8.4.
> 5. The only component we see that changes /sys/kernel/mm/ksm/run to 0 is
> ksmtuned that should be disabled. When it's started, it changes
> /sys/kernel/mm/ksm/run to 0.
> 
> So we didn't figure out the reason for /sys/kernel/mm/ksm/run to be 0 after
> host-deploy on RHEL 8.3 and to be 1 on RHEL 8.4.
> We suspect it has something to do with ksmtuned but we're not sure.
> 
> Anyway, it seems wrong that MOM have the initial configuration of KSM
> hard-coded without inspecting the configuration of KSM or changing them when
> it starts. I also wonder what happen when KSM makes changes to the
> configuration in /sys/kernel/mm/ksm/ and then restarts - I guess it will
> assume the initial configuration in [1] is set again.
> 
> We suspected that this might be affected by the fact we now restart the host
> by-default after host-deploy but that's unrelated, /sys/kernel/mm/ksm/run is
> set to 1 also when deploying the host without restarting it.
> 
> Martin, I didn't find any recent change related to KSM in host-deploy - is
> there something that I might be missing?

We are not touching KSM configuration in host deploy, AFAIK VDSM is somehow configure KSM through MOM:

https://github.com/oVirt/vdsm/blob/master/static/etc/vdsm/mom.d/00-defines.policy#L8
https://github.com/oVirt/vdsm/blob/master/lib/vdsm/supervdsm_api/ksm.py
https://github.com/oVirt/vdsm/blob/master/static/etc/vdsm/mom.d/

Comment 15 Tomáš Golembiovský 2021-04-19 11:46:16 UTC
(In reply to Arik from comment #13)

> Tomas, do you think it worth digging in into what changed with RHEL 8.4
> hosts

Definitely yes. I have managed to track the problem to qemu-kvm RPM. After installing qemu-kvm from 8.4-AV onto 8.4 machine KSM is always enabled after the boot. It would be great if platform could let us know what is going on here. The specific version I could reproduce it with was qemu-kvm-4.2.0-48.module+el8.4.0+10368+630e803b.x86_64.


>  or maybe it would be better to change MOM to apply the initial
> settings it assumes KSM to have on startup?

While this is a good idea in general it may not be enough. We need to figure out what is changing KSM settings and notably if that is just a one-time change at boot or there is something actively watching the KSM configuration and working under our hands.

Comment 16 Arik 2021-04-19 13:24:09 UTC
(In reply to Martin Perina from comment #14)
> We are not touching KSM configuration in host deploy, AFAIK VDSM is somehow
> configure KSM through MOM:
> 
> https://github.com/oVirt/vdsm/blob/master/static/etc/vdsm/mom.d/00-defines.
> policy#L8
> https://github.com/oVirt/vdsm/blob/master/lib/vdsm/supervdsm_api/ksm.py
> https://github.com/oVirt/vdsm/blob/master/static/etc/vdsm/mom.d/

yeah, VDSM configures MOM to enable/disable KSM through supervdsmd when a certain threshold is met
but we now know for sure that this bug is "just" about the initial settings - MOM doesn't attempt to activate KSM and yet KSM is activated after boot/host-deploy

Comment 17 Arik 2021-04-19 13:51:39 UTC
(In reply to Tomáš Golembiovský from comment #15)
> (In reply to Arik from comment #13)
> 
> > Tomas, do you think it worth digging in into what changed with RHEL 8.4
> > hosts
> 
> Definitely yes. I have managed to track the problem to qemu-kvm RPM. After
> installing qemu-kvm from 8.4-AV onto 8.4 machine KSM is always enabled after
> the boot. It would be great if platform could let us know what is going on
> here. The specific version I could reproduce it with was
> qemu-kvm-4.2.0-48.module+el8.4.0+10368+630e803b.x86_64.

isn't the fact that it (i.e., KSM being enabled/activated after reboot) happened also on RHEL 8.3 rules out the possibility that it's related to a change in qemu-kvm from 8.4-AV? I suspect that it has always been that way and it was just not covered by our tests

> 
> 
> >  or maybe it would be better to change MOM to apply the initial
> > settings it assumes KSM to have on startup?
> 
> While this is a good idea in general it may not be enough. We need to figure
> out what is changing KSM settings and notably if that is just a one-time
> change at boot or there is something actively watching the KSM configuration
> and working under our hands.

yeah, I cannot rule out that something touches the KSM configuration "behind the scenes". when we set /sys/kernel/mm/ksm/run to 0, it remained that way (at least for quite some time) though
Polina, does the test passes after setting /sys/kernel/mm/ksm/run to 0 manually?

Comment 18 Polina 2021-04-19 14:36:19 UTC
yes , if to change manually the initial state to 0 , the tests pass

Comment 19 Tomáš Golembiovský 2021-04-20 08:59:11 UTC
(In reply to Arik from comment #17)
> (In reply to Tomáš Golembiovský from comment #15)
> > (In reply to Arik from comment #13)
> > 
> > > Tomas, do you think it worth digging in into what changed with RHEL 8.4
> > > hosts
> > 
> > Definitely yes. I have managed to track the problem to qemu-kvm RPM. After
> > installing qemu-kvm from 8.4-AV onto 8.4 machine KSM is always enabled after
> > the boot. It would be great if platform could let us know what is going on
> > here. The specific version I could reproduce it with was
> > qemu-kvm-4.2.0-48.module+el8.4.0+10368+630e803b.x86_64.
> 
> isn't the fact that it (i.e., KSM being enabled/activated after reboot)
> happened also on RHEL 8.3 rules out the possibility that it's related to a
> change in qemu-kvm from 8.4-AV? I suspect that it has always been that way
> and it was just not covered by our tests

Actually you are probably right. qemu-kvm 4.2.0 is not AV-8.4.0 but from base RHEL and I see the same behavior on RHEL 8.3 with qemu-kvm from AV-8.3.0.

> 
> > 
> > 
> > >  or maybe it would be better to change MOM to apply the initial
> > > settings it assumes KSM to have on startup?
> > 
> > While this is a good idea in general it may not be enough. We need to figure
> > out what is changing KSM settings and notably if that is just a one-time
> > change at boot or there is something actively watching the KSM configuration
> > and working under our hands.
> 
> yeah, I cannot rule out that something touches the KSM configuration "behind
> the scenes". when we set /sys/kernel/mm/ksm/run to 0, it remained that way
> (at least for quite some time) though
> Polina, does the test passes after setting /sys/kernel/mm/ksm/run to 0
> manually?

Comment 20 Polina 2021-05-04 08:03:44 UTC
verified on mom-0.6.1-1.el8ev.noarch, ovirt-engine-4.4.6.6-0.10.el8ev.noarch.

Comment 21 Sandro Bonazzola 2021-05-05 05:36:02 UTC
This bugzilla is included in oVirt 4.4.6 release, published on May 4th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.6 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.