Created attachment 1768059 [details] engine & vdsm logs Description of problem:ksmState is always reported as True, regardless of whether ksm is enabled for the cluster or not Version-Release number of selected component (if applicable):ovirt-engine-4.4.6.2-433.g5a9bbba.6.el8ev.noarch How reproducible:100 % Steps to Reproduce: 1. On hosts vdsm-client Host getStats |grep -i ksm "ksmCpu": 0, "ksmMergeAcrossNodes": true, "ksmPages": 100, "ksmState": true, Actual results:ksmState on the host is always reported as True Expected results: The ksmState must be true when the cluster is configured 'ksm_enabled': True and several VMs starting on the host. When the VMs are migrated from the host the ksmState must be changed to False. Now it is always True Additional info:
Polina, can you please check if it's really always running or it is just a wrong report Perhaps just check /sys/kernel/mm/ksm/run: - write 0 to disable ksm, read 0 while ksm is disabled. - write 1 to run ksm, read 1 while ksm is running. - write 2 to disable ksm and unmerge all its pages.
/sys/kernel/mm/ksm/run always remains 1
if I write echo 0 > /sys/kernel/mm/ksm/run, the report is false , vdsm-client Host getStats |grep -i ksm "ksmCpu": 0, "ksmMergeAcrossNodes": true, "ksmPages": 100, "ksmState": false, echo 2 > /sys/kernel/mm/ksm/run vdsm-client Host getStats |grep -i ksm "ksmCpu": 0, "ksmMergeAcrossNodes": true, "ksmPages": 100, "ksmState": true,
The output in comment 3 is a good sign. Note that we've changed the default for new clusters to ksm=true [1]. Can you please check what is reported when changing the cluster's setting to ksm=false and syncing mom policy? [1] https://gerrit.ovirt.org/c/ovirt-engine/+/111422
Isn’t it mentioned in original description? “regardless of whether ksm is enabled for the cluster or not”
(In reply to Michal Skrivanek from comment #5) > Isn’t it mentioned in original description? “regardless of whether ksm is > enabled for the cluster or not” That's what I want to check - previously, if you've created a cluster without specifying its kvm state, it would mean that it will be disabled and that's not the case anymore. So I want to know what happens when you disable it explicitly and then syncing MOM policy on the host (or reactivating the host) to make sure we don't fall for that
just to check that I do correctly: Edit Cluster => Optimization => Check 'Enable KSM', 'Share memory pages across all available memory (best KSM effectivness)' Then click on Hosts tab in details view, click Synch Mom Policy . Then check on host 'vdsm-client Host getStats |grep -i ksm' nothing changed
I've updated the version. The problem shows only in nightly build (4.4.6). While deploying rhv-4.4.5-11 , we have cluster configured with KSM enabled which is new and completely ok and vdsm reporting "ksmState": false. As the VMs progressively start the ksm state reported by vmds changes to True and the test passes successfully as expected. In nightly build, the situation differs by vdsm reporting "ksmState": true. which explains that we see the failure only in the nightly build
Polina and I investigated it in her environment with both RHEL 8.3 and RHEL 8.4 and found that: 1. It's indeed about the initial configuration of KSM. 2. MOM assumes things about the initial configuration of KSM and specifically that run=0 [1]. Then it only updates the configuration at /sys/kernel/mm/ksm/ upon changes and since /sys/kernel/mm/ksm/run should, from MOM's point of view, remains 0 until we have enough running VMs on the host, it doesn't change /sys/kernel/mm/ksm/run. 3. After restarting the host /sys/kernel/mm/ksm/run is set to 1 on both RHEL 8.3 and RHEL 8.4. 4. After host-deploy /sys/kernel/mm/ksm/run is set to 0 on RHEL 8.3 and to 1 on RHEL 8.4. 5. The only component we see that changes /sys/kernel/mm/ksm/run to 0 is ksmtuned that should be disabled. When it's started, it changes /sys/kernel/mm/ksm/run to 0. So we didn't figure out the reason for /sys/kernel/mm/ksm/run to be 0 after host-deploy on RHEL 8.3 and to be 1 on RHEL 8.4. We suspect it has something to do with ksmtuned but we're not sure. Anyway, it seems wrong that MOM have the initial configuration of KSM hard-coded without inspecting the configuration of KSM or changing them when it starts. I also wonder what happen when KSM makes changes to the configuration in /sys/kernel/mm/ksm/ and then restarts - I guess it will assume the initial configuration in [1] is set again. We suspected that this might be affected by the fact we now restart the host by-default after host-deploy but that's unrelated, /sys/kernel/mm/ksm/run is set to 1 also when deploying the host without restarting it. Martin, I didn't find any recent change related to KSM in host-deploy - is there something that I might be missing? Tomas, do you think it worth digging in into what changed with RHEL 8.4 hosts or maybe it would be better to change MOM to apply the initial settings it assumes KSM to have on startup? [1] https://github.com/oVirt/mom/blob/master/mom/Controllers/KSM.py#L32-L33
(In reply to Arik from comment #13) > Polina and I investigated it in her environment with both RHEL 8.3 and RHEL > 8.4 and found that: > 1. It's indeed about the initial configuration of KSM. > 2. MOM assumes things about the initial configuration of KSM and > specifically that run=0 [1]. Then it only updates the configuration at > /sys/kernel/mm/ksm/ upon changes and since /sys/kernel/mm/ksm/run should, > from MOM's point of view, remains 0 until we have enough running VMs on the > host, it doesn't change /sys/kernel/mm/ksm/run. > 3. After restarting the host /sys/kernel/mm/ksm/run is set to 1 on both RHEL > 8.3 and RHEL 8.4. > 4. After host-deploy /sys/kernel/mm/ksm/run is set to 0 on RHEL 8.3 and to 1 > on RHEL 8.4. > 5. The only component we see that changes /sys/kernel/mm/ksm/run to 0 is > ksmtuned that should be disabled. When it's started, it changes > /sys/kernel/mm/ksm/run to 0. > > So we didn't figure out the reason for /sys/kernel/mm/ksm/run to be 0 after > host-deploy on RHEL 8.3 and to be 1 on RHEL 8.4. > We suspect it has something to do with ksmtuned but we're not sure. > > Anyway, it seems wrong that MOM have the initial configuration of KSM > hard-coded without inspecting the configuration of KSM or changing them when > it starts. I also wonder what happen when KSM makes changes to the > configuration in /sys/kernel/mm/ksm/ and then restarts - I guess it will > assume the initial configuration in [1] is set again. > > We suspected that this might be affected by the fact we now restart the host > by-default after host-deploy but that's unrelated, /sys/kernel/mm/ksm/run is > set to 1 also when deploying the host without restarting it. > > Martin, I didn't find any recent change related to KSM in host-deploy - is > there something that I might be missing? We are not touching KSM configuration in host deploy, AFAIK VDSM is somehow configure KSM through MOM: https://github.com/oVirt/vdsm/blob/master/static/etc/vdsm/mom.d/00-defines.policy#L8 https://github.com/oVirt/vdsm/blob/master/lib/vdsm/supervdsm_api/ksm.py https://github.com/oVirt/vdsm/blob/master/static/etc/vdsm/mom.d/
(In reply to Arik from comment #13) > Tomas, do you think it worth digging in into what changed with RHEL 8.4 > hosts Definitely yes. I have managed to track the problem to qemu-kvm RPM. After installing qemu-kvm from 8.4-AV onto 8.4 machine KSM is always enabled after the boot. It would be great if platform could let us know what is going on here. The specific version I could reproduce it with was qemu-kvm-4.2.0-48.module+el8.4.0+10368+630e803b.x86_64. > or maybe it would be better to change MOM to apply the initial > settings it assumes KSM to have on startup? While this is a good idea in general it may not be enough. We need to figure out what is changing KSM settings and notably if that is just a one-time change at boot or there is something actively watching the KSM configuration and working under our hands.
(In reply to Martin Perina from comment #14) > We are not touching KSM configuration in host deploy, AFAIK VDSM is somehow > configure KSM through MOM: > > https://github.com/oVirt/vdsm/blob/master/static/etc/vdsm/mom.d/00-defines. > policy#L8 > https://github.com/oVirt/vdsm/blob/master/lib/vdsm/supervdsm_api/ksm.py > https://github.com/oVirt/vdsm/blob/master/static/etc/vdsm/mom.d/ yeah, VDSM configures MOM to enable/disable KSM through supervdsmd when a certain threshold is met but we now know for sure that this bug is "just" about the initial settings - MOM doesn't attempt to activate KSM and yet KSM is activated after boot/host-deploy
(In reply to Tomáš Golembiovský from comment #15) > (In reply to Arik from comment #13) > > > Tomas, do you think it worth digging in into what changed with RHEL 8.4 > > hosts > > Definitely yes. I have managed to track the problem to qemu-kvm RPM. After > installing qemu-kvm from 8.4-AV onto 8.4 machine KSM is always enabled after > the boot. It would be great if platform could let us know what is going on > here. The specific version I could reproduce it with was > qemu-kvm-4.2.0-48.module+el8.4.0+10368+630e803b.x86_64. isn't the fact that it (i.e., KSM being enabled/activated after reboot) happened also on RHEL 8.3 rules out the possibility that it's related to a change in qemu-kvm from 8.4-AV? I suspect that it has always been that way and it was just not covered by our tests > > > > or maybe it would be better to change MOM to apply the initial > > settings it assumes KSM to have on startup? > > While this is a good idea in general it may not be enough. We need to figure > out what is changing KSM settings and notably if that is just a one-time > change at boot or there is something actively watching the KSM configuration > and working under our hands. yeah, I cannot rule out that something touches the KSM configuration "behind the scenes". when we set /sys/kernel/mm/ksm/run to 0, it remained that way (at least for quite some time) though Polina, does the test passes after setting /sys/kernel/mm/ksm/run to 0 manually?
yes , if to change manually the initial state to 0 , the tests pass
(In reply to Arik from comment #17) > (In reply to Tomáš Golembiovský from comment #15) > > (In reply to Arik from comment #13) > > > > > Tomas, do you think it worth digging in into what changed with RHEL 8.4 > > > hosts > > > > Definitely yes. I have managed to track the problem to qemu-kvm RPM. After > > installing qemu-kvm from 8.4-AV onto 8.4 machine KSM is always enabled after > > the boot. It would be great if platform could let us know what is going on > > here. The specific version I could reproduce it with was > > qemu-kvm-4.2.0-48.module+el8.4.0+10368+630e803b.x86_64. > > isn't the fact that it (i.e., KSM being enabled/activated after reboot) > happened also on RHEL 8.3 rules out the possibility that it's related to a > change in qemu-kvm from 8.4-AV? I suspect that it has always been that way > and it was just not covered by our tests Actually you are probably right. qemu-kvm 4.2.0 is not AV-8.4.0 but from base RHEL and I see the same behavior on RHEL 8.3 with qemu-kvm from AV-8.3.0. > > > > > > > > or maybe it would be better to change MOM to apply the initial > > > settings it assumes KSM to have on startup? > > > > While this is a good idea in general it may not be enough. We need to figure > > out what is changing KSM settings and notably if that is just a one-time > > change at boot or there is something actively watching the KSM configuration > > and working under our hands. > > yeah, I cannot rule out that something touches the KSM configuration "behind > the scenes". when we set /sys/kernel/mm/ksm/run to 0, it remained that way > (at least for quite some time) though > Polina, does the test passes after setting /sys/kernel/mm/ksm/run to 0 > manually?
verified on mom-0.6.1-1.el8ev.noarch, ovirt-engine-4.4.6.6-0.10.el8ev.noarch.
This bugzilla is included in oVirt 4.4.6 release, published on May 4th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.6 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.