Description of problem: Hypervisors seem to be reporting an incredibly high shared memory usage when memory page sharing isn't enabled: (case 1) # vdsClient -s 0 getVdsStats | grep mem memAvailable = 22201 memCommitted = 57335 memFree = 33075 memShared = 2496742 memUsed = '55' (case 2) Shared Memory: 1432% Shared Memory: 1985% This is being reported on a specific RHEV-H version, and does not seem to change whether the customer is running 3.2 or 3.3 Version-Release number of selected component (if applicable): rhev-hypervisor6-6.5-20140213.0 RHEV 3.2 and 3.3 How reproducible: Have not reprouced Steps to Reproduce: 1. 2. 3. Actual results: # vdsClient -s 0 getVdsStats | grep mem memAvailable = 22201 memCommitted = 57335 memFree = 33075 memShared = 2496742 memUsed = '55' Expected results: Much lower, if any, shared memory value reported by vdsClient Additional info:
Hey Wallace, could you please provide the output of $ tail /sys/kernel/mm/ksm/*
(In reply to Fabian Deutsch from comment #1) > Hey Wallace, > > could you please provide the output of > > $ tail /sys/kernel/mm/ksm/* Fabian, (0)[root@ivl00036 ~]# tail /sys/kernel/mm/ksm/* ==> /sys/kernel/mm/ksm/full_scans <== 1439 ==> /sys/kernel/mm/ksm/merge_across_nodes <== 1 ==> /sys/kernel/mm/ksm/pages_shared <== 1149486 ==> /sys/kernel/mm/ksm/pages_sharing <== 3501628 ==> /sys/kernel/mm/ksm/pages_to_scan <== 64 ==> /sys/kernel/mm/ksm/pages_unshared <== 8613623 ==> /sys/kernel/mm/ksm/pages_volatile <== 1335514 ==> /sys/kernel/mm/ksm/run <== 1 ==> /sys/kernel/mm/ksm/sleep_millisecs <== 2
Thanks Wallace. The pages_shared value seems to be similar to the memShared value in the description. Antoni, do you do this calculation by a chance?
Okay, not Antoni, but Martin, maybe you can help here?
(In reply to Fabian Deutsch from comment #3) > Thanks Wallace. > > The pages_shared value seems to be similar to the memShared value in the > description. > > Antoni, > > do you do this calculation by a chance? Fabian, I'm not sure exactly what you're asking here - there are two cases attached to this bug with different (high) values. The 'vdsClient' output was supposed to correlate to my last update, where as the percentages in the case description are for a second case, and the output from them can be found in my most recent attachment.
Created attachment 871979 [details] ksm output for second case
Fabian, One of my customers came back to me with the following: After adding a new hypervisor to our cluster, up to date with the latest packages, and migrating vms on it just under 70% of the hypervisor memory use ksm doesn't kick in. When adding an additional vm causing memory to go over 70% ksm kicks in. Causing the new host to amost instantly get a shared memory value of 3095%. (0)[root@ivl00034 ~]# tail /sys/kernel/mm/ksm/* ==> /sys/kernel/mm/ksm/full_scans <== 3 ==> /sys/kernel/mm/ksm/merge_across_nodes <== 1 ==> /sys/kernel/mm/ksm/pages_shared <== 636451 ==> /sys/kernel/mm/ksm/pages_sharing <== 2243364 ==> /sys/kernel/mm/ksm/pages_to_scan <== 64 ==> /sys/kernel/mm/ksm/pages_unshared <== 6919617 ==> /sys/kernel/mm/ksm/pages_volatile <== 843487 ==> /sys/kernel/mm/ksm/run <== 1 ==> /sys/kernel/mm/ksm/sleep_millisecs <== 2
Hey Wallace, thanks for the update. I am not into this, but to me it looks like some calculations might be done wrong. Moving this to vdsm.
Dan, who could take a look at this?
Vdsm is expected to report /sys/kernel/mm/ksm/pages_sharing converted to MiB. Is there any reason to suspect that there's a miscalculation here? If so, please present both numbers! If Vdsm translates the kernel numbers correctly it means that it's either a ksm bug, or that there are actually very good sharing (it can happen, with very similar, dormant guests).
In this case, KSM is doing exactly what it is intended to do. The current Cluster -> Optimization -> Memory Optimization value does not modify the KSM behavior. We have added a feature to 3.4 to disable KSM at a Cluster level in BZ 1026980. I have also filed a new bug - BZ 1090576 to help clean up the terminology used in the Cluster -> Optimization settings window to more accurately reflect underlying technologies used. Please move the cases to 1026980 and close this BZ as NOTABUG
(In reply to Dan Kenigsberg from comment #10) > Vdsm is expected to report > > /sys/kernel/mm/ksm/pages_sharing > > converted to MiB. Is there any reason to suspect that there's a > miscalculation here? If so, please present both numbers! > > If Vdsm translates the kernel numbers correctly it means that it's either a > ksm bug, or that there are actually very good sharing (it can happen, with > very similar, dormant guests). Attaching SFDC#01082951 So it looks like VDSM isn't doing this and appears to be returning the raw value of /sys/kernel/mm/ksm/pages_sharing (maybe a type issue somewhere?). For example on an internal env I see the following : Shared Memory: 2660% in the webadmin # rpm -qa | grep vdsm vdsm-xmlrpc-4.13.2-0.9.el6ev.noarch vdsm-python-4.13.2-0.9.el6ev.x86_64 vdsm-cli-4.13.2-0.9.el6ev.noarch vdsm-4.13.2-0.9.el6ev.x86_64 # vdsClient -s 0 getVdsCaps | grep mem memSize = '48223' # vdsClient -s 0 getVdsStats | grep mem memAvailable = 15694 memCommitted = 30866 memFree = 18207 memShared = 1282883 memUsed = '63' # cat /sys/kernel/mm/ksm/pages_sharing 1282883 # getconf PAGE_SIZE 4096 memshared should actually be : 1282883 * 4096 / (1024*1024) = 5011 That gives us a percentage of : ( 100 / 48223 ) * 5011 = 10% Scott, I'm going to remove the needinfo as this appears to be a valid bug against VDSM at this time.
bug is in momIF's getKsmStats() which has ret['memShared'] = stats['ksm_pages_sharing'] with no conversion to MiB.
Tested on hosted engine 3.5 and two hosts with RHEL6.5, failed to reproduce the bug. Components used during verification on engine: Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014 Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014 ovirt-engine-setup-3.5.0-0.0.master.20140804172041.git23b558e.el6.noarch Components used during verification on hosts: Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014 libvirt-0.10.2-29.el6_5.10.x86_64 sanlock-2.8-1.el6.x86_64 vdsm-4.16.1-6.gita4a4614.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.415.el6_5.14.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0159.html