Bug 1072030
Summary: | High shared memory being reported on hypervisor | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | wdaniel | ||||
Component: | vdsm | Assignee: | Martin Sivák <msivak> | ||||
Status: | CLOSED ERRATA | QA Contact: | Nikolai Sednev <nsednev> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.3.0 | CC: | asegundo, bazulay, benglish, cpelland, danken, dfediuck, fdeutsch, iheim, lpeer, lyarwood, michal.skrivanek, msivak, scohen, sherold, s.kieske, wdaniel, yeylon | ||||
Target Milestone: | --- | Keywords: | Triaged, ZStream | ||||
Target Release: | 3.5.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | sla | ||||||
Fixed In Version: | vt1.3, 4.16.0-1.el6_5 | Doc Type: | Bug Fix | ||||
Doc Text: |
Previously, missing unit conversion caused the reported shared memory amount to be much higher than expected. Proper unit conversion has now been added, resulting in accurate shared memory amount reporting.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1102650 1102651 (view as bug list) | Environment: | |||||
Last Closed: | 2015-02-11 21:10:19 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1102650, 1102651, 1142923, 1156165 | ||||||
Attachments: |
|
Description
wdaniel
2014-03-03 17:39:00 UTC
Hey Wallace, could you please provide the output of $ tail /sys/kernel/mm/ksm/* (In reply to Fabian Deutsch from comment #1) > Hey Wallace, > > could you please provide the output of > > $ tail /sys/kernel/mm/ksm/* Fabian, (0)[root@ivl00036 ~]# tail /sys/kernel/mm/ksm/* ==> /sys/kernel/mm/ksm/full_scans <== 1439 ==> /sys/kernel/mm/ksm/merge_across_nodes <== 1 ==> /sys/kernel/mm/ksm/pages_shared <== 1149486 ==> /sys/kernel/mm/ksm/pages_sharing <== 3501628 ==> /sys/kernel/mm/ksm/pages_to_scan <== 64 ==> /sys/kernel/mm/ksm/pages_unshared <== 8613623 ==> /sys/kernel/mm/ksm/pages_volatile <== 1335514 ==> /sys/kernel/mm/ksm/run <== 1 ==> /sys/kernel/mm/ksm/sleep_millisecs <== 2 Thanks Wallace. The pages_shared value seems to be similar to the memShared value in the description. Antoni, do you do this calculation by a chance? Okay, not Antoni, but Martin, maybe you can help here? (In reply to Fabian Deutsch from comment #3) > Thanks Wallace. > > The pages_shared value seems to be similar to the memShared value in the > description. > > Antoni, > > do you do this calculation by a chance? Fabian, I'm not sure exactly what you're asking here - there are two cases attached to this bug with different (high) values. The 'vdsClient' output was supposed to correlate to my last update, where as the percentages in the case description are for a second case, and the output from them can be found in my most recent attachment. Created attachment 871979 [details]
ksm output for second case
Fabian, One of my customers came back to me with the following: After adding a new hypervisor to our cluster, up to date with the latest packages, and migrating vms on it just under 70% of the hypervisor memory use ksm doesn't kick in. When adding an additional vm causing memory to go over 70% ksm kicks in. Causing the new host to amost instantly get a shared memory value of 3095%. (0)[root@ivl00034 ~]# tail /sys/kernel/mm/ksm/* ==> /sys/kernel/mm/ksm/full_scans <== 3 ==> /sys/kernel/mm/ksm/merge_across_nodes <== 1 ==> /sys/kernel/mm/ksm/pages_shared <== 636451 ==> /sys/kernel/mm/ksm/pages_sharing <== 2243364 ==> /sys/kernel/mm/ksm/pages_to_scan <== 64 ==> /sys/kernel/mm/ksm/pages_unshared <== 6919617 ==> /sys/kernel/mm/ksm/pages_volatile <== 843487 ==> /sys/kernel/mm/ksm/run <== 1 ==> /sys/kernel/mm/ksm/sleep_millisecs <== 2 Hey Wallace, thanks for the update. I am not into this, but to me it looks like some calculations might be done wrong. Moving this to vdsm. Dan, who could take a look at this? Vdsm is expected to report /sys/kernel/mm/ksm/pages_sharing converted to MiB. Is there any reason to suspect that there's a miscalculation here? If so, please present both numbers! If Vdsm translates the kernel numbers correctly it means that it's either a ksm bug, or that there are actually very good sharing (it can happen, with very similar, dormant guests). In this case, KSM is doing exactly what it is intended to do. The current Cluster -> Optimization -> Memory Optimization value does not modify the KSM behavior. We have added a feature to 3.4 to disable KSM at a Cluster level in BZ 1026980. I have also filed a new bug - BZ 1090576 to help clean up the terminology used in the Cluster -> Optimization settings window to more accurately reflect underlying technologies used. Please move the cases to 1026980 and close this BZ as NOTABUG (In reply to Dan Kenigsberg from comment #10) > Vdsm is expected to report > > /sys/kernel/mm/ksm/pages_sharing > > converted to MiB. Is there any reason to suspect that there's a > miscalculation here? If so, please present both numbers! > > If Vdsm translates the kernel numbers correctly it means that it's either a > ksm bug, or that there are actually very good sharing (it can happen, with > very similar, dormant guests). Attaching SFDC#01082951 So it looks like VDSM isn't doing this and appears to be returning the raw value of /sys/kernel/mm/ksm/pages_sharing (maybe a type issue somewhere?). For example on an internal env I see the following : Shared Memory: 2660% in the webadmin # rpm -qa | grep vdsm vdsm-xmlrpc-4.13.2-0.9.el6ev.noarch vdsm-python-4.13.2-0.9.el6ev.x86_64 vdsm-cli-4.13.2-0.9.el6ev.noarch vdsm-4.13.2-0.9.el6ev.x86_64 # vdsClient -s 0 getVdsCaps | grep mem memSize = '48223' # vdsClient -s 0 getVdsStats | grep mem memAvailable = 15694 memCommitted = 30866 memFree = 18207 memShared = 1282883 memUsed = '63' # cat /sys/kernel/mm/ksm/pages_sharing 1282883 # getconf PAGE_SIZE 4096 memshared should actually be : 1282883 * 4096 / (1024*1024) = 5011 That gives us a percentage of : ( 100 / 48223 ) * 5011 = 10% Scott, I'm going to remove the needinfo as this appears to be a valid bug against VDSM at this time. bug is in momIF's getKsmStats() which has ret['memShared'] = stats['ksm_pages_sharing'] with no conversion to MiB. Tested on hosted engine 3.5 and two hosts with RHEL6.5, failed to reproduce the bug. Components used during verification on engine: Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014 Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014 ovirt-engine-setup-3.5.0-0.0.master.20140804172041.git23b558e.el6.noarch Components used during verification on hosts: Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014 libvirt-0.10.2-29.el6_5.10.x86_64 sanlock-2.8-1.el6.x86_64 vdsm-4.16.1-6.gita4a4614.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.415.el6_5.14.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0159.html |