Bug 1072030

Summary: High shared memory being reported on hypervisor
Product: Red Hat Enterprise Virtualization Manager Reporter: wdaniel
Component: vdsmAssignee: Martin Sivák <msivak>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: high Docs Contact:
Priority: high    
Version: 3.3.0CC: asegundo, bazulay, benglish, cpelland, danken, dfediuck, fdeutsch, iheim, lpeer, lyarwood, michal.skrivanek, msivak, scohen, sherold, s.kieske, wdaniel, yeylon
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: 3.5.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: sla
Fixed In Version: vt1.3, 4.16.0-1.el6_5 Doc Type: Bug Fix
Doc Text:
Previously, missing unit conversion caused the reported shared memory amount to be much higher than expected. Proper unit conversion has now been added, resulting in accurate shared memory amount reporting.
Story Points: ---
Clone Of:
: 1102650 1102651 (view as bug list) Environment:
Last Closed: 2015-02-11 21:10:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1102650, 1102651, 1142923, 1156165    
Attachments:
Description Flags
ksm output for second case none

Description wdaniel 2014-03-03 17:39:00 UTC
Description of problem:

Hypervisors seem to be reporting an incredibly high shared memory usage when memory page sharing isn't enabled:

(case 1)
# vdsClient -s 0 getVdsStats | grep mem
	memAvailable = 22201
	memCommitted = 57335
	memFree = 33075
	memShared = 2496742
	memUsed = '55'

(case 2)

Shared Memory: 1432%
Shared Memory: 1985%

This is being reported on a specific RHEV-H version, and does not seem to change whether the customer is running 3.2 or 3.3

Version-Release number of selected component (if applicable):

rhev-hypervisor6-6.5-20140213.0
RHEV 3.2 and 3.3

How reproducible:

Have not reprouced

Steps to Reproduce:
1.
2.
3.

Actual results:

# vdsClient -s 0 getVdsStats | grep mem
	memAvailable = 22201
	memCommitted = 57335
	memFree = 33075
	memShared = 2496742
	memUsed = '55'

Expected results:

Much lower, if any, shared memory value reported by vdsClient

Additional info:

Comment 1 Fabian Deutsch 2014-03-03 17:45:15 UTC
Hey Wallace,

could you please provide the output of

$ tail /sys/kernel/mm/ksm/*

Comment 2 wdaniel 2014-03-05 14:46:25 UTC
(In reply to Fabian Deutsch from comment #1)
> Hey Wallace,
> 
> could you please provide the output of
> 
> $ tail /sys/kernel/mm/ksm/*

Fabian,

(0)[root@ivl00036 ~]# tail /sys/kernel/mm/ksm/*
==> /sys/kernel/mm/ksm/full_scans <==
1439

==> /sys/kernel/mm/ksm/merge_across_nodes <==
1

==> /sys/kernel/mm/ksm/pages_shared <==
1149486

==> /sys/kernel/mm/ksm/pages_sharing <==
3501628

==> /sys/kernel/mm/ksm/pages_to_scan <==
64

==> /sys/kernel/mm/ksm/pages_unshared <==
8613623

==> /sys/kernel/mm/ksm/pages_volatile <==
1335514

==> /sys/kernel/mm/ksm/run <==
1

==> /sys/kernel/mm/ksm/sleep_millisecs <==
2

Comment 3 Fabian Deutsch 2014-03-05 16:48:42 UTC
Thanks Wallace.

The pages_shared value seems to be similar to the memShared value in the description.

Antoni,

do you do this calculation by a chance?

Comment 4 Fabian Deutsch 2014-03-06 11:53:44 UTC
Okay, not Antoni, but Martin, maybe you can help here?

Comment 5 wdaniel 2014-03-07 18:06:12 UTC
(In reply to Fabian Deutsch from comment #3)
> Thanks Wallace.
> 
> The pages_shared value seems to be similar to the memShared value in the
> description.
> 
> Antoni,
> 
> do you do this calculation by a chance?

Fabian,

I'm not sure exactly what you're asking here - there are two cases attached to this bug with different (high) values. The 'vdsClient' output was supposed to correlate to my last update, where as the percentages in the case description are for a second case, and the output from them can be found in my most recent attachment.

Comment 6 wdaniel 2014-03-07 18:07:14 UTC
Created attachment 871979 [details]
ksm output for second case

Comment 7 wdaniel 2014-03-13 18:07:15 UTC
Fabian,

One of my customers came back to me with the following:

After adding a new hypervisor to our cluster, up to date with the latest packages,
 and migrating vms on it just under 70% of the hypervisor memory use ksm doesn't kick in. When adding an additional vm causing memory to go over 70% ksm kicks in.
Causing the new host to amost instantly get a shared memory value of 3095%.

(0)[root@ivl00034 ~]# tail /sys/kernel/mm/ksm/*
==> /sys/kernel/mm/ksm/full_scans <==
3

==> /sys/kernel/mm/ksm/merge_across_nodes <==
1

==> /sys/kernel/mm/ksm/pages_shared <==
636451

==> /sys/kernel/mm/ksm/pages_sharing <==
2243364

==> /sys/kernel/mm/ksm/pages_to_scan <==
64

==> /sys/kernel/mm/ksm/pages_unshared <==
6919617

==> /sys/kernel/mm/ksm/pages_volatile <==
843487

==> /sys/kernel/mm/ksm/run <==
1

==> /sys/kernel/mm/ksm/sleep_millisecs <==
2

Comment 8 Fabian Deutsch 2014-03-19 11:06:04 UTC
Hey Wallace,

thanks for the update. I am not into this, but to me it looks like some calculations might be done wrong. Moving this to vdsm.

Comment 9 Fabian Deutsch 2014-03-31 15:33:51 UTC
Dan,

who could take a look at this?

Comment 10 Dan Kenigsberg 2014-04-01 11:14:32 UTC
Vdsm is expected to report

  /sys/kernel/mm/ksm/pages_sharing

converted to MiB. Is there any reason to suspect that there's a miscalculation here? If so, please present both numbers!

If Vdsm translates the kernel numbers correctly it means that it's either a ksm bug, or that there are actually very good sharing (it can happen, with very similar, dormant guests).

Comment 11 Scott Herold 2014-04-23 17:09:35 UTC
In this case, KSM is doing exactly what it is intended to do.  The current Cluster -> Optimization -> Memory Optimization value does not modify the KSM behavior.  We have added a feature to 3.4 to disable KSM at a Cluster level in BZ 1026980.  I have also filed a new bug - BZ 1090576 to help clean up the terminology used in the Cluster -> Optimization settings window to more accurately reflect underlying technologies used.

Please move the cases to 1026980 and close this BZ as NOTABUG

Comment 12 Lee Yarwood 2014-05-01 10:28:26 UTC
(In reply to Dan Kenigsberg from comment #10)
> Vdsm is expected to report
> 
>   /sys/kernel/mm/ksm/pages_sharing
> 
> converted to MiB. Is there any reason to suspect that there's a
> miscalculation here? If so, please present both numbers!
> 
> If Vdsm translates the kernel numbers correctly it means that it's either a
> ksm bug, or that there are actually very good sharing (it can happen, with
> very similar, dormant guests).

Attaching SFDC#01082951

So it looks like VDSM isn't doing this and appears to be returning the raw value of /sys/kernel/mm/ksm/pages_sharing (maybe a type issue somewhere?).

For example on an internal env I see the following :

Shared Memory: 2660% in the webadmin

# rpm -qa | grep vdsm
vdsm-xmlrpc-4.13.2-0.9.el6ev.noarch
vdsm-python-4.13.2-0.9.el6ev.x86_64
vdsm-cli-4.13.2-0.9.el6ev.noarch
vdsm-4.13.2-0.9.el6ev.x86_64

# vdsClient -s 0 getVdsCaps | grep mem
	memSize = '48223'

# vdsClient -s 0 getVdsStats | grep mem
	memAvailable = 15694
	memCommitted = 30866
	memFree = 18207
	memShared = 1282883
	memUsed = '63'

# cat /sys/kernel/mm/ksm/pages_sharing
1282883

# getconf PAGE_SIZE
4096

memshared should actually be :

1282883 * 4096 / (1024*1024) = 5011

That gives us a percentage of : 

( 100 / 48223 ) * 5011 = 10%

Scott, I'm going to remove the needinfo as this appears to be a valid bug against VDSM at this time.

Comment 13 Dan Kenigsberg 2014-05-01 11:05:34 UTC
bug is in momIF's getKsmStats() which has

        ret['memShared'] = stats['ksm_pages_sharing']

with no conversion to MiB.

Comment 18 Nikolai Sednev 2014-08-12 10:59:43 UTC
Tested on hosted engine 3.5 and two hosts with RHEL6.5, failed to reproduce the bug.
Components used during verification on engine:
Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014
Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014
ovirt-engine-setup-3.5.0-0.0.master.20140804172041.git23b558e.el6.noarch

Components used during verification on hosts:
Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014
libvirt-0.10.2-29.el6_5.10.x86_64
sanlock-2.8-1.el6.x86_64
vdsm-4.16.1-6.gita4a4614.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.14.x86_64

Comment 20 errata-xmlrpc 2015-02-11 21:10:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0159.html