Bug 1072030 - High shared memory being reported on hypervisor
Summary: High shared memory being reported on hypervisor
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.3.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 3.5.0
Assignee: Martin Sivák
QA Contact: Nikolai Sednev
URL:
Whiteboard: sla
Depends On:
Blocks: 1102650 1102651 rhev3.5beta 1156165
TreeView+ depends on / blocked
 
Reported: 2014-03-03 17:39 UTC by wdaniel
Modified: 2019-05-20 11:08 UTC (History)
17 users (show)

Fixed In Version: vt1.3, 4.16.0-1.el6_5
Doc Type: Bug Fix
Doc Text:
Previously, missing unit conversion caused the reported shared memory amount to be much higher than expected. Proper unit conversion has now been added, resulting in accurate shared memory amount reporting.
Clone Of:
: 1102650 1102651 (view as bug list)
Environment:
Last Closed: 2015-02-11 21:10:19 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ksm output for second case (6.46 KB, text/x-vhdl)
2014-03-07 18:07 UTC, wdaniel
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0159 0 normal SHIPPED_LIVE vdsm 3.5.0 - bug fix and enhancement update 2015-02-12 01:35:58 UTC
oVirt gerrit 28115 0 None None None Never

Description wdaniel 2014-03-03 17:39:00 UTC
Description of problem:

Hypervisors seem to be reporting an incredibly high shared memory usage when memory page sharing isn't enabled:

(case 1)
# vdsClient -s 0 getVdsStats | grep mem
	memAvailable = 22201
	memCommitted = 57335
	memFree = 33075
	memShared = 2496742
	memUsed = '55'

(case 2)

Shared Memory: 1432%
Shared Memory: 1985%

This is being reported on a specific RHEV-H version, and does not seem to change whether the customer is running 3.2 or 3.3

Version-Release number of selected component (if applicable):

rhev-hypervisor6-6.5-20140213.0
RHEV 3.2 and 3.3

How reproducible:

Have not reprouced

Steps to Reproduce:
1.
2.
3.

Actual results:

# vdsClient -s 0 getVdsStats | grep mem
	memAvailable = 22201
	memCommitted = 57335
	memFree = 33075
	memShared = 2496742
	memUsed = '55'

Expected results:

Much lower, if any, shared memory value reported by vdsClient

Additional info:

Comment 1 Fabian Deutsch 2014-03-03 17:45:15 UTC
Hey Wallace,

could you please provide the output of

$ tail /sys/kernel/mm/ksm/*

Comment 2 wdaniel 2014-03-05 14:46:25 UTC
(In reply to Fabian Deutsch from comment #1)
> Hey Wallace,
> 
> could you please provide the output of
> 
> $ tail /sys/kernel/mm/ksm/*

Fabian,

(0)[root@ivl00036 ~]# tail /sys/kernel/mm/ksm/*
==> /sys/kernel/mm/ksm/full_scans <==
1439

==> /sys/kernel/mm/ksm/merge_across_nodes <==
1

==> /sys/kernel/mm/ksm/pages_shared <==
1149486

==> /sys/kernel/mm/ksm/pages_sharing <==
3501628

==> /sys/kernel/mm/ksm/pages_to_scan <==
64

==> /sys/kernel/mm/ksm/pages_unshared <==
8613623

==> /sys/kernel/mm/ksm/pages_volatile <==
1335514

==> /sys/kernel/mm/ksm/run <==
1

==> /sys/kernel/mm/ksm/sleep_millisecs <==
2

Comment 3 Fabian Deutsch 2014-03-05 16:48:42 UTC
Thanks Wallace.

The pages_shared value seems to be similar to the memShared value in the description.

Antoni,

do you do this calculation by a chance?

Comment 4 Fabian Deutsch 2014-03-06 11:53:44 UTC
Okay, not Antoni, but Martin, maybe you can help here?

Comment 5 wdaniel 2014-03-07 18:06:12 UTC
(In reply to Fabian Deutsch from comment #3)
> Thanks Wallace.
> 
> The pages_shared value seems to be similar to the memShared value in the
> description.
> 
> Antoni,
> 
> do you do this calculation by a chance?

Fabian,

I'm not sure exactly what you're asking here - there are two cases attached to this bug with different (high) values. The 'vdsClient' output was supposed to correlate to my last update, where as the percentages in the case description are for a second case, and the output from them can be found in my most recent attachment.

Comment 6 wdaniel 2014-03-07 18:07:14 UTC
Created attachment 871979 [details]
ksm output for second case

Comment 7 wdaniel 2014-03-13 18:07:15 UTC
Fabian,

One of my customers came back to me with the following:

After adding a new hypervisor to our cluster, up to date with the latest packages,
 and migrating vms on it just under 70% of the hypervisor memory use ksm doesn't kick in. When adding an additional vm causing memory to go over 70% ksm kicks in.
Causing the new host to amost instantly get a shared memory value of 3095%.

(0)[root@ivl00034 ~]# tail /sys/kernel/mm/ksm/*
==> /sys/kernel/mm/ksm/full_scans <==
3

==> /sys/kernel/mm/ksm/merge_across_nodes <==
1

==> /sys/kernel/mm/ksm/pages_shared <==
636451

==> /sys/kernel/mm/ksm/pages_sharing <==
2243364

==> /sys/kernel/mm/ksm/pages_to_scan <==
64

==> /sys/kernel/mm/ksm/pages_unshared <==
6919617

==> /sys/kernel/mm/ksm/pages_volatile <==
843487

==> /sys/kernel/mm/ksm/run <==
1

==> /sys/kernel/mm/ksm/sleep_millisecs <==
2

Comment 8 Fabian Deutsch 2014-03-19 11:06:04 UTC
Hey Wallace,

thanks for the update. I am not into this, but to me it looks like some calculations might be done wrong. Moving this to vdsm.

Comment 9 Fabian Deutsch 2014-03-31 15:33:51 UTC
Dan,

who could take a look at this?

Comment 10 Dan Kenigsberg 2014-04-01 11:14:32 UTC
Vdsm is expected to report

  /sys/kernel/mm/ksm/pages_sharing

converted to MiB. Is there any reason to suspect that there's a miscalculation here? If so, please present both numbers!

If Vdsm translates the kernel numbers correctly it means that it's either a ksm bug, or that there are actually very good sharing (it can happen, with very similar, dormant guests).

Comment 11 Scott Herold 2014-04-23 17:09:35 UTC
In this case, KSM is doing exactly what it is intended to do.  The current Cluster -> Optimization -> Memory Optimization value does not modify the KSM behavior.  We have added a feature to 3.4 to disable KSM at a Cluster level in BZ 1026980.  I have also filed a new bug - BZ 1090576 to help clean up the terminology used in the Cluster -> Optimization settings window to more accurately reflect underlying technologies used.

Please move the cases to 1026980 and close this BZ as NOTABUG

Comment 12 Lee Yarwood 2014-05-01 10:28:26 UTC
(In reply to Dan Kenigsberg from comment #10)
> Vdsm is expected to report
> 
>   /sys/kernel/mm/ksm/pages_sharing
> 
> converted to MiB. Is there any reason to suspect that there's a
> miscalculation here? If so, please present both numbers!
> 
> If Vdsm translates the kernel numbers correctly it means that it's either a
> ksm bug, or that there are actually very good sharing (it can happen, with
> very similar, dormant guests).

Attaching SFDC#01082951

So it looks like VDSM isn't doing this and appears to be returning the raw value of /sys/kernel/mm/ksm/pages_sharing (maybe a type issue somewhere?).

For example on an internal env I see the following :

Shared Memory: 2660% in the webadmin

# rpm -qa | grep vdsm
vdsm-xmlrpc-4.13.2-0.9.el6ev.noarch
vdsm-python-4.13.2-0.9.el6ev.x86_64
vdsm-cli-4.13.2-0.9.el6ev.noarch
vdsm-4.13.2-0.9.el6ev.x86_64

# vdsClient -s 0 getVdsCaps | grep mem
	memSize = '48223'

# vdsClient -s 0 getVdsStats | grep mem
	memAvailable = 15694
	memCommitted = 30866
	memFree = 18207
	memShared = 1282883
	memUsed = '63'

# cat /sys/kernel/mm/ksm/pages_sharing
1282883

# getconf PAGE_SIZE
4096

memshared should actually be :

1282883 * 4096 / (1024*1024) = 5011

That gives us a percentage of : 

( 100 / 48223 ) * 5011 = 10%

Scott, I'm going to remove the needinfo as this appears to be a valid bug against VDSM at this time.

Comment 13 Dan Kenigsberg 2014-05-01 11:05:34 UTC
bug is in momIF's getKsmStats() which has

        ret['memShared'] = stats['ksm_pages_sharing']

with no conversion to MiB.

Comment 18 Nikolai Sednev 2014-08-12 10:59:43 UTC
Tested on hosted engine 3.5 and two hosts with RHEL6.5, failed to reproduce the bug.
Components used during verification on engine:
Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014
Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014
ovirt-engine-setup-3.5.0-0.0.master.20140804172041.git23b558e.el6.noarch

Components used during verification on hosts:
Linux version 2.6.32-431.23.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Wed Jul 16 06:12:23 EDT 2014
libvirt-0.10.2-29.el6_5.10.x86_64
sanlock-2.8-1.el6.x86_64
vdsm-4.16.1-6.gita4a4614.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.14.x86_64

Comment 20 errata-xmlrpc 2015-02-11 21:10:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0159.html


Note You need to log in before you can comment on or make changes to this bug.