Bug 1093786

Summary: Negative values for "Shared Memory"
Product: Red Hat Enterprise Virtualization Manager Reporter: Amador Pahim <asegundo>
Component: ovirt-engineAssignee: Doron Fediuck <dfediuck>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.4.0CC: asegundo, dfediuck, gchaplik, gklein, iheim, jentrena, lpeer, mavital, mtessun, nsednev, rbalakri, Rhev-m-bugs, sherold, yeylon
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: 3.5.0Flags: mavital: needinfo+
nsednev: needinfo-
Hardware: All   
OS: Linux   
Whiteboard: sla
Fixed In Version: vt12 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1166010 (view as bug list) Environment:
Last Closed: 2015-02-11 18:01:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1072347, 1166010    
Attachments:
Description Flags
shared memory positive value screenshot none

Description Amador Pahim 2014-05-02 16:23:34 UTC
Description of problem:

There is an integer overflow in getMemSharedPercent(). Integer limit is 2,147,483,647. In backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/vdsbroker/VdsProperties.java, we have:

...
 148     public static final String mem_shared = "memShared";$                       
...
 687     public Integer getMemSharedPercent() {
 688         Long shared = mVdsStatistics.getmem_shared();
 689         Integer physical = mVdsDynamic.getphysical_mem_mb();
 690 
 691         if (shared == null || physical == null || physical == 0) {
 692             return 0;
 693         }
 694 
 695         return ((int) (shared * 100) / physical);
 696     }
...


Since "shared" is multiplied by 100, the current limit for "memShared" before causing the overflow is 21,474,836. 21,474,836 will represent ~21TB of shared memory. Not sure when we will have a system with 21TB of shared memory, but let's avoid this issue now.


Version-Release number of selected component (if applicable):
rhevm-3.4.0-0.15.beta3.el6ev.noarch.rpm


Additional info:

Here two examples

- No overflow:

Integer physical = 1033939;
Integer shared = 21474836;
((int) (shared * 100) / physical)
Result: 2076

- Overflow taking place:

Integer physical = 1033939;
Integer shared = 21474837;
((int) (shared * 100) / physical)
Result: -2076


Actual results:
RHEV Manager reporting negative and inconsistent values for Shared Memory.

Comment 1 Nikolai Sednev 2014-07-22 13:28:21 UTC
(In reply to Amador Pahim from comment #0)
> Description of problem:
> 
> There is an integer overflow in getMemSharedPercent(). Integer limit is
> 2,147,483,647. In
> backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/
> vdsbroker/vdsbroker/VdsProperties.java, we have:
> 
> ...
>  148     public static final String mem_shared = "memShared";$              
> 
> ...
>  687     public Integer getMemSharedPercent() {
>  688         Long shared = mVdsStatistics.getmem_shared();
>  689         Integer physical = mVdsDynamic.getphysical_mem_mb();
>  690 
>  691         if (shared == null || physical == null || physical == 0) {
>  692             return 0;
>  693         }
>  694 
>  695         return ((int) (shared * 100) / physical);
>  696     }
> ...
> 
> 
> Since "shared" is multiplied by 100, the current limit for "memShared"
> before causing the overflow is 21,474,836. 21,474,836 will represent ~21TB
> of shared memory. Not sure when we will have a system with 21TB of shared
> memory, but let's avoid this issue now.
> 
> 
> Version-Release number of selected component (if applicable):
> rhevm-3.4.0-0.15.beta3.el6ev.noarch.rpm
> 
> 
> Additional info:
> 
> Here two examples
> 
> - No overflow:
> 
> Integer physical = 1033939;
> Integer shared = 21474836;
> ((int) (shared * 100) / physical)
> Result: 2076
> 
> - Overflow taking place:
> 
> Integer physical = 1033939;
> Integer shared = 21474837;
> ((int) (shared * 100) / physical)
> Result: -2076
> 
> 
> Actual results:
> RHEV Manager reporting negative and inconsistent values for Shared Memory.

Amador Pahim,
Please provide exact reproduction steps, expected results and current results, we'll have to reproduce this, so we need more details.
Add also how is reproducable, 100% or rare.

Comment 2 Amador Pahim 2014-07-22 14:27:01 UTC
(In reply to Nikolai Sednev from comment #1)
> (In reply to Amador Pahim from comment #0)
> > Description of problem:
> > 
> > There is an integer overflow in getMemSharedPercent(). Integer limit is
> > 2,147,483,647. In
> > backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/
> > vdsbroker/vdsbroker/VdsProperties.java, we have:
> > 
> > ...
> >  148     public static final String mem_shared = "memShared";$              
> > 
> > ...
> >  687     public Integer getMemSharedPercent() {
> >  688         Long shared = mVdsStatistics.getmem_shared();
> >  689         Integer physical = mVdsDynamic.getphysical_mem_mb();
> >  690 
> >  691         if (shared == null || physical == null || physical == 0) {
> >  692             return 0;
> >  693         }
> >  694 
> >  695         return ((int) (shared * 100) / physical);
> >  696     }
> > ...
> > 
> > 
> > Since "shared" is multiplied by 100, the current limit for "memShared"
> > before causing the overflow is 21,474,836. 21,474,836 will represent ~21TB
> > of shared memory. Not sure when we will have a system with 21TB of shared
> > memory, but let's avoid this issue now.
> > 
> > 
> > Version-Release number of selected component (if applicable):
> > rhevm-3.4.0-0.15.beta3.el6ev.noarch.rpm
> > 
> > 
> > Additional info:
> > 
> > Here two examples
> > 
> > - No overflow:
> > 
> > Integer physical = 1033939;
> > Integer shared = 21474836;
> > ((int) (shared * 100) / physical)
> > Result: 2076
> > 
> > - Overflow taking place:
> > 
> > Integer physical = 1033939;
> > Integer shared = 21474837;
> > ((int) (shared * 100) / physical)
> > Result: -2076
> > 
> > 
> > Actual results:
> > RHEV Manager reporting negative and inconsistent values for Shared Memory.
> 
> Amador Pahim,
> Please provide exact reproduction steps, expected results and current
> results, we'll have to reproduce this, so we need more details.
> Add also how is reproducable, 100% or rare.

This is a very rare condition and the reproduction is hard, since you will need a system with 21TB of shared memory to trigger it. I'm not sure if we have to deal with it, since it's supposed to affect only really big servers, with a total memory very close to the RHEL theoretical limit (64TB) and far above the tested limit (3TB). See https://access.redhat.com/articles/rhel-limits

Anyway, if this bug is relevant and you have such system, just start as many VMs as needed to reach the 21TB of shared memory. Otherwise, the issue can be reproduced hacking VDSM to report such amount of shared memory.

Here the vdsm hack diff to trigger the issue:

diff --git a/vdsm/momIF.py b/vdsm/momIF.py
index a2088ef..af0038d 100644
--- a/vdsm/momIF.py
+++ b/vdsm/momIF.py
@@ -61,8 +61,9 @@ class MomThread(threading.Thread):
         ret = {}
         ret['ksmState'] = bool(stats['ksm_run'])
         ret['ksmPages'] = stats['ksm_pages_to_scan']
-        ret['memShared'] = stats['ksm_pages_sharing'] * PAGE_SIZE_BYTES
-        ret['memShared'] /= Mbytes
+        #ret['memShared'] = stats['ksm_pages_sharing'] * PAGE_SIZE_BYTES
+        #ret['memShared'] /= Mbytes
+        ret['memShared'] = 30000000
         ret['ksmCpu'] = stats['ksmd_cpu_usage']
         return ret

Using this hack in vdsm, the current result is:
Shared Memory: -164523%

The expected result is a positive number that makes sense considering the total amount of RAM.

Comment 4 Eyal Edri 2014-08-04 11:09:16 UTC
these bugs are candidates for z-stream, but not ready yet.
they were not included in 3.4.2 bug tracker [1] for critical bugs by gss,
and out of of scope for the 3.4.2 build.
moving to 3.4.3.


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1123858

Comment 5 Eyal Edri 2014-10-01 08:20:04 UTC
this bug wasn't included in the rhev 3.4.3 tracker bug and missed the build date of the build, also wasn't cloned to 3.4.z.
hence moving to 3.4.4.

Comment 6 Eyal Edri 2014-11-13 13:37:07 UTC
this bug is propose to clone to 3.4.z, but missed the 3.4.4 builds.
moving to 3.4.5 - please clone once ready.

Comment 9 Gil Klein 2015-01-13 08:49:54 UTC
Meital, please verify based on the vdsm hook patch suggested in comment #2

Comment 10 Nikolai Sednev 2015-01-15 18:24:32 UTC
Works for me using these components on hosts:
libvirt-client-1.1.1-29.el7_0.4.x86_64
vdsm-4.16.8.1-5.el7ev.x86_64
ovirt-hosted-engine-setup-1.2.1-8.el7ev.noarch
qemu-kvm-rhev-1.5.3-60.el7_0.11.x86_64
mom-0.4.1-4.el7ev.noarch
sanlock-3.1.0-2.el7.x86_64
Linux version 3.10.0-123.19.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #1 SMP Mon Dec 15 14:04:04 EST 2014

And these components on HE:
rhevm-3.5.0-0.29.el6ev.noarch
rhevm-guest-agent-common-1.0.10-2.el6ev.noarch
Linux version 2.6.32-504.3.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-9) (GCC) ) #1 SMP Fri Dec 12 16:05:43 EST 2014

Following steps described in comment #2, I see via engine that "Shared Memory: 516528%".

Please check the attached print-screen.

Please backport to https://bugzilla.redhat.com/show_bug.cgi?id=1166010

Comment 11 Nikolai Sednev 2015-01-15 18:25:42 UTC
Created attachment 980580 [details]
shared memory positive value screenshot

Comment 14 errata-xmlrpc 2015-02-11 18:01:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0158.html