Bug 1093786 - Negative values for "Shared Memory"
Summary: Negative values for "Shared Memory"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: 3.5.0
Assignee: Doron Fediuck
QA Contact: Nikolai Sednev
URL:
Whiteboard: sla
Depends On:
Blocks: 1072347 1166010
TreeView+ depends on / blocked
 
Reported: 2014-05-02 16:23 UTC by Amador Pahim
Modified: 2019-05-20 11:13 UTC (History)
14 users (show)

Fixed In Version: vt12
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1166010 (view as bug list)
Environment:
Last Closed: 2015-02-11 18:01:18 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:
mavital: needinfo+
nsednev: needinfo-


Attachments (Terms of Use)
shared memory positive value screenshot (100.59 KB, image/png)
2015-01-15 18:25 UTC, Nikolai Sednev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1124773 0 None None None Never
Red Hat Product Errata RHSA-2015:0158 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Virtualization Manager 3.5.0 2015-02-11 22:38:50 UTC
oVirt gerrit 30614 0 master MERGED engine: fixed possible int overflow in getMemSharedPercent Never
oVirt gerrit 30644 0 ovirt-engine-3.5 MERGED engine: fixed possible int overflow in getMemSharedPercent Never

Description Amador Pahim 2014-05-02 16:23:34 UTC
Description of problem:

There is an integer overflow in getMemSharedPercent(). Integer limit is 2,147,483,647. In backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/vdsbroker/VdsProperties.java, we have:

...
 148     public static final String mem_shared = "memShared";$                       
...
 687     public Integer getMemSharedPercent() {
 688         Long shared = mVdsStatistics.getmem_shared();
 689         Integer physical = mVdsDynamic.getphysical_mem_mb();
 690 
 691         if (shared == null || physical == null || physical == 0) {
 692             return 0;
 693         }
 694 
 695         return ((int) (shared * 100) / physical);
 696     }
...


Since "shared" is multiplied by 100, the current limit for "memShared" before causing the overflow is 21,474,836. 21,474,836 will represent ~21TB of shared memory. Not sure when we will have a system with 21TB of shared memory, but let's avoid this issue now.


Version-Release number of selected component (if applicable):
rhevm-3.4.0-0.15.beta3.el6ev.noarch.rpm


Additional info:

Here two examples

- No overflow:

Integer physical = 1033939;
Integer shared = 21474836;
((int) (shared * 100) / physical)
Result: 2076

- Overflow taking place:

Integer physical = 1033939;
Integer shared = 21474837;
((int) (shared * 100) / physical)
Result: -2076


Actual results:
RHEV Manager reporting negative and inconsistent values for Shared Memory.

Comment 1 Nikolai Sednev 2014-07-22 13:28:21 UTC
(In reply to Amador Pahim from comment #0)
> Description of problem:
> 
> There is an integer overflow in getMemSharedPercent(). Integer limit is
> 2,147,483,647. In
> backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/
> vdsbroker/vdsbroker/VdsProperties.java, we have:
> 
> ...
>  148     public static final String mem_shared = "memShared";$              
> 
> ...
>  687     public Integer getMemSharedPercent() {
>  688         Long shared = mVdsStatistics.getmem_shared();
>  689         Integer physical = mVdsDynamic.getphysical_mem_mb();
>  690 
>  691         if (shared == null || physical == null || physical == 0) {
>  692             return 0;
>  693         }
>  694 
>  695         return ((int) (shared * 100) / physical);
>  696     }
> ...
> 
> 
> Since "shared" is multiplied by 100, the current limit for "memShared"
> before causing the overflow is 21,474,836. 21,474,836 will represent ~21TB
> of shared memory. Not sure when we will have a system with 21TB of shared
> memory, but let's avoid this issue now.
> 
> 
> Version-Release number of selected component (if applicable):
> rhevm-3.4.0-0.15.beta3.el6ev.noarch.rpm
> 
> 
> Additional info:
> 
> Here two examples
> 
> - No overflow:
> 
> Integer physical = 1033939;
> Integer shared = 21474836;
> ((int) (shared * 100) / physical)
> Result: 2076
> 
> - Overflow taking place:
> 
> Integer physical = 1033939;
> Integer shared = 21474837;
> ((int) (shared * 100) / physical)
> Result: -2076
> 
> 
> Actual results:
> RHEV Manager reporting negative and inconsistent values for Shared Memory.

Amador Pahim,
Please provide exact reproduction steps, expected results and current results, we'll have to reproduce this, so we need more details.
Add also how is reproducable, 100% or rare.

Comment 2 Amador Pahim 2014-07-22 14:27:01 UTC
(In reply to Nikolai Sednev from comment #1)
> (In reply to Amador Pahim from comment #0)
> > Description of problem:
> > 
> > There is an integer overflow in getMemSharedPercent(). Integer limit is
> > 2,147,483,647. In
> > backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/
> > vdsbroker/vdsbroker/VdsProperties.java, we have:
> > 
> > ...
> >  148     public static final String mem_shared = "memShared";$              
> > 
> > ...
> >  687     public Integer getMemSharedPercent() {
> >  688         Long shared = mVdsStatistics.getmem_shared();
> >  689         Integer physical = mVdsDynamic.getphysical_mem_mb();
> >  690 
> >  691         if (shared == null || physical == null || physical == 0) {
> >  692             return 0;
> >  693         }
> >  694 
> >  695         return ((int) (shared * 100) / physical);
> >  696     }
> > ...
> > 
> > 
> > Since "shared" is multiplied by 100, the current limit for "memShared"
> > before causing the overflow is 21,474,836. 21,474,836 will represent ~21TB
> > of shared memory. Not sure when we will have a system with 21TB of shared
> > memory, but let's avoid this issue now.
> > 
> > 
> > Version-Release number of selected component (if applicable):
> > rhevm-3.4.0-0.15.beta3.el6ev.noarch.rpm
> > 
> > 
> > Additional info:
> > 
> > Here two examples
> > 
> > - No overflow:
> > 
> > Integer physical = 1033939;
> > Integer shared = 21474836;
> > ((int) (shared * 100) / physical)
> > Result: 2076
> > 
> > - Overflow taking place:
> > 
> > Integer physical = 1033939;
> > Integer shared = 21474837;
> > ((int) (shared * 100) / physical)
> > Result: -2076
> > 
> > 
> > Actual results:
> > RHEV Manager reporting negative and inconsistent values for Shared Memory.
> 
> Amador Pahim,
> Please provide exact reproduction steps, expected results and current
> results, we'll have to reproduce this, so we need more details.
> Add also how is reproducable, 100% or rare.

This is a very rare condition and the reproduction is hard, since you will need a system with 21TB of shared memory to trigger it. I'm not sure if we have to deal with it, since it's supposed to affect only really big servers, with a total memory very close to the RHEL theoretical limit (64TB) and far above the tested limit (3TB). See https://access.redhat.com/articles/rhel-limits

Anyway, if this bug is relevant and you have such system, just start as many VMs as needed to reach the 21TB of shared memory. Otherwise, the issue can be reproduced hacking VDSM to report such amount of shared memory.

Here the vdsm hack diff to trigger the issue:

diff --git a/vdsm/momIF.py b/vdsm/momIF.py
index a2088ef..af0038d 100644
--- a/vdsm/momIF.py
+++ b/vdsm/momIF.py
@@ -61,8 +61,9 @@ class MomThread(threading.Thread):
         ret = {}
         ret['ksmState'] = bool(stats['ksm_run'])
         ret['ksmPages'] = stats['ksm_pages_to_scan']
-        ret['memShared'] = stats['ksm_pages_sharing'] * PAGE_SIZE_BYTES
-        ret['memShared'] /= Mbytes
+        #ret['memShared'] = stats['ksm_pages_sharing'] * PAGE_SIZE_BYTES
+        #ret['memShared'] /= Mbytes
+        ret['memShared'] = 30000000
         ret['ksmCpu'] = stats['ksmd_cpu_usage']
         return ret

Using this hack in vdsm, the current result is:
Shared Memory: -164523%

The expected result is a positive number that makes sense considering the total amount of RAM.

Comment 4 Eyal Edri 2014-08-04 11:09:16 UTC
these bugs are candidates for z-stream, but not ready yet.
they were not included in 3.4.2 bug tracker [1] for critical bugs by gss,
and out of of scope for the 3.4.2 build.
moving to 3.4.3.


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1123858

Comment 5 Eyal Edri 2014-10-01 08:20:04 UTC
this bug wasn't included in the rhev 3.4.3 tracker bug and missed the build date of the build, also wasn't cloned to 3.4.z.
hence moving to 3.4.4.

Comment 6 Eyal Edri 2014-11-13 13:37:07 UTC
this bug is propose to clone to 3.4.z, but missed the 3.4.4 builds.
moving to 3.4.5 - please clone once ready.

Comment 9 Gil Klein 2015-01-13 08:49:54 UTC
Meital, please verify based on the vdsm hook patch suggested in comment #2

Comment 10 Nikolai Sednev 2015-01-15 18:24:32 UTC
Works for me using these components on hosts:
libvirt-client-1.1.1-29.el7_0.4.x86_64
vdsm-4.16.8.1-5.el7ev.x86_64
ovirt-hosted-engine-setup-1.2.1-8.el7ev.noarch
qemu-kvm-rhev-1.5.3-60.el7_0.11.x86_64
mom-0.4.1-4.el7ev.noarch
sanlock-3.1.0-2.el7.x86_64
Linux version 3.10.0-123.19.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #1 SMP Mon Dec 15 14:04:04 EST 2014

And these components on HE:
rhevm-3.5.0-0.29.el6ev.noarch
rhevm-guest-agent-common-1.0.10-2.el6ev.noarch
Linux version 2.6.32-504.3.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-9) (GCC) ) #1 SMP Fri Dec 12 16:05:43 EST 2014

Following steps described in comment #2, I see via engine that "Shared Memory: 516528%".

Please check the attached print-screen.

Please backport to https://bugzilla.redhat.com/show_bug.cgi?id=1166010

Comment 11 Nikolai Sednev 2015-01-15 18:25:42 UTC
Created attachment 980580 [details]
shared memory positive value screenshot

Comment 14 errata-xmlrpc 2015-02-11 18:01:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0158.html


Note You need to log in before you can comment on or make changes to this bug.