Description of problem: There is an integer overflow in getMemSharedPercent(). Integer limit is 2,147,483,647. In backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/vdsbroker/VdsProperties.java, we have: ... 148 public static final String mem_shared = "memShared";$ ... 687 public Integer getMemSharedPercent() { 688 Long shared = mVdsStatistics.getmem_shared(); 689 Integer physical = mVdsDynamic.getphysical_mem_mb(); 690 691 if (shared == null || physical == null || physical == 0) { 692 return 0; 693 } 694 695 return ((int) (shared * 100) / physical); 696 } ... Since "shared" is multiplied by 100, the current limit for "memShared" before causing the overflow is 21,474,836. 21,474,836 will represent ~21TB of shared memory. Not sure when we will have a system with 21TB of shared memory, but let's avoid this issue now. Version-Release number of selected component (if applicable): rhevm-3.4.0-0.15.beta3.el6ev.noarch.rpm Additional info: Here two examples - No overflow: Integer physical = 1033939; Integer shared = 21474836; ((int) (shared * 100) / physical) Result: 2076 - Overflow taking place: Integer physical = 1033939; Integer shared = 21474837; ((int) (shared * 100) / physical) Result: -2076 Actual results: RHEV Manager reporting negative and inconsistent values for Shared Memory.
(In reply to Amador Pahim from comment #0) > Description of problem: > > There is an integer overflow in getMemSharedPercent(). Integer limit is > 2,147,483,647. In > backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/ > vdsbroker/vdsbroker/VdsProperties.java, we have: > > ... > 148 public static final String mem_shared = "memShared";$ > > ... > 687 public Integer getMemSharedPercent() { > 688 Long shared = mVdsStatistics.getmem_shared(); > 689 Integer physical = mVdsDynamic.getphysical_mem_mb(); > 690 > 691 if (shared == null || physical == null || physical == 0) { > 692 return 0; > 693 } > 694 > 695 return ((int) (shared * 100) / physical); > 696 } > ... > > > Since "shared" is multiplied by 100, the current limit for "memShared" > before causing the overflow is 21,474,836. 21,474,836 will represent ~21TB > of shared memory. Not sure when we will have a system with 21TB of shared > memory, but let's avoid this issue now. > > > Version-Release number of selected component (if applicable): > rhevm-3.4.0-0.15.beta3.el6ev.noarch.rpm > > > Additional info: > > Here two examples > > - No overflow: > > Integer physical = 1033939; > Integer shared = 21474836; > ((int) (shared * 100) / physical) > Result: 2076 > > - Overflow taking place: > > Integer physical = 1033939; > Integer shared = 21474837; > ((int) (shared * 100) / physical) > Result: -2076 > > > Actual results: > RHEV Manager reporting negative and inconsistent values for Shared Memory. Amador Pahim, Please provide exact reproduction steps, expected results and current results, we'll have to reproduce this, so we need more details. Add also how is reproducable, 100% or rare.
(In reply to Nikolai Sednev from comment #1) > (In reply to Amador Pahim from comment #0) > > Description of problem: > > > > There is an integer overflow in getMemSharedPercent(). Integer limit is > > 2,147,483,647. In > > backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/ > > vdsbroker/vdsbroker/VdsProperties.java, we have: > > > > ... > > 148 public static final String mem_shared = "memShared";$ > > > > ... > > 687 public Integer getMemSharedPercent() { > > 688 Long shared = mVdsStatistics.getmem_shared(); > > 689 Integer physical = mVdsDynamic.getphysical_mem_mb(); > > 690 > > 691 if (shared == null || physical == null || physical == 0) { > > 692 return 0; > > 693 } > > 694 > > 695 return ((int) (shared * 100) / physical); > > 696 } > > ... > > > > > > Since "shared" is multiplied by 100, the current limit for "memShared" > > before causing the overflow is 21,474,836. 21,474,836 will represent ~21TB > > of shared memory. Not sure when we will have a system with 21TB of shared > > memory, but let's avoid this issue now. > > > > > > Version-Release number of selected component (if applicable): > > rhevm-3.4.0-0.15.beta3.el6ev.noarch.rpm > > > > > > Additional info: > > > > Here two examples > > > > - No overflow: > > > > Integer physical = 1033939; > > Integer shared = 21474836; > > ((int) (shared * 100) / physical) > > Result: 2076 > > > > - Overflow taking place: > > > > Integer physical = 1033939; > > Integer shared = 21474837; > > ((int) (shared * 100) / physical) > > Result: -2076 > > > > > > Actual results: > > RHEV Manager reporting negative and inconsistent values for Shared Memory. > > Amador Pahim, > Please provide exact reproduction steps, expected results and current > results, we'll have to reproduce this, so we need more details. > Add also how is reproducable, 100% or rare. This is a very rare condition and the reproduction is hard, since you will need a system with 21TB of shared memory to trigger it. I'm not sure if we have to deal with it, since it's supposed to affect only really big servers, with a total memory very close to the RHEL theoretical limit (64TB) and far above the tested limit (3TB). See https://access.redhat.com/articles/rhel-limits Anyway, if this bug is relevant and you have such system, just start as many VMs as needed to reach the 21TB of shared memory. Otherwise, the issue can be reproduced hacking VDSM to report such amount of shared memory. Here the vdsm hack diff to trigger the issue: diff --git a/vdsm/momIF.py b/vdsm/momIF.py index a2088ef..af0038d 100644 --- a/vdsm/momIF.py +++ b/vdsm/momIF.py @@ -61,8 +61,9 @@ class MomThread(threading.Thread): ret = {} ret['ksmState'] = bool(stats['ksm_run']) ret['ksmPages'] = stats['ksm_pages_to_scan'] - ret['memShared'] = stats['ksm_pages_sharing'] * PAGE_SIZE_BYTES - ret['memShared'] /= Mbytes + #ret['memShared'] = stats['ksm_pages_sharing'] * PAGE_SIZE_BYTES + #ret['memShared'] /= Mbytes + ret['memShared'] = 30000000 ret['ksmCpu'] = stats['ksmd_cpu_usage'] return ret Using this hack in vdsm, the current result is: Shared Memory: -164523% The expected result is a positive number that makes sense considering the total amount of RAM.
these bugs are candidates for z-stream, but not ready yet. they were not included in 3.4.2 bug tracker [1] for critical bugs by gss, and out of of scope for the 3.4.2 build. moving to 3.4.3. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1123858
this bug wasn't included in the rhev 3.4.3 tracker bug and missed the build date of the build, also wasn't cloned to 3.4.z. hence moving to 3.4.4.
this bug is propose to clone to 3.4.z, but missed the 3.4.4 builds. moving to 3.4.5 - please clone once ready.
Meital, please verify based on the vdsm hook patch suggested in comment #2
Works for me using these components on hosts: libvirt-client-1.1.1-29.el7_0.4.x86_64 vdsm-4.16.8.1-5.el7ev.x86_64 ovirt-hosted-engine-setup-1.2.1-8.el7ev.noarch qemu-kvm-rhev-1.5.3-60.el7_0.11.x86_64 mom-0.4.1-4.el7ev.noarch sanlock-3.1.0-2.el7.x86_64 Linux version 3.10.0-123.19.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #1 SMP Mon Dec 15 14:04:04 EST 2014 And these components on HE: rhevm-3.5.0-0.29.el6ev.noarch rhevm-guest-agent-common-1.0.10-2.el6ev.noarch Linux version 2.6.32-504.3.3.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-9) (GCC) ) #1 SMP Fri Dec 12 16:05:43 EST 2014 Following steps described in comment #2, I see via engine that "Shared Memory: 516528%". Please check the attached print-screen. Please backport to https://bugzilla.redhat.com/show_bug.cgi?id=1166010
Created attachment 980580 [details] shared memory positive value screenshot
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0158.html