Bug 1570564 - Tendrl-ansible precheck fails with minimum memory requirement criteria on Tendrl Server
Summary: Tendrl-ansible precheck fails with minimum memory requirement criteria on Ten...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: web-admin-tendrl-ansible
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.4.0
Assignee: Timothy Asir
QA Contact: Daniel Horák
URL:
Whiteboard:
Depends On:
Blocks: 1503137
TreeView+ depends on / blocked
 
Reported: 2018-04-23 09:05 UTC by Shekhar Berry
Modified: 2018-09-04 07:05 UTC (History)
8 users (show)

Fixed In Version: tendrl-ansible-1.6.3-7.el7rhgs
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-04 07:04:50 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2616 0 None None None 2018-09-04 07:05:55 UTC

Description Shekhar Berry 2018-04-23 09:05:26 UTC
Description of problem:

Hi,

I am setting up Tendrl Environment to analyze resource consumption under different configurations. 
My Tendrl Server is hosted on VM where I am staring with 4GB of mem and 4 CPUs initially.

Before proceeding with installation I ran the ansible prechecks.yml playbook to ensure all per-requsites are met.
It failed with error message saying minimum memory on tendlr server should be greater than 30GB.

See below:

TASK [Assert that hw requirements are met] ************************************************************************************************************************************************************************
fatal: [dhcp159-16.sbu.lab.eng.bos.redhat.com]: FAILED! => {
    "assertion": "ansible_memtotal_mb >= 30000", 
    "changed": false, 
    "evaluated_to": false
}


Version-Release number of selected component (if applicable):

rpm -qa | grep tendrl
tendrl-ansible-1.6.3-2.el7rhgs.noarch


How reproducible:

Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Martin Bukatovic 2018-04-25 11:23:27 UTC
This needs to be fixed based on the results of perf. testing, and documentation
should be aligned.

Comment 4 Nishanth Thomas 2018-04-25 13:16:00 UTC
30000 is something we decided for last release(3.3.1) and not valid anymore. 16000 is what we recommend to use for now. This will be calibrated further based on the results from the performance testing.

Comment 6 Daniel Horák 2018-05-17 08:35:46 UTC
@Nishanth, the current memory requirement (ansible_memtotal_mb >= 16000)
in prechecks.yml playbook (around line 42 for tendrl-ansible-1.6.3-3) means,
that computer with advertised 16GB of RAM doesn't fit into this requirement.

For example my laptop have "16 GB of RAM", but the reported values looks
like this:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$ free 
              total      used    free    shared  buff/cache   available
Mem:       16068228  11286132  855368    801388     3926728     6623432

$ free -m
              total      used    free    shared  buff/cache   available
Mem:          15691     11017     838       783        3835        6471

$ free -g
              total      used    free    shared  buff/cache   available
Mem:             15        10       0         0           3           6
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

And correspondingly ansible reported value is this:
  "ansible_memtotal_mb": 15691

My question is: should commonly advertised 16 GB of RAM be enough for RHGS WA
server?
If yes, we should tweak the value in the prechecks.yml check, to fit to the
real reported value (I would say that using 15000 there should be safe).

Comment 7 Nishanth Thomas 2018-05-23 13:46:45 UTC
@Daniel, I don't think this is really required. even if you advertise 16 GB it will be lesser than that  actual. But the issue is that you cannot really say that it will be 15691 in all machines, it will be different. If you really want it to be worked on 16GB advertised machines then we might need to set to bit lesser may be 15000

Comment 8 Daniel Horák 2018-05-23 14:09:51 UTC
If we want to support RHGS WA Server on machine with 16 GB  of RAM, we should fix the value in prechecks.yml playbook so it will fit with commonly marketed 16 GB RAM.

As I've suggested in Comment 6 and Nishanth verify in Comment 7, reasonable value seems to be 15000.

If you decide no to change this value, please move this BZ back to ON_QA and we should accordingly change our documentation with proper requirements, which will be higher than 16 GB.

Comment 9 Timothy Asir 2018-06-13 09:53:12 UTC
A patch sent to code.engineering which set the value to 15000.

Comment 13 Martin Bukatovic 2018-08-14 19:06:06 UTC
Checking with tendrl-ansible-1.6.3-6.el7rhgs.noarch

Since the current values in the prechecks playbook doesn't match the document
linked in comment 12:

> ansible_memtotal_mb >= 15000
> ansible_processor_vcpus >= 4

I'm moving this BZ back to assigned state to make it clear that the changes
based on the updated hw requirements in the prechecks playbook are needed.

Comment 15 Martin Bukatovic 2018-08-14 19:14:08 UTC
Since the new hw requirements talks about 3 basic sizes of cluster, each with
slightly different minimal requirements (memory and cpu are in scope of this bz
only), we have 2 options:

 * make 3 variants of the check, based on each configuration and make sure
   the correct check is enforced based on the size of the cluster (we can
   check size of the gluster_servers)

 * enforce only the very minimal requirements based on the smallest setup

Which would make more sense?

Comment 17 Martin Bukatovic 2018-08-16 15:33:41 UTC
Update based on triage meeting on 2018-08-16: we are going with 2nd option
(enforce only the very minimal requirements based on the smallest setup).

Advice on memtotal value
========================

The prechecks playbook checks the size of total memory like this:

> ansible_memtotal_mb >= 15000

This `ansible_memtotal_mb` fact is based on MemTotal field from
/proc/meminfo of remote host, which is defined as (see man proc):

> Total usable RAM (i.e. physical RAM minus a few reserved bits and the kernel
> binary code).

So if we just check the exact number for 4 GiB:

> ansible_memtotal_mb >= 4096

that will fail for machine with exactly 4 GiB ram, because MemTotal on such
machine will be bit smaller.

For this reason, we need to do this check with a smaller number, taking the
reserved kernel bits into account. But

Here is a little list of observations (all done on RHEL 7, x86_64 machines):

 * ram: 2 GiB = 2048 MiB, MemTotal: 1838 MiB, 210 MiB is reserved
 * ram: 4 GiB = 4096 MiB, MemTotal: 3789 MiB, 307 MiB is reserved
 * ram: 8 GiB = 8192 MiB, MemTotal: 7822 MiB, 370 MiB is reserved

So based on all this, I would suggest to check with 3700 MiB (to be extra
save with the limit):

> ansible_memtotal_mb >= 3700

Comment 18 Daniel Horák 2018-08-17 11:41:11 UTC
Testd and Verified on VM with 4GB RAM:

# rpm -q tendrl-ansible
  tendrl-ansible-1.6.3-7.el7rhgs.noarch

# hwinfo
  <<truncated>>
  Memory Device: #4352
    Location: "DIMM 0"
    Memory Array: #4096
    Error Info: #0
    Form Factor: 0x09 (DIMM)
    Type: 0x07 (RAM)
    Data Width: 64 bits
    Size: 4 GB
  <<truncated>>

# free
            total      used      free  shared  buff/cache   available
  Mem:    3880732    590632   1730080   17316     1560020     2939996
  Swap:   1999868         0   1999868

# ansible-playbook -i inventory prechecks.yml
    <<truncated>>
  TASK [Assert that hw requirements are met] ********************************
  ok: [tendrl-server.example.com] => {
      "changed": false, 
      "msg": "All assertions passed"
  }
    <<truncated>>
  PLAY RECAP ****************************************************************
  tendrl-server.examplem : ok=8    changed=0    unreachable=0    failed=0   
  tendrl-server.examplem : ok=8    changed=0    unreachable=0    failed=0   
  tendrl-server.examplem : ok=8    changed=0    unreachable=0    failed=0   
  tendrl-server.examplem : ok=8    changed=0    unreachable=0    failed=0   
  tendrl-server.examplem : ok=8    changed=0    unreachable=0    failed=0   
  tendrl-server.examplem : ok=8    changed=0    unreachable=0    failed=0   
  tendrl-server.example.com : ok=12   changed=0    unreachable=0    failed=0   
  localhost                  : ok=3    changed=0    unreachable=0    failed=0   

>> VERIFIED

Comment 19 Daniel Horák 2018-08-17 11:43:50 UTC
>  PLAY RECAP ****************************************************************
>  tendrl-server.examplem : ok=8    changed=0    unreachable=0    failed=0   
>  tendrl-server.examplem : ok=8    changed=0    unreachable=0    failed=0   
>  tendrl-server.examplem : ok=8    changed=0    unreachable=0    failed=0   
>  tendrl-server.examplem : ok=8    changed=0    unreachable=0    failed=0   
>  tendrl-server.examplem : ok=8    changed=0    unreachable=0    failed=0   
>  tendrl-server.examplem : ok=8    changed=0    unreachable=0    failed=0   
>  tendrl-server.example.com : ok=12   changed=0    unreachable=0    failed=0   
>  localhost                  : ok=3    changed=0    unreachable=0    failed=0   

This part was slightly malformed and should looks like:

  PLAY RECAP **************************************************************
  gl1.usmqe.example.com : ok=8    changed=0    unreachable=0    failed=0   
  gl2.usmqe.example.com : ok=8    changed=0    unreachable=0    failed=0   
  gl3.usmqe.example.com : ok=8    changed=0    unreachable=0    failed=0   
  gl4.usmqe.example.com : ok=8    changed=0    unreachable=0    failed=0   
  gl5.usmqe.example.com : ok=8    changed=0    unreachable=0    failed=0   
  gl6.usmqe.example.com : ok=8    changed=0    unreachable=0    failed=0   
  tendrl-server.usmqe.example.com : ok=12   changed=0    unreachable=0    failed=0   
  localhost             : ok=3    changed=0    unreachable=0    failed=0

Comment 21 errata-xmlrpc 2018-09-04 07:04:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616


Note You need to log in before you can comment on or make changes to this bug.