Description of problem:
When deploying metrics, many resources have their values set by default and not customized. This becomes a problem when deploying in vhosts as the default values might not reflect the host resources allocation and settings thus installation can fails in many ways, few quite not easy to debug. Also, some default values may make installation slower and riskier to fail.
The bad boys are:
- CPU allocation:
Both metrics vms are set to have 4 cores by default and it sets that through the variable *cores* on */usr/share/ansible/roles/oVirt.metrics/roles/oVirt.origin-on-ovirt/templates/vars.yaml.template*
This variable by itself doesn't specify which combination of vcores, threads and vsockets will form the 4 final cores - depending on how these are pre-allocated in the (v)host, this may cause kernel panic or other issues.
The variables *cpu_sockets* and *cpu_threads* should be set and available to be customize (or maybe could discovered by the installation)
- Memory allocation:
Also on */usr/share/ansible/roles/oVirt.metrics/roles/oVirt.origin-on-ovirt/templates/vars.yaml.template* the memory is set to be 8 GiB, however as guarantee memory is not set, its default value is 1 GiB, what can cause a lot of trouble and slowness.
The variable *memory_guaranteed* for both vms should have at least 2 GiB (maybe 4 for a better performance)
- Hosts allocation on vhost:
This one is the more problematic for vhosts. When a vhost needs to handle both metrics-store-installer and master0 vm (like in the middle of the deployment), they vhost can get way too slow, disrupting and eventually failing the installation, or even panicking the kernel - This happen even if the vhost has plenty of resources available. Having two vhosts (one for each vm) solved the issue.
The workaround that I did to be able to achieve this was to, first have each vhost in a different cluster and manually set them through the *cluster* variable on */usr/share/ansible/roles/oVirt.metrics/roles/oVirt.origin-on-ovirt/templates/vars.yaml.template*
Then, match the cluster set with the cluster condition on line 23 in */usr/share/ansible/roles/oVirt.metrics/roles/oVirt.origin-on-ovirt/tasks/create_openshift_bastion_vm.yml*
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. deploy metrics based on the tutorial
Shirly, this is in modified, targeted to 4.4.1 but code for this is included in 4.3.9-1 package.
Should this bug move to 4.3.9-1 and to ON_QA status?
Moved to 4.3.11 only because we are removing the metrics store deployment in 4.4 with bug #1827177
Deployed metrics successfully on a virtual host fully customizing the CPU threats, cores, RAM guarantee, etc.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Important: RHV Manager (ovirt-engine) 4.4 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.