Description of problem:
Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-18.104.22.168 (hosts are running RHVH-4.1-20170209.0)
How reproducible: Uncertain; this was encountered one time after the first time completing the RHV environment setup process, approximately 5 days prior to the occurrence of the issue.
Steps to Reproduce:
1. Install RHV on a 3-node cluster using the steps outlined in https://access.redhat.com/articles/2578391 as a guide.
2. Run ovirt-hosted-engine-setup to set up a hosted engine on one of the nodes. Do not use an answer file; instead, enter the options when requested. Use defaults for all of the options that have defaults, and enter information for all of the options that do not have a default. (In this state, the "OVEHOSTED_VM/applianceMem" should have a value of "4096", indicating 4 GiB of memory for the virtual machine.)
3. Start using the cluster. (We created 15 VMs with 2 GiB of memory and 2 CPU cores, an IDE disk device with 180 GiB virtual size, and RHEL 7 installed as the operating system.)
The java process that was running the ovirt-engine server instance was terminated by oom-killer.
The java process that runs the ovirt-engine server instance is left alone.
After reporting this to our contact at Red Hat, it was recommended that we increase the memory on the hosted engine VM to 8 GiB. 2 days and 20 hours after this change was made, running "free -m" on the system shows 2670 MiB used, 3440 MiB free, 1710 MiB buff/cache.
Created attachment 1258190 [details]
/var/log/messages output from the time of oom-killer
Created attachment 1258191 [details]
Output from "ps aux" (sorted by RSS descending) from the hosted-engine VM 2 days after reboot
Doesn't look like a hosted engine specific issue. If engine was running on a 4 GB bare metal server it would have been killed as well.
Moving to ovirt-engine and infra team for further investigation.
Adding some info from our mail thread:
On 02/24/2017 03:32 PM, Bryan Gurney wrote:
> Yes, the hosted engine VM has 4 GB of RAM configured; that was the default
> during the ovirt-hosted-engine-setup process. Should the default for the
> hosted engine's VM be set higher than 4 GB?
Yes, I think you need to set it to a higher value. The engine is
configured by default to use at most 1 GiB of heap. But then it also
consumes off-heap space (stacks, native buffers, etc), so it can consume
up to 2 GiB of RSS (assuming there are no off-heap leaks). Then you have
also the DWH server, and the database. For production environments will
all the components in one machine we usually recommend 16 GiB. That is
way too much, in my opinion. I think that something that 8 GiB can be
healthy for you. Note that as this is a VM, the memory won't be wasted,
the hypervisor will dedicate the unused memory to other VMs.
Let's increase appliance default memory to 8Gb
(In reply to Sandro Bonazzola from comment #5)
> Let's increase appliance default memory to 8Gb
Please don't do that yet, we need to understand the root cause of the issue. According to QA increasing default memory for appliance will have a significant impact on QA automation.
Roy, could you please investigate this issue?
Could you please share with us complete engine logs before the OOM kill, so we can investigate engine utilization?
We want to align the spec with the minimal requirement for bare metal engine and allow user to choose less. In the Grafton use case they would want to decide on the amount allocated. Sahina please track this bug and decide if to pass a lower memory parameter to HE setup with answer file.
(In reply to Yaniv Dary from comment #8)
> We want to align the spec with the minimal requirement for bare metal engine
> and allow user to choose less.
Note that according to  minimum for bare metal is 4GB, recommended is 16GB.
On phone call discussion it was pointed out to go with recommended.
Ok to set the appliance to recommended?
- 16 GB RAM
- 4 cores
- 50 GB disk
> In the Grafton use case they would want to
> decide on the amount allocated. Sahina please track this bug and decide if
> to pass a lower memory parameter to HE setup with answer file.
Created attachment 1258682 [details]
/var/log/messages-20170226 file (2017-02-20 14:29:28 EST to 2017-02-26 03:31:01 EST)
I've attached the file from /var/log/messages-20170226 on the hosted-engine VM, which covers the time of the oom-killer event (Feb 22 11:38:48 EST).
Bryan, could you please provide also engine.log and server.log from the time as I mentioned in Comment 7? We are not able to see engine internal processes from /var/log/messages ...
Created attachment 1258690 [details]
/var/log/ovirt-engine/server.log from the hosted-engine VM
Created attachment 1258691 [details]
/var/log/ovirt-engine/engine.log-20170223.gz from the hosted-engine VM
I've attached the server.log and the engine.log file that covers the time of the oom-killer event. (The engine.log-20170223.gz file covers from 2017-02-22 03:16:06,312-05 to 2017-02-23 02:40:22,316-05.)
What will happen on upgrade?
Do we need docs bug or release note on it?
I am not that familiar with the upgrade process yet, I can check.
(In reply to Yaniv Kaul from comment #18)
> What will happen on upgrade?
> Do we need docs bug or release note on it?
We'll need a release note, probably.
Even if the appliance is 'yum updated', this just ships a new OVA. I'm not sure whether ovirt-hosted-engine-ha|agent|setup automatically pick this up and use the new OVA (and associated VM definition), but I doubt it...
Required parameters being received during deployment from an appliance, unless we will have it, we can't proceed with the verification.
Returning back to assigned.
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
(In reply to Nikolai Sednev from comment #23)
> Required parameters being received during deployment from an appliance,
> unless we will have it, we can't proceed with the verification.
> Returning back to assigned.
A new build was delivered today. You should have been on the smoke test...
(In reply to Ryan Barry from comment #21)
> Even if the appliance is 'yum updated', this just ships a new OVA. I'm not
> sure whether ovirt-hosted-engine-ha|agent|setup automatically pick this up
> and use the new OVA (and associated VM definition), but I doubt it...
No, it will just update the OVA source but nothing will automatically happen on the running engine VM.
The user has to manually edit the VM definition in the engine allocating more ram and restart it to make it effective.
rhvm-appliance-4.1.20170403.0-1.el7.noarch bring these new default values now:
Please specify the memory size of the VM in MB (Defaults to appliance OVF value): :
The following CPU types are supported by this host:
- model_Westmere: Intel Westmere Family
- model_Nehalem: Intel Nehalem Family
- model_Penryn: Intel Penryn Family
- model_Conroe: Intel Conroe Family
Please specify the CPU type to be used by the VM [model_Westmere]:
Please specify the number of virtual CPUs for the VM (Defaults to appliance OVF value): :
Moving to verified.
Components on hosts:
Linux version 3.10.0-514.16.1.el7.x86_64 (firstname.lastname@example.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 10 13:12:32 EST 2017
Linux 3.10.0-514.16.1.el7.x86_64 #1 SMP Fri Mar 10 13:12:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)