Bug 1329119 - Engine fails to start if machine's memory is reduced by 75%
Summary: Engine fails to start if machine's memory is reduced by 75%
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Services
Version: 3.6.4.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.1.2
: 4.1.2
Assignee: Yedidyah Bar David
QA Contact: Lucie Leistnerova
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-21 09:10 UTC by KooV
Modified: 2017-05-23 08:21 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: On new setups, engine-setup configures the engine's jvm heap size to be 25% of available memory. This configuration value is neither changed by upgrades nor by backup/restore. If the jvm fails due to not enough memory, this was not mentioned in the logs, so was hard to understand. Consequence: If the engine was set up on some machine, and later on was run with much less memory (e.g. because the machine's memory was reduced, or because it was backed up and restored to a machine with much less memory), it failed, and this failure was not explained well in the logs, so was hard to understand and debug. Fix: The code starting the engine was changed to add to the logs details about failure of the jvm, so this is more visible (including, but not limited to, the case of not enough memory). It was also changed to limit the jvm's heap size to 90% of available memory, and add a warning to the logs about this if needed. This can be prevented by adding a configuration file with 'ENFORCE_ENGINE_HEAP_PARAMS=true'. Result: The engine no longer fails to start due to not enough memory. If it does fail, the logs should make it easier to understand why it failed.
Clone Of:
Environment:
Last Closed: 2017-05-23 08:21:30 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.1+
ylavi: blocker-
rule-engine: exception+
ylavi: planning_ack+
sbonazzo: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 73686 0 master MERGED packaging: services: Detect JBoss version with log level info 2020-04-06 08:15:57 UTC
oVirt gerrit 73687 0 master MERGED packaging: pythonlib: Add mem.py 2020-04-06 08:15:57 UTC
oVirt gerrit 73688 0 master MERGED packaging: services: Limit engine heap size 2020-04-06 08:15:57 UTC
oVirt gerrit 73735 0 ovirt-engine-4.1 MERGED packaging: services: Detect JBoss version with log level info 2020-04-06 08:15:56 UTC
oVirt gerrit 73736 0 ovirt-engine-4.1 MERGED packaging: pythonlib: Add mem.py 2020-04-06 08:15:56 UTC
oVirt gerrit 75054 0 ovirt-engine-4.1 MERGED packaging: services: Limit engine heap size 2020-04-06 08:15:56 UTC
oVirt gerrit 75080 0 master MERGED packaging: pythonlib: mem.py: Add M unit 2020-04-06 08:15:56 UTC
oVirt gerrit 75085 0 ovirt-engine-4.1 MERGED packaging: pythonlib: mem.py: Add M unit 2020-04-06 08:15:56 UTC

Description KooV 2016-04-21 09:10:07 UTC
How reproducible:


Steps to Reproduce:
1. backup from SERVER-A (RAM:256GB)
2. restore to SERVER-B (TAM:8GB)
3. restore files and db are success but ovirt-engine will not start when engine-setup
4. restore process will fail with error log (/var/log/messages)
ovirt-engine: ERROR run:532 Error: Cannot detect JBoss version

-------------------------------
this failure cause by different of memory size(256GB -> 8GB)

I solve this problem by edit /etc/ovirt-engine/engine.conf.d/10-setup-java.conf
from
ENGINE_HEAP_MIN="64565M"
ENGINE_HEAP_MAX="64565M"

to
ENGINE_HEAP_MIN="1964M"
ENGINE_HEAP_MAX="1964M"

I think engine-setup process will need recalculating heap memory size.
and auto modifying 10-setup-java.conf

I'm sorry not familiar with English.

Comment 1 Yedidyah Bar David 2016-04-21 09:25:56 UTC
This is a duplicate of bug 1299526, which was closed. Instead we have bug 1320579 to document this.

We didn't want to automatically change, because we risk overwriting manual changes the admin did to this conf.

Main issue is that the same issue affects people just changing the amount of memory allocated to the engine VM (or even physical machine) - which does not require running engine-backup nor engine-setup, just reboot.

Perhaps the correct solution should be for the engine itself (python wrapper) to check MIN and MAX against available and reduce (say, to 90% of available or something like that) accordingly. To do this properly, we should also make the engine notify the admin about this, so that the admin can make a substantiated decision.

Sandro - what do you say?

Comment 2 Yaniv Lavi 2016-04-27 07:50:01 UTC
We should add text at the end of the restore like:
"Please review /etc/ovirt-engine/engine.conf.d/10-setup-java.conf memory allocation settings, if the manager machine you restored to has less memory than the original manager"

Comment 3 Yedidyah Bar David 2016-05-01 07:27:56 UTC
(In reply to Yaniv Dary from comment #2)
> We should add text at the end of the restore like:
> "Please review /etc/ovirt-engine/engine.conf.d/10-setup-java.conf memory
> allocation settings, if the manager machine you restored to has less memory
> than the original manager"

As I wrote in comment 1, this does not help the flow of a user reducing the amount of memory allocated to an existing engine machine, without any backup/restore.

Comment 4 Sandro Bonazzola 2016-05-02 09:57:42 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 5 Sandro Bonazzola 2016-05-04 07:15:42 UTC
(In reply to Yedidyah Bar David from comment #3)
> (In reply to Yaniv Dary from comment #2)
> > We should add text at the end of the restore like:
> > "Please review /etc/ovirt-engine/engine.conf.d/10-setup-java.conf memory
> > allocation settings, if the manager machine you restored to has less memory
> > than the original manager"
> 
> As I wrote in comment 1, this does not help the flow of a user reducing the
> amount of memory allocated to an existing engine machine, without any
> backup/restore.

If we want the engine to issue an alert in the web admin, I think this should be re-targeted to 4.1 since it won't be just a text bug.
I suggest to add the warning text here and then open an RFE on engine to detect when memory is running out and issue a warning in the web ui.

Comment 6 Yedidyah Bar David 2016-05-04 07:18:28 UTC
(In reply to Sandro Bonazzola from comment #5)
> 
> If we want the engine to issue an alert in the web admin, I think this
> should be re-targeted to 4.1 since it won't be just a text bug.
> I suggest to add the warning text here and then open an RFE on engine to
> detect when memory is running out and issue a warning in the web ui.

Well, we can write it to the log for now and open an RFE for ui.

More important IMO is to limit them to current available on engine start, so that engine manages to start. People not always look at log files.

Comment 7 Pavel Stehlik 2016-05-04 07:35:03 UTC
When storage admin add 10 new LUNs to storage server. When admin add new Host to physical rack - do we adjust automatically RHEV? Do we inform about it RHEV admin? So why we should babysitting this specific action? 
 I am for:
1) proposal c#2
2) DOC text 

c#6 - I would be also partially happy with current situation - engine not start - but as long as there is clear message in the log about why not to start. If the admin doesn't care about log files - it's his problem. 
 The issue is misleading message about JBOSS version. Once there is clear message eg: "...service can't start - not enough mem, check /etc/ovirt-engine/engine.conf.d/10-setup-java.conf; ENGINE_HEAP_MIN/ENGINE_HEAP_MAX.."

Comment 8 Yaniv Lavi 2016-05-04 07:51:23 UTC
I agree that we should not babysit the admin. If he changes the machine memory allocation, he should make sure it works. Since doing p2v is a more complex flow that is more prone to error, we want to add a text for this. This is the the scope and the only required change.

Comment 9 Yedidyah Bar David 2016-05-04 08:37:50 UTC
Mind you, this is a regression. Not sure why it's marked an RFE. A regression on the engine (or engine-setup if you want), unrelated to restore.

It was introduced by bug 1185411 (3.6) and bug 1188971 (3.5.1 clone).

In <= 3.5.0, people could do:

Install and setup an engine in a 128GB ram VM

power off

edit the VM to have 16GB ram (which is our recommended value)

power on

and all will continue working.

In >= 3.5.1 this breaks.

Comment 10 Sandro Bonazzola 2016-05-04 08:50:39 UTC
(In reply to Yedidyah Bar David from comment #9)

given the regression, even if just changing the text cover some of the possible cases, I think that the proposed change to ovirt-engine.py to handle the case will be nice to have.

Comment 11 Yaniv Lavi 2016-05-04 12:30:55 UTC
(In reply to Sandro Bonazzola from comment #10)
> (In reply to Yedidyah Bar David from comment #9)
> 
> given the regression, even if just changing the text cover some of the
> possible cases, I think that the proposed change to ovirt-engine.py to
> handle the case will be nice to have.

I don't think we should change memory setting after initial install. It's up to the admin to do that.

Comment 12 Yedidyah Bar David 2016-05-04 12:40:15 UTC
(In reply to Yaniv Dary from comment #11)
> 
> I don't think we should change memory setting after initial install. It's up
> to the admin to do that.

To clarify, my suggestion wasn't to change the setting (in the config file) but to override it at runtime. That is, when engine starts, if heap min/max are too large, simply use less than that, log, and as an RFE notify admin in the ui.

Comment 13 Yaniv Lavi 2016-05-04 13:19:00 UTC
(In reply to Yedidyah Bar David from comment #12)
> (In reply to Yaniv Dary from comment #11)
> > 
> > I don't think we should change memory setting after initial install. It's up
> > to the admin to do that.
> 
> To clarify, my suggestion wasn't to change the setting (in the config file)
> but to override it at runtime. That is, when engine starts, if heap min/max
> are too large, simply use less than that, log, and as an RFE notify admin in
> the ui.

I have not problem it failing and having the boot log say that it takes too much memory and the admin will need to change this manually.

Comment 14 Yedidyah Bar David 2016-05-04 13:31:15 UTC
Changing some metadata for visibility.

Comment 15 Red Hat Bugzilla Rules Engine 2016-05-05 00:19:52 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 16 Yaniv Lavi 2016-05-05 13:44:35 UTC
After discussing with Sandro, we will do two fixes:
1. At restore time, tell the admin to check this setting if the amount allocated is larger than 4 GB.
2. With new installation capping the allocation to 4 GB.

Comment 17 Yedidyah Bar David 2016-05-08 05:58:42 UTC
(In reply to Yaniv Dary from comment #16)
> After discussing with Sandro, we will do two fixes:
> 1. At restore time, tell the admin to check this setting if the amount
> allocated is larger than 4 GB.

OK

> 2. With new installation capping the allocation to 4 GB.

What about upgrade? Notify? Cap?

What about changing memory of machine without backup/restore?

Comment 18 Yaniv Lavi 2016-05-08 06:11:21 UTC
(In reply to Yedidyah Bar David from comment #17)
> (In reply to Yaniv Dary from comment #16)
> > After discussing with Sandro, we will do two fixes:
> > 1. At restore time, tell the admin to check this setting if the amount
> > allocated is larger than 4 GB.
> 
> OK
> 
> > 2. With new installation capping the allocation to 4 GB.
> 
> What about upgrade? Notify? Cap?

No changes in upgrade.

> 
> What about changing memory of machine without backup/restore?

We will not address this use case. The admin will need to fix the values manually.

Comment 19 Yedidyah Bar David 2016-05-08 07:56:01 UTC
(In reply to Yaniv Dary from comment #18)
> (In reply to Yedidyah Bar David from comment #17)
> > What about changing memory of machine without backup/restore?
> 
> We will not address this use case. The admin will need to fix the values
> manually.

Not even better logging?

Comment 20 Yaniv Lavi 2016-05-08 10:37:23 UTC
(In reply to Yedidyah Bar David from comment #19)
> (In reply to Yaniv Dary from comment #18)
> > (In reply to Yedidyah Bar David from comment #17)
> > > What about changing memory of machine without backup/restore?
> > 
> > We will not address this use case. The admin will need to fix the values
> > manually.
> 
> Not even better logging?

If the JBoss log is not good enough with not enough memory, please open a bug on JBoss.

Comment 21 Yedidyah Bar David 2016-05-08 12:56:11 UTC
(In reply to Yaniv Dary from comment #20)
> 
> If the JBoss log is not good enough with not enough memory, please open a
> bug on JBoss.

JBoss can't do anything about this, java fails too early for handling this in JBoss.

ovirt-engine.py runs first 'java ovirt-engine-version -v'.

If I set e.g.:
ENGINE_HEAP_MIN="65536M"
ENGINE_HEAP_MAX="65536M"

This will fail, with this in the log:

-- Unit ovirt-engine.service has begun starting up.
May 08 13:26:14 didi-f19-engine.eng.lab.tlv.redhat.com ovirt-engine.py[4924]: 2016-05-08 13:26:14,504 ovirt-engine: ERROR run:532 Error: Cannot detect JBoss version
May 08 13:26:14 didi-f19-engine.eng.lab.tlv.redhat.com systemd[1]: ovirt-engine.service: Main process exited, code=exited, status=1/FAILURE

If I ask for debug logging, by doing:

echo 'OVIRT_SERVICE_DEBUG=1' > /etc/sysconfig/ovirt-engine

I get:

May 08 15:45:37 didi-f19-engine.eng.lab.tlv.redhat.com ovirt-engine.py[6558]: 2016-05-08 15:45:37,074 ovirt-engine: DEBUG _detectJBossVersion:235 Return code: 1,  | stdout: '[u'#', u'# Ther
e is insufficient memory for the Java Runtime Environment to continue.', u'# Native memory allocation (mmap) failed to map 22906142720 bytes for committing reserved memory.', u'# An error r
eport file with more information is saved as:', u'# /tmp/hs_err_pid6580.log'],  | stderr: '[u"OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f087eb00000, 22906142720, 0) f
ailed; error='Cannot allocate memory' (errno=12)"]'
May 08 15:45:37 didi-f19-engine.eng.lab.tlv.redhat.com ovirt-engine.py[6558]: 2016-05-08 15:45:37,076 ovirt-engine: ERROR run:532 Error: Cannot detect JBoss version

Comment 22 Yaniv Lavi 2016-05-23 13:18:47 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 23 Yaniv Lavi 2016-05-23 13:25:44 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 24 Yaniv Lavi 2016-10-30 12:26:46 UTC
Please address comment 16 and we should also add a meaningful message about lack of memory in engine start, if the memory size is reduced.

Comment 29 Yedidyah Bar David 2017-03-07 16:01:08 UTC
Now pushed 3 patches [1][2][3].

Patch [1] can be merged as-is, and is enough for more easily finding the root cause of the failure in the logs.

Patch [3] requires [2]. If the configured heap min/max are more than available memory, they are set to available memory, and a warning is written to the log.

Sandro - do we want anything else? IMO that's enough. Obviously we can add more code checking/warning/fixing this in engine-backup and/or engine-setup, IMO it's not worth it.

Roy - added you as reviewer for [3]. If you also want to limit it to 4GB, that's ok - but I think this should be discussed separately, in another bug, which will also change the calculation in engine-setup for new setups (and where we might discuss changing this on upgrade).

[1] https://gerrit.ovirt.org/73686 packaging: services: Detect JBoss version with log level info

[2] https://gerrit.ovirt.org/73687 packaging: pythonlib: Add mem.py

[3] https://gerrit.ovirt.org/73688 packaging: services: Limit engine heap size

Comment 30 Sandro Bonazzola 2017-03-07 16:09:21 UTC
(In reply to Yedidyah Bar David from comment #29)
> Sandro - do we want anything else? IMO that's enough. Obviously we can add
> more code checking/warning/fixing this in engine-backup and/or engine-setup,
> IMO it's not worth it.

I agree it should be enough

Comment 31 Yedidyah Bar David 2017-04-03 12:15:08 UTC
Not sure how the bug moved to MODIFIED on Mar 31 when https://gerrit.ovirt.org/73688 was still not merged (was merged today, 4 days later).

Comment 32 Yedidyah Bar David 2017-04-03 14:25:45 UTC
Sorry, there is a bug. Forgot to add M. :-(

Comment 33 Lucie Leistnerova 2017-05-12 14:32:24 UTC
Restoring engine 16GB -> 4GB with 8GB heap size shows warning in setup log
2017-05-12 15:56:57,730+0200 ovirt-engine: WARNING daemonSetup:326 ENGINE_HEAP_MAX is [8192M], total available memory is 3789 MB. Setting to 3410 MB.
Engine starts sucessfully and runs with MaxHeapSize = 3575644160 (3410.0MB)
In 10-setup-java.conf stayed ENGINE_HEAP_MAX="8192M", but it doesn't affect the run of the engine and engine-setup will change it again.


verified in ovirt-engine-4.1.2.1-0.1.el7.noarch

Comment 34 Yedidyah Bar David 2017-05-14 10:39:56 UTC
(In reply to Lucie Leistnerova from comment #33)
> Restoring engine 16GB -> 4GB with 8GB heap size shows warning in setup log
> 2017-05-12 15:56:57,730+0200 ovirt-engine: WARNING daemonSetup:326
> ENGINE_HEAP_MAX is [8192M], total available memory is 3789 MB. Setting to
> 3410 MB.

Is this engine-setup logs, or engine log?

> Engine starts sucessfully and runs with MaxHeapSize = 3575644160 (3410.0MB)
> In 10-setup-java.conf stayed ENGINE_HEAP_MAX="8192M", but it doesn't affect
> the run of the engine and engine-setup will change it again.

I do not think engine-setup changes it. Please clarify.

IMO you should get the above warning in the engine log on every restart of the engine service, until you manually change these values in 10-setup-java.conf.

> 
> 
> verified in ovirt-engine-4.1.2.1-0.1.el7.noarch

Comment 35 Lucie Leistnerova 2017-05-15 10:19:30 UTC
(In reply to Yedidyah Bar David from comment #34)
> (In reply to Lucie Leistnerova from comment #33)
> > Restoring engine 16GB -> 4GB with 8GB heap size shows warning in setup log
> > 2017-05-12 15:56:57,730+0200 ovirt-engine: WARNING daemonSetup:326
> > ENGINE_HEAP_MAX is [8192M], total available memory is 3789 MB. Setting to
> > 3410 MB.
> 
> Is this engine-setup logs, or engine log?
> 

In engine-setup log. From the Comment 29 I understood that the warning is in this log written. No warning is in engine log, but it can be seen with systemcl status after every restart.

> > Engine starts sucessfully and runs with MaxHeapSize = 3575644160 (3410.0MB)
> > In 10-setup-java.conf stayed ENGINE_HEAP_MAX="8192M", but it doesn't affect
> > the run of the engine and engine-setup will change it again.
> 
> I do not think engine-setup changes it. Please clarify.

I'm sorry, I didn't write it unequivocally. engine-setup changes the heap size for running engine not the configuration file.

> 
> IMO you should get the above warning in the engine log on every restart of
> the engine service, until you manually change these values in
> 10-setup-java.conf.
> 
> > 
> > 
> > verified in ovirt-engine-4.1.2.1-0.1.el7.noarch

And info about detecting JBoss is also alright and shows more information.


Note You need to log in before you can comment on or make changes to this bug.