Bug 1185411

Summary: ENGINE_HEAP_MAX default value as 1G must be changed
Product: Red Hat Enterprise Virtualization Manager Reporter: Marina Kalinin <mkalinin>
Component: ovirt-engineAssignee: Yedidyah Bar David <didi>
Status: CLOSED ERRATA QA Contact: Jiri Belka <jbelka>
Severity: high Docs Contact:
Priority: high    
Version: 3.5.0CC: bazulay, didi, gklein, iheim, juan.hernandez, lsurette, mkalinin, mtessun, rbalakri, Rhev-m-bugs, sbonazzo, sherold, sradco, yeylon, ykaul, ylavi, yobshans
Target Milestone: ovirt-3.6.0-rcKeywords: ZStream
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, the Manager's JVM heap size was configured to be 1GB by default. As a result, larger setups, such as those with hundred of virtual machines or more, made the Manager run out of heap memory, and required manual configuration to increase the heap size. Now, engine-setup automatically configures the heap size to be the maximum of 1GB and 1/4 of available memory. Larger setups only need to use a machine with enough memory, as is already recommended, and heap size configuration will be done automatically, thus preventing such failures and not requiring manual configuration.
Story Points: ---
Clone Of:
: 1188971 1188972 (view as bug list) Environment:
Last Closed: 2016-03-09 20:54:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 902971, 1188971, 1188972    

Description Marina Kalinin 2015-01-23 16:45:00 UTC
Right now, ENGINE_HEAP_MAX default value is 1G.
Which completely does not make sense with the minimal requirement for RHEV-M machine to be at least 16G.

We have more and more customers experiencing out of memory error and have to increase this value manually, as per this solution:
https://access.redhat.com/articles/1256093 .

Why not to avoid this problem at first, and change the default value to something more reasonable.

I have 2 suggestions:
1) Change it to be 4G by default.
2) Make it one of the installer questions. Suggest a value based on available RAM/2 ; but let user decide.

Thank you!

P.S. ENGINE_HEAP_MIN would need to be modified as well, probably.

Comment 1 Sandro Bonazzola 2015-01-26 14:47:35 UTC
Looking at the article, I think we should align to the article values by default.

ENGINE_HEAP_MIN=2g
ENGINE_HEAP_MAX=4g

Juan any reason for not doing that?

Comment 2 Juan Hernández 2015-01-26 14:51:47 UTC
I don't have any objection to do that change.

Comment 4 Yedidyah Bar David 2015-01-27 09:05:29 UTC
Shirly, what should the min/max be for dwhd and for the jboss running Reports?

Comment 5 Yaniv Lavi 2015-01-28 07:16:41 UTC
(In reply to Yedidyah Bar David from comment #4)
> Shirly, what should the min/max be for dwhd and for the jboss running
> Reports?

4Gb for JBoss running reports.
Link: http://community.jaspersoft.com/wiki/hardware-requirements-jasperreports-server#Hardwarerequirements-JVMHeapMemory

Default 1G for DWH should enough, we could increase to 2G to be safe.

Comment 6 Yedidyah Bar David 2015-02-01 20:54:01 UTC
Some comments:

1. Currently (that is, before [2] was merged), the engine defaults to 1G min/max heap size, reports does the same, and dwh does not have defaults - thus, relies on the JRE defaults, which seem to me to be more-or-less: max heap size = 0.25 of RAM. [2] changes the engine defaults to be min 2G, max 4G.

2. Currently it's possible, although perhaps slow, to run all of them together on a 1G machine, e.g. for testing. OTOH, only dwhd automatically enjoys the larger heap configured by default if running on a larger machine. So a machine with e.g. 16G (our recommended size) will have: engine 1G, reports 1G, dwhd 4G. Rest will be used for caching, and if postgresql is local, for its cache too (which helps too of course).

3. I intend to have engine-setup configure the default for each of them to be the minimum of 1G and 0.25 of RAM. This way, a machine with 1-4G will behave more-or-less as now, and one with 16G will allow each have 4G.

4. I suggest to not query the user about these, as imo we already have enough questions, and the default should be ok, but to do keep these values also in the answer file to allow to easily change them.

5. We should probably not override existing configs on upgrades, so as to not change configs done by following [1], but do change if we do not find these configured.

6. This means that we'll also ignore [2] in practice. We can add another answer file key to not configure this, thus not override [2], not sure we want.

7. I am only somewhat concerned about the case of separate machines for engine/dwh/reports - I thought about doing something complex for this, not sure it's worth it.

Comments are welcome.

[1] https://access.redhat.com/solutions/448153
[2] http://gerrit.ovirt.org/37293

Comment 7 Yedidyah Bar David 2015-02-01 22:01:27 UTC
Marina - does this sound reasonable? Do you anticipate cases that will want to allocate less than e.g. 8GB for the engine machine but still want a 2GB heap or more?

Comment 8 Yedidyah Bar David 2015-02-02 16:03:50 UTC
Gil, same question to you. Please try to stress-test an engine/dwh/reports setup with default values and see that you get errors, and then another test with the patches applied (I'll soon push patches for dwh and reports). I suggest two instances, one with 8GB and one with 16.

Comment 9 Marina Kalinin 2015-02-02 16:15:44 UTC
Didi, thanks for the irc chat today.
I didn't take in consideration the dwh and reports when I reported the bug.
The suggestion is to allocate max(1, 0.25 * ram) for process: engine, dwh, reports. Which will leave the OS with 0.25 RAM available for postgres and other processes. Will this be enough? I do not know. Scale tests by QE would be great idea.

I will also check with the last customer (internal RH) that had to increase that value and update with their setup. And see if they would be willing to run some tests for us.

Comment 10 Gil Klein 2015-02-02 16:26:01 UTC
Yuri, please start working on a test to tackle this request.

Please add your result to the BZ

Comment 11 Yedidyah Bar David 2015-02-03 13:52:02 UTC
Following a private talk with Yaniv, I am not pushing any change for dwh currently. He said that in 3.5 they made some improvements that should make it not need much ram, and anyway it currently uses the jvm's default which is more-or-less what we intended to try anyway.

Comment 13 Yedidyah Bar David 2015-02-08 14:09:47 UTC
(In reply to Yuri Obshansky from bug 1190466 comment #1)
> RHEV-M's JVM Heap Size value is very depended on setup (how many
> Hosts/Storages and VMs). I didn't change default value of 1G during my
> performance tests and I didn't encounter with Out Of Memory exception in
> spite of using non-powerful server for engine - 8 CPUs and 16 G RAM.

I wouldn't call this "non-powerful". It's probably considerably less powerful than a modern physical server an organization might purchase today for such a use, but I am not at all certain about a virtual one allocated for the same use.

We say that 4GB is minimum, 16GB recommended. I'd expect most customers to go somewhere in between. It will also be nice if as a result of this discussion/bug/fix, we'll be able to provide some more detailed recommendations, including expected memory use per system size (measured in number of hosts/VMs/other objects/etc) and how number of cpu cores affect responsiveness (in whatever metric).

> My
> setup was 1 DC/Cluster/1-2 Hosts/1-2 Storages/ 100-200 VMs. 

I'd expect large customers to be much larger than that. I am pretty certain that we customers having a single engine managing 10 times that.

> My opinion is:
> - we don't need change default value of 1G. It will be required only on
> specific configuration which could be done by customer as well.

The whole point of this bug is to prevent the customer from needing to suffer downtime, then have to wait for support etc., until their system is properly configured.

> - we need perform tuning tests and publish tuning tips.

Indeed.

> There is impossible
> to test all configurations thus we require an input from our support with
> customers common used RHEV-M configuration and setup.

I agree it's impossible, but perhaps we can to do some more.

Can we, for a start, simulate, say, 20 hosts with 1000 VMs?

Comment 15 Yedidyah Bar David 2015-02-08 14:24:52 UTC
(In reply to Yedidyah Bar David from comment #13)
> (In reply to Yuri Obshansky from bug 1190466 comment #1)
> > - we need perform tuning tests and publish tuning tips.

Just to clarify:

The patches are ready. What we need now is numbers.

Once we have the numbers, which we need anyway if we want to publish such tips, making the code roughly follow these tips is relatively little work.

Comment 16 Sandro Bonazzola 2015-02-20 11:08:26 UTC
Automated message: can you please update doctext or set it as not required?

Comment 17 Yedidyah Bar David 2015-02-22 11:09:50 UTC
doc text copied from 3.5 bug 1188971

Comment 18 Jiri Belka 2015-04-17 15:44:50 UTC
ok, ovirt-engine-setup-base-3.6.0-0.0.master.20150412172306.git55ba764.el6.noarch

25% of 4839 is 1209, thus ok.

[root@jb-ovirt36 ~]# grep totalMemory /var/log/ovirt-engine/setup/ovirt-engine-setup-20150417115209-i59eqx.log                                                                                                      
2015-04-17 11:52:09 DEBUG otopi.context context.dumpEnvironment:509 ENV OVESETUP_CONFIG/totalMemoryMB=NoneType:'None'
2015-04-17 11:52:12 DEBUG otopi.context context.dumpEnvironment:509 ENV OVESETUP_CONFIG/totalMemoryMB=NoneType:'None'
2015-04-17 11:52:13 DEBUG otopi.context context.dumpEnvironment:509 ENV OVESETUP_CONFIG/totalMemoryMB=int:'4839'
2015-04-17 11:58:23 DEBUG otopi.context context.dumpEnvironment:509 ENV OVESETUP_CONFIG/totalMemoryMB=int:'4839'
2015-04-17 12:00:16 DEBUG otopi.context context.dumpEnvironment:509 ENV OVESETUP_CONFIG/totalMemoryMB=int:'4839'

[root@jb-ovirt36 ~]# grep HEAP /etc/ovirt-engine/engine.conf.d/10-setup-java.conf 
ENGINE_HEAP_MIN="1209M"
ENGINE_HEAP_MAX="1209M"
[root@jb-ovirt36 ~]# ps aux | grep java
ovirt    22710  0.9 15.9 3009980 788728 ?      Sl   12:00   3:08 ovirt-engine -server -XX:+TieredCompilation -Xms1209M -Xmx1209M -XX:PermSize=256m -XX:MaxPermSize=256m -Djava.awt.headless=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djsse.enableSNIExtension=false -Djava.security.krb5.conf=/etc/ovirt-engine/krb5.conf -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/ovirt-engine/dump -Djava.util.logging.manager=org.jboss.logmanager -Dlogging.configuration=file:///var/lib/ovirt-engine/jboss_runtime/config/ovirt-engine-logging.properties -Dorg.jboss.resolver.warning=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djboss.modules.write-indexes=false -Djboss.server.default.config=ovirt-engine -Djboss.home.dir=/usr/share/ovirt-engine-jboss-as -Djboss.server.base.dir=/usr/share/ovirt-engine -Djboss.server.data.dir=/var/lib/ovirt-engine -Djboss.server.log.dir=/var/log/ovirt-engine -Djboss.server.config.dir=/var/lib/ovirt-engine/jboss_runtime/config -Djboss.server.temp.dir=/var/lib/ovirt-engine/jboss_runtime/tmp -Djboss.controller.temp.dir=/var/lib/ovirt-engine/jboss_runtime/tmp -jar /usr/share/ovirt-engine-jboss-as/jboss-modules.jar -mp /var/lib/ovirt-engine/jboss_runtime/modules/00-modules-common:/var/lib/ovirt-engine/jboss_runtime/modules/01-ovirt-engine-jboss-as-modules -jaxpmodule javax.xml.jaxp-provider org.jboss.as.standalone -c ovirt-engine.xml
ovirt    22896  0.5  2.4 2682008 122296 ?      Sl   12:00   1:50 ovirt-engine-dwhd -Dorg.ovirt.engine.dwh.settings=/tmp/tmp84E7Du/settings.properties -classpath /usr/share/ovirt-engine-dwh/lib/*::/usr/share/java/dom4j.jar:/usr/share/java/commons-collections.jar:/usr/share/java/postgresql-jdbc.jar ovirt_engine_dwh.historyetl_3_6.HistoryETL --context=Default

Comment 19 Jiri Belka 2015-04-17 15:46:29 UTC
fyi it's wrong doctext:

Fix: 

engine-setup now automatically configures the heap size to be the maximum of 1GB
                                                                  ^^^ minimum!
and 1/4 of available memory."

Comment 20 Yedidyah Bar David 2015-04-19 05:25:34 UTC
(In reply to Jiri Belka from comment #19)
> fyi it's wrong doctext:
> 
> Fix: 
> 
> engine-setup now automatically configures the heap size to be the maximum of
> 1GB
>                                                                   ^^^
> minimum!
> and 1/4 of available memory."

Why minimum?

If you have a 8GB machine, you want 2GB, not 1GB - the maximum of 1 and 2.

BTW, the doc people dropped the word 'maximum' altogether from the 3.5 bug 1188971 doctext, perhaps that's best...

Comment 23 errata-xmlrpc 2016-03-09 20:54:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0376.html