Red Hat Bugzilla – Bug 1026428
RFE: Ship with RHQ server tuned for higher capacity
Last modified: 2014-09-05 11:40:24 EDT
+++ This bug was initially created as a clone of Bug #1025844 +++
Description of problem:
The RHQ server, if used with over a hundred (or a thousand agents) cannot handle, out-the-box, enough load to cleanly do an upgrade.
This may not be customer-typical, but also doesn't seem likely to do much harm to adjust even on smaller installations.
Although I cannot identify what is most important, the following need to be tuned:
1) Increase the default size of the storage node memory usage. I would say that for about 1000 nodes, around 5GB of heap memory for Cassandra is good. Though I think the installer should simply pick a good number based on the local free memory size.
01:44:14,249 ERROR [org.jboss.as.ejb3.invocation] (http-/0.0.0.0:7080-64) JBAS014134: EJB Invocation failed on component ResourceManagerBean for method public
abstract void org.rhq.enterprise.server.resource.ResourceManagerLocal.addResourceError(org.rhq.core.domain.resource.ResourceError): javax.ejb.EJBException: J
BAS014516: Failed to acquire a permit within 5 MINUTES
at org.jboss.as.ejb3.pool.strictmax.StrictMaxPool.get(StrictMaxPool.java:109) [jboss-as-ejb3-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
at org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:47) [jboss-as-ejb3-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
2) Increase the size of the EJB pool. What happened with 4.5.1 -> 4.9 upgrade was that the number of inventory requests went up substantially in a short time. This caused many, many timeouts.
<strict-max-pool name="slsb-strict-max-pool" max-pool-size="2000" instance-acquisition-timeout="1" instance-acquisition-timeout-unit="MINUTES"/>
3) Increase the out-of-box communication limits:
Version-Release number of selected component (if applicable): 4.9 (from 4.5.1)
--- Additional comment from Mike Foley on 2013-11-01 14:00:14 EDT ---
minimally, this should be considered as documentation for jon 3.2
This is covered in a section for tuning the server for a large number of agents:
A soft-limit as 100+ agents being "a large number" is mentioned in inventory baselines: