+++ This bug was initially created as a clone of Bug #1025844 +++ Description of problem: The RHQ server, if used with over a hundred (or a thousand agents) cannot handle, out-the-box, enough load to cleanly do an upgrade. This may not be customer-typical, but also doesn't seem likely to do much harm to adjust even on smaller installations. Although I cannot identify what is most important, the following need to be tuned: 1) Increase the default size of the storage node memory usage. I would say that for about 1000 nodes, around 5GB of heap memory for Cassandra is good. Though I think the installer should simply pick a good number based on the local free memory size. Example error: 01:44:14,249 ERROR [org.jboss.as.ejb3.invocation] (http-/0.0.0.0:7080-64) JBAS014134: EJB Invocation failed on component ResourceManagerBean for method public abstract void org.rhq.enterprise.server.resource.ResourceManagerLocal.addResourceError(org.rhq.core.domain.resource.ResourceError): javax.ejb.EJBException: J BAS014516: Failed to acquire a permit within 5 MINUTES at org.jboss.as.ejb3.pool.strictmax.StrictMaxPool.get(StrictMaxPool.java:109) [jboss-as-ejb3-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4] at org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:47) [jboss-as-ejb3-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4] 2) Increase the size of the EJB pool. What happened with 4.5.1 -> 4.9 upgrade was that the number of inventory requests went up substantially in a short time. This caused many, many timeouts. <strict-max-pool name="slsb-strict-max-pool" max-pool-size="2000" instance-acquisition-timeout="1" instance-acquisition-timeout-unit="MINUTES"/> 3) Increase the out-of-box communication limits: rhq.server.startup.web.max-connections=1000 rhq.server.agent-downloads-limit=45 rhq.server.client-downloads-limit=5 rhq.communications.global-concurrency-limit=200 rhq.server.concurrency-limit.inventory-report=25 rhq.server.concurrency-limit.availability-report=25 rhq.server.concurrency-limit.inventory-sync=25 rhq.server.concurrency-limit.content-report=25 rhq.server.concurrency-limit.content-download=25 rhq.server.concurrency-limit.measurement-report=25 rhq.server.concurrency-limit.measurement-schedule-request=25 rhq.server.concurrency-limit.configuration-update=25 Version-Release number of selected component (if applicable): 4.9 (from 4.5.1) --- Additional comment from Mike Foley on 2013-11-01 14:00:14 EDT --- minimally, this should be considered as documentation for jon 3.2
This is covered in a section for tuning the server for a large number of agents: https://access.redhat.com/site/documentation/en-US/Red_Hat_JBoss_Operations_Network/3.2/html/Admin_and_Config/perf-concurrency.html A soft-limit as 100+ agents being "a large number" is mentioned in inventory baselines: https://access.redhat.com/site/documentation/en-US/Red_Hat_JBoss_Operations_Network/3.2/html/Admin_and_Config/performance.html#inventory-baselines