Bug 1026428

Summary: RFE: Ship with RHQ server tuned for higher capacity
Product: [JBoss] JBoss Operations Network Reporter: Mike Foley <mfoley>
Component: DocumentationAssignee: Deon Ballard <dlackey>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: unspecified    
Version: JON 3.2CC: dlackey, genman, hrupp
Target Milestone: GAKeywords: Documentation
Target Release: JON 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1025844 Environment:
Last Closed: 2014-09-05 15:40:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1025844    
Bug Blocks: 1012435    

Description Mike Foley 2013-11-04 15:52:04 UTC
+++ This bug was initially created as a clone of Bug #1025844 +++

Description of problem:

The RHQ server, if used with over a hundred (or a thousand agents) cannot handle, out-the-box, enough load to cleanly do an upgrade.

This may not be customer-typical, but also doesn't seem likely to do much harm to adjust even on smaller installations.

Although I cannot identify what is most important, the following need to be tuned:

1) Increase the default size of the storage node memory usage. I would say that for about 1000 nodes, around 5GB of heap memory for Cassandra is good. Though I think the installer should simply pick a good number based on the local free memory size.

Example error:

01:44:14,249 ERROR [org.jboss.as.ejb3.invocation] (http-/0.0.0.0:7080-64) JBAS014134: EJB Invocation failed on component ResourceManagerBean for method public
 abstract void org.rhq.enterprise.server.resource.ResourceManagerLocal.addResourceError(org.rhq.core.domain.resource.ResourceError): javax.ejb.EJBException: J
BAS014516: Failed to acquire a permit within 5 MINUTES
        at org.jboss.as.ejb3.pool.strictmax.StrictMaxPool.get(StrictMaxPool.java:109) [jboss-as-ejb3-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]
        at org.jboss.as.ejb3.component.pool.PooledInstanceInterceptor.processInvocation(PooledInstanceInterceptor.java:47) [jboss-as-ejb3-7.2.0.Alpha1-redhat-4.jar:7.2.0.Alpha1-redhat-4]


2) Increase the size of the EJB pool. What happened with 4.5.1 -> 4.9 upgrade was that the number of inventory requests went up substantially in a short time. This caused many, many timeouts.
                    <strict-max-pool name="slsb-strict-max-pool" max-pool-size="2000" instance-acquisition-timeout="1" instance-acquisition-timeout-unit="MINUTES"/>

3) Increase the out-of-box communication limits:

rhq.server.startup.web.max-connections=1000
rhq.server.agent-downloads-limit=45
rhq.server.client-downloads-limit=5
rhq.communications.global-concurrency-limit=200
rhq.server.concurrency-limit.inventory-report=25
rhq.server.concurrency-limit.availability-report=25
rhq.server.concurrency-limit.inventory-sync=25
rhq.server.concurrency-limit.content-report=25
rhq.server.concurrency-limit.content-download=25
rhq.server.concurrency-limit.measurement-report=25
rhq.server.concurrency-limit.measurement-schedule-request=25
rhq.server.concurrency-limit.configuration-update=25


Version-Release number of selected component (if applicable): 4.9 (from 4.5.1)

--- Additional comment from Mike Foley on 2013-11-01 14:00:14 EDT ---

minimally, this should be considered as documentation for jon 3.2

Comment 1 Deon Ballard 2014-01-24 17:42:23 UTC
This is covered in a section for tuning the server for a large number of agents:
https://access.redhat.com/site/documentation/en-US/Red_Hat_JBoss_Operations_Network/3.2/html/Admin_and_Config/perf-concurrency.html

A soft-limit as 100+ agents being "a large number" is mentioned in inventory baselines:
https://access.redhat.com/site/documentation/en-US/Red_Hat_JBoss_Operations_Network/3.2/html/Admin_and_Config/performance.html#inventory-baselines