The following message appears in the Agent log if an Apache Server avail check fails due to the SNMP "ping" timing out: 2012-04-11 17:44:56,271 DEBUG [ResourceContainer.invoker.daemon-102] (rhq.plugins.www.snmp.SNMPSession_v2c)- Error while pinging SNMP 1 agent at 127.0.0.1/1610/public. SNMP GETNEXT request for iso(1) failed - org.rhq.plugins.www.snmp.SNMPException: Request for [iso] timed out. Looking at the code, the SNMP timeout that's used is 50ms (with 1 retry) and is not configurable, so I'm not surprised it's timing out every so often... 50ms seems way low to me. I think we should do the following: 1) increase the default value for the timeout to 4s so it's just under the default avail facet timeout 2) change the default retries from 1 to 0 3) make the timeout and the retries configurable via conn props
Done in master: http://git.fedorahosted.org/git/?p=rhq/rhq.git;a=commitdiff;h=bf20e58 Here are the two new props that have been added to the Apache Server type's plugin config: <c:simple-property name="snmpRequestTimeout" displayName="SNMP Request Timeout" type="long" default="2000" required="false" description="the timeout, in milliseconds, for requests to the Apache SNMP agent; defaults to 2000"> <c:constraint> <c:integer-constraint minimum="100"/> </c:constraint> </c:simple-property> <c:simple-property name="snmpRequestRetries" displayName="SNMP Request Retries" type="integer" default="1" required="false" description="the number of times a request that has timed out should be retried; defaults to 1"> <c:constraint> <c:integer-constraint minimum="0"/> </c:constraint> </c:simple-property>
Bulk closing of items that are on_qa and in old RHQ releases, which are out for a long time and where the issue has not been re-opened since.