Hide Forgot
Description of problem: After installing Apache SNMP plugins for JON, metrics for wwwSummaryInRequests ("Total Number of Requests" and "Total Number of Requests per Minute") and wwwSummaryOutResponses ("Total Number of Responses" and "Total Number of Responses per Minute") are only being collected sporadically. The JON Agent logs messages similar to "SNMPException: Error occurred while retrieving column wwwSummaryInRequests(1.3.6.1.2.1.65.1.2.1.1.1): Request timed out." The metrics appear to only be collected at specific intervals at about 03 at night until 09. Execution of the command snmpwalk (snmpwalk -c public -v 2c localhost:1610 1.3.6.1.2.1.65.1.2.1.1.1) at the same time when agent was trying to collect metrics and logged Request timed out messages showes that value this command did not time out. At the same time, all other metrics are correctly returned and displayed in the JBoss ON UI. Version-Release number of selected component (if applicable): JBoss ON 3.3.7 EWS 2.1.0 - Apache/2.2.26 How reproducible: Sometimes Steps to Reproduce: 1. Enable "Total Number of Requests" and "Total Number of Requests per Minute" metrics; 2. Navigate to JBoss ON UI -> Apache metric page and confirm that values are not shown; 3. The agent.log file shows: "SNMPException: Error occurred while retrieving column wwwSummaryInRequests(1.3.6.1.2.1.65.1.2.1.1.1): Request timed out." Actual results: Enabled metrics are not shown in JBoss ON UI and error message is shown in the agent.log file. Expected results: Enabled metrics are properly shown and agent.log file does not contain error messages. Additional info:
Enabled SNMP4j DEBUG in the agent.log file revealed that every time "Total Number of Requests" is collected the following error is logged: ******************************* ERROR [DefaultUDPTransportMapping_0.0.0.0/0] (org.snmp4j.MessageDispatcherImpl)- java.io.IOException: Only 32bit unsigned integers are supported at position 148 ******************************* The tcpdump captured while agent was collecting the metrics showed the following: We send getBulkRequest: ******************************* getBulkRequest 1.3.6.1.2.1.65.1.2.1.1.1 ******************************* and as a result we get get-response that contains: ******************************* data: get-response (2) get-response request-id: 526425080 error-status: noError (0) error-index: 0 variable-bindings: 10 items 1.3.6.1.2.1.65.1.2.1.1.1.1: 0 Object Name: 1.3.6.1.2.1.65.1.2.1.1.1.1 (iso.3.6.1.2.1.65.1.2.1.1.1.1) Value (Counter32): 0 1.3.6.1.2.1.65.1.2.1.1.1.2: 2077559 Object Name: 1.3.6.1.2.1.65.1.2.1.1.1.2 (iso.3.6.1.2.1.65.1.2.1.1.1.2) Value (Counter32): 2077559 1.3.6.1.2.1.65.1.2.1.1.4.1: 0 Object Name: 1.3.6.1.2.1.65.1.2.1.1.4.1 (iso.3.6.1.2.1.65.1.2.1.1.4.1) Value (Counter32): 0 1.3.6.1.2.1.65.1.2.1.1.4.2: 2077559 Object Name: 1.3.6.1.2.1.65.1.2.1.1.4.2 (iso.3.6.1.2.1.65.1.2.1.1.4.2) Value (Counter32): 2077559 ... 1.3.6.1.2.1.65.1.2.1.1.8.2: 43843199952 Object Name: 1.3.6.1.2.1.65.1.2.1.1.8.2 (iso.3.6.1.2.1.65.1.2.1.1.8.2) Value (Counter32): 43843199952 ... ******************************* In the above case 43843199952 > 2^32 and it will cause the error to be thrown. It seems that instead to return only 1.3.6.1.2.1.65.1.2.1.1.1 and its subtrees - we return 1.3.6.1.2.1.65.1.2.1.1 and up to 10 subtree/items.
The issue is in mod-snmp. Specifically, wwwProtocolRecord->wwwSummaryInLowBytes is not only defined as an unsigned long but it is also getting the value from wwwBytesIn added to it without regard for it being the lower 32-bits of wwwSummaryInBytes. Furthermore, wwwSummaryInBytes is defined as NULL and therefore not even supported. It appears that LowBytes for In and Out are providing the incorrect value and are of the wrong type. It's also still not clear on why the request for 1.3.6.1.2.1.65.1.2.1.1.1 is resulting in mod-snmp returning 1.3.6.1.2.1.65.1.2.1.1 instead.
In the docs for the Covalent SNMP daemon, there are mentions of an 'override' directive, giving the option to override the value of an OID or make it unavailable. I tried the following: override 1.3.6.1.2.1.65.1.2.1.1.8.2 integer 0 override 1.3.6.1.2.1.65.1.2.1.1.8.2 null override -rw 1.3.6.1.2.1.65.1.2.1.1.8.2 null I was hoping this would either make the metric return zero or be disabled. However it appears to make no difference, but I am not sure I am getting the syntax correct. It would at least be a workaround.