Bug 1397106 - JON Apache SNMP plugin: Timeout collecting wwwSummaryInRequests/wwwSummaryOutResponses
Summary: JON Apache SNMP plugin: Timeout collecting wwwSummaryInRequests/wwwSummaryOut...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: JBoss Enterprise Web Server 2
Classification: JBoss
Component: JON Plugin
Version: 2.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: mgottbur
QA Contact: Michal Karm Babacek
URL:
Whiteboard:
Depends On:
Blocks: 1397107
TreeView+ depends on / blocked
 
Reported: 2016-11-21 15:58 UTC by bkramer
Modified: 2020-01-17 16:13 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-13 12:20:04 UTC
Type: Bug


Attachments (Terms of Use)

Description bkramer 2016-11-21 15:58:33 UTC
Description of problem:
After installing Apache SNMP plugins for JON, metrics for wwwSummaryInRequests ("Total Number of Requests" and "Total Number of Requests per Minute") and wwwSummaryOutResponses ("Total Number of Responses" and "Total Number of Responses per Minute") are only being collected sporadically. The JON Agent logs messages similar to "SNMPException: Error occurred while retrieving column wwwSummaryInRequests(1.3.6.1.2.1.65.1.2.1.1.1): Request timed out." 

The metrics appear to only be collected at specific intervals at about 03 at night until 09.


Execution of the command snmpwalk (snmpwalk -c public -v 2c localhost:1610 1.3.6.1.2.1.65.1.2.1.1.1) at the same time when agent was trying to collect metrics and logged Request timed out messages showes that value this command did not time out.

At the same time, all other metrics are correctly returned and displayed in the JBoss ON UI.



Version-Release number of selected component (if applicable):
JBoss ON 3.3.7
EWS 2.1.0 - Apache/2.2.26


How reproducible:
Sometimes

Steps to Reproduce:
1. Enable "Total Number of Requests" and "Total Number of Requests per Minute" metrics;
2. Navigate to JBoss ON UI -> Apache metric page and confirm that values are not shown;
3. The agent.log file shows: "SNMPException: Error occurred while retrieving column wwwSummaryInRequests(1.3.6.1.2.1.65.1.2.1.1.1): Request timed out."

Actual results:
Enabled metrics are not shown in JBoss ON UI and error message is shown in the agent.log file.

Expected results:
Enabled metrics are properly shown and agent.log file does not contain error messages.

Additional info:

Comment 1 bkramer 2016-12-02 14:28:53 UTC
Enabled SNMP4j DEBUG in the agent.log file revealed that every time "Total Number of Requests" is collected the following error is logged:

*******************************
ERROR [DefaultUDPTransportMapping_0.0.0.0/0] (org.snmp4j.MessageDispatcherImpl)- java.io.IOException: Only 32bit unsigned integers are supported at position 148
*******************************

The tcpdump captured while agent was collecting the metrics showed the following:

We send getBulkRequest:

*******************************
getBulkRequest 1.3.6.1.2.1.65.1.2.1.1.1
*******************************

and as a result we get get-response that contains:

*******************************
data: get-response (2)
  get-response
    request-id: 526425080
    error-status: noError (0)
    error-index: 0
    variable-bindings: 10 items
      1.3.6.1.2.1.65.1.2.1.1.1.1: 0
        Object Name: 1.3.6.1.2.1.65.1.2.1.1.1.1 (iso.3.6.1.2.1.65.1.2.1.1.1.1)
        Value (Counter32): 0
      1.3.6.1.2.1.65.1.2.1.1.1.2: 2077559
        Object Name: 1.3.6.1.2.1.65.1.2.1.1.1.2 (iso.3.6.1.2.1.65.1.2.1.1.1.2)
        Value (Counter32): 2077559
      1.3.6.1.2.1.65.1.2.1.1.4.1: 0
        Object Name: 1.3.6.1.2.1.65.1.2.1.1.4.1 (iso.3.6.1.2.1.65.1.2.1.1.4.1)
        Value (Counter32): 0
      1.3.6.1.2.1.65.1.2.1.1.4.2: 2077559
        Object Name: 1.3.6.1.2.1.65.1.2.1.1.4.2 (iso.3.6.1.2.1.65.1.2.1.1.4.2)
        Value (Counter32): 2077559
      ...
      1.3.6.1.2.1.65.1.2.1.1.8.2: 43843199952
        Object Name: 1.3.6.1.2.1.65.1.2.1.1.8.2 (iso.3.6.1.2.1.65.1.2.1.1.8.2)
        Value (Counter32): 43843199952
      ...
*******************************

In the above case 43843199952 > 2^32 and it will cause the error to be thrown. 

It seems that instead to return only 1.3.6.1.2.1.65.1.2.1.1.1 and its subtrees - we return 1.3.6.1.2.1.65.1.2.1.1 and up to 10 subtree/items.

Comment 2 Larry O'Leary 2016-12-02 23:49:29 UTC
The issue is in mod-snmp. Specifically, wwwProtocolRecord->wwwSummaryInLowBytes is not only defined as an unsigned long but it is also getting the value from wwwBytesIn added to it without regard for it being the lower 32-bits of wwwSummaryInBytes. Furthermore, wwwSummaryInBytes is defined as NULL and therefore not even supported. It appears that LowBytes for In and Out are providing the incorrect value and are of the wrong type. 

It's also still not clear on why the request for 1.3.6.1.2.1.65.1.2.1.1.1 is resulting in mod-snmp returning 1.3.6.1.2.1.65.1.2.1.1 instead.

Comment 3 Stian Lund 2016-12-06 09:05:16 UTC
In the docs for the Covalent SNMP daemon, there are mentions of an 'override' directive, giving the option to override the value of an OID or make it unavailable.

I tried the following:
override 1.3.6.1.2.1.65.1.2.1.1.8.2 integer 0 
override 1.3.6.1.2.1.65.1.2.1.1.8.2 null 
override -rw 1.3.6.1.2.1.65.1.2.1.1.8.2 null

I was hoping this would either make the metric return zero or be disabled. However it appears to make no difference, but I am not sure I am getting the syntax correct.

It would at least be a workaround.


Note You need to log in before you can comment on or make changes to this bug.