Bug 447947 - SNMPd does not respond on cluster service IP
SNMPd does not respond on cluster service IP
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: net-snmp (Show other bugs)
4.6
i386 Linux
low Severity high
: rc
: ---
Assigned To: Jan Safranek
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-22 11:49 EDT by Jesse Gonzalez
Modified: 2010-10-22 21:17 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
* the method previously used by snmpd to process UDP did not work well in clustered environments. Queries against an IP configured as a resource of a cluster service would time out and fail unless first performed against a non-cluster resource IP. Net-snmp for Red Hat Enterprise Linux 4 now includes improved UDP handling. This allows snmpd queries to work reliably in a clustered environment.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-05-18 16:19:17 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
backported patch from RHEL-5 (14.54 KB, patch)
2008-05-28 12:37 EDT, Jan Safranek
no flags Details | Diff

  None (edit)
Description Jesse Gonzalez 2008-05-22 11:49:22 EDT
Description of problem:
When performing an SNMP query against an IP that is configured as a resource of
a cluster service, the SNMPd does not respond on the cluster IP until you first
perform an SNMP query against an non-cluster resource IP of the device.


Version-Release number of selected component (if applicable):
net-snmp-5.1.2-11.el4_6.11.2


How reproducible:
Configure a cluster with an IP resource as part of a cluster service. Perform an
SNMP query against the IP configured in the cluster from a remote machine. The
SNMP query will timeout.

Next perform the SNMP query against an IP assigned to the device, and the SNMP
query will succeed.

Finally perform the SNMP query agains the cluster IP, and the SNMP query will
succeed.

Expected results:
The SNMP query performs as expected when initially performing an SNMP query
against the cluster IP.

Additional info:
Using the cluster-snmp package, SNMP queries are performed against cluster
resource IPs to check status, and service changes.

While tracing the network communication via tcpdump, tcpdump demonstrated the
following behavior. 172.16.172.8 is the cluster resource IP, and 172.16.172.5 is
the IP of bond0.

Initial SNMP request against the cluster resource IP:

[root@somehost ~]# tcpdump -n port 161
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond0, link-type EN10MB (Ethernet), capture size 96 bytes
08:43:06.827695 IP XX.XX.XX.XX.32830 > 172.16.172.8.snmp:  C=community
GetRequest(30)  .1.3.6.1.4.1.2312.8.2.2.0[|snmp]
08:43:06.834306 IP 172.16.172.5.snmp > XX.XX.XX.XX.32830:  C=community
GetResponse(28)  .1.3.6.1.4.1.2312.8.2.2=[|snmp]
08:43:07.764710 IP XX.XX.XX.XX.32830 > 172.16.172.8.snmp:  C=community
GetRequest(30)  .1.3.6.1.4.1.2312.8.2.2.0[|snmp]
08:43:07.771230 IP 172.16.172.5.snmp > XX.XX.XX.XX.32830:  C=community
GetResponse(28)  .1.3.6.1.4.1.2312.8.2.2=[|snmp]
08:43:08.743697 IP XX.XX.XX.XX.32830 > 172.16.172.8.snmp:  C=community
GetRequest(30)  .1.3.6.1.4.1.2312.8.2.2.0[|snmp]
08:43:08.750027 IP 172.16.172.5.snmp > XX.XX.XX.XX.32830:  C=community
GetResponse(28)  .1.3.6.1.4.1.2312.8.2.2=[|snmp]
08:43:09.812005 IP XX.XX.XX.XX.32830 > 172.16.172.8.snmp:  C=community
GetRequest(30)  .1.3.6.1.4.1.2312.8.2.2.0[|snmp]
08:43:09.818415 IP 172.16.172.5.snmp > XX.XX.XX.XX.32830:  C=community
GetResponse(28)  .1.3.6.1.4.1.2312.8.2.2=[|snmp]
08:43:10.756348 IP XX.XX.XX.XX.32830 > 172.16.172.8.snmp:  C=community
GetRequest(30)  .1.3.6.1.4.1.2312.8.2.2.0[|snmp]
08:43:10.762577 IP 172.16.172.5.snmp > XX.XX.XX.XX.32830:  C=community
GetResponse(28)  .1.3.6.1.4.1.2312.8.2.2=[|snmp]
08:43:11.817856 IP XX.XX.XX.XX.32830 > 172.16.172.8.snmp:  C=community
GetRequest(30)  .1.3.6.1.4.1.2312.8.2.2.0[|snmp]
08:43:11.823829 IP 172.16.172.5.snmp > XX.XX.XX.XX.32830:  C=community
GetResponse(28)  .1.3.6.1.4.1.2312.8.2.2=[|snmp]


SNMP request against bond0 IP address of device:

08:43:57.935595 IP XX.XX.XX.XX.32832 > 172.16.172.5.snmp:  C=community
GetRequest(30)  .1.3.6.1.4.1.2312.8.2.2.0[|snmp]
08:43:57.942354 IP 172.16.172.5.snmp > XX.XX.XX.XX.32832:  C=community
GetResponse(28)  .1.3.6.1.4.1.2312.8.2.2=[|snmp]

Second SNMP request against bond0 IP address of device:

08:44:16.359526 IP XX.XX.XX.XX.32832 > 172.16.172.8.snmp:  C=community
GetRequest(30)  .1.3.6.1.4.1.2312.8.2.2.0[|snmp]
08:44:16.365848 IP 172.16.172.5.snmp > XX.XX.XX.XX.32832:  C=community
GetResponse(28)  .1.3.6.1.4.1.2312.8.2.2=[|snmp]

[root@somehost ~]# ip addr list bond0
2: bond0: <BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue 
    link/ether 00:19:b9:cf:89:17 brd ff:ff:ff:ff:ff:ff
    inet 172.16.172.5/24 brd 172.16.172.255 scope global bond0
    inet 172.16.172.8/32 scope global bond0
Comment 1 Jesse Gonzalez 2008-05-22 12:00:52 EDT
Eventually the SNMP query will begin to fail after 40+ minutes.
Comment 2 Jesse Gonzalez 2008-05-22 22:39:40 EDT
The issue has been corrected in net-snmp release 5.3.2

The following patch is available:

https://sourceforge.net/tracker/index.php?func=detail&aid=1553447&group_id=12694&atid=312694
Comment 3 Jan Safranek 2008-05-28 12:37:43 EDT
Created attachment 306945 [details]
backported patch from RHEL-5
Comment 4 Jan Safranek 2008-05-28 13:08:50 EDT
Could you please test experimental build at
http://people.redhat.com/jsafrane/bugs/447947/ and report results? The parts,
which are affected by the patch, are slightly different in the old
net-snmp-5.1.2, which is distributed in Red Hat Enterprise Linux 4. Although I
did a review and some testing, I'd like to be sure if it works as expected.
Thanks in advance.
Comment 5 RHEL Product and Program Management 2008-09-05 13:06:17 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 6 Jan Safranek 2008-09-18 07:41:53 EDT
Jesse, have you tried to test the build mentioned in #4? I want to be sure I have real fix before I release it in next RHEL4 update.
Comment 7 Jesse Gonzalez 2008-09-22 16:35:39 EDT
Sorry Jan, I have not had a chance to test that fix. Instead I am using tcp as the protocol to make the connection:

snmpget -v2c -c public tcp:XX.XX.XX.XX OID

I have encountered problems with cluster-snmp (no bugzilla) and the latest net-snmp packages( 462016 ) as submitted by my colleage Mr. Savage.

It will take me *quite* some time to test your release posted in #4. Have you tested your build against a RHEL cluster?
Comment 8 Jan Safranek 2008-09-24 09:45:12 EDT
I didn't test it on cluster - I do not have one on my table*. I tried to simulate the problem on a (virtual) machine with two interfaces facing the same network, which was enough to reproduce the bug (net-snmp receiving request on 192.168.0.X and sending response from 192.168.0.Y) and test the fix.

The patch above touches the very heart of UDP processing, I'm trying to test it as much as possible and every additional feedback would help.


*: Of course, if this is going to be fixed in an update, our QA should try it on real cluster.
Comment 9 Jesse Gonzalez 2008-09-24 09:50:24 EDT
I'll try to test your build over the weekend.
Comment 11 Jesse Gonzalez 2008-10-07 14:26:10 EDT
The hotfix we received corrected the issue. Cluster vips respond continuously respond as expected.
Comment 12 Jan Safranek 2008-10-08 05:38:17 EDT
Great, thanks for the feedback.
Comment 17 Ruediger Landmann 2009-01-22 01:22:00 EST
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
* the method previously used by snmpd to process UDP did not work well in clustered environments. Queries against an IP configured as a resource of a cluster service would time out and fail unless first performed against a non-cluster resource IP. Net-snmp for Red Hat Enterprise Linux 4 now includes improved UDP handling. This allows snmpd queries to work reliably in a clustered environment.
Comment 20 errata-xmlrpc 2009-05-18 16:19:17 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0984.html

Note You need to log in before you can comment on or make changes to this bug.