Bug 453600 - cluster-snmp deadlocks snmpd
Summary: cluster-snmp deadlocks snmpd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: clustermon
Version: 5.2
Hardware: All
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Ryan McCabe
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On: 441947
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-07-01 14:41 UTC by Bryn M. Reeves
Modified: 2018-10-20 03:10 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 484880 (view as bug list)
Environment:
Last Closed: 2009-01-20 20:51:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Set sock.nonblocking(true) in ClusterMonitor::get_cluster() (605 bytes, patch)
2008-07-01 14:41 UTC, Bryn M. Reeves
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:0086 0 normal SHIPPED_LIVE clustermon bug fix update 2009-01-20 16:04:23 UTC

Description Bryn M. Reeves 2008-07-01 14:41:13 UTC
Description of problem:
The SNMPD plugin for clustersuite uses the
ClusterMonitoring::ClusterMonitor::get_cluster() method to retrieve the cluster
information.

This in turn calls ClientSocket::recv() -> read_restart().

The read_restart function is designed to fill a buffer with all data currently
buffered on the socket and to return when the underlying read() returns with EAGAIN.

This will only work if the socket has O_NONBLOCK set. Using this method on a
blocking socket will cause the thread calling get_cluster() to block
indefinitely waiting for additional data to arrive on the socket.

Version-Release number of selected component (if applicable):
0.10.0-5.el5 contains the defect but it is masked by bug 441947; rebuilding the
package to avoid the dlopen problem or using a later package (e.g. 0.12.0-7.el5)
allows the bug to be triggered.

How reproducible:
100%

Steps to Reproduce:
1. Configure a cluster with snmpd enabled on the nodes
2. Enable cluster-snmp
3. Try to access a REDHAT-CLUSTER-MIB MIB, e.g. REDHAT-CLUSTER-MIB::rhcMIBVersion.0
  
Actual results:
$ cat /etc/snmp/snmpd.conf
dlmod RedHatCluster     /usr/lib/cluster-snmp/libClusterMonitorSnmp.so
rocommunity public 127.0.0.1
$ snmpwalk -v2c -c public localhost
[tons of output, works fine but doesn't show REDHAT-CLUSTER-MIB::RedHatCluster]
$ snmpwalk -v2c -c public localhost REDHAT-CLUSTER-MIB::RedHatCluster
REDHAT-CLUSTER-MIB::rhcMIBVersion.0 = INTEGER: 1
Timeout: No Response from localhost
$ snmpwalk -v2c -c public localhost
Timeout: No Response from localhost

After this snmpd can only be interrupted by SIGKILL.

Expected results:
MIB output correctly, no hang of snmpd.

Additional info:
Analysis & proposed patch from Adrien Kunysz

Comment 1 Bryn M. Reeves 2008-07-01 14:41:13 UTC
Created attachment 310677 [details]
Set sock.nonblocking(true) in ClusterMonitor::get_cluster()

Comment 2 RHEL Program Management 2008-07-01 15:27:11 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 Ryan McCabe 2008-07-03 14:56:29 UTC
Thanks for the patch. Applied to the current CVS trees.

Comment 5 Brian Brock 2008-12-17 00:03:22 UTC
verified, snmp-walk'ed without error or hang

Comment 7 errata-xmlrpc 2009-01-20 20:51:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0086.html


Note You need to log in before you can comment on or make changes to this bug.