Bug 845243

Summary:	[RFE] cluster-snmp: make unmet requirements for proper function (e.g., modclusterd not running) more explicit
Product:	Red Hat Enterprise Linux 6	Reporter:	Vadim Grinco <vgrinco>
Component:	clustermon	Assignee:	Jan Pokorný [poki] <jpokorny>
Status:	CLOSED DEFERRED	QA Contact:	cluster-qe <cluster-qe>
Severity:	low	Docs Contact:
Priority:	low
Version:	6.3	CC:	cfeist, cluster-maint, fdinitto, jparsons, jpokorny, rmccabe, rsteiger
Target Milestone:	rc	Keywords:	FutureFeature, Reopened
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-08-24 21:18:36 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Vadim Grinco 2012-08-02 12:47:52 UTC

Description of problem:
I can't use libClusterMonitorSnmp.so module. Here's what snmpd reports when started with -H flag:
duplicate registration (rhcMIBVersion, rhcMIBVersion)duplicate registration (rhcClusterFailedServicesNum, rhcClusterFailedServicesNum)duplicate registration (rhcClusterFailedServicesNames, rhcClusterFailedServicesNames)duplicate registration (rhcClusterStatusDesc, rhcClusterStatusDesc)duplicate registration (rhcClusterVotes, rhcClusterVotes)duplicate registration (rhcClusterQuorate, rhcClusterQuorate)duplicate registration (rhcClusterStoppedServicesNum, rhcClusterStoppedServicesNum)duplicate registration (rhcClusterStoppedServicesNames, rhcClusterStoppedServicesNames)duplicate registration (rhcClusterAvailNodesNum, rhcClusterAvailNodesNum)duplicate registration (rhcClusterAvailNodesNames, rhcClusterAvailNodesNames)duplicate registration (rhcClusterServicesNum, rhcClusterServicesNum)duplicate registration (rhcClusterServicesNames, rhcClusterServicesNames)duplicate registration (rhcClusterName, rhcClusterName)duplicate registration (rhcClusterConfigVersion, rhcClusterConfigVersion)duplicate registration (rhcClusterStatusCode, rhcClusterStatusCode)duplicate registration (rhcClusterUnavailNodesNum, rhcClusterUnavailNodesNum)duplicate registration (rhcClusterUnavailNodesNames, rhcClusterUnavailNodesNames)duplicate registration (rhcClusterNodesNum, rhcClusterNodesNum)duplicate registration (rhcClusterNodesNames, rhcClusterNodesNames)duplicate registration (rhcClusterRunningServicesNum, rhcClusterRunningServicesNum)duplicate registration (rhcClusterRunningServicesNames, rhcClusterRunningServicesNames)duplicate registration (rhcClusterVotesNeededForQuorum, rhcClusterVotesNeededForQuorum)duplicate registration (rhcNodesTable, rhcNodesTable)duplicate registration (rhcServicesTable, rhcServicesTable)

Version-Release number of selected component (if applicable):
cluster-snmp-0.16.2-18.el6.x86_64
net-snmp-5.5-41.el6.x86_64

How reproducible:
Always, tried on many hosts.

Steps to Reproduce:
1. yum install cluster-snmp on a working cluster
2. add "dlmod RedHatCluster	/usr/lib64/cluster-snmp/libClusterMonitorSnmp.so" to the top of snmpd.conf
3. snmpd -f -c /etc/snmp/snmpd.conf -H
  
Actual results:
# snmpwalk -v 2c -c engsys localhost 1.3.6.1.4.1.2312.8
SNMPv2-SMI::enterprises.2312.8.1.1.0 = INTEGER: 2


Expected results:
Be able to monitor all cluster components.

Additional info:
cluster.conf available on demand.

Comment 1 Jan Pokorný [poki] 2012-08-02 13:54:58 UTC

Admittedly, "Red Hat Cluster Suite" product in Bugzilla is tempting,
but no longer in use (no longer having a standalone position).

As per the packages, flipping to RHEL 6 -- clustermon.

Purely preemptively, could you attach that cluster.conf?

Comment 6 Jan Pokorný [poki] 2012-08-13 14:11:11 UTC

Vadim,

the issue you encountered is caused by the way the snmpd
handles configuration files -- it does *not* try to eliminate
duplicate configuration files found along its initialization.


Use can use "-DALL" to find that in "snmpd -f -c /etc/snmp/snmpd.conf -H"
case, you actually let snmpd proceed that config file twice:

1. first using built-in configuration paths
   (/etc/snmp, /usr/share/snmp, /usr/lib64/snmp, ...)

2. reading optional config file (via "-c"): "/etc/snmp/snmpd.conf"

That is were the duplicity arises.


As per snmpd(8), what you need here is "-C" option:

> Do not read any configuration files except the ones optionally
> specified by the -c option.

Voila, "snmpd -f -C -c /etc/snmp/snmpd.conf -H" works correctly (for me).
Indeed, you can simply get rid of "-c /etc/snmp/snmpd.conf" as it
is read implicitly.


Closing as NOTABUG.
You may want to open a bug with net-snmp to prevent reusing the same
configuration file, but I don't think it can be clearly qualified
as a bug.

Comment 7 Vadim Grinco 2012-08-13 14:14:39 UTC

Jan, you're right about the -C option, and I am sorry for confustion, however, although the config file is parsed correctly and everything seems fine, try
`snmpwalk -v 2c -c engsys localhost 1.3.6.1.4.1.2312.8'

You won't get any details about the cluster other than:
SNMPv2-SMI::enterprises.2312.8.1.1.0 = INTEGER: 2

Comment 8 Jan Pokorný [poki] 2012-08-14 19:33:35 UTC

Ah, I see, you expected the whole tree of cluster-related values.


At first, make sure you have (in addition to mentioned dlmod directive)
also this line in /etc/snmp/snmpd.conf (or whatever config you use):

        view    systemview    included   REDHAT-CLUSTER-MIB::redhatCluster

When cluster-snmp package installed, you can also refer to
/usr/share/doc/cluster-snmp-*/README.snmp file (please note
the case-sensitivity;  correct casing in RHEL 6 (with the exception
of dlmod directive, is "redhatCluster" [*]).


The second thing to check (in case of enforcing SELinux), is a right
context of /etc/cluster/cluster.conf, as otherwise modclusterd
(backing cluster-snmp) is in trouble.  If the context does not look
like "unconfined_u:object_r:cluster_conf_t:s", apply restorecon.
Similarly, accepting input rule for TCP port 16851 is needed
(amongst other ports needed for proper function of the cluster)
if you have firewall enabled on the cluster nodes.

You can check that everything is OK in the backend by:

# echo \
  '<request API_version="1.0"><function_call name="status"/></request>' \
  | /usr/libexec/modcluster

This should return XML chunk containing

        <var mutable="false" name="success" type="boolean" value="true"/>


Now, everything should work, e.g.:

$ snmpwalk -v 2c -c public rhel63-64kvm-2 1.3.6.1.4.1.2312.8
SNMPv2-SMI::enterprises.2312.8.1.1.0 = INTEGER: 2
SNMPv2-SMI::enterprises.2312.8.2.1.0 = STRING: "rhel63-64kvm"
SNMPv2-SMI::enterprises.2312.8.2.2.0 = INTEGER: 1
SNMPv2-SMI::enterprises.2312.8.2.3.0 = STRING:
                                       "All services and nodes functional"
[...]

With more convenient representation (first lines apply only to an external
machine):

$ mkdir -p ~/.snmp/mibs
$ pushd ~/.snmp/mibs
$ GITBASE="http://git.fedorahosted.org/cgit/conga.git/plain"
$ curl "${GITBASE}/ricci/modules/cluster/clumon/REDHAT{,-CLUSTER}-MIB' -O
$ popd
$ snmpwalk -v 2c -c public rhel63-64kvm-2 REDHAT-CLUSTER-MIB::redhatCluster
REDHAT-CLUSTER-MIB::rhcMIBVersion.0 = INTEGER: 2
REDHAT-CLUSTER-MIB::rhcClusterName.0 = STRING: "rhel63-64kvm"
REDHAT-CLUSTER-MIB::rhcClusterStatusCode.0 = INTEGER: 1
REDHAT-CLUSTER-MIB::rhcClusterStatusDesc.0 = STRING:
                                        "All services and nodes functional"
[...]


Please let me know if this bug can be closed.


[*] there is currently one occurence with bad casing, which I fixed
    upstream:
http://git.fedorahosted.org/cgit/conga.git/commit/?id=af60fef8f05e50c5f31f0e8abddcab9e17b1bed3&ss=1#n45

Comment 9 Vadim Grinco 2012-08-15 10:37:57 UTC

Hi Jan,

Seems it was a net-snmp problem. With exactly same version of cluster-snmp and same config file but net-snmp upgraded to net-snmp-5.5-41.el6_3.1.x86_64 it started working just fine.

Thank you.

Comment 10 Vadim Grinco 2012-08-15 10:56:39 UTC

Actually I was wrong, sorry for confustion. Here's what I did:

# snmpwalk -v 2c -c engsys localhost 1.3.6.1.4.1.2312.8
SNMPv2-SMI::enterprises.2312.8.1.1.0 = INTEGER: 2
# echo   '<request API_version="1.0"><function_call name="status"/></request>'   | /usr/libexec/modcluster
<snip>
there was some result
</snip>

# snmpwalk -v 2c -c engsys localhost 1.3.6.1.4.1.2312.8
<snip>
could see all the values
</snip>

# getenforce 
Disabled

So it's apparently something with modclusterd or the system set up.

Comment 11 Vadim Grinco 2012-08-15 10:59:07 UTC

Ok, modclusterd was not running, and modcluster started it when I ran it manually. 

So it's definitely not a bug, but should probably be mentioned in the docs:

# grep -i modcluster /usr/share/doc/cluster-snmp-0.16.2/README* | wc -l
0

Comment 12 Jan Pokorný [poki] 2012-08-16 13:27:24 UTC

You are true, running modclusterd is implicitly (and towards user maybe
even secretly) presumed.

Whereas modcluster helper running modclusterd is acceptable from the
higher-level view (triggered indirectly, but in a secured/authorized
fashion), inherently insecure SNMP can be considered as a major blocker
to make the same automagic upon triggering SNMP GET.

Rather, the intended fix (reopening the bug) is:

a. mentioned this fact in README.snmpd as suggested

b. make this obstacle noticable from snmpd logs [*]

[*] currently with stopped modclusterd:
http://git.fedorahosted.org/cgit/conga.git/commit/?h=bz845243&id=48ced0c

(client-side)
$ snmpwalk -v 2c -c public rhel63-64kvm-2 REDHAT-CLUSTER-MIB::redhatCluster
REDHAT-CLUSTER-MIB::rhcMIBVersion.0 = INTEGER: 2
Error in packet.
Reason: (genError) A general failure occured

(server-side)
# tail -n1 /var/log/messages
Aug 16 15:10:06 rhel63-64kvm-2 snmpd[3795]: cluster-snmp: request cannot be resolved, is modclusterd running?

Comment 14 Jan Pokorný [poki] 2014-06-23 15:07:57 UTC

Pushed to 6.7 release consideration.

Comment 17 Jan Pokorný [poki] 2016-08-24 21:18:36 UTC

Known issue, can/should be reflected in KCS.