Bug 474297

Summary: [RFE] (with patch) cluster network connectivity checker
Product: Red Hat Enterprise Linux 5 Reporter: David Ash <dash>
Component: cmanAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED WONTFIX QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: cluster-maint, iannis
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-25 19:33:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
cluster network connectivity checker python script none

Description David Ash 2008-12-03 07:33:18 UTC
Created attachment 325493 [details]
cluster network connectivity checker python script

Description of problem:

This is an RFE for a simple script which checks network connectivity for a basic sanity check after all application cluster services are up and going on all nodes in the cluster.

I wrote example script in python (originally wrote it in bash script but xml parsing was too ugly).


Version-Release number of selected component (if applicable):

written against and tested on various Cluster Suite for rhel5 machines


How reproducible:

After starting all appropriate cluster services run clunetstat.py


Steps to Reproduce: (on all nodes)
1. #service cman start
2. #service rgmanager start
3. #clunetstat.py
  
Actual results:


Expected results:


Additional info:

This python script has dependencies on netcat (nc).  I don't think we should add any more dependencies on this in the rpm as I don't like adding lots of dependencies if it's not necessary but leave it as a kind of loose dependency.  It will simply exit with error if netcat isn't installed so I see this as good enough.

Comment 1 Christine Caulfield 2008-12-10 13:45:36 UTC
I'm not sure how much use this is as it stands. It checks that everything is running - but if it's running then you already know that. It also assumes you are running all the subsystems, which isn't always true.

A far more useful variant would be to check the network ports BEFORE the cluster software was started. That could be used to diagnose potential problems. Which seem to be more common I think.

Also, I'm less than happy about random applications poking into sockets (eg the DLM) that it has no business poking around in. If you want to check that the port is open there are neater and less intrusive ways of doing it (see nmap for instance).

Added to which there are a lot of hard-coded things in there that can be changed. eg the port number that OpenAIS runs on, the port number DLM runs on (also the DLM can use SCTP, though we don't support it) and the location of cluster.conf.

Comment 2 David Ash 2008-12-10 21:57:40 UTC
I thought it might pick up possible issues like split brain.  I might rewrite this with nmap then.