Bug 171622

Summary: DLM won't start if a node changes its nodeid
Product: [Retired] Red Hat Cluster Suite Reporter: Christine Caulfield <ccaulfie>
Component: dlmAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: low Docs Contact:
Priority: medium    
Version: 4CC: cluster-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-10-24 15:29:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Christine Caulfield 2005-10-24 14:59:08 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7

Description of problem:
The DLM doesn't clear its nodeid when it gets shutdown. 

If the cluster is shutdown and restarted - there is a strong possibility that nodes will have different nodeids than before. In this case the cluster will be running but all attempts to use the DLM will result in one of the following messages:

dlm: Can't bind to port 21064
dlm: cannot start lowcomms -98
or
dlm: cannot initialise comms layer
dlm: cannot start lowcomms -107

Although this sounds quite nasty I doubt it will happen very often. It requires that all nodes in a cluster be removed from that cluster but *not* shutdown and the DLM module *not* removed.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Join nodes to a cluster
2. cman_tool leave on all nodes
3. "cman_tool join" in a different order than before or change nodeIds using
   "cman_tool join -N<n>"
4. Attempt to use the DLM (eg start clvmd)

Of course this only happens if static nodeids are not specified in CCS.
  

Actual Results:  If the new cluster has the same nodeids as the old cluster you will see 

dlm: Can't bind to port 21064
dlm: cannot start lowcomms -98

If the node's old ID is not part of the new cluster then you will see:

dlm: cannot initialise comms layer
dlm: cannot start lowcomms -107


Expected Results:  flawless operation :)


Additional info:

A fix for this is in the STABLE branch of CVS.

The workaround is to always remove the DLM module when removing a node from the cluster if it is not to be shut down.

Comment 1 Corey Marthaler 2005-10-24 15:09:33 UTC
This looks similar to 171211.

Comment 2 Christine Caulfield 2005-10-24 15:29:49 UTC
Damn, I'm don't receive dlm bugs so I didn't know this had been submitted :(

*** This bug has been marked as a duplicate of 171211 ***