Bug 171622 - DLM won't start if a node changes its nodeid
DLM won't start if a node changes its nodeid
Status: CLOSED DUPLICATE of bug 171211
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: dlm (Show other bugs)
4
All Linux
medium Severity low
: ---
: ---
Assigned To: Christine Caulfield
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-10-24 10:59 EDT by Christine Caulfield
Modified: 2009-04-16 16:00 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-10-24 11:29:49 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Christine Caulfield 2005-10-24 10:59:08 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7

Description of problem:
The DLM doesn't clear its nodeid when it gets shutdown. 

If the cluster is shutdown and restarted - there is a strong possibility that nodes will have different nodeids than before. In this case the cluster will be running but all attempts to use the DLM will result in one of the following messages:

dlm: Can't bind to port 21064
dlm: cannot start lowcomms -98
or
dlm: cannot initialise comms layer
dlm: cannot start lowcomms -107

Although this sounds quite nasty I doubt it will happen very often. It requires that all nodes in a cluster be removed from that cluster but *not* shutdown and the DLM module *not* removed.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Join nodes to a cluster
2. cman_tool leave on all nodes
3. "cman_tool join" in a different order than before or change nodeIds using
   "cman_tool join -N<n>"
4. Attempt to use the DLM (eg start clvmd)

Of course this only happens if static nodeids are not specified in CCS.
  

Actual Results:  If the new cluster has the same nodeids as the old cluster you will see 

dlm: Can't bind to port 21064
dlm: cannot start lowcomms -98

If the node's old ID is not part of the new cluster then you will see:

dlm: cannot initialise comms layer
dlm: cannot start lowcomms -107


Expected Results:  flawless operation :)


Additional info:

A fix for this is in the STABLE branch of CVS.

The workaround is to always remove the DLM module when removing a node from the cluster if it is not to be shut down.
Comment 1 Corey Marthaler 2005-10-24 11:09:33 EDT
This looks similar to 171211.
Comment 2 Christine Caulfield 2005-10-24 11:29:49 EDT
Damn, I'm don't receive dlm bugs so I didn't know this had been submitted :(

*** This bug has been marked as a duplicate of 171211 ***

Note You need to log in before you can comment on or make changes to this bug.