Bug 218719 - dlm: connect from non cluster node
dlm: connect from non cluster node
Status: CLOSED NOTABUG
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: dlm (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Christine Caulfield
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-12-06 18:37 EST by Jonathan Earl Brassow
Modified: 2009-04-16 16:01 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-12-07 09:53:13 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jonathan Earl Brassow 2006-12-06 18:37:49 EST
I run across this every couple times I attempt to start 'clvmd' on all machines
at once.

I have 3 x86 machines.  When they come up, I issue a 'clvmd' command on all of
them at once.  One succeeds, the other two hang.  (In fact, I issue the command
rather than running the init scripts because I repeatedly hit this bug in the
past using init scripts).

On neo-04, the command hangs and reports the following to the console:
dlm: connect from non cluster node

Other state includes:
[root@neo-04 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           5   2 run       -
[4 6 5]

DLM Lock Space:  "clvmd"                            10   3 join      S-6,20,2
[4 5]

[root@neo-04 ~]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   4    1    3   M   neo-04
   5    1    3   M   neo-05
   6    1    3   M   neo-06


On neo-05, the command succeeds and reports the following on the console:
dlm: connect from non cluster node

Other state includes:
[root@neo-05 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           5   2 run       -
[4 5 6]

DLM Lock Space:  "clvmd"                            10   3 update    U-4,1,4
[5 4]

[root@neo-05 ~]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   4    1    3   M   neo-04
   5    1    3   M   neo-05
   6    1    3   M   neo-06


On neo-06, the command hangs.
[root@neo-06 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           5   2 run       -
[4 6 5]

DLM Lock Space:  "clvmd"                             0   3 join      S-1,80,3
[]

[root@neo-06 ~]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   4    1    3   M   neo-04
   5    1    3   M   neo-05
   6    1    3   M   neo-06
Comment 1 Christine Caulfield 2006-12-07 03:59:53 EST
The usual cause of this is very odd IP settings. Often two interfaces on the
same ethernet can cause it because Linux sends messages out of the "wrong" one
and the DLM sees the source address as one that it doesn't know about.

Check the ifconfig & route output on the nodes to make sure they are sane.
Comment 2 Jonathan Earl Brassow 2006-12-07 09:53:13 EST
You're right, they didn't make any sense at all.  I guess I was just trusting
DHCP...  Closing bug.  I'll reopen if I come across it again.
Comment 3 Lon Hohberger 2007-10-26 11:04:06 EDT
FWIW 338511 is related to this bug; in fact, it's the same bug.  There's a patch
attached to 338511 against dlm-kernel which calls sock->ops->bind() to set the
source IP prior to making a connection in lowcomms.c

There's probably a reason this wasn't done.

Note You need to log in before you can comment on or make changes to this bug.