Bug 218719 - dlm: connect from non cluster node
Summary: dlm: connect from non cluster node
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: dlm
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-12-06 23:37 UTC by Jonathan Earl Brassow
Modified: 2009-04-16 20:01 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-12-07 14:53:13 UTC
Embargoed:


Attachments (Terms of Use)

Description Jonathan Earl Brassow 2006-12-06 23:37:49 UTC
I run across this every couple times I attempt to start 'clvmd' on all machines
at once.

I have 3 x86 machines.  When they come up, I issue a 'clvmd' command on all of
them at once.  One succeeds, the other two hang.  (In fact, I issue the command
rather than running the init scripts because I repeatedly hit this bug in the
past using init scripts).

On neo-04, the command hangs and reports the following to the console:
dlm: connect from non cluster node

Other state includes:
[root@neo-04 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           5   2 run       -
[4 6 5]

DLM Lock Space:  "clvmd"                            10   3 join      S-6,20,2
[4 5]

[root@neo-04 ~]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   4    1    3   M   neo-04
   5    1    3   M   neo-05
   6    1    3   M   neo-06


On neo-05, the command succeeds and reports the following on the console:
dlm: connect from non cluster node

Other state includes:
[root@neo-05 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           5   2 run       -
[4 5 6]

DLM Lock Space:  "clvmd"                            10   3 update    U-4,1,4
[5 4]

[root@neo-05 ~]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   4    1    3   M   neo-04
   5    1    3   M   neo-05
   6    1    3   M   neo-06


On neo-06, the command hangs.
[root@neo-06 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           5   2 run       -
[4 6 5]

DLM Lock Space:  "clvmd"                             0   3 join      S-1,80,3
[]

[root@neo-06 ~]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   4    1    3   M   neo-04
   5    1    3   M   neo-05
   6    1    3   M   neo-06

Comment 1 Christine Caulfield 2006-12-07 08:59:53 UTC
The usual cause of this is very odd IP settings. Often two interfaces on the
same ethernet can cause it because Linux sends messages out of the "wrong" one
and the DLM sees the source address as one that it doesn't know about.

Check the ifconfig & route output on the nodes to make sure they are sane.

Comment 2 Jonathan Earl Brassow 2006-12-07 14:53:13 UTC
You're right, they didn't make any sense at all.  I guess I was just trusting
DHCP...  Closing bug.  I'll reopen if I come across it again.

Comment 3 Lon Hohberger 2007-10-26 15:04:06 UTC
FWIW 338511 is related to this bug; in fact, it's the same bug.  There's a patch
attached to 338511 against dlm-kernel which calls sock->ops->bind() to set the
source IP prior to making a connection in lowcomms.c

There's probably a reason this wasn't done.


Note You need to log in before you can comment on or make changes to this bug.