218719 – dlm: connect from non cluster node

Bug 218719 - dlm: connect from non cluster node

Summary: dlm: connect from non cluster node

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	dlm
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Christine Caulfield
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-12-06 23:37 UTC by Jonathan Earl Brassow
Modified:	2009-04-16 20:01 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2006-12-07 14:53:13 UTC
Embargoed:

Attachments	(Terms of Use)

Description Jonathan Earl Brassow 2006-12-06 23:37:49 UTC

I run across this every couple times I attempt to start 'clvmd' on all machines
at once.

I have 3 x86 machines.  When they come up, I issue a 'clvmd' command on all of
them at once.  One succeeds, the other two hang.  (In fact, I issue the command
rather than running the init scripts because I repeatedly hit this bug in the
past using init scripts).

On neo-04, the command hangs and reports the following to the console:
dlm: connect from non cluster node

Other state includes:
[root@neo-04 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           5   2 run       -
[4 6 5]

DLM Lock Space:  "clvmd"                            10   3 join      S-6,20,2
[4 5]

[root@neo-04 ~]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   4    1    3   M   neo-04
   5    1    3   M   neo-05
   6    1    3   M   neo-06


On neo-05, the command succeeds and reports the following on the console:
dlm: connect from non cluster node

Other state includes:
[root@neo-05 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           5   2 run       -
[4 5 6]

DLM Lock Space:  "clvmd"                            10   3 update    U-4,1,4
[5 4]

[root@neo-05 ~]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   4    1    3   M   neo-04
   5    1    3   M   neo-05
   6    1    3   M   neo-06


On neo-06, the command hangs.
[root@neo-06 ~]# cat /proc/cluster/services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           5   2 run       -
[4 6 5]

DLM Lock Space:  "clvmd"                             0   3 join      S-1,80,3
[]

[root@neo-06 ~]# cat /proc/cluster/nodes
Node  Votes Exp Sts  Name
   4    1    3   M   neo-04
   5    1    3   M   neo-05
   6    1    3   M   neo-06

Comment 1 Christine Caulfield 2006-12-07 08:59:53 UTC

The usual cause of this is very odd IP settings. Often two interfaces on the
same ethernet can cause it because Linux sends messages out of the "wrong" one
and the DLM sees the source address as one that it doesn't know about.

Check the ifconfig & route output on the nodes to make sure they are sane.

Comment 2 Jonathan Earl Brassow 2006-12-07 14:53:13 UTC

You're right, they didn't make any sense at all.  I guess I was just trusting
DHCP...  Closing bug.  I'll reopen if I come across it again.

Comment 3 Lon Hohberger 2007-10-26 15:04:06 UTC

FWIW 338511 is related to this bug; in fact, it's the same bug.  There's a patch
attached to 338511 against dlm-kernel which calls sock->ops->bind() to set the
source IP prior to making a connection in lowcomms.c

There's probably a reason this wasn't done.

Note You need to log in before you can comment on or make changes to this bug.