144423 – altname interfaces not working correctly

Bug 144423 - altname interfaces not working correctly

Summary: altname interfaces not working correctly

Keywords:
Status:	CLOSED DUPLICATE of bug 108832
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	dlm
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Christine Caulfield
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-01-06 21:46 UTC by Derek Anderson
Modified:	2009-04-16 19:59 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-09-16 12:34:22 UTC
Embargoed:

Attachments	(Terms of Use)

Description Derek Anderson 2005-01-06 21:46:44 UTC

Description of problem:
Defined altnames for each of the three cluster nodes as follows:
<clusternode name="link-10" votes="1">
  <altname name="link-10-pvt"/>
...
<clusternode name="link-11" votes="1">
  <altname name="link-11-pvt"/>
...
<clusternode name="link-12" votes="1">
  <altname name="link-12-pvt"/>

Set up the cluster on each with:
- modprobe dlm
- ccsd
- cman_tool join [1]
- fence_tool join
- clvmd
- vgscan; vgchange -ay
- mount -t gfs /dev/gfs/gfs0 /mnt/gfs0

[1] Two of the nodes logged "CMAN: Now using interface 2", the other
logged no message regarding which interface it was using.  Is there a
way I can query cman about this?  I ran 'tcpdump -i eth1' on all the
nodes and they were all exchanging 28 byte messages on UDP port 6089
every 5 seconds.

Now on link-12 issued 'ifdown eth1', the altname interface.
Now on link-11 issued 'umount /mnt/gfs1'.
Result at this point is a hang.

If on link-12 'ifup eth1' is issued it will eventually recover (umount
completes).

My expectation (and this is a guess) is that the primary interface
should be used until it fails, then if it does the altname interface
is used.  With this expectation changing the state of the alternate
interface should have no effect on cluster operation.  I originally
intended to test failover of the primary interface, but this should be
a simpler case so we'll start here.

Version-Release number of selected component (if applicable):
6.1 RPMS from Wed 15 Dec 2004 01:13:08 PM CST

How reproducible:
Yes.

Steps to Reproduce:
1.  Start cluster and mount a GFS
2.  Down the alternate eth interface of one node
3.  Attempt to umount from another node
  
Actual results:
Umount never completes until the downed interface is brought back up.

Expected results:
Usual cluster operation regardless of alternate interface status.

Additional info:

Comment 1 Christine Caulfield 2005-01-07 14:53:00 UTC

yeah,

multi-interface operation is currently a non-feature of the DLM.

Comment 2 Christine Caulfield 2005-01-11 16:40:44 UTC

My plan for fixing this is to use sctp as the transport for the DLM,
it's a non-trivial change but better than trying to kludge something
that would work with TCP.

In the meantime I suppose we should just use the bonding driver.

Comment 3 Christine Caulfield 2005-02-23 16:57:51 UTC

If anyone has some time to burn there's a drop-in replacement
lowcomms.c on homer ~patrick/public/lowcomms.c that uses SCTP as its
transport protocol and should support multipath.

There are a couple of things I want to sort out before releasing it on
an unsuspecting world and it probably needs a lot of QA as it's a
near-total rewrite of the DLM comms layer.

oh, and don't forget to "modprobe sctp" before using it, module
auto-loading seems not to work for it.

Comment 4 Christine Caulfield 2005-09-16 12:34:22 UTC


*** This bug has been marked as a duplicate of 108832 ***

Note You need to log in before you can comment on or make changes to this bug.