Bug 144423

Summary: altname interfaces not working correctly
Product: [Retired] Red Hat Cluster Suite Reporter: Derek Anderson <danderso>
Component: dlmAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: cluster-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-16 12:34:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Derek Anderson 2005-01-06 21:46:44 UTC
Description of problem:
Defined altnames for each of the three cluster nodes as follows:
<clusternode name="link-10" votes="1">
  <altname name="link-10-pvt"/>
...
<clusternode name="link-11" votes="1">
  <altname name="link-11-pvt"/>
...
<clusternode name="link-12" votes="1">
  <altname name="link-12-pvt"/>

Set up the cluster on each with:
- modprobe dlm
- ccsd
- cman_tool join [1]
- fence_tool join
- clvmd
- vgscan; vgchange -ay
- mount -t gfs /dev/gfs/gfs0 /mnt/gfs0

[1] Two of the nodes logged "CMAN: Now using interface 2", the other
logged no message regarding which interface it was using.  Is there a
way I can query cman about this?  I ran 'tcpdump -i eth1' on all the
nodes and they were all exchanging 28 byte messages on UDP port 6089
every 5 seconds.

Now on link-12 issued 'ifdown eth1', the altname interface.
Now on link-11 issued 'umount /mnt/gfs1'.
Result at this point is a hang.

If on link-12 'ifup eth1' is issued it will eventually recover (umount
completes).

My expectation (and this is a guess) is that the primary interface
should be used until it fails, then if it does the altname interface
is used.  With this expectation changing the state of the alternate
interface should have no effect on cluster operation.  I originally
intended to test failover of the primary interface, but this should be
a simpler case so we'll start here.

Version-Release number of selected component (if applicable):
6.1 RPMS from Wed 15 Dec 2004 01:13:08 PM CST

How reproducible:
Yes.

Steps to Reproduce:
1.  Start cluster and mount a GFS
2.  Down the alternate eth interface of one node
3.  Attempt to umount from another node
  
Actual results:
Umount never completes until the downed interface is brought back up.

Expected results:
Usual cluster operation regardless of alternate interface status.

Additional info:

Comment 1 Christine Caulfield 2005-01-07 14:53:00 UTC
yeah,

multi-interface operation is currently a non-feature of the DLM.

Comment 2 Christine Caulfield 2005-01-11 16:40:44 UTC
My plan for fixing this is to use sctp as the transport for the DLM,
it's a non-trivial change but better than trying to kludge something
that would work with TCP.

In the meantime I suppose we should just use the bonding driver.

Comment 3 Christine Caulfield 2005-02-23 16:57:51 UTC
If anyone has some time to burn there's a drop-in replacement
lowcomms.c on homer ~patrick/public/lowcomms.c that uses SCTP as its
transport protocol and should support multipath.

There are a couple of things I want to sort out before releasing it on
an unsuspecting world and it probably needs a lot of QA as it's a
near-total rewrite of the DLM comms layer.

oh, and don't forget to "modprobe sctp" before using it, module
auto-loading seems not to work for it.

Comment 4 Christine Caulfield 2005-09-16 12:34:22 UTC

*** This bug has been marked as a duplicate of 108832 ***