Description of problem: Defined altnames for each of the three cluster nodes as follows: <clusternode name="link-10" votes="1"> <altname name="link-10-pvt"/> ... <clusternode name="link-11" votes="1"> <altname name="link-11-pvt"/> ... <clusternode name="link-12" votes="1"> <altname name="link-12-pvt"/> Set up the cluster on each with: - modprobe dlm - ccsd - cman_tool join [1] - fence_tool join - clvmd - vgscan; vgchange -ay - mount -t gfs /dev/gfs/gfs0 /mnt/gfs0 [1] Two of the nodes logged "CMAN: Now using interface 2", the other logged no message regarding which interface it was using. Is there a way I can query cman about this? I ran 'tcpdump -i eth1' on all the nodes and they were all exchanging 28 byte messages on UDP port 6089 every 5 seconds. Now on link-12 issued 'ifdown eth1', the altname interface. Now on link-11 issued 'umount /mnt/gfs1'. Result at this point is a hang. If on link-12 'ifup eth1' is issued it will eventually recover (umount completes). My expectation (and this is a guess) is that the primary interface should be used until it fails, then if it does the altname interface is used. With this expectation changing the state of the alternate interface should have no effect on cluster operation. I originally intended to test failover of the primary interface, but this should be a simpler case so we'll start here. Version-Release number of selected component (if applicable): 6.1 RPMS from Wed 15 Dec 2004 01:13:08 PM CST How reproducible: Yes. Steps to Reproduce: 1. Start cluster and mount a GFS 2. Down the alternate eth interface of one node 3. Attempt to umount from another node Actual results: Umount never completes until the downed interface is brought back up. Expected results: Usual cluster operation regardless of alternate interface status. Additional info:
yeah, multi-interface operation is currently a non-feature of the DLM.
My plan for fixing this is to use sctp as the transport for the DLM, it's a non-trivial change but better than trying to kludge something that would work with TCP. In the meantime I suppose we should just use the bonding driver.
If anyone has some time to burn there's a drop-in replacement lowcomms.c on homer ~patrick/public/lowcomms.c that uses SCTP as its transport protocol and should support multipath. There are a couple of things I want to sort out before releasing it on an unsuspecting world and it probably needs a lot of QA as it's a near-total rewrite of the DLM comms layer. oh, and don't forget to "modprobe sctp" before using it, module auto-loading seems not to work for it.
*** This bug has been marked as a duplicate of 108832 ***