Description of problem: If Cluster Manager is configured to use a private LAN for heartbeating and a public LAN for user services, detection of the a disconnected link is not properly handled and no failover is performed.
Created attachment 109478 [details] Patch fixing behavior. This patch includes a backport from the LCP (http://sources.redhat.com/cluster) resource group manager to detect the ethernet link.
This patch *does not work* if the interface is a bonded interface; more work is necessary in order to do this properly. The simplest solution is to check the link of each slave bonded to a master interface and return a failure if all links are down. This should not be difficult.
Patch which fixes this and other bugzillas: http://people.redhat.com/lhh/clumanager-1.2.23-0.4lhh.patch Packages which fix this and several other current outstanding bugzillas: http://people.redhat.com/lhh/clumanager-1.2.23-0.4lhh.i386.rpm http://people.redhat.com/lhh/clumanager-1.2.23-0.4lhh.src.rpm
Note: The above are not Red Hat errata; they are test patches/packages.
The patch calls the 'ip' command which is also defined in svclib_ip, rather than calling /sbin/ip. This conflict causes the below log messages: Jan 11 13:58:34 node1 clusvcmgrd: [10366]: <info> service info: Starting IP address 10.1.1.1 Jan 11 13:58:34 node1 clusvcmgrd: [10366]: <err> service error: Usage: ip [start, stop, status] serviceID Jan 11 13:58:34 node1 clusvcmgrd: [10366]: <err> service error: Error determining status of bond0 Jan 11 13:58:34 node1 clusvcmgrd: [10366]: <err> service error: Error finding slaves of bond0 Jan 11 13:58:34 node1 clusvcmgrd: [10366]: <err> service error: Network link not detected on bond0 Jan 11 13:58:34 node1 clusvcmgrd: [10366]: <err> service error: Cannot start IP address 10.1.1.1; retrying ...
1.2.24-0.1 test fixes the above conflict and a variable name conflict which caused IP addresses to be assigned to slave interfaces instead of bond0:0, bond0:1, etc... This was an artifact of the backport. Patch: http://people.redhat.com/lhh/clumanager-1.2.24.patch Packages: http://people.redhat.com/lhh/clumanager-1.2.24-0.1.i386.rpm http://people.redhat.com/lhh/clumanager-1.2.24-0.1.src.rpm
Note -- this will have to be a configuration option so as not to break people depending on the old behavior!
GUI support in.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-047.html