Bug 144488

Summary: Disconnection of user-link not detected
Product: [Retired] Red Hat Cluster Suite Reporter: Lon Hohberger <lhh>
Component: redhat-config-clusterAssignee: Lon Hohberger <lhh>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 3CC: cluster-maint, rkenna
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-05-25 16:41:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch fixing behavior. none

Description Lon Hohberger 2005-01-07 17:10:18 UTC
Description of problem:

If Cluster Manager is configured to use a private LAN for heartbeating
and a public LAN for user services, detection of the a disconnected
link is not properly handled and no failover is performed.

Comment 1 Lon Hohberger 2005-01-07 17:12:04 UTC
Created attachment 109478 [details]
Patch fixing behavior.

This patch includes a backport from the LCP (http://sources.redhat.com/cluster)
resource group manager to detect the ethernet link.

Comment 2 Lon Hohberger 2005-01-07 17:13:24 UTC
This patch *does not work* if the interface is a bonded interface;
more work is necessary in order to do this properly.

The simplest solution is to check the link of each slave bonded to a
master interface and return a failure if all links are down.  This
should not be difficult.

Comment 3 Lon Hohberger 2005-01-07 21:02:01 UTC
Patch which fixes this and other bugzillas:

http://people.redhat.com/lhh/clumanager-1.2.23-0.4lhh.patch

Packages which fix this and several other current outstanding bugzillas:

http://people.redhat.com/lhh/clumanager-1.2.23-0.4lhh.i386.rpm
http://people.redhat.com/lhh/clumanager-1.2.23-0.4lhh.src.rpm


Comment 4 Lon Hohberger 2005-01-07 21:14:03 UTC
Note: The above are not Red Hat errata; they are test patches/packages.

Comment 5 Lon Hohberger 2005-01-11 18:35:24 UTC
The patch calls the 'ip' command which is also defined in svclib_ip,
rather than calling /sbin/ip.

This conflict causes the below log messages:

Jan 11 13:58:34 node1 clusvcmgrd: [10366]: <info> service info:
Starting IP address 10.1.1.1
Jan 11 13:58:34 node1 clusvcmgrd: [10366]: <err> service error: Usage:
ip [start, stop, status] serviceID
Jan 11 13:58:34 node1 clusvcmgrd: [10366]: <err> service error: Error
determining status of bond0
Jan 11 13:58:34 node1 clusvcmgrd: [10366]: <err> service error: Error
finding slaves of bond0
Jan 11 13:58:34 node1 clusvcmgrd: [10366]: <err> service error:
Network link not detected on bond0
Jan 11 13:58:34 node1 clusvcmgrd: [10366]: <err> service error: Cannot
start IP address 10.1.1.1; retrying ...



Comment 6 Lon Hohberger 2005-01-13 15:23:57 UTC
1.2.24-0.1 test fixes the above conflict and a variable name conflict
which caused IP addresses to be assigned to slave interfaces instead
of bond0:0, bond0:1, etc...  This was an artifact of the backport.

Patch:

http://people.redhat.com/lhh/clumanager-1.2.24.patch

Packages:

http://people.redhat.com/lhh/clumanager-1.2.24-0.1.i386.rpm
http://people.redhat.com/lhh/clumanager-1.2.24-0.1.src.rpm

Comment 7 Lon Hohberger 2005-02-11 04:47:01 UTC
Note -- this will have to be a configuration option so as not to break
people depending on the old behavior!

Comment 10 Lon Hohberger 2005-02-23 15:07:52 UTC
GUI support in.

Comment 14 Jay Turner 2005-05-25 16:41:02 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-047.html