Bug 451654

Summary: Cluster Service Cannot Start With Xen Bridge Enabled On Bonded Interface
Product: Red Hat Enterprise Linux 5 Reporter: Shane Bradley <sbradley>
Component: rgmanagerAssignee: Lon Hohberger <lhh>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.1CC: cluster-maint, edamato, tao
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 20:56:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 462680    
Attachments:
Description Flags
Patch to allow vips on bonded interface
none
Patch to allow vips on bonded interface none

Description Shane Bradley 2008-06-16 14:06:03 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b5) Gecko/2008043010 Fedora/3.0-0.60.beta5.fc9 Firefox/3.0b5

Description of problem:

Problem Description: RGmanager IP resources will not work on Xen dom0 systems whose bonded interfaces is bridged following these instructions:

  http://kbase.redhat.com/faq/FAQ_103_11147.shtm

using netdev=bondX.  For example on a Xen system that has bond0 bridged for its guests, the actual interfaces that is the master of the slaves is pbond0.  The /usr/share/cluster/ip.sh script will fail to find the slaves because it looks for them on bond0.  

The problem is in the findSlaves() function (line 462) where it finds the slaves using

   /sbin/ip link list | grep "master $mastif"

where $mastif = bond0.  However on a xen system, the slave interfaces list pbond0 as their master:

   [root@johnny5 ~]# /sbin/ip link list | grep "master bond0"
   [root@johnny5 ~]# /sbin/ip link list | grep "master pbond0"
   2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master pbond0 qlen 1000
   3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master pbond0 qlen 1000

This results in an error when the first status check is run on the ip address:

   Jun  9 12:31:07 helios clurgmgrd[30493]: <notice> Starting stopped service service:Eprisa
   Jun  9 12:31:07 helios clurgmgrd: [30493]: <err> Error determining status of bond0
   Jun  9 12:31:07 helios clurgmgrd: [30493]: <err> Error finding slaves of bond0
   Jun  9 12:31:07 helios clurgmgrd[30493]: <notice> start on ip "192.168.69.76" returned 1 (generic error)
   Jun  9 12:31:07 helios clurgmgrd[30493]: <warning> #68: Failed to start service:Eprisa; return value: 1

Version-Release number of selected component (if applicable):
rgmanager-2.0.38-2

How reproducible:
Always


Steps to Reproduce:
1) Create a cluster of dom0's
2) Create a bonded interface
3) Bridge the bonded interface using above kbase
4) Create service with ip on the subnet of the bonded interface
5) Start service


Actual Results:
Service starts but first status check fails saying

      <err> Error determining status of bond0
      <err> Error finding slaves of bond0


Expected Results:
Service should start and status checks should succeed

Additional info:

Comment 1 Shane Bradley 2008-06-16 14:08:03 UTC
Created attachment 309504 [details]
Patch to allow vips on bonded interface

On a xen systems, the actual interface that is listed as the master of the
slaves is pbond0.  The ip script checks for slaves on bondX however and fails. 


This patch fixes that issue.

Comment 2 Lon Hohberger 2008-06-16 17:50:58 UTC
Hi Shane,

Has this been tested on bonded interfaces w/o Xen as well?  If not, it's
something we need to test.  If so, I can apply immediately.

Comment 4 John Ruemker 2008-06-16 18:41:10 UTC
Created attachment 309523 [details]
Patch to allow vips on bonded interface

Comment 6 RHEL Program Management 2008-06-19 17:31:32 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 9 Lon Hohberger 2008-06-20 15:17:53 UTC
Patch is in tree.

Comment 11 Fedora Update System 2008-07-30 20:03:39 UTC
gfs2-utils-2.03.05-1.fc9, rgmanager-2.03.05-1.fc9, cman-2.03.05-1.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 15 errata-xmlrpc 2009-01-20 20:56:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0101.html