Bug 518037 - IP resources don't work with xen-bridged bonded vlan interfaces
Summary: IP resources don't work with xen-bridged bonded vlan interfaces
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rgmanager
Version: 5.3
Hardware: All
OS: Linux
Target Milestone: rc
: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
Depends On:
Blocks: 499522 528130
TreeView+ depends on / blocked
Reported: 2009-08-18 14:57 UTC by Flavio Leitner
Modified: 2018-10-20 03:59 UTC (History)
12 users (show)

Fixed In Version: rgmanager-2.0.52-1.9
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2010-03-30 08:46:56 UTC

Attachments (Terms of Use)
suggested patch to fix ip.sh script (2.32 KB, patch)
2009-08-18 15:02 UTC, Flavio Leitner
no flags Details | Diff

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2010:0280 normal SHIPPED_LIVE rgmanager bug fix and enhancement update 2010-03-29 13:59:11 UTC

Description Flavio Leitner 2009-08-18 14:57:09 UTC
Description of problem:

The server has eth1 & eth2  bonded in LACP active-active mode.
The LACP trunk carries two VLANs:

bond0.2175 - the public network
bond0.3910 - the private cluster heartbeat network

In addition to configuring the virtual machines in Cluster Suite, 
I would also like to configure a simple HA service consisting of a 
single IP address.

I configured this service (lxlhrt85) in the normal way; however, when 
I attempt to start it it fails with the following messages:

Error determining status of bond0.2175
Error finding slaves of bond0.2175

Looking at the /usr/share/cluster/ip.sh script, it is clear that IP 
resources will never work with bridged bonded interfaces.

In ip.sh (LINE 446), it tries to determine if bond0.2175 is a MASTER 
using the "ip link list dev bond0.2175" command; however, this will 
never work if the interface is bridged.

The ip.sh script should be checking the pbond0.2175 interface if
bond0.2175 has been added to a bridge.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Start the service using vlan on top of bonding device.

Comment 3 Flavio Leitner 2009-08-18 17:01:00 UTC
Updating again with only public information, sorry the noise.

Jul 15 19:52:25 x3655 clurgmgrd[4849]: <notice> Starting stopped service
Jul 15 19:52:25 x3655 clurgmgrd: [4849]: <err> Error determining status
of bond0.350
Jul 15 19:52:25 x3655 clurgmgrd: [4849]: <err> Error finding slaves of
Jul 15 19:52:25 x3655 clurgmgrd[4849]: <notice> start on ip ""
returned 1 (generic error)

Although bond0.350 is on top of bonding, it is actually a VLAN interface and
doesn't have slaves, so I did a patch fixing link status checking on ip.sh.

This will work only with kernel 2.6.18-143.el5 or newer because of the
change listed below which adds .get_link to ethtool_ops, so vlan on top
of bonding device or pure bonding devices can return the link status as
any other real NIC interface.

- [net] bonding: update to upstream version 3.4.0 (Andy Gospodarek ) [462632]

The attached patch has a positive testing feedback.

Comment 4 Flavio Leitner 2009-09-18 16:31:07 UTC

Have you reviewed this? 
Would it be possible to request a z-stream for this?


Comment 9 Lon Hohberger 2009-09-28 21:59:21 UTC
Corey and I sanity-checked this patch and verified that it does not introduce any regressions.

Comment 10 Lon Hohberger 2009-09-28 22:00:34 UTC
Do I need to require kernel >= 2.6.18-143.el5 ?

Comment 11 Andy Gospodarek 2009-09-30 14:42:37 UTC
The kernel version for 5.3 was 2.6.18-128 and for 5.4 was 2.6.18-164.

2.6.18-143 wasn't really ever released in the wild.  I'm not sure if that will dictate what kind of requires line you want in the rpm, but you should certainly have one so it will get installed when the kernel is updated.

What will happen when the user boots the older kernel with this new package?  How badly will it fail?  (If no worse that what it is doing now, then I don't see a problem.)

Comment 13 Flavio Leitner 2009-10-07 21:23:12 UTC
> What will happen when the user boots the older kernel with this new package?
> How badly will it fail?  (If no worse that what it is doing now, then I 
> don't see a problem.)

The script was failing with VLAN devices on top of bonding because
it was trying to check for slaves which is incorrect. Simple bonding
ifaces were working though. Therefore, if the kernel requires isn't met
and the user boots an old kernel with the new script, a vlan device on
top of bonding will still be broken but in a different way though.
However, a simple bonding device that I assume it was working before
will be broken too because the ioctl() didn't exist, so the variable 
$linkstate will be empty and the script will always return link UP 
in line 480, see below:

475 ethernet_link_up()
476 {
477         declare linkstate=$(ethtool $1 | grep "Link detected:" |\
478                             awk '{print $3}')
480         [ -n "$linkstate" ] || return 0
482         case $linkstate in
483         yes)
484                 return 0
485                 ;;
486         *)
487                 return 1
488                 ;;
489         esac
491         return 1
492 }

Other kind of networking interfaces should keep working as before.


Comment 21 Chris Ward 2010-02-11 10:08:03 UTC
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.

Comment 25 errata-xmlrpc 2010-03-30 08:46:56 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.