Description of problem: The server has eth1 & eth2 bonded in LACP active-active mode. The LACP trunk carries two VLANs: bond0.2175 - the public network bond0.3910 - the private cluster heartbeat network In addition to configuring the virtual machines in Cluster Suite, I would also like to configure a simple HA service consisting of a single IP address. I configured this service (lxlhrt85) in the normal way; however, when I attempt to start it it fails with the following messages: Error determining status of bond0.2175 Error finding slaves of bond0.2175 Looking at the /usr/share/cluster/ip.sh script, it is clear that IP resources will never work with bridged bonded interfaces. In ip.sh (LINE 446), it tries to determine if bond0.2175 is a MASTER using the "ip link list dev bond0.2175" command; however, this will never work if the interface is bridged. The ip.sh script should be checking the pbond0.2175 interface if bond0.2175 has been added to a bridge. Version-Release number of selected component (if applicable): rgmanager-2.0.46-1.el5_3.3-x86_64 How reproducible: Always Steps to Reproduce: 1. Start the service using vlan on top of bonding device.
Updating again with only public information, sorry the noise. Jul 15 19:52:25 x3655 clurgmgrd[4849]: <notice> Starting stopped service service:iptest Jul 15 19:52:25 x3655 clurgmgrd: [4849]: <err> Error determining status of bond0.350 Jul 15 19:52:25 x3655 clurgmgrd: [4849]: <err> Error finding slaves of bond0.350 Jul 15 19:52:25 x3655 clurgmgrd[4849]: <notice> start on ip "10.10.1.77" returned 1 (generic error) Although bond0.350 is on top of bonding, it is actually a VLAN interface and doesn't have slaves, so I did a patch fixing link status checking on ip.sh. This will work only with kernel 2.6.18-143.el5 or newer because of the change listed below which adds .get_link to ethtool_ops, so vlan on top of bonding device or pure bonding devices can return the link status as any other real NIC interface. 2.6.18-143.el5 - [net] bonding: update to upstream version 3.4.0 (Andy Gospodarek ) [462632] The attached patch has a positive testing feedback.
Lon, Have you reviewed this? Would it be possible to request a z-stream for this? thanks, Flavio
Corey and I sanity-checked this patch and verified that it does not introduce any regressions.
Do I need to require kernel >= 2.6.18-143.el5 ?
The kernel version for 5.3 was 2.6.18-128 and for 5.4 was 2.6.18-164. 2.6.18-143 wasn't really ever released in the wild. I'm not sure if that will dictate what kind of requires line you want in the rpm, but you should certainly have one so it will get installed when the kernel is updated. What will happen when the user boots the older kernel with this new package? How badly will it fail? (If no worse that what it is doing now, then I don't see a problem.)
> What will happen when the user boots the older kernel with this new package? > How badly will it fail? (If no worse that what it is doing now, then I > don't see a problem.) The script was failing with VLAN devices on top of bonding because it was trying to check for slaves which is incorrect. Simple bonding ifaces were working though. Therefore, if the kernel requires isn't met and the user boots an old kernel with the new script, a vlan device on top of bonding will still be broken but in a different way though. However, a simple bonding device that I assume it was working before will be broken too because the ioctl() didn't exist, so the variable $linkstate will be empty and the script will always return link UP in line 480, see below: 475 ethernet_link_up() 476 { 477 declare linkstate=$(ethtool $1 | grep "Link detected:" |\ 478 awk '{print $3}') 479 480 [ -n "$linkstate" ] || return 0 481 482 case $linkstate in 483 yes) 484 return 0 485 ;; 486 *) 487 return 1 488 ;; 489 esac 490 491 return 1 492 } Other kind of networking interfaces should keep working as before. Flavio
http://git.fedorahosted.org/git/?p=cluster.git;a=commit;h=e5c8d340f5a1ab76eb7940b11d4b594ec743ef6b
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~ RHEL 5.5 Beta has been released! There should be a fix present in this release that addresses your request. Please test and report back results here, by March 3rd 2010 (2010-03-03) or sooner. Upon successful verification of this request, post your results and update the Verified field in Bugzilla with the appropriate value. If you encounter any issues while testing, please describe them and set this bug into NEED_INFO. If you encounter new defects or have additional patch(es) to request for inclusion, please clone this bug per each request and escalate through your support representative.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0280.html