Bug 567604
Summary: | [Regression] bonding: 802.3ad problems with link detection | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Simon Fayer <simon.fayer05> | ||||
Component: | kernel | Assignee: | Andy Gospodarek <agospoda> | ||||
Status: | CLOSED ERRATA | QA Contact: | Network QE <network-qe> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 5.4 | CC: | andriusb, bandan.das, cevich, chas.horvath, dan.duval, dwu, dyasny, jcm, jolsa, jparadis, jwest, jwilson, k.georgiou, kzhang, liko, lzheng, mvattakk, peterm, robert.evans, sardella, sassmann, smarkovi, stanislav.polasek, syeghiay, tao, tgraf | ||||
Target Milestone: | rc | Keywords: | Regression | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-01-13 21:08:22 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Simon Fayer
2010-02-23 11:46:53 UTC
Which hardware controls which devices? Do eth0 and eth1 use the e1000e driver or the igb driver? I'd like to look at those drivers as well as the bonding code. eth0 & eth1 are using the igb driver, eth2 & eth3 are using the e1000e driver. Thanks, we will take a look at these two drivers and the possible differences in their return codes. Without the bond_3ad_adapter_speed_changed function I don't see how it can be fixed. For a example in the case of two ports and one driver what happens if one port is dissconnected during boot and is connected later on? I did some triage on this and it looks like this is our problem. We took an update to version 3.4.0 in April 2009. This change included the following upstream commit: commit f0c76d61779b153dbfb955db3f144c62d02173c2 Author: Jay Vosburgh <fubar.com> Date: Wed Jul 2 18:21:58 2008 -0700 bonding: refactor mii monitor Time went by and it seems a bug was discovered with that commit, so the code to check speed and duplex and update it was added back here: commit 17d04500e2528217de5fe967599f98ee84348a9c Author: Jay Vosburgh <fubar.com> Date: Wed Mar 18 18:38:25 2009 -0700 bonding: Fix updating of speed/duplex changes This patch corrects an omission from the following commit: commit f0c76d61779b153dbfb955db3f144c62d02173c2 Author: Jay Vosburgh <fubar.com> Date: Wed Jul 2 18:21:58 2008 -0700 bonding: refactor mii monitor The un-refactored code checked the link speed and duplex of every slave on every pass; the refactored code did not do so. The 802.3ad and balance-alb/tlb modes utilize the speed and duplex information, and require it to be kept up to date. This patch adds a notifier check to perform the appropriate updating when the slave device speed changes. Created attachment 398903 [details]
rhel5-bonding-cleanup.patch
I suspect this patch will resolve the issue. Any test testing that can be done on it would be greatly appreciated.
That patch does seem to correctly fix the problem (tested with kernel-2.6.18-164.11.1.el5). Awesome! Thanks for testing that Simon. New test kernels available here: http://people.redhat.com/agospoda/#rhel5 Any feedback you can provide is greatly apprecaited. Your latest test kernel (2.6.18-194.el5.gtest.86) does seem to resolve the issue correctly. Thanks, again! This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. in kernel-2.6.18-199.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Please update the appropriate value in the Verified field (cf_verified) to indicate this fix has been successfully verified. Include a comment with verification details. Kernel-2.6.18-199.el5 does seem to fix this problem (Verified against the original set-up described in the opening comment of this ticket). This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. *** Bug 602071 has been marked as a duplicate of this bug. *** Hi andy, customer from it955673 agreed to help test this with our rhel5.5 kernel, Do you have a place the customer could download our test kernel from? Thanks, wmg, a patch to resolve this issue can be found in my test kernels here: (listed in comment #9) http://people.redhat.com/agospoda/#rhel5 and in the latest development kernels here: (listed in comment #17) http://people.redhat.com/jwilson/el5/ Please check the comments for links when a bug is in the MODIFIED state as a link is often listed for kernel bugs. (In reply to comment #28) > wmg, a patch to resolve this issue can be found in my test kernels here: > > (listed in comment #9) > http://people.redhat.com/agospoda/#rhel5 > > and in the latest development kernels here: > > (listed in comment #17) > http://people.redhat.com/jwilson/el5/ > > Please check the comments for links when a bug is in the MODIFIED state as a > link is often listed for kernel bugs. Andy, customer confirmed this works fine on 206.el5 kernel Thanks, Thanks for the feedback, wmg! Stratus has encountered this problem also. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html |