Description of problem: Bonding does not work over NICs supported by e1000e: if you brake/restore physical links of bonding slaves one by one - network won't work anymore. Version-Release number of selected component (if applicable): Initially it was tested on 2.6.18-92.1.1.el5 x86_64 kernel, but we've checked 2.6.18-126 RHEL test kernel, the result was the same. Moreover, 2.6.29-rc1 kernel also does not work (mii reported correct status, but ping does not work). How reproducible: Bond1 has eth2 and eth3 as members in the Active-Backup bonding mode First, eth2 is active. When we disable eth2's uplink switch, it fails over to eth3 correctly and working. When we enable eth2, then disable eth3, bond1 failed back to eth2 and it shows eth2 as active nic. However, ping stops working. Actual results: ping stops working Expected results: ping continues to work fine Additional info: Both eth2 and eth3 are: Ethernet controller: Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter (rev 06) 0200: 8086:10da (rev 06) Subsystem: 103c:1717 [root@host ~]# cat /proc/net/bonding/bond1 Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: eth2 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth2 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:17:a4:77:0c:64 Slave Interface: eth3 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:17:a4:77:0c:66 [root@host ~]# cat /etc/modprobe.conf alias eth0 bnx2 alias eth1 bnx2 alias eth2 e1000 alias eth3 e1000 alias eth4 e1000 alias eth5 e1000 alias eth6 e1000 alias eth7 e1000 alias scsi_hostadapter cciss alias scsi_hostadapter1 usb-storage # Added For Virtuozzo Trunks # bond0 is for Management Team alias bond0 bonding options bond0 miimon=100 mode=1 max_bonds=3 # bond1 is for DATA Trunk Team alias bond1 bonding options bond1 miimon=100 mode=1 # bond2 is for NFS Trunk Team alias bond2 bonding options bond2 miimon=100 mode=1 # Disable IPV6 alias net-pf-10 off alias ipv6 off options ip_conntrack ip_conntrack_disable_ve0=1 ###Timeline below:### [root@host ~]# date Tue Dec 23 09:10:30 PST 2008 ### SHUT OFF Virtual Connect Switch for the eth2 uplink [root@host ~]# date Tue Dec 23 09:10:42 PST 2008 [root@host ~]# cat /proc/net/bonding/bond1 Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: eth3 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth2 MII Status: up Link Failure Count: 1 Permanent HW addr: 00:17:a4:77:0c:64 Slave Interface: eth3 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:17:a4:77:0c:66 [root@host network-scripts]# ping 10.58.64.1 PING 10.58.64.1 (10.58.64.1) 56(84) bytes of data. 64 bytes from 10.58.64.1: icmp_seq=1 ttl=255 time=1.58 ms 64 bytes from 10.58.64.1: icmp_seq=2 ttl=255 time=0.380 ms --- 10.58.64.1 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.380/0.980/1.581/0.601 ms [root@host network-scripts]# date Tue Dec 23 09:15:30 PST 2008 ### Turned ON Virtual Connect Switch for the eth2 uplink [root@host network-scripts]# date Tue Dec 23 09:15:41 PST 2008 [root@host network-scripts]# cat /proc/net/bonding/bond1 Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: eth3 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth2 MII Status: up Link Failure Count: 1 Permanent HW addr: 00:17:a4:77:0c:64 Slave Interface: eth3 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:17:a4:77:0c:66 [root@host network-scripts]# ping 10.58.64.1 PING 10.58.64.1 (10.58.64.1) 56(84) bytes of data. 64 bytes from 10.58.64.1: icmp_seq=1 ttl=255 time=0.443 ms 64 bytes from 10.58.64.1: icmp_seq=2 ttl=255 time=0.436 ms --- 10.58.64.1 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.436/0.439/0.443/0.021 ms [root@host network-scripts]# date Tue Dec 23 09:17:06 PST 2008 ### SHUT OFF Virtual Connect Switch for the eth3 uplink [root@host network-scripts]# date Tue Dec 23 09:17:16 PST 2008 [root@host network-scripts]# cat /proc/net/bonding/bond1 Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: eth2 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth2 MII Status: up Link Failure Count: 1 Permanent HW addr: 00:17:a4:77:0c:64 Slave Interface: eth3 MII Status: up Link Failure Count: 1 Permanent HW addr: 00:17:a4:77:0c:66 [root@host network-scripts]# ping 10.58.64.1 PING 10.58.64.1 (10.58.64.1) 56(84) bytes of data. >From 10.58.64.251 icmp_seq=9 Destination Host Unreachable >From 10.58.64.251 icmp_seq=10 Destination Host Unreachable >From 10.58.64.251 icmp_seq=11 Destination Host Unreachable >From 10.58.64.251 icmp_seq=13 Destination Host Unreachable >From 10.58.64.251 icmp_seq=14 Destination Host Unreachable >From 10.58.64.251 icmp_seq=15 Destination Host Unreachable [root@host log]# date Tue Dec 23 09:19:11 PST 2008 ############################################################# ############################################################# ###/var/log/messages content during the testing period Dec 23 09:10:36 host kernel: 0000:00:05.0: eth2: Link is Down Dec 23 09:10:36 host kernel: bonding: bond1: link status definitely down for interface eth2, disabling it Dec 23 09:10:36 host kernel: bonding: bond1: making interface eth3 the new active one. Dec 23 09:10:36 host kernel: device eth2 left promiscuous mode Dec 23 09:10:36 host kernel: printk: 3 messages suppressed. Dec 23 09:10:36 host kernel: audit(1230052236.788:28): dev=eth2 prom=0 old_prom=256 auid=4294967295 ses=4294967295 Dec 23 09:10:36 host kernel: device eth3 entered promiscuous mode Dec 23 09:10:36 host kernel: audit(1230052236.788:29): dev=eth3 prom=256 old_prom=0 auid=4294967295 ses=4294967295 Dec 23 09:10:39 host kernel: 0000:00:05.0: eth2: Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX Dec 23 09:10:39 host kernel: bonding: bond1: link status definitely up for interface eth2. Dec 23 09:17:12 host kernel: 0000:00:05.0: eth3: Link is Down Dec 23 09:17:13 host kernel: bonding: bond1: link status definitely down for interface eth3, disabling it Dec 23 09:17:13 host kernel: bonding: bond1: making interface eth2 the new active one. Dec 23 09:17:13 host kernel: device eth3 left promiscuous mode Dec 23 09:17:13 host kernel: audit(1230052633.024:30): dev=eth3 prom=0 old_prom=256 auid=4294967295 ses=4294967295 Dec 23 09:17:13 host kernel: device eth2 entered promiscuous mode Dec 23 09:17:13 host kernel: audit(1230052633.032:31): dev=eth2 prom=256 old_prom=0 auid=4294967295 ses=4294967295 Dec 23 09:17:15 host kernel: 0000:00:05.0: eth3: Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX Dec 23 09:17:15 host kernel: bonding: bond1: link status definitely up for interface eth3.
i've got an update here: the 2.6.29-rc4 kernel + 3 patches provided by David Graham _do_ work. For details see: http://bugzilla.kernel.org/show_bug.cgi?id=12570 Patches can be found here: http://archive.netbsd.se/?ml=linux-netdev&a=2009-02&m=9743331
Created attachment 335763 [details] e1000e-fixup-link-problems-with-bonding.patch Sounds great. Thanks for posting that update here as well. :-) The patches that will be needed from upstream to complete this request are: commit 5df3f0eaf8b236cc785e2733a3df1e5c84e4aad8 Author: dave graham <david.graham> Date: Tue Feb 10 12:51:41 2009 +0000 e1000e: Disable dynamic clock gating for 82571 per si errata. commit 573cca8c6fdbf6bd2dae8f9e9b66931990849c83 Author: dave graham <david.graham> Date: Tue Feb 10 12:52:05 2009 +0000 e1000e: remove RXSEQ link monitoring for serdes commit c9523379d6000f379a84b6b970efb8782c128071 Author: dave graham <david.graham> Date: Tue Feb 10 12:52:28 2009 +0000 e1000e: Serdes - attempt autoneg when link restored. Attached is a patch that should address these changes. Feel free to try the patch on the latest RHEL5.3 source, but I will also add these changes to my test kernels and post here when those are available for download.
Hi, I have the same problem with RedHat 4 and e1000 card module. my conf os version: Red Hat Enterprise Linux AS release 4 (Nahant) kernel: 2.6.9-34.EL lspci | grep -i ethernet 06:05.0 Ethernet controller: Intel Corp. 82546GB Gigabit Ethernet Controller (rev 03) 06:05.1 Ethernet controller: Intel Corp. 82546GB Gigabit Ethernet Controller (rev 03) the patch applies to my configuration ? thank you.
My test kernels have been updated to include a patch for this bugzilla. http://people.redhat.com/agospoda/#rhel5 Please test them and report back your results. Without immediate feedback there is a good chance this or any other fix for this driver will not be included in the upcoming update.
This sounds like a duplicate of bug 492270. Please reopen if the test kernels in comment #4 or the patches in bug 492270 do not resolve this. *** This bug has been marked as a duplicate of bug 492270 ***