Bug 483034

Summary: Bonding does not work over e1000e.
Product: Red Hat Enterprise Linux 5 Reporter: Konstantin Khorenko <khorenko>
Component: kernelAssignee: Andy Gospodarek <agospoda>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.2CC: david.graham, jay.vosburgh, jesse.brandeburg, jlbenavidesm, mstanichenko, peterm, sean
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-04-24 00:44:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
e1000e-fixup-link-problems-with-bonding.patch none

Description Konstantin Khorenko 2009-01-29 13:08:47 UTC
Description of problem:
Bonding does not work over NICs supported by e1000e: if you brake/restore physical links of bonding slaves one by one - network won't work anymore.

Version-Release number of selected component (if applicable):
Initially it was tested on 2.6.18-92.1.1.el5 x86_64 kernel, but we've checked 2.6.18-126 RHEL test kernel, the result was the same. Moreover, 2.6.29-rc1 kernel also does not work (mii reported correct status, but ping does not work).

How reproducible:
Bond1 has eth2 and eth3 as members in the Active-Backup bonding mode
First, eth2 is active.   When we disable eth2's uplink switch, it fails over to eth3 correctly and working.
When we enable eth2, then disable eth3, bond1 failed back to eth2 and it shows eth2 as active nic.  However, ping stops working.

Actual results:
ping stops working

Expected results:
ping continues to work fine

Additional info:
Both eth2 and eth3 are:
Ethernet controller: Intel Corporation 82571EB Quad Port Gigabit Mezzanine Adapter (rev 06)

0200: 8086:10da (rev 06)
Subsystem: 103c:1717

[root@host ~]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth2
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:17:a4:77:0c:64

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:17:a4:77:0c:66

[root@host ~]# cat /etc/modprobe.conf
alias eth0 bnx2
alias eth1 bnx2
alias eth2 e1000
alias eth3 e1000
alias eth4 e1000
alias eth5 e1000
alias eth6 e1000
alias eth7 e1000
alias scsi_hostadapter cciss
alias scsi_hostadapter1 usb-storage
# Added For Virtuozzo Trunks
# bond0 is for Management Team
alias bond0 bonding
options bond0 miimon=100 mode=1 max_bonds=3

# bond1 is for DATA Trunk Team
alias bond1 bonding
options bond1 miimon=100 mode=1

# bond2 is for NFS Trunk Team
alias bond2 bonding
options bond2 miimon=100 mode=1


# Disable IPV6
alias net-pf-10 off
alias ipv6 off

options ip_conntrack ip_conntrack_disable_ve0=1


###Timeline below:###

[root@host ~]# date
Tue Dec 23 09:10:30 PST 2008

### SHUT OFF Virtual Connect Switch for the eth2 uplink

[root@host ~]# date
Tue Dec 23 09:10:42 PST 2008

[root@host ~]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth3
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth2
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:17:a4:77:0c:64

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:17:a4:77:0c:66


[root@host network-scripts]# ping 10.58.64.1
PING 10.58.64.1 (10.58.64.1) 56(84) bytes of data.
64 bytes from 10.58.64.1: icmp_seq=1 ttl=255 time=1.58 ms
64 bytes from 10.58.64.1: icmp_seq=2 ttl=255 time=0.380 ms

--- 10.58.64.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.380/0.980/1.581/0.601 ms


[root@host network-scripts]# date
Tue Dec 23 09:15:30 PST 2008
### Turned ON Virtual Connect Switch for the eth2 uplink
[root@host network-scripts]# date
Tue Dec 23 09:15:41 PST 2008

[root@host network-scripts]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth3
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth2
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:17:a4:77:0c:64

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:17:a4:77:0c:66

[root@host network-scripts]# ping 10.58.64.1
PING 10.58.64.1 (10.58.64.1) 56(84) bytes of data.
64 bytes from 10.58.64.1: icmp_seq=1 ttl=255 time=0.443 ms
64 bytes from 10.58.64.1: icmp_seq=2 ttl=255 time=0.436 ms

--- 10.58.64.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.436/0.439/0.443/0.021 ms


[root@host network-scripts]# date
Tue Dec 23 09:17:06 PST 2008

### SHUT OFF Virtual Connect Switch for the eth3 uplink

[root@host network-scripts]# date
Tue Dec 23 09:17:16 PST 2008

[root@host network-scripts]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth2
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth2
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:17:a4:77:0c:64

Slave Interface: eth3
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:17:a4:77:0c:66

[root@host network-scripts]# ping 10.58.64.1
PING 10.58.64.1 (10.58.64.1) 56(84) bytes of data.
>From 10.58.64.251 icmp_seq=9 Destination Host Unreachable
>From 10.58.64.251 icmp_seq=10 Destination Host Unreachable
>From 10.58.64.251 icmp_seq=11 Destination Host Unreachable
>From 10.58.64.251 icmp_seq=13 Destination Host Unreachable
>From 10.58.64.251 icmp_seq=14 Destination Host Unreachable
>From 10.58.64.251 icmp_seq=15 Destination Host Unreachable

[root@host log]# date
Tue Dec 23 09:19:11 PST 2008

#############################################################
#############################################################
###/var/log/messages content during the testing period
Dec 23 09:10:36 host kernel: 0000:00:05.0: eth2: Link is Down
Dec 23 09:10:36 host kernel: bonding: bond1: link status definitely down for interface eth2, disabling it
Dec 23 09:10:36 host kernel: bonding: bond1: making interface eth3 the new active one.
Dec 23 09:10:36 host kernel: device eth2 left promiscuous mode
Dec 23 09:10:36 host kernel: printk: 3 messages suppressed.
Dec 23 09:10:36 host kernel: audit(1230052236.788:28): dev=eth2 prom=0 old_prom=256 auid=4294967295 ses=4294967295
Dec 23 09:10:36 host kernel: device eth3 entered promiscuous mode
Dec 23 09:10:36 host kernel: audit(1230052236.788:29): dev=eth3 prom=256 old_prom=0 auid=4294967295 ses=4294967295
Dec 23 09:10:39 host kernel: 0000:00:05.0: eth2: Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Dec 23 09:10:39 host kernel: bonding: bond1: link status definitely up for interface eth2.
Dec 23 09:17:12 host kernel: 0000:00:05.0: eth3: Link is Down
Dec 23 09:17:13 host kernel: bonding: bond1: link status definitely down for interface eth3, disabling it
Dec 23 09:17:13 host kernel: bonding: bond1: making interface eth2 the new active one.
Dec 23 09:17:13 host kernel: device eth3 left promiscuous mode
Dec 23 09:17:13 host kernel: audit(1230052633.024:30): dev=eth3 prom=0 old_prom=256 auid=4294967295 ses=4294967295
Dec 23 09:17:13 host kernel: device eth2 entered promiscuous mode
Dec 23 09:17:13 host kernel: audit(1230052633.032:31): dev=eth2 prom=256 old_prom=0 auid=4294967295 ses=4294967295
Dec 23 09:17:15 host kernel: 0000:00:05.0: eth3: Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Dec 23 09:17:15 host kernel: bonding: bond1: link status definitely up for interface eth3.

Comment 1 Konstantin Khorenko 2009-03-17 16:04:36 UTC
i've got an update here: the 2.6.29-rc4 kernel + 3 patches provided by David Graham _do_ work.

For details see: http://bugzilla.kernel.org/show_bug.cgi?id=12570
Patches can be found here:
http://archive.netbsd.se/?ml=linux-netdev&a=2009-02&m=9743331

Comment 2 Andy Gospodarek 2009-03-18 20:50:04 UTC
Created attachment 335763 [details]
e1000e-fixup-link-problems-with-bonding.patch

Sounds great.  Thanks for posting that update here as well.  :-)

The patches that will be needed from upstream to complete this request are:


commit 5df3f0eaf8b236cc785e2733a3df1e5c84e4aad8
Author: dave graham <david.graham>
Date:   Tue Feb 10 12:51:41 2009 +0000

    e1000e: Disable dynamic clock gating for 82571 per si errata.

commit 573cca8c6fdbf6bd2dae8f9e9b66931990849c83
Author: dave graham <david.graham>
Date:   Tue Feb 10 12:52:05 2009 +0000

    e1000e: remove RXSEQ link monitoring for serdes

commit c9523379d6000f379a84b6b970efb8782c128071
Author: dave graham <david.graham>
Date:   Tue Feb 10 12:52:28 2009 +0000

    e1000e: Serdes - attempt autoneg when link restored.

Attached is a patch that should address these changes.  Feel free to try the patch on the latest RHEL5.3 source, but I will also add these changes to my test kernels and post here when those are available for download.

Comment 3 jlb 2009-04-13 21:52:15 UTC
Hi, I have the same problem with RedHat 4 and e1000 card module.

my conf
os version: Red Hat Enterprise Linux AS release 4 (Nahant)
kernel: 2.6.9-34.EL

lspci | grep -i ethernet
06:05.0 Ethernet controller: Intel Corp. 82546GB Gigabit Ethernet Controller (rev 03) 
06:05.1 Ethernet controller: Intel Corp. 82546GB Gigabit Ethernet Controller (rev 03) 

the patch applies to my configuration ?
thank you.

Comment 4 Andy Gospodarek 2009-04-23 02:40:06 UTC
My test kernels have been updated to include a patch for this bugzilla.

http://people.redhat.com/agospoda/#rhel5

Please test them and report back your results.  Without immediate
feedback there is a good chance this or any other fix for this driver
will not be included in the upcoming update.

Comment 5 Andy Gospodarek 2009-04-24 00:44:31 UTC
This sounds like a duplicate of bug 492270.  Please reopen if the test kernels in comment #4 or the patches in bug 492270 do not resolve this.

*** This bug has been marked as a duplicate of bug 492270 ***