627926 – [RHEL6.0] e1000e devices fail to initialize interrupts properly

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 627926 - [RHEL6.0] e1000e devices fail to initialize interrupts properly

Summary: [RHEL6.0] e1000e devices fail to initialize interrupts properly

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.0
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Dean Nelson
QA Contact:	Network QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-08-27 12:35 UTC by Dean Nelson
Modified:	2011-05-19 12:38 UTC (History)
CC List:	11 users (show)
Fixed In Version:	kernel-2.6.32-83.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:	496127
Environment:
Last Closed:	2011-05-19 12:38:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0542	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update	2011-05-19 11:58:07 UTC

Description Dean Nelson 2010-08-27 12:35:03 UTC

+++ This bug was initially created as a clone of Bug #496127 +++

Description of problem:

I try to configure bonding with the mode 802.3ad on the RHEL5 system.
The network card is an Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06).

My problem is that the channel comes never up.

I have tried to kernels. 2.6.18-92.el5PAE and 2.6.18-128.1.1.el5PAE

You will see below that the MII Status for the physical interfaces is DOWN.
I think that is the problem.

With the latest kernel, mii-tool is not working at all and is reporting a down state.

Ethtool is working fine.


Version-Release number of selected component (if applicable):

RHEL5 U3
Kernel 2.6.18-92.el5PAE
Kernel 2.6.18-128.1.1.el5PAE


Additional info:

configuration files
-------------------------

[root@hostname ~]# cat /etc/modprobe.conf 
alias eth0 e1000e
alias eth1 e1000e
alias eth2 tg3
alias eth3 tg3
alias bond0 bonding
options bond0 max_bonds=2


[root@hostname ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Intel Corporation 82571EB Gigabit Ethernet Controller
DEVICE=eth0
HWADDR=00:15:17:4B:2E:1C
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
MASTER=bond0
SLAVE=yes


[root@hostname ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0 
DEVICE=bond0
BOOTPROTO=none
ONBOOT=yes
USERCTL=no
TYPE=Ethernet
IPADDR=10.10.10.10
NETMASK=255.255.255.0
NETWORK=10.10.10.0
BROADCAST=10.10.10.255
BONDING_OPTS="mode=4 miimon=100"



Results for 2.6.18-92.el5PAE
----------------------------

[root@hostname ~]# modinfo e1000e | head -n 2
filename:       /lib/modules/2.6.18-92.el5/kernel/drivers/net/e1000e/e1000e.ko
version:        0.2.0


[root@hostname ~]# mii-tool eth0
eth0: negotiated 100baseTx-FD, link ok


[root@hostname ~]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Active Aggregator Info:
	Aggregator ID: 1
	Number of ports: 1
	Actor Key: 0
	Partner Key: 1
	Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eth0
MII Status: down
Link Failure Count: 0
Permanent HW addr: 00:15:17:4b:2e:1c
Aggregator ID: 1


/var/log/messages output when I load the bonding module

Apr 16 19:56:30 hostname kernel: Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)
Apr 16 19:56:30 hostname kernel: bonding: Warning: either miimon or arp_interval and arp_ip_target module parameters must be specified, otherwise bonding will not detect link failures! see bonding.txt for details.
Apr 16 19:56:30 hostname kernel: bonding: bond0: setting mode to 802.3ad (4).
Apr 16 19:56:30 hostname kernel: bonding: bond0: Setting MII monitoring interval to 100.
Apr 16 19:56:30 hostname kernel: bonding: bond0: Adding slave eth0.
Apr 16 19:56:30 hostname kernel: bonding: bond0: enslaving eth0 as a backup interface with a down link.
Apr 16 19:56:38 hostname kernel: bonding: bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond

-------------------------------------------------------------------------------------



Results for 2.6.18-128.1.1.el5PAE
----------------------------

[root@hostname ~]# modinfo e1000e | head -n 2
filename:       /lib/modules/2.6.18-128.1.1.el5PAE/kernel/drivers/net/e1000e/e1000e.ko
version:        0.3.3.3-k4


[root@hostname ~]# mii-tool eth0
SIOCGMIIREG on eth0 failed: Input/output error
eth0: 10 Mbit, half duplex, no link


[root@hostname ~]# ethtool eth0 | grep Link
	Link detected: yes


[root@hostname ~]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Active Aggregator Info:
	Aggregator ID: 1
	Number of ports: 1
	Actor Key: 0
	Partner Key: 1
	Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eth0
MII Status: down
Link Failure Count: 0
Permanent HW addr: 00:15:17:4b:2e:1c
Aggregator ID: 1


/var/log/messages output when I load the bonding module

Apr 16 20:13:49 hostname kernel: Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)
Apr 16 20:13:49 hostname kernel: bonding: Warning: either miimon or arp_interval and arp_ip_target module parameters must be specified, otherwise bonding will not detect link failures! see bonding.txt for details.
Apr 16 20:13:49 hostname kernel: bonding: bond0: setting mode to 802.3ad (4).
Apr 16 20:13:49 hostname kernel: bonding: bond0: Setting MII monitoring interval to 100.
Apr 16 20:13:49 hostname kernel: bonding: bond0: Adding slave eth0.
Apr 16 20:13:49 hostname kernel: eth0: MSI interrupt test failed, using legacy interrupt.
Apr 16 20:13:49 hostname kernel: bonding: bond0: enslaving eth0 as a backup interface with a down link.
Apr 16 20:13:57 hostname kernel: bonding: bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond

-------------------------------------------------------------------------------------

--- Additional comment from jpirko on 2009-04-23 04:31:42 EDT ---

Hi Marco.

Can you please try this on latest upstream kernel downloaded from http://www.kernel.org/ ? It would be helpful to see if the issue appears there too.

Thanks.

--- Additional comment from marco on 2009-04-23 07:16:02 EDT ---

Hi Jiro,

I installed kernel 2.6.29 and the channel comes up.

The e1000e version is 0.3.3.3-k6 (So only k4 changed to k6)
The bonding version is 3.5.0

[root@hostname ~]# uname -r
2.6.29-ms1

[root@hostname ~]# modinfo bonding | head -n 4
filename:       /lib/modules/2.6.29-ms1/kernel/drivers/net/bonding/bonding.ko
author:         Thomas Davis, tadavis and many others
description:    Ethernet Channel Bonding Driver, v3.5.0
version:        3.5.0

[root@hostname ~]# modinfo e1000e | head -n 2
filename:       /lib/modules/2.6.29-ms1/kernel/drivers/net/e1000e/e1000e.ko
version:        0.3.3.3-k6

[root@hostname ~]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
	Aggregator ID: 3
	Number of ports: 2
	Actor Key: 17
	Partner Key: 8
	Partner Mac Address: 00:11:5d:15:95:80

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:15:17:4b:2e:1c
Aggregator ID: 3

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:15:17:4b:2e:1d
Aggregator ID: 3

---------------------------

With the official Redhat Kernel I also tried a newer e1000e driver from Intel. Version 0.5.18.3
It was also not working with this newer driver.

So the problem is maybe the bonding driver? At least in combination with some network cards.
Because other cards working fine.


Thanks
Marco

--- Additional comment from jpirko on 2009-04-23 07:30:11 EDT ---

Marco.

Thanks for fast feedback. Can you please try if this is bonding mode dependent of if this issue appears for example in mode 1.

Thanks.

--- Additional comment from marco on 2009-04-23 09:26:37 EDT ---

Jiro,

I think the mode is independent. I tested it again with mode 1.
If I set it to mode 1 (active/backup) the MII Status is also always "down" for the physical interfaces.

It looks to me that in the official kernel the link status of the NIC can not properly determined.

I know that "mii-tool" is not for the newer network cards, but it shows at least a different behavior between the offical rhel5 kernel and the latest kernel.
ethtool shows always the correct information.

With the official kernel "mii-tool eth0" shows always "no link".
With the latest kernel it shows "link ok" once the bond interface was brought up.

I also noticed with the official kernel in /var/log/messages when I bring up the bond interface the following messages.
Apr 23 14:29:14 hostname kernel: eth0: MSI interrupt test failed, using legacy interrupt.
Apr 23 14:29:14 hostname kernel: eth1: MSI interrupt test failed, using legacy interrupt.


Here are now some results.

----------- official kernel ------------------
This was with mode 1. The NICs have definitely a link.

[root@hostname ~]# uname -r
2.6.18-128.el5PAE

[root@hostname test]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: None
MII Status: down
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: down
Link Failure Count: 0
Permanent HW addr: 00:15:17:4b:2e:1c

Slave Interface: eth1
MII Status: down
Link Failure Count: 0
Permanent HW addr: 00:15:17:4b:2e:1d


[root@hostname test]# mii-tool eth0
SIOCGMIIREG on eth0 failed: Input/output error
eth0: 10 Mbit, half duplex, no link

[root@hostname test]# mii-tool eth1
SIOCGMIIREG on eth1 failed: Input/output error
eth1: 10 Mbit, half duplex, no link
-----------------------------------------------------

-------------- latest kernel ------------------
[root@hostname ~]# uname -r
2.6.29-ms1


[root@hostname ~]# ifup bond0

[root@hostname ~]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
	Aggregator ID: 1
	Number of ports: 2
	Actor Key: 17
	Partner Key: 8
	Partner Mac Address: 00:11:5d:15:95:80

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:15:17:4b:2e:1c
Aggregator ID: 1

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:15:17:4b:2e:1d
Aggregator ID: 1

[root@hostname ~]# mii-tool eth0
SIOCGMIIREG on eth0 failed: Input/output error
eth0: negotiated 100baseTx-FD, link ok

[root@hostname ~]# mii-tool eth1
SIOCGMIIREG on eth1 failed: Input/output error
eth1: negotiated 100baseTx-FD, link ok

[root@hostname ~]# ifdown bond0

[root@hostname ~]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: down
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
bond bond0 has no active aggregator


[root@hostname ~]# mii-tool eth0
SIOCGMIIREG on eth0 failed: Input/output error
eth0: negotiated 100baseTx-FD, link ok

[root@hostname ~]# mii-tool eth1
SIOCGMIIREG on eth1 failed: Input/output error
eth1: negotiated 100baseTx-FD, link ok

-----------------------------------------------

I also tried the latest kernels that I found here. http://people.redhat.com/dzickus/el5/140.el5/i686/
But same issue.

Do you have some other newer testing redhat kernels that I could try?


Marco

--- Additional comment from jpirko on 2009-04-23 12:28:32 EDT ---

Hi Marco.

Ok, this is actually what I thought. Can you please do mii-tool ethX when NICs are not enslaved to bonding interface to be sure this issue has nothing to do with bonding?

Thanks a lot.

--- Additional comment from marco on 2009-04-23 12:44:36 EDT ---

With the official rhel kernel it shows always this. You can bring up/down the bond interface as many times as you want. It's always "no link".

[root@hostname test]# mii-tool eth0
SIOCGMIIREG on eth0 failed: Input/output error
eth0: 10 Mbit, half duplex, no link


With the latest kernel it shows "no link" after startup of the server.
After bringing up bond0 it says "link ok".

It then stays "link ok" for always. It doesn't matter how often I bring down/up bond0.

[root@hostname ~]# mii-tool eth1
SIOCGMIIREG on eth1 failed: Input/output error
eth1: negotiated 100baseTx-FD, link ok


Btw, the line "SIOCGMIIREG on eth1 failed: Input/output error" does not occur with an older kernel.
For example with -92 release.

--- Additional comment from jpirko on 2009-04-23 14:03:55 EDT ---

Marco.

Actually I wanted you to test the behaviour without bonding involved. Only plain eth device and mii-tool. I expect the same results, just wanted to be sure and to rule out the bonding driver bug.

Thanks

--- Additional comment from marco on 2009-04-23 19:18:34 EDT ---

Sorry, then I misunderstood you Jiri.

I reconfigured eth0 to a standalone card with an IP and below are the results.


[root@hostname ~]# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:15:17:4B:2E:1C  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Memory:ea120000-ea140000 

[root@hostname ~]# mii-tool eth0
SIOCGMIIREG on eth0 failed: Input/output error
eth0: 10 Mbit, half duplex, no link

[root@hostname ~]# ifup eth0

[root@hostname ~]# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:15:17:4B:2E:1C  
          inet addr:10.10.10.10  Bcast:10.10.10.255  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Memory:ea120000-ea140000 

[root@hostname ~]# mii-tool eth0
SIOCGMIIREG on eth0 failed: Input/output error
eth0: 10 Mbit, half duplex, no link

[root@hostname ~]# ifdown eth0

[root@hostname ~]# mii-tool eth0
SIOCGMIIREG on eth0 failed: Input/output error
eth0: 10 Mbit, half duplex, no link

--- Additional comment from jpirko on 2009-04-24 00:40:58 EDT ---

Ok Marco, thanks for the confirmation. This is solely e1000e issue. I'll look at this.

--- Additional comment from marco on 2009-04-24 04:44:20 EDT ---

Jiri,

do you think it's solely a e1000e issue? I also tried the e1000e driver directly from intel. Version 0.5.18.3

This newer version has the same behavior. Channel does not come up at all.

Could it also be something else, because it's working in the latest vanilla kernel with the same e1000e version? (0.3.3.3)



Marco

--- Additional comment from marco on 2009-04-24 10:51:51 EDT ---

One more update.

Yesterday where I configured a standalone device I just looked for the link status with mii-tool or ethtool. I have not tried if I have real network connectivity. I tested this today. 

So with the Redhat kernel and the e1000e module that comes with the kernel or with the latest intel version I have no network connectivity.

The link is there, but I can not ping my default gateway. The switch does also not see my MAC address.


Marco

--- Additional comment from marco on 2009-05-02 11:31:15 EDT ---

Jiri,

I found bug https://bugzilla.redhat.com/show_bug.cgi?id=477774.
I think that's the same problem what I noticed on my machine. I have also an IBM x3850 (M1).

After I added pci=nomsi to my kernel line the NICs with the e1000e driver were working fine. Also my bonding device worked fine.

So I guess my bug report is a duplicate for 477774?


Marco

--- Additional comment from jpirko on 2009-05-03 03:01:40 EDT ---

Hi Marco.

It looks like a similar issue. I suggest we wait for the solution and if it will work for you, we would set this as a duplicate.

Thanks

--- Additional comment from marco on 2009-05-03 03:19:18 EDT ---

Jiri,

I have this box where I have the issue still available for testing for about 2 weeks.
After this I need to go into production mode.

So if I can test something, let me know. Not sure how fast Redhat has a solution.


Marco

--- Additional comment from agospoda on 2009-05-08 16:15:22 EDT ---

It sounds like this issue might be quite similar to bug 47774 since you are using the same system.  I would encourage you to try the latest test kernels located here:

http://people.redhat.com/dzickus/el5/

It looks like the last kernel you tried was 2.6.18-140, but there was a change in 2.6.18-141 that may resolve this issue.  This may be a different issue, but certainly seems like it could be a duplicate of bug 492270, so some testing would be helpful.

--- Additional comment from marco on 2009-05-12 06:38:59 EDT ---

I installed kernel-2.6.18-144.el5.x86_64 and started the system without the pci=nomsi option.

The intel NICs with the e1000e driver do still not work. Once I bring up eth0 for example I still have the following message.

------
May 12 12:15:52 hostname: eth0: MSI interrupt test failed, using legacy interrupt.
------

It's working fine with the kernel option pci=nomsi

--- Additional comment from agospoda on 2009-05-12 11:20:15 EDT ---

Good to know that -144 doesn't work (sorry it doesn't!).

Though you get the message that indicates that the system will switch to legacy interrupts for the 82571EB, did you check to make sure it did in /proc/interrupts?  It would be helpful you could paste the contents of /proc/interrupts.

Looking at the driver there seems to be a case where the test will fail, but the driver will still try and use MSI.  This patch seems like one way to resolve this (if my presumption is correct).

--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -1440,7 +1440,7 @@ void e1000e_set_interrupt_capability(struct e1000_adapter *adapter)
                }
                /* Fall through */
        case E1000E_INT_MODE_LEGACY:
-               /* Don't do anything; this is the system default */
+               adapter->flags &= ~FLAG_MSI_ENABLED & ~FLAG_HAS_MSIX;
                break;
        }

--- Additional comment from marco on 2009-05-12 11:52:38 EDT ---

I attach 3 files.

1 with pci=nomsi
2 without the kernel option. One file before eth0 and eth1 are up and one after eth0/1 are up.

--- Additional comment from marco on 2009-05-12 11:54:00 EDT ---



--- Additional comment from marco on 2009-05-12 11:55:16 EDT ---



--- Additional comment from marco on 2009-05-12 11:55:46 EDT ---



--- Additional comment from agospoda on 2009-05-12 13:34:21 EDT ---

I did some more snooping around and it looks like the the drive code as it exists right now is fine.  MSI should be getting properly disabled in the driver.

IIRC there was a pci quirk added to address this sort of thing, that probably helps to explain why the problem does not exist with 2.6.29, but does with RHEL5's native driver and with Intel's latest driver from sourceforge.  I'll see what I can dig up.

--- Additional comment from f_a_f12001 on 2010-06-14 03:30:01 EDT ---

Dear Guys,
         I have the same error on a Fedora 9 HP workstation with the same driver for NIC, This host has a strange behavior with networking, Every may be month or more it stops sending or receiving on eth0 although the whole systems is working normally, And the networking is back normally after I reboot the machine, It gave me this message after "dmesg |grep eth0"
SIOCGMIIREG on eth0 failed: Input/output error                        
0000:00:19.0: eth0: (PCI Express:2.5GB/s:Width x1) 00:0f:fe:4d:35:0e
0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection
0000:00:19.0: eth0: MAC: 5, PHY: 6, PBA No: 1002ff-0ff
ADDRCONF(NETDEV_UP): eth0: link is not ready
0000:00:19.0: eth0: Link is Up 100 Mbps Full Duplex, Flow Control: None
0000:00:19.0: eth0: 10/100 speed: disabling TSO
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: no IPv6 routers present
0000:00:19.0: eth0: Link is Down
0000:00:19.0: eth0: Link is Up 100 Mbps Full Duplex, Flow Control: None
0000:00:19.0: eth0: 10/100 speed: disabling TSO
0000:00:19.0: eth0: Link is Down
0000:00:19.0: eth0: Link is Up 100 Mbps Full Duplex, Flow Control: None

--- Additional comment from agospoda on 2010-06-29 16:40:01 EDT ---

Marco, I think there is a good chance you are hitting a problem that Dean Nelson just fixed with the following patch:

http://patchwork.ozlabs.org/patch/56224/

He discovered that some systems that failed to initialize MSI would set registers incorrectly when trying to enable legacy interrupts.

I will ask Dean to take a look at this and hopefully he can share the bugzilla # for the RHEL5 bug that plans to include the patch linked above.

--- Additional comment from agospoda on 2010-06-29 17:04:39 EDT ---

f_a_f12001, that seems like an odd problem.  You should check the logs in your switch at the same time to make sure the switch sees the link going down too.  If it does, this doesn't appear to be the system losing contact with the NIC hardware, but a serious problem with the device or switch.

Unfortunately this bug is to address some RHEL5 issues, not Fedora 9 bugs.  I cannot really help you much as F9 is not maintained anymore.

If you still have this problem when running F13 (or a kernel for F13) you can open a new bug to address the problem.

--- Additional comment from dnelson on 2010-06-29 18:39:18 EDT ---

(In reply to comment #27)
> Marco, I think there is a good chance you are hitting a problem that Dean
> Nelson just fixed with the following patch:
> 
> http://patchwork.ozlabs.org/patch/56224/
> 
> He discovered that some systems that failed to initialize MSI would set
> registers incorrectly when trying to enable legacy interrupts.
> 
> I will ask Dean to take a look at this and hopefully he can share the bugzilla
> # for the RHEL5 bug that plans to include the patch linked above.    

Bug 477774 is the RHEL5 bug I'm working on. And from comment #12 I see Marco is already familiar with it.

And I'd agree that both BZs look to be dealing with the same problem. I'll put together a RHEL5.6 system with the patch that fixes the problem. And then, Marco, you can prove whether they are.

Dean

--- Additional comment from dnelson on 2010-06-29 23:50:22 EDT ---

As promised in comment #29, I've updated my test kernel rpms to include a patch that in theory fixes the problem reported in this BZ. The patch and rpms can be found under the RHEL5 Test Packages at:

http://people.redhat.com/dnelson/#rhel5

Please test, and if you do, please report back whether the problem has been resolved or not.

Thanks,
Dean

--- Additional comment from marco on 2010-06-30 03:34:44 EDT ---

Since my servers are already in production I can't easily test it.
But I will talk with my application owners so see if we can schedule some down or maintenance time.

If it's possible to do that, I will try that kernel out and will let you know if it's working.

Thanks
Marco

--- Additional comment from agospoda on 2010-06-30 09:30:02 EDT ---

Marco, based on feedback in bug 477774, I feel pretty confident that the patch from Dean will fix it your problem.

--- Additional comment from dnelson on 2010-06-30 10:59:17 EDT ---



--- Additional comment from marco on 2010-06-30 11:33:04 EDT ---

Andy,

thanks, I saw that comment too.
Still have no final answer from my application owners. But most likely they don't want to test it on the production box.

But once the patch is in a new updated RHEL5 kernel I will schedule an update.


Thanks again Dean and Andy.

--- Additional comment from dnelson on 2010-06-30 13:38:14 EDT ---



--- Additional comment from jarod on 2010-07-12 11:43:21 EDT ---

in kernel-2.6.18-206.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 2 RHEL Program Management 2010-10-05 01:51:02 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 3 Aristeu Rozanski 2010-11-17 19:44:13 UTC

Patch(es) available on kernel-2.6.32-83.el6

Comment 7 errata-xmlrpc 2011-05-19 12:38:01 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html

Note You need to log in before you can comment on or make changes to this bug.