216799 – sky2 transmitter lockup (Marvell network interface freeze)

Bug 216799 - sky2 transmitter lockup (Marvell network interface freeze)

Summary: sky2 transmitter lockup (Marvell network interface freeze)

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Neil Horman
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	216801 (view as bug list)
Depends On:
Blocks:	430698 461304
TreeView+	depends on / blocked

Reported:	2006-11-22 01:04 UTC by Lodewijk Smit
Modified:	2009-05-13 13:59 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-05-13 13:59:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
upstream-fix (6.06 KB, patch) 2006-12-07 18:48 UTC, Andy Gospodarek	no flags	Details \| Diff
correct patch (2.86 KB, patch) 2006-12-07 18:57 UTC, Andy Gospodarek	no flags	Details \| Diff
new sky2 backport (164.53 KB, patch) 2008-04-16 18:23 UTC, Neil Horman	no flags	Details \| Diff
Show Obsolete (1) View All

Description Lodewijk Smit 2006-11-22 01:04:32 UTC

Description of problem:

sky2 transmitter lockup
All networking stops every few days; reboot is necessary.

Problem and solution is described in: 
http://bugzilla.kernel.org/show_bug.cgi?id=6839

Version-Release number of selected component (if applicable):

We have problems with Marvell 88E8053 gigabit ethernet interface.
This interface is used on all recent ASUS motherboards (e.g. P5W DH Deluxe)
As a consequence, RHEL networking is unreliable on all these systems.

How reproducible:

Difficult, but solution is known :-)

Patch for solving this bug is available in kernel 2.6.18.1  
Can this patch be made available for RHEL 4.4 (backport to kernel 2.6.9)?

Comment 1 Rex Dieter 2006-11-28 12:57:11 UTC

Confirmed (seeing the same problem).

See also:
http://marc.theaimsgroup.com/?l=linux-netdev&m=116227589707824&w=2

Comment 2 Jason Baron 2006-11-30 18:19:45 UTC

*** Bug 216801 has been marked as a duplicate of this bug. ***

Comment 4 Andy Gospodarek 2006-12-07 18:48:52 UTC

Created attachment 143079 [details]
upstream-fix

Upstream commit that resolves this is: 470ea7eba4aaa517533f9b02ac9a104e77264548

Comment 5 Andy Gospodarek 2006-12-07 18:56:40 UTC

Comment on attachment 143079 [details]
upstream-fix

Wrong file....

Comment 6 Andy Gospodarek 2006-12-07 18:57:47 UTC

Created attachment 143081 [details]
correct patch

correct upstream fix

Comment 7 Lodewijk Smit 2006-12-07 22:14:48 UTC

In comment of patch (id=143081) the comment states:
"Only the Yukon-FE chip is Marvell 88E803X (10/100 only) are affected."

We have trouble with a different chipset: Marvell 88E8053 (gigabit)

So I am wondering whether the comment is wrong (has to be updated?) or does 
this patch not solve this particular problem?

Comment 8 Andy Gospodarek 2006-12-07 22:31:08 UTC

I wondered that too, but there doesn't seem to be anything specific to that
hardware in the patch, so I'm adding to some new test kernels.  You should have
something to test on a box in a few hours.

Comment 9 Andy Gospodarek 2006-12-08 16:17:05 UTC

Test kernels with the attached patch are available here:

http://people.redhat.com/agospoda/#rhel4

And feedback would be greatly appreciated.

Comment 10 Lodewijk Smit 2006-12-12 21:20:31 UTC

Do you have a RHEL4 smp i686 kernel for me?
I prepared a test environment, but there is no 32 bits test kernel (only 64 
bits).

Currently running: 2.6.9-42.0.3.ELsmp #1 SMP Fri Oct 6 06:21:39 CDT 2006 i686 
i686 i386 GNU/Linux

Comment 11 Andy Gospodarek 2006-12-12 21:29:35 UTC

Sorry about that.  My gtest.5 builds did not include i686 for some reason.  Here
is a link to an older kernel that should work for you.  Please let me know if
you need any other versions:

http://people.redhat.com/agospoda/bz/216799/

Comment 12 Andy Gospodarek 2006-12-15 19:11:18 UTC

Updated kernels (including the 32-bit builds!) are available here:

http://people.redhat.com/agospoda/#rhel4

Comment 13 Lodewijk Smit 2006-12-19 13:10:54 UTC

Installed the 2.6.9-42.29.EL.gtest.4smp kernel on our server about one week 
ago. Till now, no problems occured.

Last night, I did a stress test between two servers. Only one of these servers 
has the Mavell network interface. I copied 1 Terabyte data from server 1 to 
server 2 while simultaneously copying 1 Terabyte data from server 2 to server 
1. So, heavy load in both directions for a long period. No problems occured 
during this test. 

It is no proof but I am pretty confident that the patch solves the problem. 
Thanks for your effort to provide the patch.

Comment 15 RHEL Program Management 2007-01-09 22:25:24 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 16 Frank hoang 2007-05-08 16:49:00 UTC

Having lots of issues with the sky2 timeout with heavy traffic.
Using Intel® Server Board SE7520BB2

lspci | grep Marvell
04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8050 PCI-E ASF
Gigabit Ethernet Controller (rev 18)

was running the latest kernel 2.6.9-42.ELsmp #1 SMP when this was encountered.
Traffic of over 10Mbps would cause timeout in about 15-30mins
kernel: NETDEV WATCHDOG: eth1: transmit timed out
kernel: sky2 eth1: tx timeout 
kernel: sky2 status report lost? 
Server[4640]: Failed to open log file, log aborted.

I tested the 2.6.9-55.EL.gtest.19smp and the server seemed to be solid, but for 
only 4-5 hours before the getting a slightly similar error messages again.

kernel: NETDEV WATCHDOG: eth1: transmit timed out
kernel: sky2 eth1: tx timeout
kernel: sky2 hardware hung? flushing

Comment 17 Andy Gospodarek 2007-05-08 17:55:37 UTC

Did the hardware recover on its own or did you need to reboot and or
unload/reload the module to make the sky2 device operational again?

Comment 18 Frank hoang 2007-05-08 18:19:50 UTC

Two ways to fix problem
1. unload/reload module
#rmmod sky2 && modprobe sky2
would get the network working right away

2. Rebooting the server would also fixed the issue.

Hardware was not able to recover on its own after leaving it on for 12hrs.

Frank

Comment 19 Andy Gospodarek 2007-05-24 19:09:06 UTC

Someone managed to backport an upstream sky2 driver to RHEL4.  You can find the
srpm here:

http://people.redhat.com/nhorman/rpms/kernel-2.6.9-55.3.EL.bz228733.src.rpm

Comment 22 Neil Horman 2007-06-14 17:48:22 UTC

if someone can confirm that the kernel andy referenced in comment #19 fixes this
issue, I can propose it for 4.6

Comment 23 Greg Bailey 2007-06-14 21:22:52 UTC

I built and installed kernel-smp-2.6.9-55.3.EL.bz228733.i686.rpm on 3 different
servers (identical hardware).

I get the following in /var/log/messages at boot time (on each server):

Jun 14 13:07:34 el4-node1 kernel: sky2: probe of 0000:02:00.0 failed with error
-125173760

Jun 14 13:09:34 el4-node2 kernel: sky2: probe of 0000:02:00.0 failed with error
-125173760

Jun 14 13:07:21 el4-node3 kernel: sky2: probe of 0000:02:00.0 failed with error
-125173760

And the network device doesn't exit--eth0 becomes the e1000 NIC that is normally
eth1.

Comment 24 Neil Horman 2007-06-15 13:12:40 UTC

my bad.  Looks like the probe routine changed function signatures upstream, and
as a result was returning void where an integer was expected.  This link:
http://people.redhat.com/nhorman/rpms/kernel-2.6.9-55.3.EL.bz228733.2.src.rpm
Is a new srpm that holds the fix for that.  Let me know how it goes.  Thanks!

Comment 25 Greg Bailey 2007-06-15 18:28:37 UTC

I built and installed kernel-smp-2.6.9-55.3.EL.bz228733.2.i686.rpm on the same 3
servers as comment #23.

I get a kernel panic when eth0 is initialized:

Unable to handle kernel NULL pointer dereference at virtual address 0000017c
 printing eip:
c0282286
*pde = 35939001
Oops: 0000 [#1]
SMP
Modules linked in: netconsole netdump parport_pc lp parport autofs4 i2c_dev
i2c_core sunrpc dm_mirror dm_mod button battery ac ftdi_sio usbserial uhci_hcd
ehci_hcd hw_random sky2 e1000 ext3 jbd ata_piix libata sd_mod scsi_mod
CPU:    2
EIP:    0060:[<c0282286>]    Not tainted VLI
EFLAGS: 00010292   (2.6.9-55.3.EL.bz228733.2smp)
EIP is at netif_receive_skb+0x19/0x2ec
eax: f4ee2e80   ebx: f5a58800   ecx: 00000608   edx: 00000000
esi: f4ee2e80   edi: 0000003c   ebp: f5a58a40   esp: c03d3f64
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c03d3000 task=f7e40b30)
Stack: f4ee2e80 00000001 f4ee2e80 f5a58800 f4ee2e80 0000003c f5a58a40 f88abb28
       00000282 003c0300 f55a8010 00000003 00000000 00000040 f7f3dd80 00000000
       00000000 40000000 f5a58800 00000040 f7f3dd80 f88ac29e c03d3fd4 00000000
Call Trace:
 [<f88abb28>] sky2_status_intr+0x212/0x455 [sky2]
 [<f88ac29e>] sky2_poll+0x5c/0xbf [sky2]
 [<c0282704>] net_rx_action+0xae/0x160
 [<c0126a14>] __do_softirq+0x4c/0xb1
 [<c010819f>] do_softirq+0x4f/0x56
 =======================
 [<c0107ab4>] do_IRQ+0x1a2/0x1ae
 [<c02d6da8>] common_interrupt+0x18/0x20
 [<c01040e8>] mwait_idle+0x33/0x42
 [<c01040a0>] cpu_idle+0x26/0x3b
Code: 00 00 89 72 28 e8 6b 48 ea ff 53 9d eb 80 5b 5e 5f c3 55 57 56 53 83 ec 0c
89 44 24 08 c7 44 24 04 01 00 00 00 89 04 24 8b 50 18 <83> ba 7c 01 00 00 00 74
6f 31 c0 f6 42 58 20 74 14 0f b7 82 ae

Comment 26 Neil Horman 2007-06-18 15:40:58 UTC

hmm, thats odd.  I've located a sky2 card down here, and I've been pummeling it
for about an hour now with icmp traffic in and out with no problems.  Looking at
the backtrace above, I put your oops in this section of code:
/* Update receiver after 16 frames */
if (++buf_write[le->link] == RX_BUF_WRITE) {
        sky2_put_idx(hw, rxqaddr[le->link],
                     sky2->rx_put);
        buf_write[le->link] = 0;
}
Looking at it, both buf_write and rxqaddr are statically defined and should
never be NULL, and the sky2, le and hw pointers all get dereferenced previously
in the function, indicating that if there were going to be a NULL pointer
exception, it should have happened earlier in the function.  About the only
cause for this oops that I could see would be if le->link were greater than 2
and we overran one of buf_write or rxqaddr (both of which are statically defined
arrays).  Since my card seems to be working with that kernel just fine, do you
think you can add some debug code to sky2_intr_status to see what exactly is
NULL when we oops?  Thanks!

Comment 27 Neil Horman 2007-08-29 14:46:28 UTC

Not there is new kernel on my people page that fixes the above problem.  Please
test it out and report results.

Comment 29 Pete Philips 2007-10-31 15:46:54 UTC

I too am having this problem. I am running a Pentium 4 dual core SMP system with
kernel 2.6.9-55.0.9 SMP. My Marvell chipset is as follows:

01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 15)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 15)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 15)
04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 15)

I have tried both the patch suggested by Andy Gospodarek which I built myself
into a new module and the ready built kernel by Neil Horman at:

http://people.redhat.com/nhorman/rpms/kernel-smp-2.6.9-55.3.EL.bz228733.i686.rpm

I still get one interface or other locking up under heavy load after between 0.5
and 3 hours. Running Neils kernel I get:

Oct 30 16:40:25 app201 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Oct 30 16:40:25 app201 kernel: sky2 eth0: tx timeout
Oct 30 16:40:25 app201 kernel: sky2 hardware hung? flushing
Oct 30 16:41:58 app201 kernel: sky2 eth0: disabling interface

running my own kernel with Andys patch I get much the same:

Oct 31 14:48:37 app201 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Oct 31 14:48:37 app201 kernel: sky2 eth0: tx timeout
Oct 31 14:48:37 app201 kernel: sky2 hardware hung? flushing
Oct 31 14:53:47 app201 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Oct 31 14:53:47 app201 kernel: sky2 eth0: tx timeout
Oct 31 14:53:47 app201 kernel: sky2 status report lost?
Oct 31 14:54:17 app201 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Oct 31 14:54:17 app201 kernel: sky2 eth0: tx timeout
Oct 31 14:54:17 app201 kernel: sky2 hardware hung? flushing
Oct 31 14:59:37 app201 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Oct 31 14:59:37 app201 kernel: sky2 eth0: tx timeout
Oct 31 14:59:37 app201 kernel: sky2 status report lost?
Oct 31 14:59:53 app201 su(pam_unix)[2713]: session closed for user root
Oct 31 15:00:07 app201 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Oct 31 15:00:07 app201 kernel: sky2 eth0: tx timeout
Oct 31 15:00:07 app201 kernel: sky2 hardware hung? flushing

The machine I am using has 8 NIC sockets. 4 are unused. The 4 that are used use
the sky2 driver. eth0 and eth1 are used to form a bridge, br0. The error is
triggered when two machines are sending and receiving large amounts data
over the bridge.

If you need any further details of my setup, please just let me know.

Thanks,


Pete Philips.

Comment 30 Andy Gospodarek 2007-10-31 16:00:23 UTC

Thanks for the feedback, Pete.  It sounds like you are getting an error similar to what others are seeing upstream.  I'll talk with Neil and see if we can get something going soon to resolve this.

Comment 31 Pete Philips 2007-11-01 15:51:11 UTC

Out of interest, has anyone reported this problem with RHEL5?

Comment 32 Pete Philips 2007-11-05 09:42:21 UTC

I can confirm, after conducting a test over the weekend, that RHEL5 with the
kernel-2.6.18-8.el5 kernel is also affected by this bug. The only problem is
there was no log output to confirm this. I set up a machine with two Marvell
NICs in a bridge configuration. I then set up two other machines (client/server)
to continually pass large files between them, over the bridge. I left this going
over the weekend.

On Monday morning the bridge was no longer active and the client/server pair
could no longer communicate. This was reset using the standard rmmod sky ;
modprobe sky2 reset procedure. After that the bridge came back.

Since there was no log output it cannot be conclusively said to be the same
problem but it certainly looks like it.

Comment 33 Pete Philips 2007-11-20 09:44:35 UTC

Further experimentation reveals that this problem only manifests itself if the
server in question is running a bridge and the two ports associated with the
bridge use different speeds (10/100/Gbit) or different duplex settings. If both
interfaces are set identically then the problem does not appear.

Comment 34 Pete Philips 2007-11-20 12:46:45 UTC

I can confirm that this bug is also present in RHEL 5.1 (kernel-2.6.18-53.el5) .
After only 1/2 hour of putting data through my bridge I get this message:

Nov 20 12:27:40 secerno kernel: sky2 eth0: tx timeout
Nov 20 12:27:40 secerno kernel: sky2 eth0: disabling interface
Nov 20 12:27:40 secerno kernel: sky2 eth0: enabling interface
Nov 20 12:27:40 secerno kernel: sky2 eth0: ram buffer 48K
Nov 20 12:27:43 secerno kernel: sky2 eth0: Link is up at 1000 Mbps, full duplex,
flow control rx

Although eth0 is disabled / enabled as stated in the log, the bridge is no
longer functional after this event.


Pete.

Comment 35 Andy Gospodarek 2007-11-20 18:07:34 UTC

Interesting find that this problem manifests itself when the links are set to different speed/duplex.  Have you happenened to notice if the sky2-based hardware is the slower or faster link, or does it even matter?

Comment 36 Pete Philips 2007-11-21 09:27:12 UTC

The machine I am using has four Marvell 88E8053 NICs so both sides of the bridge
are exactly the same NIC.

Comment 37 Andy Gospodarek 2008-02-26 20:11:51 UTC

Pete, is this hardware still problematic for you?

So I'm back to looking at this and as it stands right now there appears to be an
issue with sky2 that becomes apparent when you have a sky2 interface as part of
a bridging interface.  The reason that is significant, is that the bridging
devices can often push more traffic through the box than when using the device
as an endpoint.  I realize that this is a painful issue, so I would like to see
if we can get it resolved.

Now that we have a decent idea of why this might happen I'm going to see if I
can get my hands on some sky2 hardware (I think we have some in the office) and
look at reproducing the issue so I can understand better why it's happening.  I
have a feeling this might still be a problem upstream, but once we get a good
handle on how to reproduce it, then we can start to figure out if it's still
upstream or not.

Comment 38 Pete Philips 2008-03-03 09:41:18 UTC

Hi Andy,

Yes I can confirm that this is still a problem. I can also confirm that this
problem is also present in the latest RHEL5.1 kernel. As you suggest, the
crucial element in reproducing the behaviour seems to be the use of a bridge.

Pete.

Comment 41 Neil Horman 2008-04-16 18:23:39 UTC

Created attachment 302651 [details]
new sky2 backport

I've not tested it yet (no sky2 hardware in hand at the moment), but I've done
this backport of the latest sky2 driver that a co-worker has been using on a
2.6.25 kernel, and he has been unable to reproduce any lockups or crashes with
it.  If you could give it a spin, I'd appreciate it.  Thanks!

Comment 42 Andy Gospodarek 2008-05-17 03:03:16 UTC

My test kernels have been updated to include a patch for this bugzilla.

http://people.redhat.com/agospoda/#rhel4

Please test them and report back your results.

Comment 43 Tom G. Christensen 2008-05-18 10:00:56 UTC

With the backported sky2 driver in the test kernel (smp-2.6.9-70.EL.gtest.44) I
get no network connectivity.
The driver loads and finds the nic but there's just no connection.
This is on a Gigabyte GA-965P-DS3 with onboard Marvell 88E8053 chip.

With the stock RHEL4 kernel (smp-2.6.9-67.0.15.EL) I see tx timeouts that the
driver cannot recover from, reloading the driver fixes it.
Infact I'm getting tx timeouts right now and it's entirely reproduceable. Just
start a couple of torrents (in this case Fedora 9 DVD images) and wait. Within
30 minutes tx timeouts will kill the network. This is with very moderate load
just around 600kb/s down and 40kb/s up.

sky2 eth0: tx timeout
sky2 eth0: transmit ring 269 .. 228 report=269 done=269
sky2 hardware hung? flushing
sky2 eth0: tx timeout
sky2 eth0: transmit ring 268 .. 227 report=269 done=269
sky2 eth0: status report lost?
... repeat last three lines ad nauseum

Comment 45 Pete Philips 2008-08-26 11:42:36 UTC

Andy / Neil,

Sorry but I am unable to perform further testing of this issue with RHEL4 as my organisation have since moved over to RHEL5. I can however confirm that it remains a problem in RHEL5.

Do I need to report it separately as a RHEL5 bug or does this ticket cover it?

Thanks,


Pete.
pete.philips

Comment 46 Andy Gospodarek 2008-08-26 11:55:42 UTC

No problem, Pete.  I actually read yesterday that a firmware update will likely fix the tx timeout problems that have been plaguing sky2 for a while.  Here is a copy of the email that was sent out to address it:

"Subject: [sky2, solved] transmit timeouts and firmware update...

I (and a lot of other users) have been experiencing the frequent sky2
transmit timeout problem [1] (on 88E8053/Yukon2 EC gig hardware); this
is a result of the embedded NIC controller locking up, and I've found
that updating the firmware addresses this issue. I'm still seeing a
previous and different issue [2] from time to time though (silicon
bug?).

Marvell shipping broken firmware is completely unpublicised or
acknowledged, however updated firmware is available through your
motherboard vendor, so all hope it not lost after all...

My 8053/EC is using firmware 2.2 (previously 1.9) - you can check in
DOS with 'yukondg.exe' from
http://www.marvell.com/drivers/files/yukondg_v6.53.4.3.zip .

Thanks,
  Daniel

--- [1]

NETDEV WATCHDOG: eth0 (sky2): transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 20 .. 491 report=20 done=20
...

--- [2]

sky2 eth0: hung mac 1:119 fifo 7 (90:163)
sky2 eth0: receiver hang detected
sky2 eth0: disabling interface
sky2 eth0: enabling interface
--"

Comment 47 Pete Philips 2008-08-27 10:26:48 UTC

Andy,

Thanks for the information. The ZIP file only has a DOS utility which is always a little tricky ;-)

Do you know how to determine the firmware revision from Linux? If I use lcpci I get:

01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15)
04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15)

I wonder if "rev 15" corresponds to firmware 1.5?

Alternatively if I use "ethtool -i eth4" I get

driver: sky2
version: 1.14
firmware-version: N/A
bus-info: 0000:01:00.0

The "N/A" doesn't sound promising.

Regards,


Pete.

Comment 48 Tom G. Christensen 2008-08-27 10:47:55 UTC

Just a note of warning.
In the original thread on linux-net which #46 quoted from there are links to firmware downloads from Gigabyte and Jetway.
That firmware is for the *88E8056 only* and it *will* trash your 88E8053 nic without so much as a warning.
I did just that on my GA-965P-DS3 (rev 1.0) yesterday and am currently waiting on a reply from Gigabyte techsupport.
It's no big loss really since it was basically useless anyway but still unfortunate.

Comment 49 Pete Philips 2008-08-27 10:58:13 UTC

Thanks for the warning, Tom!

Comment 50 Andy Gospodarek 2008-08-27 13:54:04 UTC

Pete, the lspci information is not going to contain the firmware version -- that should specifically be hardware related.  Unfortunately ethtool doesn't seem to show the correct version for your sky2 either.

Hopefully anyone with this problem will be able to get firmware/BIOS updates that will help out.  Both Neil and I tried quite a bit to reproduce this problem with our pci-e sky2 cards and I'm starting to understand why we could not when it seemed pretty easy for most who had on-board sky2 cards.

If you do have success with a firmware/BIOS update for your on-board sky2 cards, please post the model of your motherboard and firmware version that fixed it if you don't mind.  It would be great for us to collect a list so we can help others that have problems.

Thanks!

Comment 51 RHEL Program Management 2008-09-03 13:11:45 UTC

Updating PM score.

Comment 54 RHEL Program Management 2009-03-12 18:53:33 UTC

Since RHEL 4.8 External Beta has begun, and this bugzilla remains 
unresolved, it has been rejected as it is not proposed as exception or 
blocker.

Comment 55 Neil Horman 2009-03-23 10:33:33 UTC

any update to this, Lodewijk?

Comment 56 Lodewijk Smit 2009-03-23 11:39:34 UTC

(In reply to comment #55)
> any update to this, Lodewijk?  

This problem caused serious unpredictable unstable behaviour in production servers (stopping network traffic after a few days) in November 2006. I bought new Intel network cards almost immediately, because I really needed stable servers on a short term. Looking back, that was not a bad choice, as this issue still exists in March 2009. So, no comments from my side except that I am disappointed in Marvell. People should avoid these Marvell network interfaces as they are not well supported for Linux.

Comment 57 Neil Horman 2009-03-23 14:28:01 UTC

I assume by that you mean that using Andys test kernels, the problem still exists, correct?

Comment 58 Neil Horman 2009-04-17 10:54:17 UTC

ping, any update here?

Comment 59 Neil Horman 2009-05-13 13:59:49 UTC

closing due to inactivity.  No update in 2 months.

Note You need to log in before you can comment on or make changes to this bug.