Bug 186397
Summary: | Problem with the sky2.ko network driver (Marvell GigE card driver) | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Ariel Biener <ariel> |
Component: | kernel | Assignee: | John W. Linville <linville> |
Status: | CLOSED CANTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.0 | CC: | adodson, frank.hoang, hoover, jbaron, k.georgiou, rhbugs |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-10-12 15:08:48 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ariel Biener
2006-03-23 11:40:35 UTC
Test kernels w/ very late version of sky2 are available here: http://people.redhat.com/linville/kernels/rhel4/ Please give them a try and post the results here...thanks! Hi John, While I cannot run a test kernel on a production machine, I can enclose the answer from the `sky2' developer/maintainer (I contacted him directly as well). He offered to use the latest 1.1 sky2 version, and sent me both sky2.h and sky2.c. What version are you using in the 2.6.9-34.6smp kernel ? See Stephen Hemminger answer below. --Ariel Date: Thu, 23 Mar 2006 09:24:04 -0800 From: Stephen Hemminger <shemminger> To: Ariel Biener <ariel.ac.il> Subject: Re: Marvell Yukon 2 Gigabit Ethernet driver, version = 0.13 DD105319C547861EB68046F Message-ID: <20060323092404.47a65145> In-Reply-To: <200603231340.23302.ariel.ac.il> References: <200603231340.23302.ariel.ac.il> X-Mailer: Sylpheed-Claws 2.0.0 (GTK+ 2.8.6; i486-pc-linux-gnu) Mime-Version: 1.0 I would recommend you take the latest 1.1 version and try it. There are lots of race issues resolved after I finally got the hardware documentation. Nothing is perfect, but this version is way more stable. The stock RHEL4 kernel is currently at 0.13. The test kernels at the location in comment 1 are using 1.1. I would suggest you at least try the test kernels. If the driver you have now is locking-up, how much worse could it be? :-) Well, actually, I switched back to using the sk98lin driver from Intel, which works fine, however, a stock driver is of course alot more preferable. Any idea when you expect the next kernel upgrade release ? Regardless, since my whole bug report was meant to help myself and others who may encounter this, I will install a test kernel for a week, and report back whether this fixes it or not. --Ariel Cool...let me know how it goes! BTW, next kernel update won't be until May/June IIRC... I am also testing this, as I ran into this problem with the NICs on the PenguinComputing Xeon blades. I'll let you know how things look once the machines have been run under load for a while. I'm afraid there are still issues. I got the following on one of the nodes. I haven't put any load on them yet, so I may have more problems when they are loaded. Once this got into this state it was off the network till I rebooted. Apr 27 09:29:35 compute-105-12 kernel: sky2 v1.1 addr 0xf6a00000 irq 169 Yukon-XL (0xb3) rev 1 Apr 27 09:29:35 compute-105-12 kernel: divert: allocating divert_blk for eth0 Apr 27 09:29:35 compute-105-12 kernel: sky2 eth0: addr 00:a0:d1:e4:66:0d Apr 27 09:29:35 compute-105-12 kernel: divert: allocating divert_blk for eth1 Apr 27 09:29:35 compute-105-12 kernel: sky2 eth1: addr 00:a0:d1:e4:66:0e Apr 27 09:29:35 compute-105-12 kernel: sky2 eth0: enabling interface Apr 27 09:29:35 compute-105-12 kernel: sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control none Apr 28 18:32:13 compute-105-12 kernel: sky2 eth0: tx timeout Apr 28 18:32:13 compute-105-12 kernel: sky2 eth0: transmit ring 489 .. 449 report=491 done=491 Apr 28 18:32:13 compute-105-12 kernel: sky2 status report lost? Apr 28 18:32:23 compute-105-12 kernel: NETDEV WATCHDOG: eth0: transmit timed outApr 28 18:32:23 compute-105-12 kernel: sky2 eth0: tx timeout Apr 28 18:32:23 compute-105-12 kernel: sky2 eth0: transmit ring 491 .. 451 report=491 done=491 Apr 28 18:32:23 compute-105-12 kernel: sky2 hardware hung? flushing Apr 28 18:40:28 compute-105-12 kernel: NETDEV WATCHDOG: eth0: transmit timed outApr 28 18:40:28 compute-105-12 kernel: sky2 eth0: tx timeout Apr 28 18:40:28 compute-105-12 kernel: sky2 eth0: transmit ring 451 .. 410 report=491 done=491 Apr 28 18:40:28 compute-105-12 kernel: sky2 status report lost? Apr 28 18:41:08 compute-105-12 kernel: NETDEV WATCHDOG: eth0: transmit timed outApr 28 18:41:08 compute-105-12 kernel: sky2 eth0: tx timeout Should I file this upstream in kernel bugzilla also, or do you want to handle that? There are some further upstream changes. Let me get a test kernel together with them to see if it covers this problem. If not, we can go upstream with the problem. OK - but...I either need the SRPM for it so I can rebuild, or else I need the patch from bug 173843 in it also. As you can see in the text, I took your SRPM and added that patch to it. I can't promise to include that patch, but I do always publish SRPMs. Are you going to be using sky2 1.3-rc1? The build was already in progress when Stephen posted that. I have 1.2 in the test kernels here: http://people.redhat.com/linville/kernels/rhel4/ Please give those a try and post the results here...thanks! Test kernels w/ sky2 1.3 now available at the same location... I still see the same problem with kernel 2.6.9-39.EL.jwltest.143smp, which includes version 1.3 of the driver. Hi, I also still see the problem, on RHEL4.3-WS. The interface gets stuck every few days, and a reboot is required since the module is stuck. Linux fireball.tau.ac.il 2.6.9-34.0.2.ELsmp #1 SMP Fri Jun 30 10:33:58 EDT 2006 i686 i686 i386 GNU/Linux --Ariel Test kernels w/ sky2 1.5 are available here: http://people.redhat.com/linville/kernels/rhel4/ Please give them a try and post the results here...thanks! I have the same problem with the 1.5 driver in kernel 2.6.9-42.EL.jwltest.156smp. And, I experience the same problem with the non-SMP kernel in kernel-2.6.9-42.2.EL.jwltest.160.i686.rpm. One of the patches that went into sky2 1.6 has this comment: [PATCH] sky2: status interrupt handling improvement More changes to prevent losing status and causing hangs. The hardware is smarter than I gave it credit for. Clearing the status IRQ causes the status state machine to toggle an IRQ if needed and post any more transmits. Test kernels w/ this and other patches to bring sky2 up-to-date w/ 1.6 are available at the same location as in comment 17. I hate to keep spinning you off to random new versions, but give then 'causing hangs' fix comment above, would you mind giving this new kernel a try? Thanks! Closed due to lack of response. Please reopen when the requested information becomes available...thanks! Sorry for reopening this, I'm having issues w/ the sky2 drivers Having lots of issues with the sky2 timeout with heavy traffic. Using IntelĀ® Server Board SE7520BB2 #lspci | grep Marvell 04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8050 PCI-E ASF Gigabit Ethernet Controller (rev 18) #ethtool -i eth1 driver: sky2 version: 1.1 was running the latest kernel 2.6.9-42.0.10.ELsmp #1 SMP when this was encountered. Traffic of over 10Mbps would cause timeout in about 15-30mins kernel: NETDEV WATCHDOG: eth1: transmit timed out kernel: sky2 eth1: tx timeout kernel: sky2 status report lost? Server[4640]: Failed to open log file, log aborted. fix for it is to reboot server or run #rmmod sky2 && modprobe sky2 to remount the modules. I tested the 2.6.9-55.EL.gtest.19smp from http://people.redhat.com/agospoda/#rhel4 and the server seemed to be solid, but for only for a few hours before the getting a slightly similar error messages again. kernel: NETDEV WATCHDOG: eth1: transmit timed out kernel: sky2 eth1: tx timeout kernel: sky2 hardware hung? flushing current version is 1.6 # ethtool -i eth1 driver: sky2 version: 1.6 firmware-version: N/A bus-info: 0000:04:00.0 |