Bug 228733 - sky2 module (ver 1.6) kernel panic
sky2 module (ver 1.6) kernel panic
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.4
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Neil Horman
Brian Brock
http://bugzilla.kernel.org/show_bug.c...
:
Depends On:
Blocks: 430698 461304
  Show dependency treegraph
 
Reported: 2007-02-14 13:22 EST by Greg Bailey
Modified: 2009-03-23 09:37 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-03-23 07:02:19 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Console output of panic on pv-hab-number120 (1.03 KB, text/plain)
2007-02-14 13:22 EST, Greg Bailey
no flags Details
Console output of panic on pv-hab-number100 (1.18 KB, text/plain)
2007-02-14 13:24 EST, Greg Bailey
no flags Details
Console output of panic on it7000cp (1.17 KB, text/plain)
2007-03-01 12:46 EST, Greg Bailey
no flags Details
sky2 1.13 build failure output (3.86 KB, text/plain)
2007-03-06 18:36 EST, Greg Bailey
no flags Details
test patch in kernel kernel-smp-2.6.9-49.EL.bz228733.i686.rpm (3.28 KB, patch)
2007-03-08 20:03 EST, Neil Horman
no flags Details | Diff
Output of dmesg with tx timeout messages (118.24 KB, text/plain)
2007-03-26 19:36 EDT, Greg Bailey
no flags Details
Kernel Panic from 2.6.9-55.3.EL.bz228733.2smp (1.75 KB, text/plain)
2007-08-07 13:17 EDT, Greg Bailey
no flags Details
patch to debug sky2 oops (863 bytes, patch)
2007-08-07 14:39 EDT, Neil Horman
no flags Details | Diff
patch to fix null dev pointer (1.65 KB, patch)
2007-08-07 16:03 EDT, Neil Horman
no flags Details | Diff
sky2.c version related to comment 47 (98.89 KB, text/plain)
2007-08-08 14:02 EDT, Greg Bailey
no flags Details
correct patch for sky2 from cvs (100.77 KB, patch)
2007-08-14 08:47 EDT, Neil Horman
no flags Details | Diff
new sky2 backport (164.53 KB, patch)
2008-04-16 14:24 EDT, Neil Horman
no flags Details | Diff

  None (edit)
Description Greg Bailey 2007-02-14 13:22:54 EST
Description of problem:

We are running a "jbaron test kernel" 2.6.9-42.28.ELsmp (i686) because the
latest errata kernel (2.6.9-42.0.8.ELsmp) has an old sky2 driver which panics
the server when networking is restarted.  (Ref: support ticket #1095940).

We've had 2 occurrences of kernel panics related to the "sky2" module.  In both
instances, the kernel was 2.6.9-42.28.ELsmp (sky 1.6).

The hardware is:
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8050 PCI-E ASF
Gigabit Ethernet Controller (rev 17)

In both cases, the console output showed:

 [<f88d8b63>] sky2_status_intr+0x2d5/0x39f [sky2]
 [<f88d90a2>] sky2_poll+0xd8/0x13a [sky2]
 [<c028168b>] net_rx_action+0xae/0x160
 [<c0126a08>] __do_softirq+0x4c/0xb1
 [<c010819f>] do_softirq+0x4f/0x56

I'm not sure how to configure netdump as it would rely on the network interface
being available.  In /var/log/messages, I see the following line before the panic:

Feb 13 18:58:26 pv-hab-number100 kernel: sky2 eth0: rx error, status 0x7ffc0001
length 112

Version-Release number of selected component (if applicable):

kernel 2.6.9-42.28.ELsmp (i686)

How reproducible:

Not very.  Appears to happen at random.

Steps to Reproduce:
1.  Not known
2.
3.
  
Actual results:
Panic

Expected results:
No panic

Additional info:
Comment 1 Greg Bailey 2007-02-14 13:22:54 EST
Created attachment 148077 [details]
Console output of panic on pv-hab-number120
Comment 2 Greg Bailey 2007-02-14 13:24:06 EST
Created attachment 148078 [details]
Console output of panic on pv-hab-number100
Comment 3 Greg Bailey 2007-03-01 12:44:56 EST
This happened just again on a server called "it7000cp".  I will attach the
console output as a separate attachment.  /var/log/messages had the following
right before the panic:

Mar  1 09:06:25 it7000cp kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar  1 09:06:25 it7000cp kernel: sky2 eth0: tx timeout
Mar  1 09:06:25 it7000cp kernel: sky2 hardware hung? flushing
Mar  1 09:11:05 it7000cp kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar  1 09:11:05 it7000cp kernel: sky2 eth0: tx timeout
Mar  1 09:11:05 it7000cp kernel: sky2 status report lost?
Mar  1 09:11:45 it7000cp kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar  1 09:11:45 it7000cp kernel: sky2 eth0: tx timeout
Mar  1 09:11:45 it7000cp kernel: sky2 hardware hung? flushing
Mar  1 09:16:01 it7000cp su(pam_unix)[26196]: session opened for user ssadmin by
(uid=0)
Mar  1 09:16:02 it7000cp su(pam_unix)[26196]: session closed for user ssadmin
Mar  1 09:20:25 it7000cp kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar  1 09:20:25 it7000cp kernel: sky2 eth0: tx timeout
Mar  1 09:20:25 it7000cp kernel: sky2 status report lost?
Mar  1 09:21:20 it7000cp kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar  1 09:21:20 it7000cp kernel: sky2 eth0: tx timeout
Mar  1 09:21:20 it7000cp kernel: sky2 hardware hung? flushing
Mar  1 09:31:01 it7000cp su(pam_unix)[28173]: session opened for user ssadmin by
(uid=0)
Mar  1 09:31:02 it7000cp su(pam_unix)[28173]: session closed for user ssadmin
Mar  1 09:46:01 it7000cp su(pam_unix)[29903]: session opened for user ssadmin by
(uid=0)
Mar  1 09:46:02 it7000cp su(pam_unix)[29903]: session closed for user ssadmin
Mar  1 09:50:50 it7000cp kernel: sky2 eth0: rx error, status 0x7ffc0001 length 96
Mar  1 09:50:50 it7000cp kernel: KERNEL: assertion
(!atomic_read(&sk->sk_wmem_alloc)) failed at net/unix/af_unix.c (333)
Mar  1 09:50:50 it7000cp kernel: KERNEL: assertion (sk_unhashed(sk)) failed at
net/unix/af_unix.c (334)
Mar  1 09:50:50 it7000cp kernel: KERNEL: assertion (!sk->sk_socket) failed at
net/unix/af_unix.c (335)
Mar  1 09:50:50 it7000cp kernel: Attempt to release alive unix socket: d0fd0100
Mar  1 09:50:51 it7000cp kernel: KERNEL: assertion
(!atomic_read(&sk->sk_wmem_alloc)) failed at net/unix/af_unix.c (333)
Mar  1 09:50:51 it7000cp kernel: KERNEL: assertion (sk_unhashed(sk)) failed at
net/unix/af_unix.c (334)
Mar  1 09:50:51 it7000cp kernel: KERNEL: assertion (!sk->sk_socket) failed at
net/unix/af_unix.c (335)
Mar  1 09:50:51 it7000cp kernel: Attempt to release alive unix socket: d0fd0100
Mar  1 09:50:51 it7000cp kernel: KERNEL: assertion
(!atomic_read(&sk->sk_wmem_alloc)) failed at net/unix/af_unix.c (333)
Mar  1 09:50:51 it7000cp kernel: KERNEL: assertion (sk_unhashed(sk)) failed at
net/unix/af_unix.c (334)
Mar  1 09:50:51 it7000cp kernel: KERNEL: assertion (!sk->sk_socket) failed at
net/unix/af_unix.c (335)
Mar  1 09:50:51 it7000cp kernel: Attempt to release alive unix socket: d0fd0100
Mar  1 09:50:51 it7000cp kernel: KERNEL: assertion
(!atomic_read(&sk->sk_wmem_alloc)) failed at net/unix/af_unix.c (333)
Mar  1 09:50:51 it7000cp kernel: KERNEL: assertion (sk_unhashed(sk)) failed at
net/unix/af_unix.c (334)
Mar  1 09:50:51 it7000cp kernel: KERNEL: assertion (!sk->sk_socket) failed at
net/unix/af_unix.c (335)
Mar  1 09:50:51 it7000cp kernel: Attempt to release alive unix socket: d0fd0100
Mar  1 09:50:52 it7000cp kernel: KERNEL: assertion
(!atomic_read(&sk->sk_wmem_alloc)) failed at net/unix/af_unix.c (333)
Mar  1 09:50:52 it7000cp kernel: KERNEL: assertion (sk_unhashed(sk)) failed at
net/unix/af_unix.c (334)
Mar  1 09:50:52 it7000cp kernel: KERNEL: assertion (!sk->sk_socket) failed at
net/unix/af_unix.c (335)
Mar  1 09:50:52 it7000cp kernel: Attempt to release alive unix socket: d0fd0100
Mar  1 09:50:52 it7000cp kernel: KERNEL: assertion
(!atomic_read(&sk->sk_wmem_alloc)) failed at net/unix/af_unix.c (333)
Mar  1 09:50:52 it7000cp kernel: KERNEL: assertion (sk_unhashed(sk)) failed at
net/unix/af_unix.c (334)
Mar  1 09:50:52 it7000cp kernel: KERNEL: assertion (!sk->sk_socket) failed at
net/unix/af_unix.c (335)
Mar  1 09:50:52 it7000cp kernel: Attempt to release alive unix socket: d0fd0100
Mar  1 09:50:52 it7000cp kernel: KERNEL: assertion
(!atomic_read(&sk->sk_wmem_alloc)) failed at net/unix/af_unix.c (333)
Mar  1 09:50:52 it7000cp kernel: KERNEL: assertion (sk_unhashed(sk)) failed at
net/unix/af_unix.c (334)
Mar  1 09:50:52 it7000cp kernel: KERNEL: assertion (!sk->sk_socket) failed at
net/unix/af_unix.c (335)
Mar  1 09:50:52 it7000cp kernel: Attempt to release alive unix socket: d0fd0100
Mar  1 09:50:54 it7000cp kernel: Warning: kfree_skb passed an skb still on a
list (from c027bf26).
Mar  1 09:50:54 it7000cp kernel: ------------[ cut here ]------------
Mar  1 09:50:54 it7000cp kernel: kernel BUG at net/core/skbuff.c:293!
Mar  1 09:50:54 it7000cp kernel: invalid operand: 0000 [#1]
Comment 4 Greg Bailey 2007-03-01 12:46:09 EST
Created attachment 149028 [details]
Console output of panic on it7000cp
Comment 5 Neil Horman 2007-03-01 12:57:37 EST
can you please try booting your systems with pci=nomsi on the kernel commandline
and see if the problem still occurs?
Comment 6 Chuck Ebbert 2007-03-06 10:53:05 EST
A bunch of sky2 fixes just went into 2.6.19.6
Comment 7 Greg Bailey 2007-03-06 11:38:06 EST
I tried adding this parameter on one of the servers, and things appear OK with
the Marvell NIC.

The problem is, I have >100 servers with this kernel build, and so far these
panics have only happened on 1 server at a time, and never on the same server
twice (so far).  They're also (fortunately or unfortunately) pretty rare, and I
don't have a way to trigger the panic except to wait for the phone call... :-(
Comment 8 Neil Horman 2007-03-06 12:48:35 EST
Ok, so we're not going to get valid results out of pci=nomsi in your
environment, then.  Can you roll out a test kernel to your systems.  I think
this commit:

819067916d785cac0369b8d6e187b4a83fd17785

from linus' tree is likely the problem your seeing.  I'll build a RHEL4 kernel
for you to test with.
Comment 9 Neil Horman 2007-03-06 14:34:23 EST
fwiw, I'm not sure its worth taking the spot patch for this.  I'm currently
working on just taking that patch, but if you have the time, it would probably
be worth your while to compile the latest RHEL4 kernel, and just substitue the
lastest upstream sky2 drvier for testing purposes.  Is it possible for you to
give that a try, or do you need to wait for me to get this backported?
Comment 10 Greg Bailey 2007-03-06 15:14:54 EST
Which latest RHEL4 kernel should I use?  The 2.6.9-42.0.10.EL one for U4 or the
2.6.9-48.EL (or jbaron's 2.6.9-49.EL) for U5?  Which sky2 version should I
attempt to merge--the 1.13 in Linus' tree?  Any special considerations
shoehorning the latest 2.6.20 sky2 driver into 2.6.9?  I'll have a go at it
based on your version information...
Comment 11 Neil Horman 2007-03-06 15:40:22 EST
I would suggest just using the latest RHEL4 kernel available from RHN.  Theres
not much point in using anything else, since thats what any fix will be applied
to anyway.  The only reason not to use the latest RHEL4 kernel is if your
environment has a need to not move forward in kernel versions.  If so, follow
you internal guidelines and use whatever is mandated (since the upstream sky
driver should be able to largely be a drop in replacement to any RHEL4 kernel).
 As for which sky2 driver, just grab the latest from linus' git tree, since the
requisite changes referenced above are all in there.

Or just let me know that its more hassle than its worth for you, and I'll let
you know when I have a kernel built here :)
Comment 12 Greg Bailey 2007-03-06 18:34:48 EST
OK, I grabbed sky2.c and sky2.h from 2.6.21-rc2 as that seemed to have the
latest versions.  I added the "#include sky2_compat.h" line to sky2.c (I assume
that's required).

I get build failures when attempting to compile sky2.ko.  I've attached the
build failure as an attachment.
Comment 13 Greg Bailey 2007-03-06 18:36:29 EST
Created attachment 149410 [details]
sky2 1.13 build failure output
Comment 14 Neil Horman 2007-03-08 09:58:56 EST
After looking at it alittle more closely, and discussing it with some others
around here the consensus is that a backport of the specific patch that I think
is requred for this fix would be preferable to a completely update, given the
fact that RHEL4 is getting on in years.  I've uploaded a test kernel for you to:

http://people.redhat.com/nhorman

Please give it a try and let me know the results.  Thanks!
Comment 15 Greg Bailey 2007-03-08 15:24:52 EST
I have upgraded to kernel-smp-2.6.9-49.EL.bz228733 on a few servers and have not
encountered any regressions so far...

Can you attach the patch file you used for this?  Is it the same as the above
referenced commit from Linus' tree or did you have to modify it?  (Or, I'll pick
it from the kernel.src.rpm if that's available somewhere...

Neil, can you comment on whether you think bugzilla #216799 might be related to
this issue?  I've also encountered the same symptoms as that one and need to
know if I should pull in another patch, or if you think the patch you supplied
also addresses #216799.  They both seem to talk about transmit timeout stuff. 
Thanks!
Comment 16 Neil Horman 2007-03-08 20:03:02 EST
Created attachment 149663 [details]
test patch in kernel kernel-smp-2.6.9-49.EL.bz228733.i686.rpm

Heres the patch.  Its the same patch I referenced previously, plus another
patch that hit the same code, which I included to make the application easier. 
It basically schedules interface restarts to occur in process context to avoid
the tx hang that was occuring.
Comment 17 Neil Horman 2007-03-08 20:06:23 EST
Please let me know when you are confident that this has fixed your problem.  Thanks!
Comment 18 Greg Bailey 2007-03-23 16:13:11 EDT
No panics thus far...

Neil, do you have the .src.rpm for this kernel?  I no longer have access to the
2.6.9-49.EL source from which it was derived as it looks like jbaron is up to
2.6.9-51.EL now...  are the old ones archived away anywhere?
Comment 19 Neil Horman 2007-03-26 07:13:29 EDT
I don't have the srpm anymore (expunged by the build system here), but its all
taged in CVS so I can rebuild it quickly and post it for you.  I assume that
since you are asking for the srpm, you are reasonably confident that this has
fixed your problem?
Comment 20 RHEL Product and Program Management 2007-03-26 07:24:44 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 21 Neil Horman 2007-03-26 09:49:39 EDT
Ok, srpm uploaded to 
http://people.redhat.com/nhorman
the patch attached above is available in the srpm as linux-kernel-test.patch.

Let me know if you are comfortable saying this patch fixes your problems, and
I'll propose it for 4.6 inclusion.
Comment 22 Greg Bailey 2007-03-26 19:33:37 EDT
I received the following over and over in dmesg output and lost network
connectivity:

NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 10 .. 482 report=11 done=11
sky2 status report lost?

This sounds like the same problem (or similar, at least) as bugzilla #216799
(see comment 15).  Does your bz228733 kernel include that patch also?

I was able to restart the interface with ifdown, rmmod, modprobe, ifup sequence.
 I will attach the output of "dmesg" as an attachment.
Comment 23 Greg Bailey 2007-03-26 19:36:08 EDT
Created attachment 150978 [details]
Output of dmesg with tx timeout messages

This dmesg output is from a 2.6.9-49.EL.bz228733 kernel.
Comment 24 Neil Horman 2007-03-27 07:39:24 EDT
Yes, the errors you describe seem to relate to bz 216799, and no, my kernel does
not contain that patch.  If you'd like to incorporate it to the provided src
rpm, feel free.  It should apply fairly cleanly.  Please let me know when you
are comfortable with this fix.  Thanks!
Comment 25 Greg Bailey 2007-04-20 14:53:37 EDT
I have built a kernel "2.6.9-51.1.INTL" which includes the patch referenced
above in comment #16, and the patch from bugzilla #216799.

This kernel has been installed on a few dozen servers.  Although I have yet to
see a kernel panic or hung network interface (which statistically would have
probably happened by now), I'm still seeing various combinations of the
following in dmesg output:

icmp v4 hw csum failure
udp v4 hw csum failure.
hw tcp v4 csum failed

The interface appears to recover from these timeouts (I see the messages saying
that it is disabled and then enabled):

Apr 20 05:49:08 stun1 kernel: printk: 2 messages suppressed.
Apr 20 05:49:38 stun1 kernel: printk: 1 messages suppressed.
Apr 20 05:50:28 stun1 last message repeated 3 times
Apr 20 05:51:29 stun1 last message repeated 3 times
Apr 20 08:22:52 stun1 login(pam_unix)[3772]: bad username [  ]
Apr 20 08:22:52 stun1 login[3772]: FAILED LOGIN 1 FROM (null) FOR   ,
Authentication failure
Apr 20 08:22:52 stun1 login(pam_unix)[3772]: bad username []
Apr 20 08:22:52 stun1 login[3772]: FAILED LOGIN 2 FROM (null) FOR ,
Authentication failure
Apr 20 08:22:52 stun1 login(pam_unix)[3772]: bad username []
Apr 20 08:22:52 stun1 login[3772]: FAILED LOGIN 3 FROM (null) FOR ,
Authentication failure
Apr 20 08:22:53 stun1 login(pam_unix)[3772]: bad username []
Apr 20 08:22:53 stun1 login[3772]: FAILED LOGIN SESSION FROM (null) FOR ,
Authentication failure
Apr 20 08:28:44 stun1 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Apr 20 08:28:44 stun1 kernel: sky2 eth0: tx timeout
Apr 20 08:28:44 stun1 kernel: sky2 eth0: disabling interface
Apr 20 08:28:44 stun1 kernel: sky2 eth0: enabling interface
Apr 20 08:28:46 stun1 kernel: sky2 eth0: Link is up at 100 Mbps, full duplex,
flow control none
Apr 20 08:29:00 stun1 kernel: printk: 12 messages suppressed.

The failed login messages seem suspicious in that there's missing information;
I'm not sure if the timeout happened during an attempted login from the Internet.

Do I need:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=874183072de73a36a958585e3186639fd2634701

Comment 26 Neil Horman 2007-04-20 15:20:16 EDT
It certainly seems like you do.  Are you comfortable building that into the
current kernel you are working with?   If not let me know, and I can build it
for you.  Also, just FYI, One of the other engineers here mentioned to me that a
wholesale upgrade of sky2 in 4.6 might be prudent as he has a number of sky2
fixes outstanding, so 4.6 might see all these patches and more integrated as well.

Let me know if the above git commit fixes the remainder of your problems.  Thanks!
Comment 27 Greg Bailey 2007-04-20 15:46:47 EDT
I can probably pull that patch in and make a new kernel; the problem is the
turnaround time in getting the new kernel tested before it goes on our
production hardware.

Regarding the wholesale upgrade of sky2, that would be my preference as well
(see comments #12 and #13 above), but I'd need help getting the latest sky2
driver (just updated yesterday!) to build in the 2.6.9 tree.

How much would the latest sky2.h and sky2.c from Linus' tree need to be modified
to build in the RHEL4 kernel?
Comment 28 John W. Linville 2007-04-20 16:05:38 EDT
FWIW, wholesale update of sky2 is considered for 5.1, not 4.6.

YMMV, but the backport of current sky2 sources to rhel4 might be a bit painful 
(but certainly not impossible).
Comment 29 Neil Horman 2007-04-20 16:19:21 EDT
my bad, I thought they were slated for 4.6.  Well, that being what it is, the
wholesale backport was a bit harry IIRC for 4.6, the sky2.h driver relied on
some kernel infrastructure that didn't exist in 2.6.9 which we would also need
to backport.  Anywho, lets move forward with with the cherry picked patch for
now. Let me know how the testing goes, while I take another crack at the
wholesale backport.  If I get it together, I'll post it here for you to try.
Comment 30 Greg Bailey 2007-05-07 16:29:39 EDT
The combination of the #216799 and #228733 patches is going well.  No kernel
panics to report so far.  Please propose these for Update 6.

I've yet to include the patch referenced in Comment #25, as it is of lower
priority.  I'd still be interested in testing a current version of the sky2
driver backported for 2.6.9 however if such a thing materialized...
Comment 31 Neil Horman 2007-05-08 07:01:00 EDT
Unfotunately, I've not attempted a wholesale backport yet, sorry. its been
shoved down my todo list.  As the git patch you referenced seems to be working
for you however, I'll coordinate with our other networking engnineers and make
sure this gets in for you.  Thanks!
Comment 32 Andy Gospodarek 2007-05-08 14:59:56 EDT
If the patches for bug 216799 and the ones in this comment #25 and comment #16
resolve your issues then we can look at adding those and rather than doing a
full backport of the latest driver for the next update.

Comment 33 Neil Horman 2007-05-24 14:45:32 EDT
Ok, I managed to backport the latest sky2 driver to RHEL4 (minus some
infrastructure that doesn't fit in RHEL4 at this point). The src rpm is on my
people page:
http://people.redhat.com/nhorman
Please build it and try it out.  Thanks!
Comment 34 Neil Horman 2007-08-06 12:46:37 EDT
ping, any update here?
Comment 35 Greg Bailey 2007-08-06 13:26:55 EDT
We've been running on a kernel with the patch from Comment #16 and bug 216799
for quite a while without any panics.  It would be great to see at least those
fixes in Update 6.

I have not yet built a kernel with the latest backported sky2 driver from
Comment #33 because I'm not enough of a kernel hacker to put the right debug
statements in it to figure out why it's crashing (see comments on bug 216799),
but I'm a very willing guinea pig for supplied kernels!  :-)
Comment 36 Neil Horman 2007-08-06 15:10:59 EDT
I'd rather not take just that fix if we get the whole driver in.  Besides, its
rather late at this point for anything to get into 4.6.  Lets try to get this
together for 4.7.  

As for the debugging, if you can just put some prinks in sky2_status_intr to
print out the value of le-link.  If you don't feel comfortable with that, let me
know and I'll get to it as soon as you can.

Also, it might be worth a shot to disable MSI interrupts if your hardware
supports them from the kernel command line.  
Comment 37 Greg Bailey 2007-08-07 12:30:56 EDT
What's the kernel command line option to disable MSI interrupts?  I couldn't
find anything relevant in
/usr/share/doc/kernel-doc-2.6.9/Documentation/kernel-parameters.txt
Comment 38 Neil Horman 2007-08-07 12:48:33 EDT
should be:
nomsi
Comment 39 Greg Bailey 2007-08-07 13:16:46 EDT
I've disabled msi with the "pci=nomsi" kernel command line argument and still
get a panic when the interface is brought up.

I've added printk statements but they don't appear to show up on the console. 
Can you attach a modified sky2.c file I can use with proper debugging statements
in it?

Also, doesn't the "EIP is at netif_receive_skb" in the stack trace (I'll attach
the most recent crash) mean that the invalid access occurred in net/core/dev.c ?
Comment 40 Greg Bailey 2007-08-07 13:17:58 EDT
Created attachment 160831 [details]
Kernel Panic from 2.6.9-55.3.EL.bz228733.2smp
Comment 41 Neil Horman 2007-08-07 14:29:23 EDT
yeah, I hadn't looked at the EIP much, since what I think is happening is that
sky2 is passing a NULL pointer to netif_receive_skb because le->link is out of
bounds.  I'm attaching a patch that should help you tell.  If you have multiple
NIC's on board, it may spew a number of messages and slow your system down a
bit, so be warned.
Comment 42 Neil Horman 2007-08-07 14:39:48 EDT
Created attachment 160839 [details]
patch to debug sky2 oops
Comment 43 Greg Bailey 2007-08-07 15:28:25 EDT
With the patch from Comment #42 I get a few NETIF_RECEIVE_SKB debugs while rc
scripts are executed, and then the panic when I "ifup eth0":

... system rc scripts ...

NETIF_RECEIVE_SKB: SKB = f68b7280
NETIF_RECEIVE_SKB: DEV = c03462c0
NETIF_RECEIVE_SKB: SKB = f6dfc180
NETIF_RECEIVE_SKB: DEV = c03462c0
NETIF_RECEIVE_SKB: DEV = c03462c0
NETIF_RECEIVE_SKB: SKB = f704b480
NETIF_RECEIVE_SKB: DEV = c03462c0
NETIF_RECEIVE_SKB: SKB = f6dfc180
NETIF_RECEIVE_SKB: DEV = c03462c0

... system rc scripts ...

[root@geb-test0 ~]# ifup eth0
ip_tables: (C) 2000-2002 Netfilter core team
SKY2 DEBUG: le->link = 0
SKY2 DEBUG: le->link = 0
NETIF_RECEIVE_SKB: SKB = f5e16c80
NETIF_RECEIVE_SKB: DEV = 00000000
Unable to handle kernel NULL pointer dereference at virtual address 0000017c
 printing eip:
c02822aa
*pde = 3729e001
Oops: 0000 [#1]
SMP
Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc
dm_mirror dm_mod button battery ac ftdi_sio usbserial uhci_hcd ehci_hcd
hw_random e1000 sky2 ext3 jbd ata_piix libata sd_mod scsi_mod
CPU:    0
EIP:    0060:[<c02822aa>]    Not tainted VLI
EFLAGS: 00010282   (2.6.9-55.3.EL.bz228733.3smp)
EIP is at netif_receive_skb+0x3d/0x310
eax: f5e16c80   ebx: f7297c00   ecx: c03d1f58   edx: 00000000
esi: f5e16c80   edi: 0000003c   ebp: f7297e40   esp: c03d1f64
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c03d1000 task=c0324a80)
Stack: f5e16c80 00000001 f5e16c80 f7297c00 f5e16c80 0000003c f7297e40 f88abb3d
       00000064 003c0300 f6e98008 00000002 00000000 00000040 f7f0c680 00000000
       00000000 40000000 f7297c00 00000040 f7f0c680 f88ac2b3 c03d1fd4 00000000
Call Trace:
 [<f88abb3d>] sky2_status_intr+0x227/0x46a [sky2]
 [<f88ac2b3>] sky2_poll+0x5c/0xbf [sky2]
 [<c0282728>] net_rx_action+0xae/0x160
 [<c0126a14>] __do_softirq+0x4c/0xb1
 [<c010819f>] do_softirq+0x4f/0x56
 =======================
 [<c0107ab4>] do_IRQ+0x1a2/0x1ae
 [<c02d6dcc>] common_interrupt+0x18/0x20
 [<c01040e8>] mwait_idle+0x33/0x42
 [<c01040a0>] cpu_idle+0x26/0x3b
 [<c0397786>] start_kernel+0x199/0x19d
Code: 00 50 68 d6 a1 30 c0 e8 67 06 ea ff 8b 44 24 10 ff 70 18 68 f6 a1 30 c0 e8
56 06 ea ff 8b 44 24 18 89 44 24 10 8b 50 18 83 c4 10 <83> ba 7c 01 00 00 00 74
6f 31 c0 f6 42 58 20 74 14 0f b7 82 ae
 <0>Kernel panic - not syncing: Fatal exception in interrupt
Comment 44 Neil Horman 2007-08-07 16:03:04 EDT
Created attachment 160849 [details]
patch to fix null dev pointer

well, that definately shows the problem, although I'm not sure how its
occuring. le->link is valid, but the sky2_hw structs dev array seems to have
been nulled out (or never initialized), which it certainly seems it should have
been.  Anywho, I think this attached patch should fix it.  Please replace the
debug patch you were just using with this one and see if the problem clears up.
 Thanks!
Comment 45 Greg Bailey 2007-08-07 17:48:33 EDT
Same output from "2.6.9-55.3.EL.bz228733.4smp" (patch from comment #44):

ip_tables: (C) 2000-2002 Netfilter core team
SKY2 DEBUG: le->link = 0
SKY2 DEBUG: le->link = 0
NETIF_RECEIVE_SKB: SKB = f7db2c80
NETIF_RECEIVE_SKB: DEV = 00000000
Unable to handle kernel NULL pointer dereference at virtual address 0000017c
 printing eip:
c02822aa
*pde = 372b7001
Oops: 0000 [#1]
SMP
Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc
dm_mirror dm_mod button battery ac ftdi_sio usbserial uhci_hcd ehci_hcd
hw_random e1000 sky2 ext3 jbd ata_piix libata sd_mod scsi_mod
CPU:    0
EIP:    0060:[<c02822aa>]    Not tainted VLI
EFLAGS: 00010286   (2.6.9-55.3.EL.bz228733.4smp)
EIP is at netif_receive_skb+0x3d/0x310
Comment 46 Neil Horman 2007-08-08 08:52:58 EDT
Sorry, but I can't buy the same error on this build, I don't think you managed
to compile the patch in.  if dev was still null in this case, it should have
oopsed back in the sky2_poll routine:  Please add the following line:
printk(KERN_CRIT "SKY2_DEBUG: dev = %p\n",dev0);
at the top of they sky2_poll routine, right after the variable declaration,
rebuild and try again.  Thanks!
Comment 47 Greg Bailey 2007-08-08 11:16:06 EDT
OK, I added the following line to sky2_poll:

printk(KERN_CRIT "SKY2_DEBUG: sky2_poll: dev = %p\n",dev0);

And get:

ifup eth0
ip_tables: (C) 2000-2002 Netfilter core team
SKY2_DEBUG: sky2_poll: dev = f731f800
SKY2_DEBUG: sky2_poll: dev = f731f800
SKY2 DEBUG: le->link = 0
SKY2 DEBUG: le->link = 0
NETIF_RECEIVE_SKB: SKB = f5c52080
NETIF_RECEIVE_SKB: DEV = 00000000
Unable to handle kernel NULL pointer dereference at virtual address 0000017c
 printing eip:
c02822aa
*pde = 3700a001
Oops: 0000 [#1]
SMP
Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc
ftdi_sio usbserial dm_mirror dm_mod button battery ac uhci_hcd ehci_hcd
hw_random e1000 sky2(U) ext3 jbd ata_piix libata sd_mod scsi_mod
CPU:    0
EIP:    0060:[<c02822aa>]    Not tainted VLI
EFLAGS: 00010286   (2.6.9-55.3.EL.bz228733.4smp)
EIP is at netif_receive_skb+0x3d/0x310
Comment 48 Neil Horman 2007-08-08 13:54:46 EDT
Ok, clearly something is going wrong here.  in sky2_poll the dev pointer looks
perfectly valid, but by the time we call sky2_status_intr is gotten corrupted to
NULL.  And in my tree here, I can't see how that happens (nor does it happen in
my testing).  Can you please attach the sky2.c file from your tree so that I can
compare the two please?  Thanks
Comment 49 Greg Bailey 2007-08-08 14:02:54 EDT
Created attachment 160924 [details]
sky2.c version related to comment 47

Requested version of sky2.c that relates to comments #47, #48.
Comment 50 Greg Bailey 2007-08-10 14:38:50 EDT
Not sure why this bug is still NEEDINFO; did it not update correctly or is there
any other information or testing I can provide?  thanks!
Comment 51 Neil Horman 2007-08-10 15:40:54 EDT
you needed to either set the state back to assigned or click the "I am providing
the requested info" checkbox.
Comment 52 Neil Horman 2007-08-10 15:50:07 EDT
well, I see the problem.  I'm not sure how it happened but your version of the
file has some significant (and critical) differences between what my last patch
changed in the base version of the file, and what you uploaded.  Not sure how it
happened, but most notably your version of the file never assigns skb->dev,
which my version does.  I'm going to build & verify a binary kernel here for
you, and post it to my people page.  What arches do you need?  Is x86
sufficient, or do you need others as well?
Comment 53 Greg Bailey 2007-08-10 16:09:40 EDT
I just need i686 smp.  In theory I could take a standard Red Hat kernel source
tree and replace sky2.h and sky2.c, right?  I'm not sure how my sky2.c would be
different; I basically appended your patch to the linux-kernel-test.patch file...
Comment 54 Neil Horman 2007-08-10 16:32:46 EDT
you could just replace the code that way, but its error prone, since your
patched sky2.c file is wrong.  I don't know how you got your file off track
either, but somewhere between your base file and my patch you added something
extra.  Perhaps you had something erroneous in your linux-kernel-test.patch file
previously.  Anywho, I'm building now, and will have binaries posted for you on
monday
Comment 55 Neil Horman 2007-08-13 10:07:55 EDT
Ok, I've posted a i686 smp kernel here:
http://people.redhat.com/nhorman/rpms/kernel-smp-2.6.9-55.3.EL.bz228733.i686.rpm
I've been testing it on my sky2 card for a few hours here this morning.  It has
survived dhcp/scp/ping flooding for the past two hours here, and should be good
to go.  Please give it a try and let me know your results.  Thanks!
Comment 56 Greg Bailey 2007-08-13 14:55:38 EDT
The kernel referenced in Comment #55 boots fine and can access the network, etc.

I'm investigating the missing skb->dev reference and see the following in the
"linux-kernel-test.patch" file that's part of your
kernel-2.6.9-55.3.EL.bz228733.2.src.rpm file that used to be available from your
people page, and see the following in it:

@@ -1955,17 +2068,20 @@
                dev = hw->dev[le->link];

                sky2 = netdev_priv(dev);
-               length = le->length;
-               status = le->status;
+               length = le16_to_cpu(le->length);
+               status = le32_to_cpu(le->status);

                switch (le->opcode & ~HW_OWNER) {
                case OP_RXSTAT:
-                       skb = sky2_receive(sky2, length, status);
-                       if (!skb)
-                               break;
+                       skb = sky2_receive(dev, length, status);
+                       if (unlikely(!skb)) {
+                               sky2->net_stats.rx_dropped++;
+                               goto force_update;
+                       }

-                       skb->dev = dev;
                        skb->protocol = eth_type_trans(skb, dev);
+                       sky2->net_stats.rx_packets++;
+                       sky2->net_stats.rx_bytes += skb->len;
                        dev->last_rx = jiffies;

 #ifdef SKY2_VLAN_TAG_USED

Can you attach the linux-kernel-test.patch file I should be using?  It appears
the one in the earlier .src.rpm might not be right.

Thanks!
Comment 57 Neil Horman 2007-08-14 08:47:02 EDT
Created attachment 161265 [details]
correct patch for sky2 from cvs

sure, here it is.  Not sure how it got changed in the srpm.  its exactly the
same patch, just without the - in front of the skb->dev =... line (and the
corresponding line number changes that go with it).  Very odd, it was correct
in our CVS tree here, so I have no idea how that would have changed.  Anywho,
given that the kernel I built worked for you, I'm thinking that this is ready
for me to post for inclusion here, unless you would like to rebuild with this
patch and do some more testing.  Whats your preference?
Comment 58 Greg Bailey 2007-08-21 17:46:30 EDT
I built a kernel on 8/14 with the supplied patch and have been running it on a
half dozen servers or so since then.

Just this morning I hit some sort of a timeout issue and found this in
/var/log/messages on one of the servers:

Aug 21 10:36:01 el4-node1 kernel: sky2 eth0: tx timeout
Aug 21 10:36:01 el4-node1 kernel: sky2 eth0: disabling interface
Aug 21 10:36:01 el4-node1 kernel: sky2 eth0: enabling interface
Aug 21 10:36:01 el4-node1 kernel: sky2 eth0: ram buffer 48K
Aug 21 10:36:04 el4-node1 kernel: sky2 eth0: Link is up at 1000 Mbps, full
duplex, flow control rx

Then I do "service network stop", "rmmod sky2", "service network start" to
restore network connectivity:

Aug 21 14:30:38 el4-node1 network: Setting network parameters:  succeeded
Aug 21 14:30:38 el4-node1 network: Bringing up loopback interface:  succeeded
Aug 21 14:30:38 el4-node1 kernel: ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16
(level, low) -> IRQ 169
Aug 21 14:30:38 el4-node1 kernel: sky2 0000:02:00.0: v1.14 addr 0xdeefc000 irq
169 Yukon-EC (0xb6) rev 2
Aug 21 14:30:38 el4-node1 kernel: sky2 eth0: addr 00:0e:0c:6a:c9:54
Aug 21 14:30:38 el4-node1 kernel: sky2 eth0: enabling interface
Aug 21 14:30:38 el4-node1 kernel: sky2 eth0: ram buffer 48K
...
Aug 21 14:30:40 el4-node1 kernel: sky2 eth0: Link is up at 1000 Mbps, full
duplex, flow control rx
Aug 21 14:30:42 el4-node1 network: Bringing up interface eth0:  succeeded

The output of "dmesg" shows:

NETDEV WATCHDOG: eth0: transmit timed out
sky2 eth0: tx timeout
sky2 eth0: transmit ring 412 .. 371 report=413 done=413
sky2 eth0: disabling interface
sky2 eth0: enabling interface
sky2 eth0: ram buffer 48K
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control rx
sky2 eth0: disabling interface
divert: freeing divert_blk for eth0
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:02:00.0 to 64
sky2 0000:02:00.0: v1.14 addr 0xdeefc000 irq 169 Yukon-EC (0xb6) rev 2
divert: allocating divert_blk for eth0
sky2 eth0: addr 00:0e:0c:6a:c9:54
sky2 eth0: enabling interface
sky2 eth0: ram buffer 48K
sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control rx

When I was getting the "timeout" messages, I was unable to ping any other
network devices.  Is this related to either Comment #5, Comment #15, or Comment #22?
Comment 59 Neil Horman 2007-08-21 19:51:38 EDT
Don't know for sure, although msi might be a possibility.  I'd try with
pci=nomsi just to see.  I'll look upstream and see if something more recent has
gone in.
Comment 60 Greg Bailey 2007-08-28 19:41:03 EDT
I've rebooted with "pci=nomsi" on the command line, and still see in
/proc/interrupts:

217:     710348          0          0          0         PCI-MSI  eth0

Is this expected?

I'm trying to understand the "tx timeout" messages, and how to reproduce them. 
In my test environment, I have 2 servers, each of which has a sky2 Marvell NIC
connected to a switch as "eth0".

On server "A", I type "nc serverB 3409 < /dev/zero"

On server "B", I type "nc -p 3409 > /dev/null"

I see lots of traffic from A->B, as expected.  If I shutdown eth0 on server "B",
wait a while, and then re-enable eth0 on server "B", I see the following in
"dmesg" output on server B:

sky2 eth0: disabling interface
sky2 eth0: enabling interface
sky2 eth0: ram buffer 48K

As expected...  The problem is that server B now is unable to ping or access the
local network anymore.  "mii-tool" shows a link present.  ethtool eth0 shows
that a link is NOT present.  tcpdump of eth0 shows no activity (meanwhile server
A is still spewing out lots of zeroes...)

If I perform "service network restart" on server B, it doesn't help anything. 
If I unload the sky2 module, then things clear up and I'm back on the network again.

I'm curious about this testcase because the symptom seems to match the earlier
"tx timeout" messages; the driver tried to re-enable itself after a timeout, but
it's still not able to see any traffic.

Any ideas?  Is there some other way to trigger a "tx timeout"?  Seems like the
restart that's supposed to happen misses something.
Comment 61 Neil Horman 2007-08-29 09:34:22 EDT
Dang!  That would be because RHEL4 is too old to support pci=nomsi.  Sorry about
that, should have checked that first.  I'll add to the patch to see if I can
disable msi interrupts manually for sky2 specifically.
Comment 62 Neil Horman 2007-08-29 10:07:35 EDT
scratch that, we seem to be in luck, there is already a sky2 module parameter
called disable_msi.  Just add this line to /etc/modprobe.conf
options ethX disable_msi=1
where ethX is the alias name for the interface driven by your sky2 module
Comment 63 Greg Bailey 2007-08-29 13:07:29 EDT
OK, I added "options eth0 disable_msi=1" to /etc/modprobe.conf on both server
"A" and server "B" as described in Comment #60.

I still see the same behavior as described in Comment #60, except that
/proc/interrupts now shows:

169:        340     114654        615         71   IO-APIC-level  uhci_hcd, eth0

The loss of connectivity as described in that comment still apply and the
"service network restart" does NOT restore connectivity--I have to unload and
reload the sky2 module.
Comment 64 Neil Horman 2007-08-29 14:43:02 EDT
Hmm, I wonder if this is what your seeing?
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c59697e06058fc2361da8cefcfa3de85ac107582
Looks like sky2 had some tx timer problem workaround that got lost upstream
during a driver rebase, and was then readded when the problem recurred.
That should apply with just an offset to the current sky2 build.  Think you can
apply it on top of what we have, or shall I build you a new kernel?
Comment 65 Greg Bailey 2007-08-29 16:06:54 EDT
OK, I included that patch.  I get the same results as described in Comment #60,
except that now I get the additional line in the dmesg output:

sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control rx

And "ethtool eth0" actually shows "Link detected: yes", unlike before.

However, these changes notwithstanding, I still am not able to access the
network until I reload the "sky2" module.  :-(
Comment 66 Neil Horman 2007-08-29 16:34:17 EDT
And you still get the tx_timeout messages?
Comment 67 Greg Bailey 2007-08-29 16:59:56 EDT
My goal is to somehow trigger the scenario that generates the "tx timeout"
messages, but I've been unsuccessful in doing so.  I can't forcibly get the "tx
timeout" message to be displayed.

Comment #60 was an attempt to do that, and I thought it suspicious that the
symptoms exhibited by "tx timeout" being displayed and the result of my exercise
of bouncing the network interface while lots of traffic is being received seemed
to be pretty similar.
Comment 68 Neil Horman 2007-08-29 19:35:57 EDT
Ok, that changes thing.  Sounds like we may need to reset the hardware on open.
 I'll see if I can enhance the patch to do that, incoporating the patch from
comment 64 on the way.  Thanks!
Comment 69 Greg Bailey 2007-08-30 08:46:24 EDT
Just for point of reference, I tried the test described in Comment #60 on a
vanilla 2.6.23-rc4 kernel, and I get the same failure condition.  How does this
type of issue get reported upstream?  (Do I report a bug on kernel.org or does
Red Hat generally do that type of thing?)

Also, for what it's worth, the vendor sk98lin 10.20.3.3 driver does not appear
to have this issue...
Comment 70 Neil Horman 2007-08-30 09:16:04 EDT
Oh, that is good to know.  you can here:
http://bugzilla.kernel.org/

If the problem is upstream as well, then perhaps it would be prdent to move
forward with this bz, and roll it into RHEL4.6 or 4.7 and pursue the problem
upstream and backport when its fixed.  What are your thoughts?
Comment 71 Greg Bailey 2007-08-30 14:53:26 EDT
The problem happens intermittently with the vanilla 2.6.23-rc4 kernel, whereas
it happens consistently with the RHEL4 kernel (sky2 1.14).

I've opened:
http://bugzilla.kernel.org/show_bug.cgi?id=8962
to track the upstream issue.

Re: your question in Comment #70, how big or involved would the patch you
reference in Comment #68 be?  I'd be interested in trying it...
Comment 72 Neil Horman 2007-08-30 15:44:50 EDT
Well, I don't honestly know.  My initial inclination was that there woudl be
some relatively straightforward way to reset the chip in the sky2_probe routine
that we could borrow and put in the open routine, making for an easy patch. 
Looking at it though, it may be rather more complicated than that.  I'll get up
with the upstream maintainer and see what his thoughts on the matter are, since
he's much more familiar with sky2 than I am.
Comment 73 Neil Horman 2007-08-30 15:50:02 EDT
by the way, it hasn't escaped my notice from the upstream bz that you are
testing this on CentOs not on RHEL per-se.  whiel it doesn't particularly bother
me one way or the other, and this clearly isn't a distribution specific problem,
I'd be curious to know if you've contacted them for support on this issue?
Comment 74 Greg Bailey 2007-08-30 18:15:17 EDT
I hadn't contacted them because my original problem with the sky2 driver was on
a RHEL 4 system and so I opened a support ticket with Red Hat.  After 3 months
the response from support was that I should view bugzilla #198808 for more
information about this problem.  I don't have permissions to view that bug (and
complained as such, to no avail), so I opted to write my own bugzilla entry
instead, and here it is...

Out of curiosity, in trying to determine what you had to modify to the upstream
sky2 driver to retrofit it to an RHEL4 kernel, is it mostly removing wake-on-lan
stuff?  (And which upstream kernel did you use to pull sky2 1.14 from--a
specific kernel rev. or a GIT commit?)
Comment 75 Andy Gospodarek 2007-08-30 18:27:05 EDT
Just as an FYI, Stephen Hemminger posted a sky2 update to netdev today -- it
might be worth trying his patches though I don't see anything in his
descriptions that will really help the issues discussed here.
Comment 76 Neil Horman 2007-08-30 20:21:00 EDT
I pulled from Linus's tree, I don't remember the exact tag/version, but I'll
look it up for you if you like

Andy, I'll go over Stephens update in the AM.  Thanks!
Comment 77 Neil Horman 2007-09-07 10:06:30 EDT
Since I'm unable to reproduce were you able to get the requested debug info that
Stephen asked for?
Comment 78 Greg Bailey 2007-09-14 03:01:44 EDT
Yes, comment posted to that bug report on 9/7.
Comment 80 RHEL Product and Program Management 2008-01-16 09:37:17 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 82 Neil Horman 2008-03-24 16:18:10 EDT
We've managed to get some extra sky2 hardware available and are working on
setting up reproducers now.
Comment 84 Neil Horman 2008-04-16 14:24:07 EDT
Created attachment 302652 [details]
new sky2 backport

I've not tested it yet (no sky2 hardware in hand at the moment), but I've done
this backport of the latest sky2 driver that a co-worker has been using on a
2.6.25 kernel, and he has been unable to reproduce any lockups or crashes with
it.  If you could give it a spin, I'd appreciate it.  Thanks!
Comment 85 RHEL Product and Program Management 2008-09-03 09:10:28 EDT
Updating PM score.
Comment 88 RHEL Product and Program Management 2009-03-12 14:53:20 EDT
Since RHEL 4.8 External Beta has begun, and this bugzilla remains 
unresolved, it has been rejected as it is not proposed as exception or 
blocker.
Comment 89 Neil Horman 2009-03-23 07:02:19 EDT
closing, no activity from reporter for over a year
Comment 90 Greg Bailey 2009-03-23 09:37:57 EDT
Unfortunately I found myself in the same situation as the reporter of #216799 (sky2 transmitter lockup), and moved on to other, less problematic hardware (that didn't have Marvell interfaces).  Before doing so, however, I made use of the patches for this bugzilla and #216799 and was able to make satisfactory use of the hardware.  The prior patches posted to this bug helped tremendously; unfortunately by the time the last request for testing occurred (4/16/2008) I was no longer using this hardware...

Note You need to log in before you can comment on or make changes to this bug.